Closed Bug 94176 Opened 23 years ago Closed 22 years ago

Text copied from table with multiple rows has no CR/LF

Categories

(Core :: DOM: HTML Parser, defect)

defect
Not set
normal

Tracking

()

VERIFIED FIXED

People

(Reporter: justincwatt, Assigned: mozeditor)

References

()

Details

(Keywords: regression, Whiteboard: [plaintext] fixinhand edt_x3)

Attachments

(1 file, 3 obsolete files)

From Bugzilla Helper: User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:0.9.1) Gecko/20010607 Netscape6/6.1b1 BuildID: 2001080110 As described on the following page: http://www.unc.edu/~jwatt/tablecopyproblem.html ...when text is copied out of a table with multiple rows and into a text editor (or email), there is no CR/LF (carriage return/line feed) between each row, therefore the table data is copied out of the table as one single line of text. Reproducible: Always Steps to Reproduce: 1. Copy text from table on http://www.unc.edu/~jwatt/tablecopyproblem.html 2. Paste in Notepad (or similar text editor) with auto word-wrap off Actual Results: Text from multiple table rows is pasted as one (infinitely) lone line of text Expected Results: There should be a CRLF after each row in the paste text
sounds like a serializer issue in Navigator; reassign
Assignee: mjudge → anthonyd
Component: Selection → DOM to Text Conversion
QA Contact: tpreston → sujay
so the request would be that a cr/lf be inserted for every <tr>? SO what would be the expected behavior for nested tables?
Severity: normal → enhancement
Priority: -- → P4
Whiteboard: [RFE][plaintext]
Target Milestone: --- → Future
This is a regression. We used to append newlines at the end of a table row. I wonder if this is another regression from the noxif landing? Cc'ing Ben and Daniel since they worked on the original code and probably want to know about the regression. Beth: for nested tables, it won't look like the original. Nobody ever had time to implement really smart table serializing code, though we all thought it would be nice to do some day. But we did have it looking reasonable for simple tables, at one point.
Keywords: regression
In nsPlaintextSerializer::DoCloseContainer: else if ((type == eHTMLTag_tr) || (type == eHTMLTag_li) || (type == eHTMLTag_dt)) { // Items that should always end a line, but get no more whitespace EnsureVerticalSpace(0); } So either EnsureVerticalSpace has broken so that it doesn't actually end the line any more, or somehow we're not getting to that point (hard to see how that could be, though, it's the first clause in DoCloseContainer after the check for body or html). Ben, Daniel, any idea whether there have been recent regressions in EnsureVerticalSpace(0)?
Status: UNCONFIRMED → NEW
Ever confirmed: true
> Ben, [...] any idea whether there have been recent regressions > in EnsureVerticalSpace(0)? No. Did you try to add a |printf| to that |if|-block?
I was wondering how it would look, for example if I have a table that is 1x2, each cell has text and the 2nd cell has a nested table in it, would that nested table be indented in regard to the content of the cell? |-----|----------------| |one | two | | | |------------| | | | |more | stuff| | | | |------------| | |-----|----------------| so would the paste look like this? one two more stuff or would it look like this? one two more stuff
The algorithm that we had before the regression inserted a newline after every tr and between tables (whether they were nested or not). I'm a little confused about your example since the border goes around "more" and "stuff" but not around "two", and to get that I think we'd need three levels of table nesting. But this is somewhat academic -- the issue is that if we don't put any newlines into tables when we convert them, we'll get really long lines and all tables (with more than one row) will be guaranteed hard to read. If we do put newlines in (as we used to before the regression) then simple tables will be easy to read, and complex ones will be no worse. If we want complex tables to be easy to read as well, then we need to make it a priority and put some manpower on it. (I'd estimate two weeks for something halfway decent, more to do proper alignment on pages that use tables for formatting).
let me code it out for you: <table> <tr> <td>one</td> <td>two <table> <tr> <td>more</td> <td>stuff</td> </tr> </table> </td> </tr> </table> as for the extra added work for complex tables, I would suggest adding a new bug and push it out to future.
Another related problem... (related to Composer) When I make a generic 2x2 table in Composer, the generated source looks like this (pay attention to the <BR>'s: <table cellpadding="2" cellspacing="2" border="1" width="100%"> <tbody> <tr> <td valign="Top">one<br> </td> <td valign="Top">two<br> </td> </tr> <tr> <td valign="Top">three<br> </td> <td valign="Top">four<br> </td> </tr> </tbody> </table> The automatically generated <BR> tags after the data in each <TD></TD> makes the text copied out of the table look like this: one two three four Yuck! Does this deserve a Composer-related bug? I wonder if the text copied out of the table above will look like the following when this bug (bug 94176) is fixed ? one two three four Also, why is there an initial space before the "one" ?
IMO: the editor adding those br tags is a bug and we shouldn't do that, but that's a separate issue, not part of this bug. I'm not sure if Joe has a bug on that or not. Offhand, I would expect Beth's sample to output as: one two more stuff because the nested <table> tag is a block element and would cause an EnsureVerticalSpace when it opens. And in fact, that's pretty close to what happens now -- the example doesn't trigger this bug, because there's no <tr> between table rows; the only <tr> which should show up is complicated by a <table> tag which triggers a newline all by itself. I see an extra space between one and two, which I expect is covered by another bug (I know we have several bugs on extra spaces showing up) but otherwise the output is what I predicted above.
The bug on being smarter about tables was filed long ago, and is bug 18012, Support Tables in Plaintext Output.
--> brade
Assignee: anthonyd → brade
I think this is fixed; resolve so QA can test.
Status: NEW → RESOLVED
Closed: 23 years ago
OS: Windows 2000 → All
Hardware: PC → All
Resolution: --- → FIXED
Target Milestone: Future → mozilla0.9.3
As far as I can tell, this bug is still busted in Mozilla 0.9.4, build: 2001091303 and the most recent nightly build: 2001100403. For example, when text is copied out of this simple html table: <html> <body> <table> <tr> <td>1</td> <td>2</td> </tr><tr> <td>3</td> <td>4</td> </tr> </table> </body> </html> it looks like this: 1 2 3 4 and it should look like this: 1 2 3 4 I believe this bug needs to be re-opened, unless I am missing something.
reopen based on recent testing (comments above)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
-->peterv
Assignee: brade → harishd
Status: REOPENED → NEW
Really reassigning.
Assignee: harishd → peterv
Target Milestone: mozilla0.9.3 → ---
Blocks: 103522
Severity: enhancement → normal
Priority: P4 → --
Whiteboard: [RFE][plaintext] → [plaintext]
Dmose was just complaining about this on #mozilla. And I've been hitting it a lot trying to copy discussions from chatzilla and paste them into plaintext editors. I'm wasting a lot of time re-formatting stuff that we screwed up at paste time. I looked into it a little to see if the fix was obvious. It turns out this regressed when the "copy encoding" stuff was added, because now <tr> and <td> are no longer passed along to nsPlaintextSerializer, so the serializer doesn't know to do the right thing with them. Adding nsHTMLAtoms::table, nsHTMLAtoms::tr, nsHTMLAtoms::th, and nsHTMLAtoms::td to the list of tags in nsHTMLCopyEncoder::IncludeInContext fixes table copy, but has the side effect that now text copied from inside one table cell (e.g. from the main part of a page which uses tables for formatting) appends a newline after the end (because the tr/table tags are included and the serializer sees a block node closing). We need a way of "including in context" only when the selection *spans* one of these nodes. Joe wrote the copy encoder stuff: Joe, how do we do that? Is that what the "promoted range"/"promoted point" stuff is about? (I'm not sure what they do.) I'll take this bug for now since it's clear that no one else plans to look at it and it's making a lot of people's lives miserable. But I need some guidance from Joe on the code.
Assignee: peterv → akkana
Haven't had much luck syncing with Joe. Joe, I think this is basically the same issue as the extra line breaks in text copied from pre blocks -- in both cases, we really need to figure out whether the selection crosses a block boundary as opposed to the selection living wholly inside a block boundary. I'm willing to put in some work on this since it's sometimes a major impediment to getting work done, but I don't understand how the current code works. Can you offer some pointers?
Assignee: akkana → jfrancis
I'm kinda surprised that the current code doesn't do the right thing here. I just made a table and removed the trailing <br>s in source mode, and expiremented. I see the bug if I select all of the table content by drag- selecting from *inside* the table. But I get the desired behavior if I select- all. The weird thing is that I thought the html stream we put into the clipboard should be identical in those two cases, and only the special data flavor that indicated how "deep" in the structure the real selection endpoints were would be different. Since that latter structure is only used in Composers paste code, and not by the serializer, I would think these two cases would get identical results in the serialier. But they don't. Akkana, I can't really help you unless I just investigate this bug myself. It doesn't make sense to me. The tr's should be getting through.
Target Milestone: --- → mozilla0.9.9
--> myself, after discussing with Nisheeth.
Assignee: jfrancis → tmutreja
Attached patch Patch to fix it... (obsolete) (deleted) — Splinter Review
Akkana, as you mentioned above, changes in IncludeInContext() are causing the regression by adding extra new lines but there seems to be no other way to catch these tags in Serializer. In the patch I'm trying to work around the addition of these new lines.
Comment on attachment 69655 [details] [diff] [review] Patch to fix it... This does fix the problem of the lacking newlines after <tr>, and looks like a solid fix to the problem, so I'm giving it r=akkana. Ideally I hope jfrancis can take a look at the changes to nsDocumentEncoder.cpp, since he's the expert on what might break when those lists change (but my guess is that it's fine when taken with the rest of this patch). The !mAtFirstColumn added for <td> nodes unfortunately isn't doing its job. I made a page that looked like this: <table> <tr><td>one<td>two<td>three<td>four <tr><td>a <td>b <td>c <td>d </table> and the first line came out: onetwothreefour That's true before this patch, too, so it shouldn't block acceptance of the patch, but it remains a problem to be fixed. (I don't know why the patch didn't fix it -- seems like it should.) I also noticed another problem (unrelated, probably already covered by a bug somewhere else): copy the html source beppe gave, and paste into an external plaintext app (I did this in order to create an html file for testing) -- you lose the initial whitespace from most of the lines. I'm not sure if that's a new problem or not (but it wasn't caused by this patch -- happens even without the patch).
Attachment #69655 - Flags: review+
Can you please add a comment to the new member variable, describing its purpose? There are so many variables of this kind already that I think it's needed. [OT] > you lose the initial whitespace from most of the lines. I know that I once filed a bug about that in textareas, not sure about it's status, though. I see the same behaviour you describe, but not in textareas anymore.
I second Ben's request for a comment describing the new member variable. Is there a bug for the missing whitespace between table cells? The other missing whitespace is probably (misbehaving?) whitespace compression I would guess.
Attached patch Incorporating above suggestions. (obsolete) (deleted) — Splinter Review
I see the reason why mAtFirstColumn is not working. It's because we are not manipulating it to suit our requirements. So I'm replacing it with a check for "mCurrentLine.IsEmpty()". That solves our purpose. Also included the description of new member variable in this patch. Akkana, would you please explain me about which white space are you talking here: "copy the html source beppe gave, and paste into an external plaintext app (I did this in order to create an html file for testing) -- you lose the initial whitespace from most of the lines." I tried with the testcase provided by beppe and could not understand this properly.
Comment on attachment 69841 [details] [diff] [review] Incorporating above suggestions. Ah, so mLineBreakIsDue is kind of a lazy line break, too avoid extra ending lines (which is a problem right now)... Have you tested it in mail too (sending a table as plain text), where the formatting flags are a little different. As for mAtFirstColumn it's only used when doing preformatted output. In the other case we don't wrap until later so we don't know which column text will end up at. mCurrentLine is not always used either. For some settings of the flags it's bypassed and the text is written directly to the output string/stream.
Attached patch Modified to incorporate Daniel's sugestions. (obsolete) (deleted) — Splinter Review
Thanks Daniel, Included a new member variable |mIsFirstTD| to nsPlainTextSerializer. It keeps a track if we are handling first <TD> of a <TR> or <TABLE>. If <TD> is not the first one, we skip adding a new line character to the current text. Using a separate member variable makes it independent of any specific flag setting.
Re the plaintext paste problems: look at beppe's comment #8. Copy that whole comment. Paste into a plaintext window. It pastes like this: ---- let me code it out for you: <table> <tr> <td>one</td> <td>two <table> <tr> <td>more</td> <td>stuff</td> </tr> </table> </td> </tr> </table> as for the extra added work for complex tables, I would suggest adding a new bug and push it out to future. ---- Notice that all the initial whitespace is lost. I think this is covered by ftang's bug that jfrancis just took, summary something like "plaintext copy/paste is very bad".
Status: NEW → ASSIGNED
Whiteboard: [plaintext] → [plaintext][needs sr=]
Comment on attachment 70063 [details] [diff] [review] Modified to incorporate Daniel's sugestions. This is a no-go. You can't make this change to IncludeInContext(). It will break html paste: copying anything in a table would cause a whole table to be pasted in Composer.
Attachment #70063 - Attachment is obsolete: true
Attachment #70063 - Flags: needs-work+
What is the expected behavior of HTML paste? Should we not get a table pasted on composer? I think we should.
Copying a word that happens to be in a table should result in a table being pasted. contributors to this discussion may be interested in reviewing/testing my patch in bug 98572
That was SUPPOSED to read: Copying a word that happens to be in a table should NOT result in a table being pasted.
The changes to nsPlainTextSerializer looks ok though. I assume you have run the tests (they are part of the tinderbox tests so failing one of them breaks tinderbox), and sent mail with it, verifying that it looks ok. If so, r=bratell@lysator.liu.se for the serializer. You can use that r= in coming patches as long as nsPlainTextSerializer stays the same. I'm very thankful for all your work. There are a lot of small things to fix, that I've never gotten to do.
Akkana, I think you're comment #19 is correct. I have to change the copy code to detect block crossings. Unfortunately, before I can do that I have to import a bunch of ws code that I originally wrote for the editor. The reason is you could be "trivially" crossing a block. For example, you could drag select a bunch of text that lives between two lists. But if you do your drag just right (just wrong?) you could actually have the selection starting at the very end of the prior list, and ending at the very beginning of the following list. Then, if I detect block crossings at copy time, I will force two single item lists (with empty items) to be output: one before your copied text and one after. This is hardly what the user wanted. I'm in the process of working on this issue this week. Once it is fixed the serializer should see table cells and rows, etc, for selections that are inside of tables and span table structure. Assigning to myself pending copy work. If there remains serialzer work after I improve copy I will assign back at that point.
Assignee: tmutreja → jfrancis
Status: ASSIGNED → NEW
*** Bug 125658 has been marked as a duplicate of this bug. ***
for the moment putting in moz1.1. hoping to bring it back but cant right now due to nervousness by the bug gods.
Status: NEW → ASSIGNED
Target Milestone: mozilla0.9.9 → mozilla1.1
removing myself from the cc list
*** Bug 134776 has been marked as a duplicate of this bug. ***
From bug 134776, this occurs in Chatzilla and is very troublesome to copy and paste discussions of bugs, etc. Hope this will increase priority/effort a little.
*** Bug 136830 has been marked as a duplicate of this bug. ***
*** Bug 103522 has been marked as a duplicate of this bug. ***
*** Bug 153858 has been marked as a duplicate of this bug. ***
The trunk is the wave of the future!
Target Milestone: mozilla1.1alpha → mozilla1.1beta
Heh. I looked into this some more and discovered what is really going on. The problem is that the parser is dropping partial table structure on the floor. The docencoder is indeed producing the tr's and td's, and clipboard is receiving them. When the clipboard code tries to convert this to plaintext, it calls the parser. The parser just drops table elements on the floor if it doesnt already have a table on it's tag stack. So we get just the contents of the td's. the parser needs to not drop the tr's/td's. Perhaps there should be a seperate mode for the parser where it understands it is just parsing a fragment, instead of a document. The clipboard code could then call the parser in this mode. Over to parser folks. CC'ing pinkerton for clipboard.
Component: DOM to Text Conversion → Parser
reassigning
Assignee: jfrancis → harishd
Status: ASSIGNED → NEW
QA Contact: sujay → moied
Boris was thinking about the fragment stuff at some point I believe, adding him to Cc.
back to me cuz i have a fix. the patch i am attaching depends on the patch in bug 159842.
Assignee: harishd → jfrancis
Attachment #69655 - Attachment is obsolete: true
Attachment #69841 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Whiteboard: [plaintext][needs sr=] → [plaintext]
Target Milestone: mozilla1.1beta → M1
Whiteboard: [plaintext] → [plaintext] fixinhand
*** Bug 162488 has been marked as a duplicate of this bug. ***
*** Bug 141862 has been marked as a duplicate of this bug. ***
I don't think M1 is the correct milestone...
Target Milestone: M1 → ---
resetting milestone; M1 is correct milestone if you want jfrancis to fix it soon.
Target Milestone: --- → M1
Oh, sorry. But it would be good if everyone stuck to the same meanings of milestone fields and other fields. Most(?) teams uses milestone to set the milestone to aim for, and priority to fine tune the order in which to fix the bugs in a milestone. Also, we have people estimating glide paths for releases etc. based on number of bugs in milestones. Using nonstandard milestones screws these estimates.
Comment on attachment 93106 [details] [diff] [review] widget patch to tell parser we are parsing fragment sr=kin@netscape.com
Attachment #93106 - Flags: superreview+
Depends on: 159842
fix landed on trunk
Status: ASSIGNED → RESOLVED
Closed: 23 years ago22 years ago
Resolution: --- → FIXED
Please verify this bug
Whiteboard: [plaintext] fixinhand → [plaintext] fixinhand edt_x3
using the trunk build from 2003040708 on win2K. I created a simple table, viewed it in the browser, set my mail compose to be plaintext. And then copied & pasted the data from the browser window to the mail compose window. testing a simple table: <table> <tr> <td>one</td> <td>two</td> </tr> <tr> <td>more</td> <td>stuff</td> </tr> </table> results in this output: one two more stuff If I remove the content 'more' and do the same copy/paste - the ws is preserved.
Status: RESOLVED → VERIFIED
See also bug 236546, same bug when copying table cells/rows/columns with Ctrl.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: