94176 - Text copied from table with multiple rows has no CR/LF

Reporter

Description

•

23 years ago

From Bugzilla Helper: User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:0.9.1) Gecko/20010607 Netscape6/6.1b1 BuildID: 2001080110 As described on the following page: http://www.unc.edu/~jwatt/tablecopyproblem.html ...when text is copied out of a table with multiple rows and into a text editor (or email), there is no CR/LF (carriage return/line feed) between each row, therefore the table data is copied out of the table as one single line of text. Reproducible: Always Steps to Reproduce: 1. Copy text from table on http://www.unc.edu/~jwatt/tablecopyproblem.html 2. Paste in Notepad (or similar text editor) with auto word-wrap off Actual Results: Text from multiple table rows is pasted as one (infinitely) lone line of text Expected Results: There should be a CRLF after each row in the paste text

Kathleen :Brade

Comment 1

•

23 years ago

sounds like a serializer issue in Navigator; reassign

Assignee: mjudge → anthonyd

Component: Selection → DOM to Text Conversion

QA Contact: tpreston → sujay

rubydoo123

Comment 2

•

23 years ago

so the request would be that a cr/lf be inserted for every <tr>? SO what would be the expected behavior for nested tables?

Severity: normal → enhancement

Priority: -- → P4

Whiteboard: [RFE][plaintext]

Target Milestone: --- → Future

Akkana Peck

Comment 3

•

23 years ago

This is a regression. We used to append newlines at the end of a table row. I wonder if this is another regression from the noxif landing? Cc'ing Ben and Daniel since they worked on the original code and probably want to know about the regression. Beth: for nested tables, it won't look like the original. Nobody ever had time to implement really smart table serializing code, though we all thought it would be nice to do some day. But we did have it looking reasonable for simple tables, at one point.

Keywords: regression

Akkana Peck

Comment 4

•

23 years ago

In nsPlaintextSerializer::DoCloseContainer: else if ((type == eHTMLTag_tr) || (type == eHTMLTag_li) || (type == eHTMLTag_dt)) { // Items that should always end a line, but get no more whitespace EnsureVerticalSpace(0); } So either EnsureVerticalSpace has broken so that it doesn't actually end the line any more, or somehow we're not getting to that point (hard to see how that could be, though, it's the first clause in DoCloseContainer after the check for body or html). Ben, Daniel, any idea whether there have been recent regressions in EnsureVerticalSpace(0)?

Status: UNCONFIRMED → NEW

Ever confirmed: true

Ben Bucksch (:BenB)

Comment 5

•

23 years ago

> Ben, [...] any idea whether there have been recent regressions > in EnsureVerticalSpace(0)? No. Did you try to add a |printf| to that |if|-block?

rubydoo123

Comment 6

•

23 years ago

I was wondering how it would look, for example if I have a table that is 1x2, each cell has text and the 2nd cell has a nested table in it, would that nested table be indented in regard to the content of the cell? |-----|----------------| |one | two | | | |------------| | | | |more | stuff| | | | |------------| | |-----|----------------| so would the paste look like this? one two more stuff or would it look like this? one two more stuff

Akkana Peck

Comment 7

•

23 years ago

The algorithm that we had before the regression inserted a newline after every tr and between tables (whether they were nested or not). I'm a little confused about your example since the border goes around "more" and "stuff" but not around "two", and to get that I think we'd need three levels of table nesting. But this is somewhat academic -- the issue is that if we don't put any newlines into tables when we convert them, we'll get really long lines and all tables (with more than one row) will be guaranteed hard to read. If we do put newlines in (as we used to before the regression) then simple tables will be easy to read, and complex ones will be no worse. If we want complex tables to be easy to read as well, then we need to make it a priority and put some manpower on it. (I'd estimate two weeks for something halfway decent, more to do proper alignment on pages that use tables for formatting).

rubydoo123

Comment 8

•

23 years ago

let me code it out for you: <table> <tr> <td>one</td> <td>two <table> <tr> <td>more</td> <td>stuff</td> </tr> </table> </td> </tr> </table> as for the extra added work for complex tables, I would suggest adding a new bug and push it out to future.

Justin Watt

Reporter

Comment 9

•

23 years ago

Another related problem... (related to Composer) When I make a generic 2x2 table in Composer, the generated source looks like this (pay attention to the <BR>'s: <table cellpadding="2" cellspacing="2" border="1" width="100%"> <tbody> <tr> <td valign="Top">one<br> </td> <td valign="Top">two<br> </td> </tr> <tr> <td valign="Top">three<br> </td> <td valign="Top">four<br> </td> </tr> </tbody> </table> The automatically generated <BR> tags after the data in each <TD></TD> makes the text copied out of the table look like this: one two three four Yuck! Does this deserve a Composer-related bug? I wonder if the text copied out of the table above will look like the following when this bug (bug 94176) is fixed ? one two three four Also, why is there an initial space before the "one" ?

Akkana Peck

Comment 10

•

23 years ago

IMO: the editor adding those br tags is a bug and we shouldn't do that, but that's a separate issue, not part of this bug. I'm not sure if Joe has a bug on that or not. Offhand, I would expect Beth's sample to output as: one two more stuff because the nested <table> tag is a block element and would cause an EnsureVerticalSpace when it opens. And in fact, that's pretty close to what happens now -- the example doesn't trigger this bug, because there's no <tr> between table rows; the only <tr> which should show up is complicated by a <table> tag which triggers a newline all by itself. I see an extra space between one and two, which I expect is covered by another bug (I know we have several bugs on extra spaces showing up) but otherwise the output is what I predicted above.

Akkana Peck

Comment 11

•

23 years ago

The bug on being smarter about tables was filed long ago, and is bug 18012, Support Tables in Plaintext Output.

rubydoo123

Comment 12

•

23 years ago

--> brade

Assignee: anthonyd → brade

Kathleen :Brade

Comment 13

•

23 years ago

I think this is fixed; resolve so QA can test.

Status: NEW → RESOLVED

Closed: 23 years ago

OS: Windows 2000 → All

Hardware: PC → All

Resolution: --- → FIXED

Target Milestone: Future → mozilla0.9.3

Justin Watt

Reporter

Comment 14

•

23 years ago

As far as I can tell, this bug is still busted in Mozilla 0.9.4, build: 2001091303 and the most recent nightly build: 2001100403. For example, when text is copied out of this simple html table: <html> <body> <table> <tr> <td>1</td> <td>2</td> </tr><tr> <td>3</td> <td>4</td> </tr> </table> </body> </html> it looks like this: 1 2 3 4 and it should look like this: 1 2 3 4 I believe this bug needs to be re-opened, unless I am missing something.

Kathleen :Brade

Comment 15

•

23 years ago

reopen based on recent testing (comments above)

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Kathleen :Brade

Comment 16

•

23 years ago

-->peterv

Assignee: brade → harishd

Status: REOPENED → NEW

Christopher Hoess (gone)

Comment 17

•

23 years ago

Really reassigning.

Assignee: harishd → peterv

Target Milestone: mozilla0.9.3 → ---

Jesse Ruderman

Updated

•

23 years ago

Blocks: 103522

Heikki Toivonen (remove -bugzilla when emailing directly)

Updated

•

23 years ago

Severity: enhancement → normal

Priority: P4 → --

Whiteboard: [RFE][plaintext] → [plaintext]

Akkana Peck

Comment 18

•

23 years ago

Dmose was just complaining about this on #mozilla. And I've been hitting it a lot trying to copy discussions from chatzilla and paste them into plaintext editors. I'm wasting a lot of time re-formatting stuff that we screwed up at paste time. I looked into it a little to see if the fix was obvious. It turns out this regressed when the "copy encoding" stuff was added, because now <tr> and <td> are no longer passed along to nsPlaintextSerializer, so the serializer doesn't know to do the right thing with them. Adding nsHTMLAtoms::table, nsHTMLAtoms::tr, nsHTMLAtoms::th, and nsHTMLAtoms::td to the list of tags in nsHTMLCopyEncoder::IncludeInContext fixes table copy, but has the side effect that now text copied from inside one table cell (e.g. from the main part of a page which uses tables for formatting) appends a newline after the end (because the tr/table tags are included and the serializer sees a block node closing). We need a way of "including in context" only when the selection *spans* one of these nodes. Joe wrote the copy encoder stuff: Joe, how do we do that? Is that what the "promoted range"/"promoted point" stuff is about? (I'm not sure what they do.) I'll take this bug for now since it's clear that no one else plans to look at it and it's making a lot of people's lives miserable. But I need some guidance from Joe on the code.

Assignee: peterv → akkana

Akkana Peck

Comment 19

•

23 years ago

Haven't had much luck syncing with Joe. Joe, I think this is basically the same issue as the extra line breaks in text copied from pre blocks -- in both cases, we really need to figure out whether the selection crosses a block boundary as opposed to the selection living wholly inside a block boundary. I'm willing to put in some work on this since it's sometimes a major impediment to getting work done, but I don't understand how the current code works. Can you offer some pointers?

Assignee: akkana → jfrancis

Joe Francis

Assignee

Comment 20

•

23 years ago

I'm kinda surprised that the current code doesn't do the right thing here. I just made a table and removed the trailing <br>s in source mode, and expiremented. I see the bug if I select all of the table content by drag- selecting from *inside* the table. But I get the desired behavior if I select- all. The weird thing is that I thought the html stream we put into the clipboard should be identical in those two cases, and only the special data flavor that indicated how "deep" in the structure the real selection endpoints were would be different. Since that latter structure is only used in Composers paste code, and not by the serializer, I would think these two cases would get identical results in the serialier. But they don't. Akkana, I can't really help you unless I just investigate this bug myself. It doesn't make sense to me. The tr's should be getting through.

Target Milestone: --- → mozilla0.9.9

Tanu Mutreja

Comment 21

•

23 years ago

--> myself, after discussing with Nisheeth.

Assignee: jfrancis → tmutreja

Tanu Mutreja

Comment 22

•

23 years ago

Attached patch Patch to fix it... (obsolete) (deleted) — Details — Splinter Review

Akkana, as you mentioned above, changes in IncludeInContext() are causing the regression by adding extra new lines but there seems to be no other way to catch these tags in Serializer. In the patch I'm trying to work around the addition of these new lines.

Akkana Peck

Comment 23

•

23 years ago

Comment on attachment 69655 [details] [diff] [review] Patch to fix it... This does fix the problem of the lacking newlines after <tr>, and looks like a solid fix to the problem, so I'm giving it r=akkana. Ideally I hope jfrancis can take a look at the changes to nsDocumentEncoder.cpp, since he's the expert on what might break when those lists change (but my guess is that it's fine when taken with the rest of this patch). The !mAtFirstColumn added for <td> nodes unfortunately isn't doing its job. I made a page that looked like this: <table> <tr><td>one<td>two<td>three<td>four <tr><td>a <td>b <td>c <td>d </table> and the first line came out: onetwothreefour That's true before this patch, too, so it shouldn't block acceptance of the patch, but it remains a problem to be fixed. (I don't know why the patch didn't fix it -- seems like it should.) I also noticed another problem (unrelated, probably already covered by a bug somewhere else): copy the html source beppe gave, and paste into an external plaintext app (I did this in order to create an html file for testing) -- you lose the initial whitespace from most of the lines. I'm not sure if that's a new problem or not (but it wasn't caused by this patch -- happens even without the patch).

Attachment #69655 - Flags: review+

Ben Bucksch (:BenB)

Comment 24

•

23 years ago

Can you please add a comment to the new member variable, describing its purpose? There are so many variables of this kind already that I think it's needed. [OT] > you lose the initial whitespace from most of the lines. I know that I once filed a bug about that in textareas, not sure about it's status, though. I see the same behaviour you describe, but not in textareas anymore.

Daniel Bratell

Comment 25

•

23 years ago

I second Ben's request for a comment describing the new member variable. Is there a bug for the missing whitespace between table cells? The other missing whitespace is probably (misbehaving?) whitespace compression I would guess.

Tanu Mutreja

Comment 26

•

23 years ago

Attached patch Incorporating above suggestions. (obsolete) (deleted) — Details — Splinter Review

I see the reason why mAtFirstColumn is not working. It's because we are not manipulating it to suit our requirements. So I'm replacing it with a check for "mCurrentLine.IsEmpty()". That solves our purpose. Also included the description of new member variable in this patch. Akkana, would you please explain me about which white space are you talking here: "copy the html source beppe gave, and paste into an external plaintext app (I did this in order to create an html file for testing) -- you lose the initial whitespace from most of the lines." I tried with the testcase provided by beppe and could not understand this properly.

Daniel Bratell

Comment 27

•

23 years ago

Comment on attachment 69841 [details] [diff] [review] Incorporating above suggestions. Ah, so mLineBreakIsDue is kind of a lazy line break, too avoid extra ending lines (which is a problem right now)... Have you tested it in mail too (sending a table as plain text), where the formatting flags are a little different. As for mAtFirstColumn it's only used when doing preformatted output. In the other case we don't wrap until later so we don't know which column text will end up at. mCurrentLine is not always used either. For some settings of the flags it's bypassed and the text is written directly to the output string/stream.

Tanu Mutreja

Comment 28

•

23 years ago

Attached patch Modified to incorporate Daniel's sugestions. (obsolete) (deleted) — Details — Splinter Review

Thanks Daniel, Included a new member variable |mIsFirstTD| to nsPlainTextSerializer. It keeps a track if we are handling first <TD> of a <TR> or <TABLE>. If <TD> is not the first one, we skip adding a new line character to the current text. Using a separate member variable makes it independent of any specific flag setting.

Akkana Peck

Comment 29

•

23 years ago

Re the plaintext paste problems: look at beppe's comment #8. Copy that whole comment. Paste into a plaintext window. It pastes like this: ---- let me code it out for you: <table> <tr> <td>one</td> <td>two <table> <tr> <td>more</td> <td>stuff</td> </tr> </table> </td> </tr> </table> as for the extra added work for complex tables, I would suggest adding a new bug and push it out to future. ---- Notice that all the initial whitespace is lost. I think this is covered by ftang's bug that jfrancis just took, summary something like "plaintext copy/paste is very bad".

Tanu Mutreja

Updated

•

23 years ago

Status: NEW → ASSIGNED

Whiteboard: [plaintext] → [plaintext][needs sr=]

Joe Francis

Assignee

Comment 30

•

23 years ago

Comment on attachment 70063 [details] [diff] [review] Modified to incorporate Daniel's sugestions. This is a no-go. You can't make this change to IncludeInContext(). It will break html paste: copying anything in a table would cause a whole table to be pasted in Composer.

Attachment #70063 - Attachment is obsolete: true

Attachment #70063 - Flags: needs-work+

Tanu Mutreja

Comment 31

•

23 years ago

What is the expected behavior of HTML paste? Should we not get a table pasted on composer? I think we should.

Joe Francis

Assignee

Comment 32

•

23 years ago

Copying a word that happens to be in a table should result in a table being pasted. contributors to this discussion may be interested in reviewing/testing my patch in bug 98572

Joe Francis

Assignee

Comment 33

•

23 years ago

That was SUPPOSED to read: Copying a word that happens to be in a table should NOT result in a table being pasted.

Daniel Bratell

Comment 34

•

23 years ago

The changes to nsPlainTextSerializer looks ok though. I assume you have run the tests (they are part of the tinderbox tests so failing one of them breaks tinderbox), and sent mail with it, verifying that it looks ok. If so, r=bratell@lysator.liu.se for the serializer. You can use that r= in coming patches as long as nsPlainTextSerializer stays the same. I'm very thankful for all your work. There are a lot of small things to fix, that I've never gotten to do.

Joe Francis

Assignee

Comment 35

•

23 years ago

Akkana, I think you're comment #19 is correct. I have to change the copy code to detect block crossings. Unfortunately, before I can do that I have to import a bunch of ws code that I originally wrote for the editor. The reason is you could be "trivially" crossing a block. For example, you could drag select a bunch of text that lives between two lists. But if you do your drag just right (just wrong?) you could actually have the selection starting at the very end of the prior list, and ending at the very beginning of the following list. Then, if I detect block crossings at copy time, I will force two single item lists (with empty items) to be output: one before your copied text and one after. This is hardly what the user wanted. I'm in the process of working on this issue this week. Once it is fixed the serializer should see table cells and rows, etc, for selections that are inside of tables and span table structure. Assigning to myself pending copy work. If there remains serialzer work after I improve copy I will assign back at that point.

Assignee: tmutreja → jfrancis

Status: ASSIGNED → NEW

Tanu Mutreja

Comment 36

•

23 years ago

*** Bug 125658 has been marked as a duplicate of this bug. ***

Joe Francis

Assignee

Comment 37

•

23 years ago

for the moment putting in moz1.1. hoping to bring it back but cant right now due to nervousness by the bug gods.

Status: NEW → ASSIGNED

Target Milestone: mozilla0.9.9 → mozilla1.1

rubydoo123

Comment 38

•

23 years ago

removing myself from the cc list

Akkana Peck

Comment 39

•

23 years ago

*** Bug 134776 has been marked as a duplicate of this bug. ***

Brant Gurganus

Comment 40

•

23 years ago

From bug 134776, this occurs in Chatzilla and is very troublesome to copy and paste discussions of bugs, etc. Hope this will increase priority/effort a little.

Akkana Peck

Comment 41

•

23 years ago

*** Bug 136830 has been marked as a duplicate of this bug. ***

Akkana Peck

Comment 42

•

23 years ago

*** Bug 103522 has been marked as a duplicate of this bug. ***

Alfonso Martinez

Comment 43

•

22 years ago

*** Bug 153858 has been marked as a duplicate of this bug. ***

Joe Francis

Assignee

Comment 44

•

22 years ago

The trunk is the wave of the future!

Target Milestone: mozilla1.1alpha → mozilla1.1beta

Joe Francis

Assignee

Comment 45

•

22 years ago

Heh. I looked into this some more and discovered what is really going on. The problem is that the parser is dropping partial table structure on the floor. The docencoder is indeed producing the tr's and td's, and clipboard is receiving them. When the clipboard code tries to convert this to plaintext, it calls the parser. The parser just drops table elements on the floor if it doesnt already have a table on it's tag stack. So we get just the contents of the td's. the parser needs to not drop the tr's/td's. Perhaps there should be a seperate mode for the parser where it understands it is just parsing a fragment, instead of a document. The clipboard code could then call the parser in this mode. Over to parser folks. CC'ing pinkerton for clipboard.

Component: DOM to Text Conversion → Parser

Joe Francis

Assignee

Comment 46

•

22 years ago

reassigning

Assignee: jfrancis → harishd

Status: ASSIGNED → NEW

QA Contact: sujay → moied

Heikki Toivonen (remove -bugzilla when emailing directly)

Comment 47

•

22 years ago

Boris was thinking about the fragment stuff at some point I believe, adding him to Cc.

Joe Francis

Assignee

Comment 48

•

22 years ago

back to me cuz i have a fix. the patch i am attaching depends on the patch in bug 159842.

Assignee: harishd → jfrancis

Joe Francis

Assignee

Comment 49

•

22 years ago

Attached patch widget patch to tell parser we are parsing fragment (deleted) — Details — Splinter Review

Attachment #69655 - Attachment is obsolete: true

Attachment #69841 - Attachment is obsolete: true

Joe Francis

Assignee

Updated

•

22 years ago

Status: NEW → ASSIGNED

Whiteboard: [plaintext][needs sr=] → [plaintext]

Target Milestone: mozilla1.1beta → M1

Joe Francis

Assignee

Updated

•

22 years ago

Whiteboard: [plaintext] → [plaintext] fixinhand

Alfonso Martinez

Comment 50

•

22 years ago

*** Bug 162488 has been marked as a duplicate of this bug. ***

Alfonso Martinez

Comment 51

•

22 years ago

*** Bug 141862 has been marked as a duplicate of this bug. ***

Heikki Toivonen (remove -bugzilla when emailing directly)

Comment 52

•

22 years ago

I don't think M1 is the correct milestone...

Target Milestone: M1 → ---

Kathleen :Brade

Comment 53

•

22 years ago

resetting milestone; M1 is correct milestone if you want jfrancis to fix it soon.

Target Milestone: --- → M1

Heikki Toivonen (remove -bugzilla when emailing directly)

Comment 54

•

22 years ago

Oh, sorry. But it would be good if everyone stuck to the same meanings of milestone fields and other fields. Most(?) teams uses milestone to set the milestone to aim for, and priority to fine tune the order in which to fix the bugs in a milestone. Also, we have people estimating glide paths for releases etc. based on number of bugs in milestones. Using nonstandard milestones screws these estimates.

kinmoz

Comment 55

•

22 years ago

Comment on attachment 93106 [details] [diff] [review] widget patch to tell parser we are parsing fragment sr=kin@netscape.com

Attachment #93106 - Flags: superreview+

Joe Francis

Assignee

Updated

•

22 years ago

Depends on: 159842

Joe Francis

Assignee

Comment 56

•

22 years ago

fix landed on trunk

Status: ASSIGNED → RESOLVED

Closed: 23 years ago → 22 years ago

Resolution: --- → FIXED

Charles Rosendahl

Comment 57

•

22 years ago

Please verify this bug

Whiteboard: [plaintext] fixinhand → [plaintext] fixinhand edt_x3

rubydoo123

Comment 58

•

22 years ago

using the trunk build from 2003040708 on win2K. I created a simple table, viewed it in the browser, set my mail compose to be plaintext. And then copied & pasted the data from the browser window to the mail compose window. testing a simple table: <table> <tr> <td>one</td> <td>two</td> </tr> <tr> <td>more</td> <td>stuff</td> </tr> </table> results in this output: one two more stuff If I remove the content 'more' and do the same copy/paste - the ws is preserved.

Status: RESOLVED → VERIFIED

Jesse Ruderman

Comment 59

•

21 years ago

See also bug 236546, same bug when copying table cells/rows/columns with Ctrl.

Patch to fix it... 23 years ago Tanu Mutreja (deleted), patch	akkzilla : review+	Details \| Diff \| Splinter Review
Incorporating above suggestions. 23 years ago Tanu Mutreja (deleted), patch		Details \| Diff \| Splinter Review
Modified to incorporate Daniel's sugestions. 23 years ago Tanu Mutreja (deleted), patch		Details \| Diff \| Splinter Review
widget patch to tell parser we are parsing fragment 22 years ago Joe Francis (deleted), patch	kinmoz : superreview+	Details \| Diff \| Splinter Review