Closed
Bug 94176
Opened 23 years ago
Closed 22 years ago
Text copied from table with multiple rows has no CR/LF
Categories
(Core :: DOM: HTML Parser, defect)
Core
DOM: HTML Parser
Tracking
()
VERIFIED
FIXED
M1
People
(Reporter: justincwatt, Assigned: mozeditor)
References
()
Details
(Keywords: regression, Whiteboard: [plaintext] fixinhand edt_x3)
Attachments
(1 file, 3 obsolete files)
(deleted),
patch
|
kinmoz
:
superreview+
|
Details | Diff | Splinter Review |
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:0.9.1)
Gecko/20010607 Netscape6/6.1b1
BuildID: 2001080110
As described on the following page:
http://www.unc.edu/~jwatt/tablecopyproblem.html
...when text is copied out of a table with multiple rows and into a text editor
(or email), there is no CR/LF (carriage return/line feed) between each row,
therefore the table data is copied out of the table as one single line of text.
Reproducible: Always
Steps to Reproduce:
1. Copy text from table on http://www.unc.edu/~jwatt/tablecopyproblem.html
2. Paste in Notepad (or similar text editor) with auto word-wrap off
Actual Results: Text from multiple table rows is pasted as one (infinitely)
lone line of text
Expected Results: There should be a CRLF after each row in the paste text
Comment 1•23 years ago
|
||
sounds like a serializer issue in Navigator; reassign
Assignee: mjudge → anthonyd
Component: Selection → DOM to Text Conversion
QA Contact: tpreston → sujay
Comment 2•23 years ago
|
||
so the request would be that a cr/lf be inserted for every <tr>? SO what would
be the expected behavior for nested tables?
Severity: normal → enhancement
Priority: -- → P4
Whiteboard: [RFE][plaintext]
Target Milestone: --- → Future
Comment 3•23 years ago
|
||
This is a regression. We used to append newlines at the end of a table row. I
wonder if this is another regression from the noxif landing? Cc'ing Ben and
Daniel since they worked on the original code and probably want to know about
the regression.
Beth: for nested tables, it won't look like the original. Nobody ever had time
to implement really smart table serializing code, though we all thought it would
be nice to do some day. But we did have it looking reasonable for simple
tables, at one point.
Keywords: regression
Comment 4•23 years ago
|
||
In nsPlaintextSerializer::DoCloseContainer:
else if ((type == eHTMLTag_tr) ||
(type == eHTMLTag_li) ||
(type == eHTMLTag_dt)) {
// Items that should always end a line, but get no more whitespace
EnsureVerticalSpace(0);
}
So either EnsureVerticalSpace has broken so that it doesn't actually end the
line any more, or somehow we're not getting to that point (hard to see how that
could be, though, it's the first clause in DoCloseContainer after the check for
body or html). Ben, Daniel, any idea whether there have been recent regressions
in EnsureVerticalSpace(0)?
Status: UNCONFIRMED → NEW
Ever confirmed: true
Comment 5•23 years ago
|
||
> Ben, [...] any idea whether there have been recent regressions
> in EnsureVerticalSpace(0)?
No.
Did you try to add a |printf| to that |if|-block?
Comment 6•23 years ago
|
||
I was wondering how it would look, for example if I have a table that is 1x2,
each cell has text and the 2nd cell has a nested table in it, would that nested
table be indented in regard to the content of the cell?
|-----|----------------|
|one | two |
| | |------------| |
| | |more | stuff| |
| | |------------| |
|-----|----------------|
so would the paste look like this?
one
two
more stuff
or would it look like this?
one
two more stuff
Comment 7•23 years ago
|
||
The algorithm that we had before the regression inserted a newline after every
tr and between tables (whether they were nested or not).
I'm a little confused about your example since the border goes around "more" and
"stuff" but not around "two", and to get that I think we'd need three levels of
table nesting.
But this is somewhat academic -- the issue is that if we don't put any newlines
into tables when we convert them, we'll get really long lines and all tables
(with more than one row) will be guaranteed hard to read. If we do put newlines
in (as we used to before the regression) then simple tables will be easy to
read, and complex ones will be no worse. If we want complex tables to be easy
to read as well, then we need to make it a priority and put some manpower on it.
(I'd estimate two weeks for something halfway decent, more to do proper
alignment on pages that use tables for formatting).
Comment 8•23 years ago
|
||
let me code it out for you:
<table>
<tr>
<td>one</td>
<td>two
<table>
<tr>
<td>more</td>
<td>stuff</td>
</tr>
</table>
</td>
</tr>
</table>
as for the extra added work for complex tables, I would suggest adding a new bug
and push it out to future.
Reporter | ||
Comment 9•23 years ago
|
||
Another related problem... (related to Composer)
When I make a generic 2x2 table in Composer, the generated source looks like
this (pay attention to the <BR>'s:
<table cellpadding="2" cellspacing="2" border="1" width="100%">
<tbody>
<tr>
<td valign="Top">one<br>
</td>
<td valign="Top">two<br>
</td>
</tr>
<tr>
<td valign="Top">three<br>
</td>
<td valign="Top">four<br>
</td>
</tr>
</tbody>
</table>
The automatically generated <BR> tags after the data in each <TD></TD> makes the
text copied out of the table look like this:
one
two
three
four
Yuck! Does this deserve a Composer-related bug? I wonder if the text copied out
of the table above will look like the following when this bug (bug 94176) is fixed ?
one
two
three
four
Also, why is there an initial space before the "one" ?
Comment 10•23 years ago
|
||
IMO: the editor adding those br tags is a bug and we shouldn't do that, but
that's a separate issue, not part of this bug. I'm not sure if Joe has a bug on
that or not.
Offhand, I would expect Beth's sample to output as:
one two
more stuff
because the nested <table> tag is a block element and would cause an
EnsureVerticalSpace when it opens.
And in fact, that's pretty close to what happens now -- the example doesn't
trigger this bug, because there's no <tr> between table rows; the only <tr>
which should show up is complicated by a <table> tag which triggers a newline
all by itself. I see an extra space between one and two, which I expect is
covered by another bug (I know we have several bugs on extra spaces showing up)
but otherwise the output is what I predicted above.
Comment 11•23 years ago
|
||
The bug on being smarter about tables was filed long ago, and is bug 18012,
Support Tables in Plaintext Output.
Comment 13•23 years ago
|
||
I think this is fixed; resolve so QA can test.
Status: NEW → RESOLVED
Closed: 23 years ago
OS: Windows 2000 → All
Hardware: PC → All
Resolution: --- → FIXED
Target Milestone: Future → mozilla0.9.3
Reporter | ||
Comment 14•23 years ago
|
||
As far as I can tell, this bug is still busted in Mozilla 0.9.4, build:
2001091303 and the most recent nightly build: 2001100403.
For example, when text is copied out of this simple html table:
<html>
<body>
<table>
<tr>
<td>1</td>
<td>2</td>
</tr><tr>
<td>3</td>
<td>4</td>
</tr>
</table>
</body>
</html>
it looks like this:
1 2 3 4
and it should look like this:
1 2
3 4
I believe this bug needs to be re-opened, unless I am missing something.
Comment 15•23 years ago
|
||
reopen based on recent testing (comments above)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 17•23 years ago
|
||
Really reassigning.
Assignee: harishd → peterv
Target Milestone: mozilla0.9.3 → ---
Updated•23 years ago
|
Severity: enhancement → normal
Priority: P4 → --
Whiteboard: [RFE][plaintext] → [plaintext]
Comment 18•23 years ago
|
||
Dmose was just complaining about this on #mozilla. And I've been hitting it a
lot trying to copy discussions from chatzilla and paste them into plaintext
editors. I'm wasting a lot of time re-formatting stuff that we screwed up at
paste time.
I looked into it a little to see if the fix was obvious. It turns out this
regressed when the "copy encoding" stuff was added, because now <tr> and <td>
are no longer passed along to nsPlaintextSerializer, so the serializer doesn't
know to do the right thing with them.
Adding nsHTMLAtoms::table, nsHTMLAtoms::tr, nsHTMLAtoms::th, and nsHTMLAtoms::td
to the list of tags in nsHTMLCopyEncoder::IncludeInContext fixes table copy, but
has the side effect that now text copied from inside one table cell (e.g. from
the main part of a page which uses tables for formatting) appends a newline
after the end (because the tr/table tags are included and the serializer sees a
block node closing).
We need a way of "including in context" only when the selection *spans* one of
these nodes. Joe wrote the copy encoder stuff: Joe, how do we do that? Is that
what the "promoted range"/"promoted point" stuff is about? (I'm not sure what
they do.)
I'll take this bug for now since it's clear that no one else plans to look at it
and it's making a lot of people's lives miserable. But I need some guidance
from Joe on the code.
Assignee: peterv → akkana
Comment 19•23 years ago
|
||
Haven't had much luck syncing with Joe. Joe, I think this is basically the same
issue as the extra line breaks in text copied from pre blocks -- in both cases,
we really need to figure out whether the selection crosses a block boundary as
opposed to the selection living wholly inside a block boundary.
I'm willing to put in some work on this since it's sometimes a major impediment
to getting work done, but I don't understand how the current code works. Can
you offer some pointers?
Assignee: akkana → jfrancis
Assignee | ||
Comment 20•23 years ago
|
||
I'm kinda surprised that the current code doesn't do the right thing here. I
just made a table and removed the trailing <br>s in source mode, and
expiremented. I see the bug if I select all of the table content by drag-
selecting from *inside* the table. But I get the desired behavior if I select-
all. The weird thing is that I thought the html stream we put into the clipboard
should be identical in those two cases, and only the special data flavor that
indicated how "deep" in the structure the real selection endpoints were would be
different. Since that latter structure is only used in Composers paste code, and
not by the serializer, I would think these two cases would get identical results
in the serialier. But they don't.
Akkana, I can't really help you unless I just investigate this bug myself. It
doesn't make sense to me. The tr's should be getting through.
Target Milestone: --- → mozilla0.9.9
Comment 22•23 years ago
|
||
Akkana, as you mentioned above, changes in IncludeInContext() are causing the
regression by adding extra new lines but there seems to be no other way to
catch these tags in Serializer. In the patch I'm trying to work around the
addition of these new lines.
Comment 23•23 years ago
|
||
Comment on attachment 69655 [details] [diff] [review]
Patch to fix it...
This does fix the problem of the lacking newlines after <tr>, and looks like a
solid fix to the problem, so I'm giving it r=akkana. Ideally I hope jfrancis
can take a look at the changes to nsDocumentEncoder.cpp, since he's the expert
on what might break when those lists change (but my guess is that it's fine
when taken with the rest of this patch).
The !mAtFirstColumn added for <td> nodes unfortunately isn't doing its job. I
made a page that looked like this:
<table>
<tr><td>one<td>two<td>three<td>four
<tr><td>a <td>b <td>c <td>d
</table>
and the first line came out:
onetwothreefour
That's true before this patch, too, so it shouldn't block acceptance of the
patch, but it remains a problem to be fixed. (I don't know why the patch
didn't fix it -- seems like it should.)
I also noticed another problem (unrelated, probably already covered by a bug
somewhere else):
copy the html source beppe gave, and paste into an external plaintext app (I
did this in order to create an html file for testing) -- you lose the initial
whitespace from most of the lines. I'm not sure if that's a new problem or not
(but it wasn't caused by this patch -- happens even without the patch).
Attachment #69655 -
Flags: review+
Comment 24•23 years ago
|
||
Can you please add a comment to the new member variable, describing its purpose?
There are so many variables of this kind already that I think it's needed.
[OT]
> you lose the initial whitespace from most of the lines.
I know that I once filed a bug about that in textareas, not sure about it's
status, though. I see the same behaviour you describe, but not in textareas anymore.
Comment 25•23 years ago
|
||
I second Ben's request for a comment describing the new member variable.
Is there a bug for the missing whitespace between table cells?
The other missing whitespace is probably (misbehaving?) whitespace compression I
would guess.
Comment 26•23 years ago
|
||
I see the reason why mAtFirstColumn is not working. It's because we are not
manipulating it to suit our requirements. So I'm replacing it with a check for
"mCurrentLine.IsEmpty()". That solves our purpose.
Also included the description of new member variable in this patch.
Akkana, would you please explain me about which white space are you talking
here:
"copy the html source beppe gave, and paste into an external plaintext app (I
did this in order to create an html file for testing) -- you lose the initial
whitespace from most of the lines."
I tried with the testcase provided by beppe and could not understand this
properly.
Comment 27•23 years ago
|
||
Comment on attachment 69841 [details] [diff] [review]
Incorporating above suggestions.
Ah, so mLineBreakIsDue is kind of a lazy line break, too avoid extra ending
lines (which is a problem right now)...
Have you tested it in mail too (sending a table as plain text), where the
formatting flags are a little different.
As for mAtFirstColumn it's only used when doing preformatted output. In the
other case we don't wrap until later so we don't know which column text will
end up at. mCurrentLine is not always used either. For some settings of the
flags it's bypassed and the text is written directly to the output
string/stream.
Comment 28•23 years ago
|
||
Thanks Daniel,
Included a new member variable |mIsFirstTD| to nsPlainTextSerializer. It keeps
a track if we are handling first <TD> of a <TR> or <TABLE>. If <TD> is not the
first one, we skip adding a new line character to the current text. Using a
separate member variable makes it independent of any specific flag setting.
Comment 29•23 years ago
|
||
Re the plaintext paste problems: look at beppe's comment #8. Copy that whole
comment. Paste into a plaintext window. It pastes like this:
----
let me code it out for you:
<table>
<tr>
<td>one</td>
<td>two
<table>
<tr>
<td>more</td>
<td>stuff</td>
</tr>
</table>
</td>
</tr>
</table>
as for the extra added work for complex tables, I would suggest adding a new bug
and push it out to future.
----
Notice that all the initial whitespace is lost.
I think this is covered by ftang's bug that jfrancis just took, summary
something like "plaintext copy/paste is very bad".
Updated•23 years ago
|
Status: NEW → ASSIGNED
Whiteboard: [plaintext] → [plaintext][needs sr=]
Assignee | ||
Comment 30•23 years ago
|
||
Comment on attachment 70063 [details] [diff] [review]
Modified to incorporate Daniel's sugestions.
This is a no-go. You can't make this change to IncludeInContext(). It will
break html paste: copying anything in a table would cause a whole table to be
pasted in Composer.
Attachment #70063 -
Attachment is obsolete: true
Attachment #70063 -
Flags: needs-work+
Comment 31•23 years ago
|
||
What is the expected behavior of HTML paste? Should we not get a table pasted on
composer? I think we should.
Assignee | ||
Comment 32•23 years ago
|
||
Copying a word that happens to be in a table should result in a table being pasted.
contributors to this discussion may be interested in reviewing/testing my patch
in bug 98572
Assignee | ||
Comment 33•23 years ago
|
||
That was SUPPOSED to read:
Copying a word that happens to be in a table should NOT result in a table being
pasted.
Comment 34•23 years ago
|
||
The changes to nsPlainTextSerializer looks ok though. I assume you have run the
tests (they are part of the tinderbox tests so failing one of them breaks
tinderbox), and sent mail with it, verifying that it looks ok.
If so, r=bratell@lysator.liu.se for the serializer. You can use that r= in
coming patches as long as nsPlainTextSerializer stays the same. I'm very
thankful for all your work. There are a lot of small things to fix, that I've
never gotten to do.
Assignee | ||
Comment 35•23 years ago
|
||
Akkana, I think you're comment #19 is correct. I have to change the copy code to
detect block crossings. Unfortunately, before I can do that I have to import a
bunch of ws code that I originally wrote for the editor. The reason is you could
be "trivially" crossing a block. For example, you could drag select a bunch of
text that lives between two lists. But if you do your drag just right (just
wrong?) you could actually have the selection starting at the very end of the
prior list, and ending at the very beginning of the following list. Then, if I
detect block crossings at copy time, I will force two single item lists (with
empty items) to be output: one before your copied text and one after.
This is hardly what the user wanted.
I'm in the process of working on this issue this week. Once it is fixed the
serializer should see table cells and rows, etc, for selections that are inside
of tables and span table structure.
Assigning to myself pending copy work. If there remains serialzer work after I
improve copy I will assign back at that point.
Assignee: tmutreja → jfrancis
Status: ASSIGNED → NEW
Comment 36•23 years ago
|
||
*** Bug 125658 has been marked as a duplicate of this bug. ***
Assignee | ||
Comment 37•23 years ago
|
||
for the moment putting in moz1.1. hoping to bring it back but cant right now due
to nervousness by the bug gods.
Status: NEW → ASSIGNED
Target Milestone: mozilla0.9.9 → mozilla1.1
Comment 38•23 years ago
|
||
removing myself from the cc list
Comment 39•23 years ago
|
||
*** Bug 134776 has been marked as a duplicate of this bug. ***
Comment 40•23 years ago
|
||
From bug 134776, this occurs in Chatzilla and is very troublesome to copy and
paste discussions of bugs, etc. Hope this will increase priority/effort a little.
Comment 41•23 years ago
|
||
*** Bug 136830 has been marked as a duplicate of this bug. ***
Comment 42•23 years ago
|
||
*** Bug 103522 has been marked as a duplicate of this bug. ***
Comment 43•22 years ago
|
||
*** Bug 153858 has been marked as a duplicate of this bug. ***
Assignee | ||
Comment 44•22 years ago
|
||
The trunk is the wave of the future!
Target Milestone: mozilla1.1alpha → mozilla1.1beta
Assignee | ||
Comment 45•22 years ago
|
||
Heh. I looked into this some more and discovered what is really going on. The
problem is that the parser is dropping partial table structure on the floor.
The docencoder is indeed producing the tr's and td's, and clipboard is receiving
them. When the clipboard code tries to convert this to plaintext, it calls the
parser. The parser just drops table elements on the floor if it doesnt already
have a table on it's tag stack. So we get just the contents of the td's.
the parser needs to not drop the tr's/td's. Perhaps there should be a seperate
mode for the parser where it understands it is just parsing a fragment, instead
of a document. The clipboard code could then call the parser in this mode.
Over to parser folks. CC'ing pinkerton for clipboard.
Component: DOM to Text Conversion → Parser
Assignee | ||
Comment 46•22 years ago
|
||
reassigning
Assignee: jfrancis → harishd
Status: ASSIGNED → NEW
QA Contact: sujay → moied
Boris was thinking about the fragment stuff at some point I believe, adding him
to Cc.
Assignee | ||
Comment 48•22 years ago
|
||
back to me cuz i have a fix. the patch i am attaching depends on the patch in
bug 159842.
Assignee: harishd → jfrancis
Assignee | ||
Comment 49•22 years ago
|
||
Attachment #69655 -
Attachment is obsolete: true
Attachment #69841 -
Attachment is obsolete: true
Assignee | ||
Updated•22 years ago
|
Status: NEW → ASSIGNED
Whiteboard: [plaintext][needs sr=] → [plaintext]
Target Milestone: mozilla1.1beta → M1
Assignee | ||
Updated•22 years ago
|
Whiteboard: [plaintext] → [plaintext] fixinhand
Comment 50•22 years ago
|
||
*** Bug 162488 has been marked as a duplicate of this bug. ***
Comment 51•22 years ago
|
||
*** Bug 141862 has been marked as a duplicate of this bug. ***
I don't think M1 is the correct milestone...
Target Milestone: M1 → ---
Comment 53•22 years ago
|
||
resetting milestone; M1 is correct milestone if you want jfrancis to fix it soon.
Target Milestone: --- → M1
Oh, sorry. But it would be good if everyone stuck to the same meanings of
milestone fields and other fields. Most(?) teams uses milestone to set the
milestone to aim for, and priority to fine tune the order in which to fix the
bugs in a milestone.
Also, we have people estimating glide paths for releases etc. based on number of
bugs in milestones. Using nonstandard milestones screws these estimates.
Comment 55•22 years ago
|
||
Comment on attachment 93106 [details] [diff] [review]
widget patch to tell parser we are parsing fragment
sr=kin@netscape.com
Attachment #93106 -
Flags: superreview+
Assignee | ||
Comment 56•22 years ago
|
||
fix landed on trunk
Status: ASSIGNED → RESOLVED
Closed: 23 years ago → 22 years ago
Resolution: --- → FIXED
Comment 57•22 years ago
|
||
Please verify this bug
Whiteboard: [plaintext] fixinhand → [plaintext] fixinhand edt_x3
Comment 58•22 years ago
|
||
using the trunk build from 2003040708 on win2K. I created a simple table, viewed
it in the browser, set my mail compose to be plaintext. And then copied & pasted
the data from the browser window to the mail compose window.
testing a simple table:
<table>
<tr>
<td>one</td>
<td>two</td>
</tr>
<tr>
<td>more</td>
<td>stuff</td>
</tr>
</table>
results in this output:
one two
more stuff
If I remove the content 'more' and do the same copy/paste - the ws is preserved.
Status: RESOLVED → VERIFIED
Comment 59•21 years ago
|
||
See also bug 236546, same bug when copying table cells/rows/columns with Ctrl.
You need to log in
before you can comment on or make changes to this bug.
Description
•