Open Bug 62188 Opened 24 years ago Updated 2 years ago

Support start number for lists in plain text output

Categories

(Core :: DOM: Serializers, defect, P3)

defect

Tracking

()

REOPENED

People

(Reporter: bratell, Unassigned)

References

(Depends on 1 open bug)

Details

(Whiteboard: [#140541 will fix this])

Attachments

(1 file, 3 obsolete files)

Spinoff of bug 61173. We will start all numbered lists on 1 regardless if someone copy'n'pastes the number 4711 to 4717 in a list. There is support in html to specify number of an item in a list. We should look at that in the plaintext converter.
Bratell, would you please provide some more information about this bug. I tried with following: I copied a list of following from the browser : 1. one 2. two 3. three 4. four then pasted it on notepad and got the following : 1. one 2. two 3. three 4. four but when I copied : 2. two 3. three 4. four and pasted it on notepad I got : * two * three * four whereas if I pasted the same on composer I got: 1. two 2. three 3. four Do you mean that the numbering is maintained during HTML serialization but not in PlainTextSerialization. Precisely, do we want 1. two 2. three 3. four to appear while copying on the notepad.
What I meant this bug for was to add support to nsPlainTextSerializer for the start attribute in <ol>. That way the editor/copy code could add that attribute to the copied list items so that they would have the same numbers in the cut'n'paste. If the list is a ranking and I copy the part between 80 and 89 into a list to show how bad my friend is, it's bad that the output list is numbered 1 to 10. That you got an unordered list must be because the copy code doesn't output a <ol> around the copied <li>:s. Maybe this is the reason they don't? So to answer your question, I want notepad to get the following: 2. two 3. three 4. four Anyway, it's problably quite simple and safe to add the support in nsPlainTextSerializer. May be a one-liner almost, but I have bigger fish to catch right now so I will probably not look at it sometime soon and when that is done the copy code have to be changed to use the start attribute and I have no idea how simple or difficult that is.
Attached patch Patch to fix this. (obsolete) (deleted) — Splinter Review
Currently OL and LI tags are not being passed to the serializer so we have no clue about the ordinal number of LI elements. Problem got more complicated in order to take care of nested lists, partially selected ordered lists (where we may completely loose track of the ordinal number for LI if "value" attribute is not explicitly set with that LI) and to keep the view source cleaner to maximum possible extent. Also we could not set any other temporary attribute for internal processing of LI value as that attribute would not be interpreted correctly if we paste the content to any other HTML editor. I made following modifications to the code to take care of copying list items with 'n' levels of nesting and pasting them to some plain text editor / HTML editor / MSWord etc. : 1. Added OL and LI in nsHTMLCopyEncoder::IncludeInContext(). 2. Added a member variable |mIsCopying| to nsDocumentEncoder, which is true only when the content is coming via nsHTMLCOPYEncoder. 3. While copying the partial list (more complicated when the selected part is having nested lists too) as we loose track of the actual ordinal number of LI elements if it's not explicitly set, we added a temporary "value" attribute to the first LI element of an OL within the selected range. Based on this, ordinal numbers for the remaining LIs are calculated during serialization. 4. To get the value of this temporary attribute, we start from the first selected LI item and traverse the DOM Tree back till we get either a previous sibling LI node with an explicit "value" attribute or there is no more previous sibling to traverse. In case we don't get any sibling node with a value attribute, we need to decide the ordinal number based on "start" attribute of the parent OL node. If parent OL does not contain explicitly mentioned "start" attribute, set it to a default value of 0. Also to get the right ordinal, we need to keep track of the number of previous siblings we traversed. 5. Now as the "value" attributes are added to the original DOM Nodes, we have to avoid displaying them while "view source". To take care of this, as soon as we add that temporary "value" attribute we push the node in a stack and later when the destructor of nsDocumentEncoder is called after the copy part is done, we clean up nodes kept in the stack for removal of the added "value" attribute. Remove the stack once this clean up is done. 6. By this time, clipboard content is supposed to contain clearly mentioned ordinal number wherever required, so during HTMLSerialization ordinal numbers are correctly generated. 7. In PlainText Serializer, we defined a data structure a keep start attribute of an OL and number of LI nodes traversed. Later whenever LI tag is encountered, we check if it contains explicit "value" attribute otherwise we read the stack to get the start value and number of siblings traversed before(For that OL). Addition of these two values gives the right ordinal value for the LI node.
Comment on attachment 69038 [details] [diff] [review] Patch to fix this. I can only comment on the nsPlainTextSerializer part but I think you're making things a little more complicated than needed. In the nsPlainTextSerializer there is already a stack that is used to keep track of the next number for the current ordered list, mOLStack. you see the mOLStack[mOLStackIndex-1]++, where the number is increased (removed by your patch). When you encounter the value attribute you can just set that value directly instead of increasing by one. I don't see you ever using the startval number you're saving, except for copying it to other state structs. Just using the existing stack should make your patch much smaller.
Giving the bug to Tanu, since he/she has made good progress on it.
Assignee: bratell → tmutreja
RemoveElementAt might be a little cleaner than ReplaceElementAt(nsnull...), though I expect the implementations are pretty much the same. I agree with Daniel about using the existing stack; or if you can't use it, please delete it or integrate it into your new variable, so we don't have two different OL stacks being maintained.
Attached patch Incorporating the suggestions... (obsolete) (deleted) — Splinter Review
Thanks Daniel & Akkana! I think I really complicated the stuff. Modifying nsPlainTextSerializer here, as per your suggestions.
Keywords: patch
Whiteboard: [patch need r/sr=]
Comment on attachment 69222 [details] [diff] [review] Incorporating the suggestions... Much simpler, thanks! Looks fine to me, r=akkana. Personally I'd probably move the declaration of startAttr inside the next block along with startVal, but I'll leave the decision on that to you.
Attachment #69222 - Flags: review+
Comment on attachment 69222 [details] [diff] [review] Incorporating the suggestions... Yes, this looks fine. If I may ask for a favour, that would be to comment this: if (mULCount + mOLStackIndex == 0) - EnsureVerticalSpace(1); + EnsureVerticalSpace(0); Those constants are fragile since the serializer is used in so many places.
Comment on attachment 69222 [details] [diff] [review] Incorporating the suggestions... I'm not at all happy about us using a "value" attribute in the DOM for storing the number of an list item, this is something that should be dealt with in the text output code, w/o touching the DOM tree. For all we know someone else could've stored valuable information in the "value" attribute on the list items. Keep this information in the text output code.
Attachment #69222 - Flags: needs-work+
Johnny, here we are not setting "value" attribute if LI is already having it. But I see no other way to overcome special cases like the one below: If the original list is something like: <OL> <LI>ONE</LI> <LI VALUE = 100>TWO</LI> <LI>THREE</LI> <OL> <LI>FOUR</LI> </LI>FIVE</LI> </OL> <LI>SIX</LI> </OL> and then if I make a selection for copy starting from the LI having "THREE", we almost loose the context information for this node. I could not see any other way to set an ordinal as 101 for this node. Only if I set a value attribute to this node and pass further, serializer can know the actual ordinal for this one otherwise serializer will always take this with an ordinal as 3. So, as far as I understand your concern here, we are not doing anything with an LI having "value" attribute so there is no possibility of loosing any information. Moreover, as we are manipulating this additional attribute only while copying and cleaning the node as soon as we are done with copy, there should be no eventual impact of this. Once I'm clear with this part, I'll update the patch for incorporating other suggestions too.
Even if we're sure we don't loose value attributes when copying LI's, modifying the DOM tree while doing the copy is IMO not a good design. Why not simply have a trivial hash (plhash, or pldhash) that you store the "value"'s in as you see them (i.e. key on the LI's nsIDOMNode pointer value or something, and store the "value" in the hash based on that key), and then use that when you serialize the document? That would leave the DOM tree alone (and not cause reflows, mutation events, and who knows what as a result of setting the "value" attributes) and isolate this logic completely inside the serializer code. This is not hard, and is IMO well worth doing.
Johnny & Akkana, when we use nsPlainTextSerializer for pasting the content on text editor, we have Parser Nodes instead of DOM nodes. So putting the DOM nodes / elements on hash may not work because while pasting it we will not be having handle to the DOM. (Correct me, if I'm wrong here !!!) Other solution that I can think now is to set the value attribute to LIs while doing HTML serialization(during "copy") in nsHTMLContentSerializer. If feasible, that would just be shifting the logic from nsDocumentEncoder--> nsHTMLContentSerializer. Please suggest if you have anything better in mind and see if the above makes some sense.
If we're serializing to plaintext from parser nodes there is no problem here, right? In that case we'll be outputting every LI that the parser hands us, and we'll have every LI so we can easily compute the numbers for every LI as we serialize them. However, if we're serializing a partial DOM tree into HTML, we'll have a DOM tree, and we can use the hash ahd output the value atributes that are needed to make the output correct.
In this patch I've shifted the logic of setting "value" attributes to LI from nsDocumentEncoder --> nsHTMLContentSerializer and hence we are no more modifying the original DOM tree. Daniel, I'm still continuing with the EnsureVerticalSpace(0), as on skipping this I see 2 new line characters whenever I paste an OL to a plain text editor. Please let me know if I'm missing some cases where you think it mandatory to comment it out.
Please test sending OL:s in plain text mails to. If that looks ok, it's fine with me and r=bratell for the nsPlaintextSerializer part. Just make sure this matches the tests so we don't break tinderbox.
Attachment #70251 - Flags: superreview+
Comment on attachment 70251 [details] [diff] [review] Moving the logic from nsDocumentEncoder to nsHTMLContentSerializer... - In nsHTMLContentSerializer::IsFirstChildOfOL(): + if (state->isFirstListItem) + return PR_TRUE; + else + return PR_FALSE; + } + else + return PR_FALSE; two else-after-return's, doing it this way is much cleaner: + if (state->isFirstListItem) + return PR_TRUE; + + return PR_FALSE; + } + + return PR_FALSE; Other than that, sr=jst
Status: NEW → ASSIGNED
Whiteboard: [patch need r/sr=] → [patch need a=]
Attached patch Including jst's comments. (deleted) — Splinter Review
In nsHTMLContentSerializer::IsFirstChildOfOL(), replaced + if (state->isFirstListItem) + return PR_TRUE; + else + return PR_FALSE; + } + else + return PR_FALSE; by + if (state->isFirstListItem) + return PR_TRUE; + + return PR_FALSE; + } + + return PR_FALSE;
Attachment #69038 - Attachment is obsolete: true
Attachment #69222 - Attachment is obsolete: true
Attachment #70251 - Attachment is obsolete: true
Target Milestone: --- → mozilla1.0
Comment on attachment 71463 [details] [diff] [review] Including jst's comments. a=asa (on behalf of drivers) for checkin to 0.9.9 and the mozilla trunk. Also, moving reviews forward to the current patch. Let's try to keep the relevant reviews attached to the relevant patches. Thanks.
Attachment #71463 - Flags: superreview+
Attachment #71463 - Flags: review+
Attachment #71463 - Flags: approval+
Moving Netscape owned 0.9.9 and 1.0 bugs that don't have an nsbeta1, nsbeta1+, topembed, topembed+, Mozilla0.9.9+ or Mozilla1.0+ keyword. Please send any questions or feedback about this to adt@netscape.com. You can search for "Moving bugs not scheduled for a project" to quickly delete this bugmail.
Target Milestone: mozilla1.0 → mozilla1.2
Whiteboard: [patch need a=] → [has-approval]
Fixed with checkin D:\mozilla\content\base>cvs commit cvs commit: Examining . cvs commit: Examining public cvs commit: Examining src Checking in public/nsIContentSerializer.h; /cvsroot/mozilla/content/base/public/nsIContentSerializer.h,v <-- nsIContentSe rializer.h new revision: 1.10; previous revision: 1.9 done Checking in src/nsDocumentEncoder.cpp; /cvsroot/mozilla/content/base/src/nsDocumentEncoder.cpp,v <-- nsDocumentEncode r.cpp new revision: 1.63; previous revision: 1.62 done Checking in src/nsHTMLContentSerializer.cpp; /cvsroot/mozilla/content/base/src/nsHTMLContentSerializer.cpp,v <-- nsHTMLCont entSerializer.cpp new revision: 1.40; previous revision: 1.39 done Checking in src/nsHTMLContentSerializer.h; /cvsroot/mozilla/content/base/src/nsHTMLContentSerializer.h,v <-- nsHTMLConten tSerializer.h new revision: 1.14; previous revision: 1.13 done Checking in src/nsPlainTextSerializer.cpp; /cvsroot/mozilla/content/base/src/nsPlainTextSerializer.cpp,v <-- nsPlainTextS erializer.cpp new revision: 1.48; previous revision: 1.47 done Checking in src/nsPlainTextSerializer.h; /cvsroot/mozilla/content/base/src/nsPlainTextSerializer.h,v <-- nsPlainTextSer ializer.h new revision: 1.18; previous revision: 1.17 done Checking in src/nsXMLContentSerializer.cpp; /cvsroot/mozilla/content/base/src/nsXMLContentSerializer.cpp,v <-- nsXMLConten tSerializer.cpp new revision: 1.21; previous revision: 1.20 done Checking in src/nsXMLContentSerializer.h; /cvsroot/mozilla/content/base/src/nsXMLContentSerializer.h,v <-- nsXMLContentS erializer.h new revision: 1.10; previous revision: 1.9 done
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
This commit added a compiler warning content/base/src/nsHTMLContentSerializer.cpp:1147 `struct nsHTMLContentSerializer::olState * state' might be used uninitialized in this function Would not it be better to change it : --- nsHTMLContentSerializer.cpp 2002/03/12 08:21:51 1.40 +++ nsHTMLContentSerializer.cpp 2002/03/12 20:47:53 @@ -1144,13 +1144,13 @@ if (parentName.EqualsIgnoreCase("OL")) { olState defaultOLState(0, PR_FALSE); - olState* state; + olState* state=nsnull; if (mOLStateStack.Count() > 0) state = (olState*)mOLStateStack.ElementAt(mOLStateStack.Count()-1); /* Though we should never reach to a "state" as null at this point as all LI are supposed to be inside some OL and OL tag should have pushed a state to the mOLStateStack.*/ - if (!state || mOLStateStack.Count() == 0) + if (!state) state = &defaultOLState; if (state->isFirstListItem)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
This should've been filed as a new bug, but let's not worry about that now. sr=jst for the above patch.
a=asa (on behalf of drivers) for checkin to the 1.0 trunk of the patch in comment 22
Fix checked in for ayn2@cornell.edu D:\mozilla\content\base\src>cvs commit cvs commit: Examining . Checking in nsHTMLContentSerializer.cpp; /cvsroot/mozilla/content/base/src/nsHTMLContentSerializer.cpp,v <-- nsHTMLContentSerializer.cpp new revision: 1.42; previous revision: 1.41 done
Status: REOPENED → RESOLVED
Closed: 23 years ago23 years ago
Resolution: --- → FIXED
I had to back out part of the patch. reopening
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
clearing milestone
Target Milestone: mozilla1.2alpha → ---
Does bug 136595 fit into this discussion at all, or is the list-style-type/CSS stuff another layer on top of it all which should be dealt with separately?
Backed part of the patch: @@ -1300,7 +1303,9 @@ tag.get() == nsHTMLAtoms::h3 || tag.get() == nsHTMLAtoms::h4 || tag.get() == nsHTMLAtoms::h5 || - tag.get() == nsHTMLAtoms::h6) { + tag.get() == nsHTMLAtoms::h6 || + tag.get() == nsHTMLAtoms::ol || + tag.get() == nsHTMLAtoms::li) { return PR_TRUE; } Though the remaining patch has not much meaning without this, we did not back the whole patch out to make sure that the problem to be fixed in future is just a smart addition of OL, LI, UL in nsDocumentEncoder::IncludeInContext(). I'm opening a new bug for that and marking it dependent on that.
Depends on: 140541
Whiteboard: [has-approval] → [#140541 will fix this]
*** Bug 149956 has been marked as a duplicate of this bug. ***
Any update on this? Just running through some old bugs i'm cc'd on and see no movement in almost 7 months
Depends on: 316775
QA Contact: sujay → dom-to-text

The bug assignee didn't login in Bugzilla in the last 7 months, so the assignee is being reset.

Assignee: t_mutreja → nobody
Severity: minor → S4
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: