Open Bug 68489 Opened 24 years ago Updated 2 years ago

Savings from ignorable whitespace

Categories

(Core :: Layout, defect, P3)

defect

Tracking

()

Future

People

(Reporter: rbs, Unassigned)

References

Details

Attachments

(1 file, 2 obsolete files)

[Follow-up to bug 68411] The content model includes whitespace. However, the whitespace can be ignorable in many contexts so that it is not necessary to create any nsTextFrame for them. For example, the inter-tag whitespace in <table>\n<tr>\n<td> are ignorable. Not ignoring them would even be harmful. This is because the first child of the <table> is expected to be its first row, not the nsTextFrame that would have been created in between. A similar problem arises in mathml where, for example, the first child of <mfrac> <mi>a</mi> <mi>b</mi> </mfrac> is expected to be its numerator. In general therefore, whitespace is ignorable when there is an embedded ordering of children that is significant. This is the case within the <table>...</table> and the <math>...</math> environments. Currently, apart from table elements (and soon math elements) no special action is taken to avoid creating nsTextFrame associated to ignorable whitespace. On the contrary, a number of frame owners have been using special after-the-fact hacks to deal with the problem at the frame level. The patch on bug 68411 now provides a general mechanism for any container frame to instruct the frame construction code that it doesn't want ignorable whitespace as children. To do so, the Init() method of the container frame just has to include the statement: mState |= NS_FRAME_EXCLUDE_IGNORABLE_WHITESPACE; (there could be an if clause that causes the addition of the bit to be made only under certain conditions, e.g., if the style context of the frame doesn't have 'white-space: pre') This bug is a request for investigation of other frame classes that can benefit from the optimization. For each case that will be deemed fit for exclusion, there will be two aspects: 1) Just adding the bit in the Init() so that the creation of the nsTextFrames are stopped. This will bring savings in memory. This may also bring about savings is reflow-time, as it will cut down on the hacks that would be reached. 2) Removing the hacks. This is more involved and would require some familiarity with the code. It may appeal to reason to postpone this second aspect at a subsequent stage, when 1) is completed everywhere and has been tested for quite some time.
We *can't* drop the white-space from <table>. table { display: block; white-space: pre; } table * { display: inline; } <table> <tr><td>A</td></tr> <tr><td>B</td></tr> <tr><td>C</td></tr> <tr><td>D</td></tr> </table> ...should render just like: A B C D ...and not ABCD ...or whatever.
Tested, and it worked okay, as expected. It is the ignorable inter-tag whitespace <table>\n<tr>\n<td> that are wed out. ^^ ^^ The ones that may exist elsewhere (e.g., in <td> ...\n... </td>) are left as is.
Some explanations about how the flag work: First, <table> set the bit to say it doesn't want ignorable whitespace children, so all the inter-tag whitespace <tr>...</tr>\n<tr>...</tr>\n, etc, are wed out. Then <tr> set the bit too, so all the inter-tag <td>...<td>\n<td>...<td>\n are wed out. But <td> doesn't set the bit, so it keeps all of its children. If we suppose for a moment that only some of the <tr> set the bit, it would therefore be possible to selectively and independently exclude or include whitespace at any level of the hierarchy. That's in fact the motivation for investigating if any win could come from this scheme in the general frame model.
Ian -- we can drop the whitespace if we check first for 'white-space: pre' (and the 'display' types that will cause that to be recognized).
Ok, my misunderstanding is at which point is this drop done? Is it at the frame construction level, redone each time style is reresolved? If so, then yes, we can just check for the display types, and drop whitespace on table, inline- table, table-row, table-row-group, table-footer-group, table-header-group, table-coloumn, and table-column-group frames (keeping whitespace in table-cell frames of course). But then we can actually drop _all_ non-table frames there, since all such content should get wrapped by the appropriate anonymous frames, and so should never exist there in the first place. If this is done at the parser/content sink level, though, then we can't do any of this (for tables) since we don't know what is a table ahead of time, and it could change dynamically at any time. Note that we shouldn't ever do this based on the tagname of HTML elements; it should be done exclusively on 'display' properties. But then block elements never contain whitespace nodes either, theoretically, since those would be wrapped in anonymous inline boxes, and inline boxes can (if white-space is set to 'normal', the initial value) collapse all contiguous whitespace characters to a single space. (MathML elements are a different matter, since they have display semantics not described by CSS. For MathML, this would indeed be based on the namespace and tagname of the elements.)
Ian - It's done at frame construction time. We still want the whitespace in the content model. Since it's done with frame construction, if we change the style change hint for the 'white-space' property to NS_STYLE_HINT_FRAMECHANGE (or whatever it's called), dynamic changes to the 'white-space' property should be handled correctly.
Yes, currently changes to whitespace only cause a REFLOW hint, so we need to modify that as David suggested to make sure we reframe.
Cool, in that case I see no problems. :-)
Depends on: 68411
I'm willing to work on this early this summer, but if someone wants it done sooner they should take it...
Status: NEW → ASSIGNED
Priority: -- → P1
Target Milestone: --- → mozilla0.9.2
Target Milestone: mozilla0.9.2 → mozilla0.9.3
Priority: P1 → P2
Target Milestone: mozilla0.9.3 → mozilla0.9.4
Target Milestone: mozilla0.9.4 → mozilla0.9.5
Target Milestone: mozilla0.9.5 → mozilla0.9.6
I had a look at the existing code. I don't think it's general enough, since it seems to me it would cause whitespace to be stripped in something like: <p><b>hello</b> <i>world</i></p> if we used it for block frames. Or did I miss something? For block frames, we can always (when 'white-space' is 'pre' or 'nowrap') strip whitespace: * at the beginning or the end * between child block frames, as opposed to child inline frames
It seems to me a more appropriate name for the current state bit would be NS_FRAME_EXCLUDE_WHITESPACE. I can't think of situations other than blocks that would have similar rules for the ones that would apply to blocks, so I don't think we need a second general solution in addition to the general one used for MathML and tables. However, the savings for blocks would be considerable (in terms of memory and performance and simplification of the block code), so I think we should try to do this for the block code. I guess the other frame classes I could think of that might benefit from the existing code are the XUL frames, except I think ignorable whitespace is stripped out in the XUL *content sink*.
It actually would be good to do this for the relevant XUL frames so that XUL display types work better from non-XUL documents.
Target Milestone: mozilla0.9.6 → mozilla0.9.7
Attached patch a hack to test an approach for block frames (obsolete) (deleted) — Splinter Review
Well, this was a little hack to see if I could do this for block frames. It actually doesn't seem all that hard -- except that this patch didn't succeed in getting rid of all of the frames I wanted to (never mind the ones separated by comments). It also crashes when I click on a link...
Oh, and I wonder why ProcessChildren and ProcessBlockChildren still need to be separate...
+ nsCOMPtr<nsIContent> content(*iter); + nsCOMPtr<nsITextContent> textContent = do_QueryInterface(content); + PRBool onlyWhitespace; + if (textContent && + NS_SUCCEEDED(textContent->IsOnlyWhitespace(&onlyWhitespace)) && + onlyWhitespace) could be s//'ed with the existing /*static PRBool*/ IsOnlyWhitespace(*iter) that is somewhere in that file.
Target Milestone: mozilla0.9.7 → Future
Attached patch a real patch to solve half the problem (obsolete) (deleted) — Splinter Review
This patch, after a quick test, seems to do what I intended it do, and is the first half of getting rid of ignorable whitespace at the beginning and end of blocks. What it doesn't deal with is ignorable whitespace within inlines in blocks, e.g., <div><span> <span>hi</span></span></div> which we do need to fix if we're going to use this to allow us to simplify the block code, although only because of block-within-inline cases, such as: <div><span> <span style="display: block">hi</span></span></div> However, this patch should give a bit of the performance and footprint gain (I suspect the majority) that we should get from this. I'll look into doing the second half later -- I may just add it to this patch, although I may decide I want to do enough refactoring to want it to be a separate patch.
Attachment #59019 - Attachment is obsolete: true
Actually, I realize there are other cases this won't handle, like: <div><p>foo</p> <p>foo</p></div> So maybe it's worth getting just this in for the performance improvement?
The patch looks good enough to me. The case "[block-elt] space [block-elt]" is pretty hard because it is not obvious to tell if an element is a block without resolving its style context which requires creating a frame for it in the first place. === typo: + } else if (aContent->IsContentOfType(nsIContent::eCOMMENT) || + aContent->IsContentOfType(nsIContent::eCOMMENT)) { should be: + } else if (aContent->IsContentOfType(nsIContent::eCOMMENT | nsIContent::ePI) === // never create frames for comments on PIs + // XXX It would be nice to move this logic into |ChildIterator| -- + // it's already partly there for |SkipInitialWhitespace| and + // |SkipFinalWhitespace|. if (tag == nsLayoutAtoms::commentTagName || tag == nsLayoutAtoms::processingInstructionTagName) - return rv; + return NS_OK; while there, you could perhaps move this in NeedFrameFor() while using the newly added IsContentOfType() there.
>typo: >+ } else if (aContent->IsContentOfType(nsIContent::eCOMMENT) || >+ aContent->IsContentOfType(nsIContent::eCOMMENT)) { > >should be: >+ } else if (aContent->IsContentOfType(nsIContent::eCOMMENT | nsIContent::ePI) Thanks, although it needs to be two separate calls -- the latter asks whether the content is both a comment and a processing instruction. I'd like to find a better way to do child filtering. We have three types of filtering that we want to do: 1. filtering out comments and processing instructions 2. (1) plus removing all text nodes that are only whitespace 3. (1) plus removing certain text nodes that are only whitespace -- those at the beginning of the child list (ignoring things filtered in (1), as are the following statements), those at the end of the child list, and those in the middle of the child list preceded or followed by a block (although only handling the preceded case would catch the cases of extra lines being created -- except for one inline-block case). I don't really see a good way to do this, yet.
What about trying a two-pass frame creation scheme (something that ocurred to me the other day after sending my earlier comment). In the first-pass, only create frames that are guaranteed to be needed (i.e,. non-textframes, non-comment|pi), whilst marking the other content nodes that are left out (e.g., by enlisting them in a voidarray or a similar structure that keeps track of the prev (and/or next) frame siblings at the insertion points. Then in the second pass, possibly create other frames that should be created [the caveat is that the child list is not claimed to be construted until all children are created]. However, if you are already measuring a sizeable win with the current patch, maybe the current patch is ok for the time being, and no further gymnastics are needed at the moment.
Attachment #62595 - Flags: needs-work+
I just removed this from my tree.
Attachment #62595 - Attachment is obsolete: true
Attachment #83873 - Flags: needs-work+
QA Contact: chrispetersen → layout
Moving to p3 because no activity for at least 1 year(s). See https://github.com/mozilla/bug-handling/blob/master/policy/triage-bugzilla.md#how-do-you-triage for more information
Priority: P2 → P3
Assignee: dbaron → nobody
Status: ASSIGNED → NEW
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: