Closed Bug 68411 Opened 24 years ago Closed 24 years ago

Patch to avoid ignorable whitespace in a general manner

Categories

(Core :: Layout, defect)

defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla0.9

People

(Reporter: rbs, Assigned: rbs)

References

(Blocks 1 open bug)

Details

(Keywords: perf)

Attachments

(5 files)

Currently, frames are created for ignorable whitespace even when these are not necessary. For example, the whitespace between <table>\n<tr>\n<td> should be excluded during the construction of child frames. And indeed, the table code includes a logic at the frame construction level to avoid these ignorable whitespace. Unfortunately, other elements like math (or even xul/xbl) are left to themselves. As a consequence, I was led in the mathml code (see nsMathMLContaineFrame.cpp, for example) to temporarily avoid these whitespace in a _very innefficient_ manner, doing each loop in the form: while { if (!IsOnlyWhiteSpace(childFrame)) { ... } } On the other hand, xbl took the drastic but safe measure of getting rid of the whitespace nodes directly from the content model ! (see nsXBLService::StripWhitespaceNodes()]. This patch aims at extending the solution used by the table code so that other frames can benefit from it too! Essentially, a frame which doesn't want ignorable whitespace as children just has to set the newly added bit: state |= NS_FRAME_EXCLUDE_IGNORABLE_WHITESPACE; Hence, any frame container (most block elements?) will be able to set the bit to get the optimization. (I recollect that the optimization in memory and reflow-time by avoiding ignorable whitespace is something that has been in the air for quite some time, but nobody has taken the time to do it.) The addition of this bit has meant that I had to update the table code to set it there. (But these remained straightforward statements in the corresponding Init() of table related elements.) While adding this bit, I collected the bits that were elsewhere and added them to nsIFrame.h. I am not sure that everybody will be agreable with this collocation. Please let me know if this isn't okay, and I will put them back where they came from. I collected the bits there to avoid collisions in their use. Attachements coming. Of note: the second change in the diff of nsCSSFrameConstructor.cpp is the place where the bit is set for mathml frames, i.e., it doesn't affect other frames (and it is in #ifdef).
Attached file Patch in nsCSSFrameConstructor.cpp (deleted) —
Attached file Patch in the table land (deleted) —
Attached file Existing flags that were collected (deleted) —
Target Milestone: --- → mozilla0.9
This patch, of course, doesn't include the removal of the unneeded code we can rip out once this is done. :-) It might be useful to make sure (by dumping the frame tree in certain cases) that this is really taking out all the unneeded whitespace that we have hacks all over the place to ignore.
Indeed it is not the intention of this bug. But that could well be a follow-up bug. During development, I had the following additional debug radars: + nsFrameState state; + aParentFrame->GetFrameState(&state); + if (NS_FRAME_EXCLUDE_IGNORABLE_WHITESPACE & state) { nsFrame::ListTag(stdout, aParentFrame) printf(" is avoiding ignorable whitespace\n"); ... } else { nsFrame::ListTag(stdout, aParentFrame) printf(" is taking ignorable whitespace\n"); } It helped to see exactly the tags that were (or were not) filtering. I could see numerous tags like <div>, <p>, etc, taking the whitespace. Once the patch is checked in, another bug can be opened and someone familiar with where the hacks are could go ahead and set the bit for the cases, and then remove the hacks. A word of caution though: the present exclusion is made in ConstrutFrameByTag() which caters for the normal case of inter-tag relationship <tag>\n<tag>, like table and mathml. It would therefore be necessary to check that NS_NewTextFrame() is not getting called in any unexpected place (e.g., generated content that is only whitespace, assuming this case is possible).
Keywords: perf
Does this patch make sure that you're not excluding whitespace when 'white-space: pre'? If it doesn't, you'll break PRE and 'white-space: pre'. (Also, once that's done, the style hint for white-space will probably need to be NS_STYLE_HINT_FRAMECHANGE.)
Actually, the patch is not excluding whitespace on the frames. Its intention is only to provide a basis to do so. The whitespace that are excluded at those that the table code was already excluding. I just also added the exclusion of the whitespace not needed for mathml. This is in view of removing the numerous hacks "if (!IsOnlyWhitespace(childFrame))" that are polluting the mathml code. So, the patch by itself is low-risk, while the other much wider task involving other block/inline frames is high-risk and would need more care and scrutiny, _if_ it has to be carried out.
Filed follow-up bug 68489 on the issue of activating the bit in other frames.
r=? sr=?
BTW, apart from four cases from the renowned false positives in the table regression tests, I couldn't detect any weirdness with my changes. Moreover, I fortuitously uncovered, by inspection, a bug that was lingering in the <caption> case and which is going to be fixed by the patch. The previous filtering code was ignoring the whitespace even in the <caption>. This means that <caption> <i>italic</i> \n <b>bold></b> </caption> would have rendered "italicbold" jammed together. However, this situation never happened because of the other bug 58197 that causes nested elements in a <caption> to be dropped altogether. The <caption> (nsTableCaptionFrame) is a nsBlockFrame, and I didn't include the setting of the bit NS_FRAME_EXCLUDE_IGNORABLE_WHITESPACE in the Init() of nsTableCaptionFrame. For completeness, the bit is also not set in nsTableCellFrame -- the previous code didn't exclude whitespace there (<td><td> is very different from <td>\n</td>, isn't it). Apart from these two, the bit is set on all other table elements. The bit is set on all math elements because whitespace is ignorable everywhere within the <math>...</math> environment. (The inter-frame spacing is defined in a custom way, e.g., via the lspace="." or rspace="." attributes which have some default values in the Operator Dictionary.)
I'll let karnaze comment on the table diffs, but this certainly seems like a great idea! As to the consolidation of the frame state flags, I could go either way on this. It's nice to have them in one place; OTOH, I don't think we need to worry about collisions (we shouldn't be running through code that cares about NS_BLOCK_MARGIN_ROOT while reflowing an inline frame, e.g.) Was there a place that you actually saw a collision? (I seem to recall a message indicating that you did, but can't remember where...)
overall, not a bad idea. we'll discuss it this afternoon in our group mtg to see if there are any issues. I want to check with the editor folks because I recall some issue with whitespace and frames having to do with caret painting. I also want to verify that the output system doesn't need these frames to preserve whitespace on output. We just walk the content model, right? Editor folks, please remind us what the issues are here. one mistake: please do not consolidate all the flags into nsIFrame.h. The flags in nsIFrame.h are those that are universal to all frames... // The low 16 bits of the frame state word are reserved by this API. #define NS_FRAME_RESERVED 0x0000FFFF All the other flags are for the use of the specific implementation. By definition, there cannot be meaningful collision between high-order bits between frame types. The place where we have to be careful is if a frame class uses some bits, and other frames are derived from it, then the derived class has to respect the bits reserved by it's parent class.
I assume this will be checking with the style system, and omitting frames only for whitespace which is insignificant given the current style and context? If so, it shouldn't cause any problems I'm aware of (and sounds like a good idea). The output system only cares about the content model, and doesn't know anything about frames (at least currently). So this shouldn't cause a problem with output, as long as the whitespace still has nodes in the content model. (We've talked about having a smarter output system that could output what we're really showing, style and all, but it's not likely that's going to happen in the forseeable future, and even if it did, this would still be okay as long as significant whitespace still had frames.) The caret issue is that we always need a frame on which to draw the caret. As long as there will still be frames for *significant* whitespace, this shouldn't create an editor or caret problem. There have been editor/caret issues involving frames for whitespace, but it's always been related to significant whitespace -- e.g. newlines inside <pre> tags (if we could get frames for those instead of having to translate them to <br> tags, we could make huge performance gains).
what akkana sed
if this is done, be sure to test the case where a block is dynamically attributed with whitespace: pre, to make sure we construct the whitespace frames correctly. If this doesn't work, it's probably because the hint for that particular style change isn't strong enough. Marc A. knows all about this.
[collision of bits] (waterson, buster) I later understood as buster explained, that there wasn't need to worry about clashes in different frames so long as the meanings of bits are preserved across descedants. +#define NS_INLINE_FRAME_CONTAINS_PERCENT_AWARE_CHILD 0x00010000 +#define NS_BLOCK_SHRINK_WRAP 0x00010000 [consolidation] (buster) Since collisions are okay, I can indeed let each header file with its own set of relevant bits. buster, all the 16 lower bits have been taken... are you okay to reserve 4 additional bits, and shift implementations to the 12 higher bits? [style system] (akkana) yep, only ignorable whitespace are meant to be excluded. The decision is made by the parent frame. By avoiding children that are whitespace, no style contexts will be created for these. So, some potential gains may arise from that too. (I suspect also that the style system has its own separate hacks to avoid whitespace, in order, for example, to match things like :first-node ...) [what comes next] (dbaron, buster) This is just an enabling patch in the sense that it doesn't remove the whitespace. The actual bug to decide what/how to do is bug 68489. In the meantime, the patch enables frames (like table or math) that have no concern about whitespace to ignore them via the simple setting of that bit.
RE: reserve 4 additional bits, and shift implementations to the 12 higher bits? The 16/16 split is arbitrary, so it's fine to reserve a few more, as long as no implementation is already over the 12 bit limit. You'll have to check, if you haven't already. If so, we'd have to get creative.
I just did a search for "nsGetFrameState", then "nsFrameState", then "nsSetFrameState", then each of the bits 0x????0000 themselves, with comparative testings of the search results (in a contextual sense) to increase the unlikeness of leaving a nasty bug that will strike someday... Producers of additional bits that I could find were nsBlockFrame/nsInlineFrame (which are already in the patch of nsIFrame.h), and the others given below. Fortunately, nobody appears to be using more than 12 bits. nsTextFrame.cpp ============================================= 1079 // Flag indicating that whitespace was skipped 1080 #define TEXT_SKIP_LEADING_WS 0x01000000 1081 #define TEXT_HAS_MULTIBYTE 0x02000000 1082 #define TEXT_IN_WORD 0x04000000 1083 // This bit is set on the first frame in a continuation indicating 1084 // that it was chopped short because of :first-letter style. 1085 #define TEXT_FIRST_LETTER 0x08000000 1086 #define TEXT_WAS_TRANSFORMED 0x10000000 1087 1088 // Bits in mState used for reflow flags 1089 #define TEXT_REFLOW_FLAGS 0x1F000000 1090 1091 #define TEXT_TRIMMED_WS 0x20000000 1092 1093 #define TEXT_OPTIMIZE_RESIZE 0x40000000 1094 1095 #define TEXT_BLINK_ON 0x80000000 nsTextBoxFrame.cpp ============================================ 51 #define NS_STATE_NEED_LAYOUT 0x01000000 nsBox.h ============================================ 51 // flags from box 52 #define NS_STATE_IS_HORIZONTAL 0x00400000 53 #define NS_STATE_AUTO_STRETCH 0x00800000 54 #define NS_STATE_IS_ROOT 0x01000000 55 #define NS_STATE_CURRENTLY_IN_DEBUG 0x02000000 56 #define NS_STATE_SET_TO_DEBUG 0x04000000 57 #define NS_STATE_DEBUG_WAS_SET 0x08000000 58 #define NS_STATE_IS_COLLAPSED 0x10000000 59 #define NS_STATE_DEFAULT_HORIZONTAL 0x20000000 60 #define NS_STATE_STYLE_CHANGE 0x40000000 61 #define NS_STATE_EQUAL_SIZE 0x80000000 nsTableCellFrame.h ============================================ #define NS_TABLE_CELL_FRAME_CONTENT_EMPTY 0x80000000 nsTableRowGroupFrame.h ============================================ #define NS_ROWGROUP_REPEATABLE 0x80000000
The rtests (block and table) looked fine (no additional changes in the table). (My other changes on this topic are mathml-specific, mass removal of those !IsOnlyWhitespace(), and the ensuing consolidation/relief from knowing that these whitespace frames are never going to be seen again...)
rbs@localhost: ping -r -sr karnaze | waterson
OS: Windows 2000 → All
Hardware: PC → All
rbs: sr=waterson for 02/10/01 01:40 Patch in the table land 02/16/01 03:52 Collated diffs with implementation bits shifted in the upper 12 positions (If you'd meant to check in something else, please let me know!)
Roger, I did overlook a serious problem with the frame state bit you selected. Generic frame bits are the low order 16. Specific frame bits are the high order 16. You need to pick one of the remaining low order bits.
That's okay chrisK, since all the initial 16 bits were already taken, I suggested reserving 4 more bits and buster said it was okay, provided I first checked that nobody was already using more than 12 bits. That's why after searching/verification [in my comments of 2001-02-12 17:19], I shifted implementations to the 12 higher bits, leaving room for 4 more universal bits. I also updated the comments in nsIFrame.h to reflect this change of policy in the collated patch.
Checked-in. Resolving as FIXED.
Status: NEW → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
Blocks: 68489
Marking verified per last comments.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: