Closed Bug 74656 Opened 24 years ago Closed 20 years ago

Optimizations possible in AppendLines

Categories

(Core :: Layout: Images, Video, and HTML Frames, defect, P5)

defect

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: hyatt, Assigned: shaver)

References

Details

(Keywords: perf)

Attachments

(17 files)

(deleted), patch
Details | Diff | Splinter Review
(deleted), patch
Details | Diff | Splinter Review
(deleted), patch
Details | Diff | Splinter Review
(deleted), patch
Details | Diff | Splinter Review
(deleted), patch
Details | Diff | Splinter Review
(deleted), patch
Details | Diff | Splinter Review
(deleted), patch
Details | Diff | Splinter Review
(deleted), patch
Details | Diff | Splinter Review
(deleted), patch
Details | Diff | Splinter Review
(deleted), text/plain
Details
(deleted), text/plain
Details
(deleted), text/plain
Details
(deleted), text/plain
Details
(deleted), text/plain
Details
(deleted), text/plain
Details
(deleted), text/plain
Details
(deleted), text/plain
Details
Frame lists need to keep a pointer to their lastChild.  Because they don't do
this, content insertion and appending is preposterously slow (especially when a
large # of insertions/appends are done).
I foggily recall that we hashed through this before and decided it would be a 
Bad Thing to add more pointers to nsContainerFrame. Would it be possible to 
detect that the child list has grown past a certain limit (say, 16 children) and 
add a frame property?
Status: NEW → ASSIGNED
Summary: 20% of reply time spent in O(n^2) LastChild retrieval in frame code → 10% of reply time spent in O(n^2) line walking algorithm in nsLineBox
Target Milestone: --- → mozilla0.9.1
nsLineBox needs to be improved.  The implementation is a study in inefficiency.
I can hack around the frame problem, but nsLineBox has to be patched up.  Its
implementation is ridiculous.
Keywords: perf
QA Contact: amar → waterson
nsLineBox has to be rewritten. In fact, how about nsBlockFrame too while we are
at it? Considering where these live in the food-chain, I think we would do well
to get them as fast and clean as possible.

Chris, I like your idea of creating a lastChild pointer when the child list gets
long (I was thinking much longer than 16, but it could be tuned). Isn't
accessing frame properties pretty slow though? I know accessing the View for a
frame is via a frame property and it shows up occassionally on profiles or
really large documents, which was initially surprising to me. Also, would the
lastChild property have to be per-child-list, or would it just be for the
default child list?
(Tangent Alert.) Accessing frame properties should be O(1), but it isn't, thanks
to troy's baloney-headed nsDST. I think (but haven't proven) that although
intended to be O(log n), it degenerates because of the way it uses the pointer
bits. (Using pointer bits to generate a tree is silly. If you do it
left-to-right, you're screwed, because the high bits are likely to all be the
same: locality. If you go right-to-left, you're screwed, because the lowest bits
are all going to be very similar: alignment.) Anyway, I've been looking for an
excuse to replace it with pldhash. Point me to a profile!
I can show you plenty of profiles where GetPrimaryFrameFor shows up as a
significant amount.  It's obviously *not* O(1) because of the dumbass desire to
not use a hash table.
You correct, we did hash through this before, a few times.  I pointed this out
in August and in January.  Both times, all involved thought that the extra bytes
were not worth the O(n^2) (I have the email logs, in case anyone wants the info
from that heated debate).  I disagreed then, and am I glad to see that people
are finally seeing the light (even if I am not credited with the find anymore).

Go for it, fix this problem (there are a few O(N^2) pieces of code, all of which
will fall at some point, I hope).
hyatt: recall that GetPrimaryFrameFor() often needs to search for frames. Not
all frames are hashed (er, dst-ed) from the get-go. It would be interesting to
see how much of GetPrimiaryFrameFor() is spent in nsDST::Search() (or whatever).
I dunno about redoing nsLineBox, but I am right now working on making frame
properties much cheaper (in time and space), and then making nsLineBox use a
frame property to track its lastChild/lastLine.

Hyatt: feel free to assign this to me if you want.
[Just want to re-iterate that if rewriting ns[Block/Inline]Frame is in the air 
anytime, a fundamental change that I suggest to incorporate is bug 64763 so as
to get rid of the cruff within the PerFrameData stuff.]
Blocks: 22486
Waterson and I are no longer certain that this will be a space win, but I have a
patch anyway.  I'm going to Do The Math on this stuff shortly to see what our
real-world frame-property memory usage is with the current DST stuff and compare
to a pldhash implementation.

Waterson: as far as the effects of alignment on the bit patterns, it looks like
the DST code tries to skip the lower few bits of the pointer.
<hyatt> take it off my list!

OK, Dave.

The patch that I'm about to attach WORKSFORME, though I haven't run the layout
block-frame regressions, and I haven't run any profiles yet.  Waterson, honey,
can you profile for me?
Assignee: hyatt → shaver
Status: ASSIGNED → NEW
Oh yeah: I looked at the frame properties stuff, and we usually don't get more
than a few dozen properties.  We _do_ often ask for the same (frame,prop) pair
repeatedly in succession, sometimes more than 900 times (!), so I have a little
patch that adds a one-entry quick cache to avoid consulting the DST for those
cases.  I'm going to clean it up, and make PropertyListFor move the found list
to the head, and then I'll post it here.  It won't change the world, but every
bit counts, right?
Status: NEW → ASSIGNED
Target Milestone: mozilla0.9.1 → mozilla0.9
Marc: actually ViewProperty was one of the hot spots here, so my little patch
could help those cases a fair bit.  Do you have test cases we should use for
before/after profiling?
The latest patch is for the 1-entry cache I mentioned, which is only tangentally
related to this bug.  If people want, I can break it out into another.

Marc: if you're running profiles on those big pages again, do you want to see if
this helps take GetFrameProperty out of the picture?

I was tempted to make the FrameHashTable stuff use dhash, but I think it's
better to wait until brendan's done with bug 72722 and just convert to
nsHashtable there.

Comments and review solicited for my latest two patches.  (I haven't Done The
Math on memory usage for dhash vs. dst for frame properties yet, so I'm not
ready to put the first one in.)
The first block-frame patch didn't invalidate the cache when we removed frames,
which caused quite a bit of trouble.  This one should work much better (I was
editing documents with it for a while before I destroyed my tree).
Looks pretty good.

- You add two words to each block frame. It'd be better if we could
  conditionally cache this when we knew it'd be a big win (i.e., block
  frame has many lines). (That's my biggest complaint.)

- Not sure why you needed to add GetLastChild(). Couldn't you have just
  used the extant LastChild() method? (Also means less damage to
  nsBlockFrame.h)

- In this part:

  +    if (appending) {
  +      prevSibLine = mCachedLastLine;

  it's not immediately obvious that mCachedLastLine is guaranteed to be
  valid because of your previous call to GetLastChild() to compute
  the boolean ``appending''. You should put a bright red comment here.

- Rename GetLastLine() to ComputeLastLineFrom() or something. Heck you only
  call it from one place. Why does it need to be a function at all?

  Instead of...

  +    // make sure we're (still) tracking the last line
  +    nsLineBox *line;
  +    if (mCachedLastLine)
  +      line = mCachedLastLine;
  +    else
  +      line = mLines;
  +    line = GetLastLine(line);

  how about:

  nsLineBox line = mCachedLastLine ? mCachedLastLine : mLines;
  while (line->mNext)
    line = line->mNext;

- Local custom...

  . names pointers as |nsIFoo* foo|, not |nsIFoo *foo|
  . uses |nsnull| for null pointers, as opposed to bare zero
  . is not sparse on vertical whitespace (e.g., ``We check here'')

GetLastChild and LastLineFrom are both remnants of earlier revs, with much more
debugging and validation cruft in them.  They can (and should) go.

I'll comment and morph to dominant code style (bad me!).

After I do that (and post another patch), I'll look at using a frame property if
someone can tell me how big ``big enough''is, and whether we just count lines or
lines+children-in-last-line, etc.
Well, the problem hasn't been noticed except in editor and in a DOM case that 
edward cooked up, that inserted/appended one node at a time. So, that makes me 
believe that mainline layout is not typically suffering here, and is why I'm 
being stingy. :-)

Could you mangle your existing patch you could play with different values of 
`n', and empirically determine ``when you start to notice the evil behavior in 
editor''? If I had to pick a number out of my ass, I'd say something like 32 or 
64 should be a reasonable threshold at which to start caching the last child.
...and I think we should just count lines. You can count the lines in 
nsBlockFrame::ReflowDirtyLines(), and when the number of lines exceeds `n', set 
a new frame state bit (maybe NS_BLOCK_HAS_CACHED_LAST_CHILD) that tells your 
caching code to keep the properties up-to-date.
OK, that's a good tip.  I'll whack up some environment setting that gets used to
set `n' and you can try different values with your choice of test sets!
So, as I work on making all of this stuff tunable, I start to wonder if counting
lines is the right approach at all: aren't we more concerned with the access
model (lots of appends causing lots of LastChild calls, and therefore lots of
list-walking) than the number of lines?  If we just do a few big AppendFrames
calls, we can put together a large document without too much pain, right?

(I also wonder if perhaps we shouldn't just be inverting the list and storing
the _tail_, since appending would seem to be a much more frequent operation than
prepending.  But I haven't thought much about that yet.)

I'll post a patch in an hour or so that only caches the last line -- caching the
last child would cost us an additional two words per caching-enabled block
frame, and waterson says that lines are usually pretty short anyway -- and we
can discuss better heuristics.
Best patch yet!  Set BLOCK_FRAME_LINE_CACHE_THRESHOLD in your environment to
tune the point at which we switch to using a frame property (for last line, not
last child -- I now see that I should rename the frame-state flag).

Seth, Edward, do you want to play with settings here and see what effect they
have on your specific test cases?
collision: 
http://lxr.mozilla.org/seamonkey/source/layout/html/base/src/nsHTMLParts.h#44

+#define NS_BLOCK_HAS_CACHED_LAST_CHILD      0x00100000
 #define NS_BLOCK_SHRINK_WRAP                0x00100000
Seems to me to be a rather large patch for what I thought to be a simple change.
 I guess I do not understand why we just do not add the 4 bytes for a tail
pointer and be done with it.  The amount of code needed for this patch and the
time overhead for the extra calculations seem to more than outweigh adding 4
bytes.  Please correct me if I am wrong, for I just want to know why?  Thanks.
Well, the patch at 04/10/01 07:21 isn't *that* much larger than the patch from 
04/07/01 20:47 (which just keeps a `last' pointer, all the time). And, it's 
probably a bit larger simply because shaver added some stuff to help pick a 
sweet spot.

In defense of not doing it all the time (since I lobbied for it), a large web 
page could have tens or even hundreds of thousands of block frames. So, keeping 
block frames as small as possible is important. We don't run in to this 
performance problem in the ``normal'' course of layout. So why pay the penalty 
in space unless there's a win in time?

I like it, I like it. Nits below.

+static int CachedLineThreshold = 0;
+

gCachedLineThreshold?

+#ifdef DEBUG
+      fprintf(stderr, "BLOCK_FRAME_LINE_CACHE_THRESHOLD is %d\n",
+              CachedLineThreshold);
+#endif

All kipps other debug printf()'s just go to stdout. (Or is this code just in
there until we figure out what the ``sweet spot'' is?)

+inline void
+nsBlockFrame::SetCachedLastLine(nsIPresShell *aPresShell,
+                                nsIFrameManager *aFrameManager,
+                                nsLineBox *aLine)

Why inline?

+    // If we fail to set it, clear the cached flag, because something
+    // is screwy in the cache and we'd rather be slow than crash.
+    mState &= ~NS_BLOCK_HAS_CACHED_LAST_CHILD;

Is this really necessary or even sufficient? Seems like LastLine() can recover
gracefully without this.

+  if (!mLines)
+    return nsnull;
+  
+  // If we don't have a cache, we walk lines
+  if (!(mState & NS_BLOCK_HAS_CACHED_LAST_CHILD))
+    return nsLineBox::LastLine(mLines);
+
+  nsCOMPtr<nsIFrameManager> frameManager;
+  aPresShell->GetFrameManager(getter_AddRefs(frameManager));
+  if (!frameManager)
+    return nsLineBox::LastLine(mLines);
+  
+  nsLineBox* cachedLine;
+  if (NS_FAILED(frameManager->GetFrameProperty(this,
+                                               nsLayoutAtoms::cachedLastLine,
+                                               0, (void **)&cachedLine))) {
+    return nsLineBox::LastLine(mLines);

Flow of control a bit muddy here. Why not just do nested if()? (It'll only end
up being three deep, and we've got a c-basic-offset of `2'...)

+  // the call to LastChild will ensure that the line cache is up to date
+  PRBool appending = (mState & NS_BLOCK_HAS_CACHED_LAST_CHILD &&
+                      aPrevSibling == LastChild(presShell));
+

Does bitwise-and have precedence over logical-and? I always forget. Parens?

+ * Frame state flag indicating that this frame is using a frame property to
+ * cache its last line and child.  See LastChild below and in nsBlockFrame.cpp
+ * for details.
+ */
+#define NS_BLOCK_HAS_CACHED_LAST_CHILD      0x00100000
+
+/*

what rbs said; this should really be in nsHTMLParts.h (it's not obvious to me
*why*, but that's where the other flags are).

+  void InvalidateLastLineCache(nsIPresShell* aPresShell) {
+    if (!(mState & NS_BLOCK_HAS_CACHED_LAST_CHILD))
+      return;
+    nsCOMPtr<nsIFrameManager> frameManager;
+    aPresShell->GetFrameManager(getter_AddRefs(frameManager));
+    if (frameManager)
+      frameManager->SetFrameProperty(this, nsLayoutAtoms::cachedLastLine,
+                                     nsnull, nsnull);
+  }

Does nested if make flow-of-control simpler?

Also, when you check in, should we just guess at a sweet spot for
[g]CachedLineThreshold? Say, 64 to start?
This seems overly complex to me.  Wouldn't it be simpler to not worry about the 
threshold stuff?
For the reasons I cite above, I will accept the *very marginal* increase in code
complexity.
gCachedLineThreshold it is.

Of course it's only there until we find the sweet spot!  Who would leave debug
spew like that in for any length of time? =)

Dunno why inline.  Not inline now!

LastLine _can't_ recover from the case where it has a line cached that is no
longer part of the block frame at all, and if I don't know what the state is in
the DST, I don't want to trust it.  Maybe that can't really happen, but I'm
treading very lightly here, and I had a spare belt that matched my suspenders.

I'll wield the mighty Nested If, yes.

And the mighty parens, just for you!

When (if?) I check in, I'm leaning towards leaving gCachedLineThreshold unset,
but only because I don't understand where the number ``64'' comes from.  Here's
a question I want to ask of you: what %age space-waste is acceptable for this? 
Then I can set gCachedLineThreshold such that the frame property is < n% of the
BlockFrame + its LineBox children.  3%? 5%? 10%?  (Don't cheat and work
backwards from 64!)

New patch coming right up.
Anyone interested in this patch?  Anyone dislike it?

Mozilla 0.9 is coming soon, and I want to know if I should just walk away from
this or spend precious end-of-release cycles making it appropriate for checkin.
Looked at the patch and it is good to go. Will give r=rbs so that waterson
can give sr for you to checkin.

Some nits:
=====
+static int gCachedLineThreshold = 0;

Since the whole point of the patch is to improve the perceived perf, maybe a 
non-zero value should be used to start with. It is not going to benefit the 
end-user if 0 is kept (the env variable is undiscoverable). In the meantime,
the hunt for the optimal value could continue.

==========
+  // XXX Use cached last-line data to speed up the line and other checks
+  // XXX below, instead of just invalidating
+  InvalidateLastLineCache(presShell);

Since 'XXX' more often than not implies: a todo, an uncertainty or doubt, a
problem with an edge-case, I will remove the 'XXX' and just say 'This uses 
the...'
If you can tell me a good number (or, better, answer the acceptable-bloat-%age
question I posed to waterson), I'll gleefully set gCachedLineThreshold to it. 
In the meantime, I was hoping to tell .performance about it and get feedback
from people like varada and stephend about the effects of various settings,
before we make everyone pay the bloat cost.

The XXX does, in fact, indicate a todo: we should use the cached line data to
speed up those other checks, rather than simply invalidating the cache there (as
we do on the line below the comment).

Want to r= it anyway, or wait until we settle the gCachedLineTheshold debate?
r=rbs. I don't see much that can be reliably added to this in this time-frame.

re:threshold, what about trying 64 to start-with as waterson suggested. Any
value >> 1 is okay to me as a start.

As an additional comment as to why it is better in the long run to add these
few lines of code so as not to add the 4 bytes all the time. The block frame is
pervasive and is also used elsewhere as a wrapper frame (e.g., in table) to 
provide functionalities not supported by other frames. Quite often, a block
frame can contain lineboxes with a block frame as sole children. That
contributes to the thousands of block frames that could be around in a page.
This 64 number seems really popular.  Are there data out there about the
line-count for nsBlockFrames in general?  Do people just like powers of two? 
I'd rather not just guess a number and check it in, because, as we saw with the
reflow interval stuff, it can stay that way for an unpleasantly long time.
The main mistake that was made with the interval stuff is that no non-futured
bug was filed in bugzilla. In order not to repeat the same mistake, a bug should
be entered this time around. As I was writting these comments, I thought of
filing that before completing my comments, but this code is yet to be
checked-in. Also since a shake-up will do most good to layout, this code may
perhaps not appear at some later stage in its present form.
The one-entry cache in the frame manager that is saving from the +900 successive 
requests of the same property is good stuff. The uncertainty about the threshold
is blurring that that one is nice.
I just debugged an issue where frame-state flags needed to be carried forward 
into a continuing frame (bug 74184). Let me convince myself that this won't 
cause problems if we need to create a continuing frame for the block. (I think 
this'd only happen while printing, which might make it doubly fun to debug!)

As far as picking a ``good value'', I've tried to think of some way to 
statically analyze the problem (to impress brendan, of course), but I can't. So, 
I think that the way to pick a good value is to pick the largest value that 
doesn't empirically create a ``noticable delay'' in the test case that's 
described above on a low-end machine. I've got a crappy machine at work (133MHz 
Win98) that I'll apply the patch on and try to cough up the juice.
I tryed that patch on linux and i get crash in large page:
http://www.student.oulu.fi/stats/Jan.wwwstats.html

Stacktrace is like this:

#0  0x41ac11e3 in nsBlockFrame::LastLine(nsIPresShell*) (this=0x91bc744,
aPresShell=0x87f1200)
    at ../../../../../layout/html/base/src/nsBlockFrame.cpp:5351
#1  0x41c60db3 in nsBlockFrame::LastChild(nsIPresShell*) (this=0x91bc744,
aPresShell=0x87f1200)
    at ../../../../../layout/html/base/src/nsBlockFrame.cpp:5373
#2  0x41ac130e in nsBlockFrame::AppendFrames(nsIPresContext*, nsIPresShell&,
nsIAtom*, nsIFrame*) (this=0x91bc744, 
    aPresContext=0x86f0bd0, aPresShell=@0x87f1200, aListName=0x0,
aFrameList=0x947d338)
    at ../../../../../layout/html/base/src/nsBlockFrame.cpp:5400
#3  0x41adae6e in FrameManager::AppendFrames(nsIPresContext*, nsIPresShell&,
nsIFrame*, nsIAtom*, nsIFrame*) (
    this=0x86a6398, aPresContext=0x86f0bd0, aPresShell=@0x87f1200,
aParentFrame=0x91bc744, aListName=0x0, 
    aFrameList=0x947d338) at
../../../../../layout/html/base/src/nsFrameManager.cpp:777
#4  0x41b91af7 in nsCSSFrameConstructor::AppendFrames(nsIPresContext*,
nsIPresShell*, nsIFrameManager*, nsIContent*, nsIFrame*, nsIFrame*)
(this=0xbfffe440, aPresContext=0x86f0bd0, aPresShell=0x87f1200,
aFrameManager=0x86a6398, 
    aContainer=0x872eab8, aParentFrame=0x91bc744, aFrameList=0x947d338)
    at ../../../../../layout/html/style/src/nsCSSFrameConstructor.cpp:7596
#5  0x41b93d86 in nsCSSFrameConstructor::ContentAppended(nsIPresContext*,
nsIContent*, int) (this=0x8753310, 
    aPresContext=0x86f0bd0, aContainer=0x872eab8, aNewIndexInContainer=57)
    at ../../../../../layout/html/style/src/nsCSSFrameConstructor.cpp:8142


code at crash is:
5349            // ...then we can use the cache!
5350            nsLineBox* line = cachedLine;
5351            while (line->mNext)            <-- crash here
5352              line = line->mNext;
Weird. In theory, you shouldn't get there at all because gCachedLineThreshold=0. 
Did you set a non-zero value in the env variable? If so, which value?
I set BLOCK_FRAME_LINE_CACHE_THRESHOLD=64 when it crashes, didnt
try other values.
retracting my r pending further investigations.
Dammit.  This isn't for 0.9, obviously.

I'll investigate that crash when I'm finished being sick.
Target Milestone: mozilla0.9 → mozilla0.9.1
So, here's what was happening: when we remove frames, we invalidate the cache,
by storing a null value in the property.  I guess I expected that a subsequent
retrieval would fail, but that's not the case (I think it was with my
dhash-based frame property code, which overloaded null).

Much better now.  I don't crash on Tomi's page (though it _is_ still
heart-breakingly slow :-((((((((( ).
(On my happier days, I dream that someone who really cares about this, like
perhaps the mail-compose performance guys, will test it and tell me if I've been
wasting my time and everyone else's.)
Even on the "full patch", I am mortified by my own incompetence.  I managed to
get a bit of hyatt's painting-delay patch in there, and also to forget the
nsLayoutAtomList.h change.

I'm going to go soak my head now, after I post another patch.  Special apologies
to stephend and blake, who I have thwarted repeatedly in their attempts to help
me profile.
let's get some of those mail compose guys you dream about cc'd on this bug.  It
looks like stephend is helping out with testing perf, but maybe JF or varada can
also comment on whether or not this speeds up reply.
I'm going to Quantify this as soon as Rational comes through with my keys.
Wondering if there is any connection with this bug an bug 29584: Exponential 
time to open text files in the Editor.
Blocks: 56854
Is profiling really necessary here? Can't someone just apply the darn patch and 
see if it makes any difference on the test case described in bug 22486?
The two output logs I attached showed the results of being unpatched and 
patched, but I haven't yet enabled the pref.  I'll test 
BLOCK_FRAME_LINE_CACHE_THRESHOLD values of 0, 1, 16, 64 and 128, and post logs 
for each run.
My machine used to generate this data is a Pentium III 750 with 128 megs of RAM, 
running Windows 2000 SP1 with 1 GIG of virtual memory.  The build was a 
debug trunk build pulled at around noon.
Cool, thanks for doing this Stephen. To tabularize this inline:

            
    n       Total  Insertion
    0 (off)  8734        681
    1        8164        678
   16        7939        678
   64        8253        683
  128        8679        689

I presume that the times are given in msec?

How many lines did the message that you replied to have? I was under the 
impression that insertion time contributed much more significantly to the 
overall time to open the window than is indicated by your data. (It looks like 
the delta that this makes to insertion time is in the noise, frankly. But maybe 
you tested with a small message?)
Oh, stone me senseless.  I knew this was for message compose, but didn't realize 
this was intended for replying to messages.  I thought this was just for 
initializing the editor widget.  I'm also assuming that this is milliseconds, as 
http://www.mozilla.org/mailnews/performance/msgcompose_loggin.html doesn't nail 
that down.  What I tested with was actually just a blank document, with a 2.43 
meg attachment.  Chris/Shaver, what size of messages should I be *replying* to 
to test this properly? Let me know, I'll run the tests again. 
> The build was a debug trunk build pulled at around noon.

Wouldn't it be better to test this way in an optimized build?
John is absolutely right, same procedure as with Quantify, etc, (setting
moz_debug to be null).  So, with the patches in my tree, the right variable
unset, and ready to rock and roll on some more testing, can I receive some
scenarios to help me get this tested properly in anticipation of Mike landing
his patch and helping out mail/news? Thanks.
If I understand that code correctly, the code-section being exercised is when a
_long_ content needs to be broken in several lines. Mail&News provide a good
example for this because the message text has to be broken in several lines in
the content window. But in addition to testing Mail&News, another really good
example is "viewsource". This is because all of the viewsource content is inside
a <pre>...</pre> block. Hence some interesting data can also be obtained by
viewing source of
http://lxr.mozilla.org/seamonkey/source/layout/html/style/src/nsCSSFrameConstructor.cpp. 

But this might be an extreme case and other cases could be found. As far as
testing is concerned, viewsource might be easier to deal with.
To test this, I created a 6,000 line plain-text message and mailed it to myself. 
Then, I tried to reply-to this message. I ended up with about 31s to reply to 
the message without the patch, and 31s to reply to the message with the patch, 
and with BLOCK_FRAME_LINE_CACHE_THRESHOLD=16. I also tried plain clipboard 
insertion, and that didn't seem to show any difference either.

Maybe something in editor has changed? Anyway, I'll re-profile this and report 
back soon.
Ok, so I don't have a ton of time to do a detailed analysis of the profile, but 
it looks like nsBlockFrame::AppendFrames() is called _once_, and accounts for a 
tiny fraction of the time. nsBlockFrame::InsertFrames() is called 11K times, and 
accounts for 14% of the time, which this patch won't help (I don't think). (I'm 
profiling clipboard insertion, too...)

There seems to be quite a bit of low-hanging fruit here; I'll attach a text 
version of the profile to the original bug (bug 22486).
This patch has rotted, and I'm sick of chasing the conflicts in my tree, so I
hereby declare it dormant.  If someone thinks it's worthwhile (and I'm pretty
sure it is for large pages, but hey), they can update it and live the adventure.

Untargetting and updating summary to keep mail-perf folks from suffering under
false hopes.  (Someone may want to remove the dependency, but I'm not going to
ass-u-me.)

Thanks for your time, everyone.
OS: Windows 2000 → All
Priority: -- → P5
Hardware: PC → All
Summary: 10% of reply time spent in O(n^2) line walking algorithm in nsLineBox → Optimizations possible in AppendLines
Target Milestone: mozilla0.9.1 → ---
mjudge and I were looking at some selection stuff yesterday (bug 62971), and 
stumbled across this thing that kipp allegedly hacked up days before his 
departure. It's a ``line iterator''. Here's how it works: it walks the line list 
once to count the lines, then allocates an array big enough to hold pointers to 
all of them, then walks the line list again to put the pointers into the array. 
Now the lines can be accessed like an array.

Which made me wonder...why aren't the lines just stored in an array? That would 
solve this problem (getting to the last line quickly). And it would also solve 
some of our hit-test headaches (e.g., in selection), since each line knows its 
bounds, lines don't overlap, and each line's y-coordinate increases 
monotonically. (I think, somebody call my bluff if this isn't the case.)

I think that there's very little code (if any) that inserts lines ``in the 
middle'' of the line array -- and that which does ends up throwing away all the 
subsequent lines anyway IIRC. Anyway, something else to think about.
I did some profiling with this, with long html page time spend
in nsBlockFrame::AddFrames() drop from 1.9sec to 0.02 (5.7%->0.1% total time)

I'll attach interesting parts from gprof output.
The work on bug 86947 should also fix the nsBlockFrame::AddFrames problem, since
it makes the line list doubly-linked (for other reasons) and searches the list
from the end in the AddFrames case.
I think dbaron will have fixed this in bug 86947.
Depends on: 86947
No longer depends on: 86947
so is this fixed?
remove self
What's the current status of this one?
Well, dbaron fixed the block frame; however, there are other places where it'd
be nice to have a last-child member. For example, see bug 145425.
Depends on: 233463
Blocks: 145425
Status: ASSIGNED → RESOLVED
Closed: 20 years ago
Resolution: --- → WORKSFORME
Product: Core → Core Graveyard
Component: Layout: HTML Frames → Layout: Images
Product: Core Graveyard → Core
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: