Closed Bug 56854 Opened 24 years ago Closed 6 years ago

[meta] slow loading typical long page

Categories

(Core :: DOM: HTML Parser, defect, P3)

defect

Tracking

()

VERIFIED FIXED
Future

People

(Reporter: omega2, Assigned: waterson)

References

(Depends on 5 open bugs, Blocks 1 open bug, )

Details

(Keywords: meta, perf)

From Bugzilla Helper: User-Agent: Mozilla/5.0 (Windows; U; WinNT4.0; en-US; m18) Gecko/20001010 BuildID: 2000101014 Overall performance reading JDK docs seems to have taken a hit. This may be related to bug 5588, but I think it is somewhat different. Reproducible: Always Steps to Reproduce: 1.Load the page 2. Compare perf to IE5, NS 4.6 Actual Results: Slow, sluggish load times up to 3x normal Expected Results:
win98 200101804 (post m18) trunk build, page loads quick and no problems with scrolling. any winnt people care to test this?
Keywords: qawanted
I changed the URL to one in which the problem seems to be more apparent. This is still with M18/ NT4. The time at the bottom reads 29 secs with this page which seems very slow.
we do load this page considerably slower thatn 4.x and ie 5.x Over to Layout.
Assignee: asa → clayton
Status: UNCONFIRMED → NEW
Component: Browser-General → Layout
Ever confirmed: true
QA Contact: doronr → petersen
Dividing up Clayton's bugs to triage
Assignee: clayton → dcone
Triage
Assignee: dcone → harishd
Component: Layout → Parser
QA Contact: petersen → janc
It spends a lot of time here! nsQueryInterface::nsQueryInterface(const nsQueryInterface & {...}) + 11 bytes do_QueryInterface(nsISupports * 0x02f71aa0, unsigned int * 0x00000000) line 347 + 27 bytes nsCOMPtr<nsIPresShell>::Assert_NoQueryNeeded() line 499 + 17 bytes nsGetterAddRefs<nsIPresShell>::~nsGetterAddRefs<nsIPresShell>() line 909 nsContainerFrame::GetOverflowFrames(nsIPresContext * 0x02fd7090, int 0) line 888 nsInlineFrame::Reflow(nsInlineFrame * const 0x00f50e14, nsIPresContext * 0x02fd7090, nsHTMLReflowMetrics & {...}, const nsHTMLReflowState & {...}, unsigned int & 1230364) line 265 + 14 bytes nsLineLayout::ReflowFrame(nsIFrame * 0x00f50e14, nsIFrame * * 0x0012d054, unsigned int & 1230364, nsHTMLReflowMetrics * 0x00000000, int & 0) line 921 nsBlockFrame::ReflowInlineFrame(nsBlockReflowState & {...}, nsLineLayout & {...}, nsLineBox * 0x00f50fac, nsIFrame * 0x00f50e14, unsigned char * 0x0012c5cc) line 4368 + 29 bytes nsBlockFrame::DoReflowInlineFrames(nsBlockReflowState & {...}, nsLineLayout & {...}, nsLineBox * 0x00f50fac, int * 0x0012cc48, unsigned char * 0x0012ca90, int 0, int 0) line 4252 + 28 bytes nsBlockFrame::DoReflowInlineFramesAuto(nsBlockReflowState & {...}, nsLineBox * 0x00f50fac, int * 0x0012cc48, unsigned char * 0x0012ca90, int 0, int 0) line 4186 + 42 bytes nsBlockFrame::ReflowInlineFrames(nsBlockReflowState & {...}, nsLineBox * 0x00f50fac, int * 0x0012cc48, int 0, int 0) line 4131 + 32 bytes nsBlockFrame::ReflowLine(nsBlockReflowState & {...}, nsLineBox * 0x00f50fac, int * 0x0012cc48, int 0) line 3267 + 29 bytes nsBlockFrame::ReflowDirtyLines(nsBlockReflowState & {...}) line 2956 + 27 bytes nsBlockFrame::Reflow(nsBlockFrame * const 0x00f50dc8, nsIPresContext * 0x02fd7090, nsHTMLReflowMetrics & {...}, const nsHTMLReflowState & {...}, unsigned int & 0) line 1746 + 15 bytes nsBlockReflowContext::DoReflowBlock(nsHTMLReflowState & {...}, nsReflowReason eReflowReason_Initial, nsIFrame * 0x00f50dc8, const nsRect & {...}, int 1, int 0, int 0, nsMargin & {...}, unsigned int & 0) line 561 + 36 bytes nsBlockReflowContext::ReflowBlock(nsIFrame * 0x00f50dc8, const nsRect & {...}, int 1, int 0, int 0, nsMargin & {...}, unsigned int & 0) line 331 + 50 bytes nsBlockFrame::ReflowBlockFrame(nsBlockReflowState & {...}, nsLineBox * 0x00f5b338, int * 0x0012d780) line 3884 + 56 bytes nsBlockFrame::ReflowLine(nsBlockReflowState & {...}, nsLineBox * 0x00f5b338, int * 0x0012d780, int 1) line 3149 + 23 bytes nsBlockFrame::ReflowDirtyLines(nsBlockReflowState & {...}) line 2956 + 27 bytes nsBlockFrame::Reflow(nsBlockFrame * const 0x00e198fc, nsIPresContext * 0x02fd7090, nsHTMLReflowMetrics & {...}, const nsHTMLReflowState & {...}, unsigned int & 0) line 1746 + 15 bytes nsBlockReflowContext::DoReflowBlock(nsHTMLReflowState & {...}, nsReflowReason eReflowReason_Incremental, nsIFrame * 0x00e198fc, const nsRect & {...}, int 1, int 299, int 0, nsMargin & {...}, unsigned int & 0) line 561 + 36 bytes nsBlockReflowContext::ReflowBlock(nsIFrame * 0x00e198fc, const nsRect & {...}, int 1, int 299, int 0, nsMargin & {...}, unsigned int & 0) line 331 + 50 bytes nsBlockFrame::ReflowBlockFrame(nsBlockReflowState & {...}, nsLineBox * 0x00ef0aa8, int * 0x0012e2b8) line 3884 + 56 bytes nsBlockFrame::ReflowLine(nsBlockReflowState & {...}, nsLineBox * 0x00ef0aa8, int * 0x0012e2b8, int 1) line 3149 + 23 bytes nsBlockFrame::ReflowDirtyLines(nsBlockReflowState & {...}) line 2956 + 27 bytes nsBlockFrame::Reflow(nsBlockFrame Buster is the man ;-)
Assignee: harishd → buster
A simple wall-clock test on my machine showed the rtm candidate build from 11/06 loaded this page in a similar fashion as IE 4 on WinNT. NS6 was slightly slower (25 seconds vs. 20 seconds.) A real test of layout requires doing the load with a local file to remove network traffic noise. Not a high priority this week, but we'll be doing lots of performance work over the next few months for the next release.
Status: NEW → ASSIGNED
Target Milestone: --- → Future
I have performed this test locally. IE5/NT4 loads the page in about 3-4 seconds, M16 in about 25. This is a big difference! This page is a part of the JDK 1.3 documentation and can be downloaded seperately.
moving to mozilla0.9
moving to mozilla0.9
Keywords: perf
Target Milestone: Future → mozilla0.9
updated qa contact.
QA Contact: janc → bsharma
Taking; will get a profile done by m0.9, but may not have a fix.
Assignee: buster → waterson
Status: ASSIGNED → NEW
Depends on: 74328
Depends on: 74319
1/3 of the total time is spent in nsBlockFrame::RecoverStateFrom which is called 1.5 million times from nsBlockFrame::ReflowDirtyLines which in itself is called 35000 times. I don't know what nsBlockFrame::RecoverStateFrom does, but it seems like something worth looking closer on.
Depends on: 74830
Depends on: 75001
Depends on: 75081
Depends on: 75083
On my machine, Quantify claims the total time for this page load is about 9,300msec. As Danne notes, the sore thumb is reflow. Loading this page causes 240 reflows (reasonable), which account for 5,285msec of time (unreasonable). The problem is the O(n**2) fanout at nsBlockFrame::ReflowDirtyLines(). Specifically, 16,962 calls to ReflowDirtyLines() fans out to 1.3M calls to nsBlockFrame::RecoverStateFrom(), accounting for about 65% of the reflow time (3,400msec). nsBlockFrame::RecoverStateFrom() appears to end up being particularly expensive because most of the line boxes contain blocks. This sends us through what appears to be a fairly expensive code path in nsBlockReflowState::RecoverVerticalMargins(). Ideally, we'd avoid the O(n**2) problem by storing off the state we need somewhere (and avoid ``recovering'' it altogether). But barring that (for, say, space reasons), we may be able to to optimize RecoverVerticalMargins(). The rest of reflow is spent reflowing lines: 1,420msec in 15,143 calls to nsBlockFrame::ReflowInlineFrames(). This time is dominated by text measurement in nsTextFrame::MeasureText(). Probably worth looking into at some point will be frame construction. There are 12,180 calls to nsCSSFrameConstructor::ContentAppended(), which winds up fanning out to 63,739 calls to nsCSSFrameConstructor::ConstructFrame(). This accounts for 1,729msec of time. 371msec is spent in 11,763 calls to nsHTMLAnchorElement::GetHrefCString(), called from nsStyleUtil::IsHTMLLink(). We do this so we can look up the link state in global history. (That's about 20% of frame construction time.) I'm going to see if I can improve the O(n**2)-ness in RecoverStateFrom().
Status: NEW → ASSIGNED
Blocks: 64858
End-of-milestone reality check.
Target Milestone: mozilla0.9 → mozilla0.9.1
This is where I write another chapter in my book entitled, ``I've been so burned by Quantify!'' When we last left Our Heroes, they'd made the assumption that the O(n**2) behavior of RecoverStateFrom() was the culprit. Well, it is and it isn't. I've put together an atrocious hack that avoids doing _any_ state recovery by caching the nsBlockReflowState object's per-line data stuff. On my fast machine, it doesn't make a white of difference. Well, maybe a whit: page load time dropped from about 11s to about 9s -- regardless, a far cry from the ~55% win I was expecting. So I thought about this a bit, and realized that on a fast machine, the layout reflow timers don't fire often enough to exacerbate the O(n**2) behavior of state recovery. In other words, the fast machine slurps up enough data that we don't need to do very many reflows. Unfortunately, when we're running under Quantify, the timers fire at the same rate, but the ``crippled'' binary does much less work per unit of time. This leads to more incremental reflows, which makes the O(n**2) algorithm jump out. To justify my self-worth, I moved my build over to a very, very slow machine (166MHz running Win98). On this machine, it drops the layout time from over 70s to about 50s. (On this machine, IE5 loads the page in 11s.) Once I get some of the other miscellaneous block frame changes out of my tree, I'll post a patch, and we can discuss whether this is worth pursuing. (``It's Risky.'') shaver: it'd probably work well if integrated with your last-line stuff. I'll post another profile (with my hacks, so the timers don't confound the results) once I get some free time to analyze it.
Depends on: 74656
Depends on: 77938
Depends on: 77939
Depends on: 77941
Depends on: 77942
Depends on: 77945
Depends on: 77947
Depends on: 77948
Depends on: 77949
Depends on: 77954
Depends on: 77956
Depends on: 77959
Profiled with my changes to speed up state recovery. Total time attributed to main() is 6,104msec. (All times attributed to functions are F+D, unless noted otherwise.) . 33% of the time (1,993msec) is spent processing 106 reflow commands. . 1/3 of that time (670msec, about 11% overall) is spent measuring text in nsTextFrame::MeasureText(). - Half of that (330msec, 5% overall) is spent in GetTextExtentPoint32A(); i.e., the native toolkit. - nsTextFrame::ComputeWordFragmentWidth() accounts for 22% of text measurement time (150msec, 2.5% overall). This should be examined carefully. [1] . Deleting the time spent measuring text leaves 688msec (35% of reflow, or 11% overall) spent in nsBlockFrame::ReflowInlineFrames(). nsTextFrame::Reflow() (sans text measurement) still accounts for 30% of the inline reflow time, 208msec. . nsTextFrame's nsTextStyle ctor accounts for 45% of this (95msec), mostly because it has to get color (?) and font metrics information from the OS. - Getting the color information looks to fall through to the OS each time, which makes it more expensive that need be. - Furthermore, the implementation of nsXPLookAndFeel::GetColor() is atrocious given that we're calling this 88K times! This should a straight lookup, not a table crawl (which will *miss* most of the time). - But why do we even need color information at all here? [2] Probably the way to fix this is to make nsLookAndFeel (probably a singleton?) cache the color IDs in a table so it doesn't have to fall through to nsXPLookAndFeel *or* the OS. Getting color information accounts for 6% of the line reflow time (40msec, a bit under 1% of the overall time). [3] As for font, nsFontCache::GetMetricsFor() accounts for 16% of the inline reflow time (115msec, 2% overall). The font cache look to be implemented pretty inefficiently. - 156K calls to GetMetricsFor() fan-out to 430K calls to nsFont::Equals(). This accounts for 26% of the time. Hash? - Addref'ing and release'ing the language group atom accounts for about 30% of the time. Do we need to addref & release atoms here to compare? - Only 15% of the time is spent actually going to the OS to get font information. So, improving the implementation here might yield a 1-1.5% speedup. [4] . 43K calls to nsFrame::Invalidate() account for 60msec (30% of text reflow, 3% of overall reflow, 1% overall). Could these be batched? (In fact, it looks like this eager invalidation in nsTextFrame::Reflow() may be the result of us hacking around some other bug.) [5] . nsLineLayout::VerticalAlignFrames() accounts for 5% of reflow time (99msec). - About of 1/3 of that (29msec) goes to nsRenderingContextWin::SetFont(), which busies itself with our inefficient font cache. - About 1/2 of it (50msec) goes to nsHTMLReflowState::CalcLineHeight(), which *again* winds up being dominated by the cost to fetch font metrics information from the font cache. A surprising 15msec is spent in ftol() from ComputeLineHeight(). Perhaps we ought evaluate how the math is done near the end of this routine (the only place that floating point appears to be necessary). [6] . nsHTMLReflowState's ctor accounts for 79msec; we should go over this carefully as it's done all over the place. [7] . Looks like we're actually allocating the PerFrameData structures from the global heap. This probably should be allocated from the PresShell's pool. [8] . 30% of the time (1,853msec) is spent in 381 calls to PresShell::ContentAppended(). . 44% of that time (825msec, 14% overall) is spent in 64K calls to nsCSSFrameConstructor::ResolveStyleContext() . 22K calls to malloc() account for 13% of this time (112msec, 2% overall). The malloc-happiness appears to come from PR_Malloc() and PL_strdup(). The PL_strdup()'s come from nsStdURL::Resolve(), no doubt preparing URLs for history lookup. ns PR_Malloc()'s come from nsStdURL::Resolve(). . Wow, nsStdURL::Resolve() is a hot ticket. Looking at this another way, we see that nsHTMLAnchorElement::GetHrefCString() accounts for 42% of style resolution (350msec, 6% overall). Whee, purple links! What I wouldn't give to be able to do this baloney lazily. [9] . 27% of this time (504msec, 8% overall) is spent in 62K calls to nsCSSFrameConstructor::CreateGeneratedContentFrame(). Although I can't see why this simple HTML page would trigger it, we end up with 5.8K calls to nsHTMLDocument::CreateElement(), so it looks like we're actually _creating_ content for pseudos here? As far as I can tell, the calls to CreateElement() bottom out in NS_NewHTMLSpanElement(), so we're ending up with spans for some reason. Worth looking into. [10] . 4% of the time (75msec, 1% overall) is spent constructing 33K text frames from nsCSSFrameConstructor::ConstructFrameByTag(). . 6% of the time (107msec, 2% overall) is spent in 381 calls to nsCSSFrameConstructor::AppendFrames(). . 1/4 of that time (22msec) is spent in nsFrameList::LastChild(); shaver's patch to fix LastChild() ought to fix this. . nsLineBox::FindLineContaining(), called from nsBlockFrame::AppendFrames(), also goes O(n**2), and accounts for 59msec. This may also be fixed by shaver's patch. Bottom line: shaver's last-line hackery may yield a 2% speedup here; better on slower machines because we do more appends. . 5% of the time (85msec, 1% overall) is spent in 12K calls to nsCSSFrameConstructor::HaveSpecialBlockStyle(). This bottoms out in more pseudo-style probing. . In fact, we spend at least 123msec (7% of the time) probing pseudo style contexts; i.e., in nsPresContext::ProbePseudoStyleContextFor(). What is this and why is it so slow? [11] . 11% of the time (680msec) is spent in 202 calls to nsParser::Tokenize(). . nsScanner::ReadUntil() seems to be the sore thumb here. There are about 51K calls, and it accounts for 35% of the time (240msec, 4% overall). There is a huge (12x) fanout to the string routines, indicating some further analysis would be worth while. [12] [1] Bug 77938 [2] Bug 77939 [3] Bug 77941 [4] Bug 77942 [5] Bug 77945 [6] Bug 77947 [7] Bug 77948 [8] Bug 77949 [9] Bug 77953 [10] Bug 77954 [11] Bug 77956 [12] Bug 77959
I see that a lot of the profiles end up in API calls: GetTextExtentPoint32A, the color lookup, strlen, malloc... I just want to remind you that those routines are (by default? always?) timed by Quantify instead of doing micro level profiling. My experience with mozilla is that Quantify overestimates all timed routines, so that they _might_ not be as significant as it seems. I still have in fresh memory where inlining code made Quantify report 20% less time spent, but the wall clock still reported the same real time used just because the functions were profiled for real instead of timed. Also, if the timed functions are overestimated, it means that the other non timed functions have a larger real share than Qunatify displays.
Depends on: 78055
Depends on: 78911
Target Milestone: mozilla0.9.1 → mozilla0.9.2
*** Bug 68232 has been marked as a duplicate of this bug. ***
Target Milestone: mozilla0.9.2 → mozilla1.0
Maybe the summary should be changed to make it more general.
Blocks: 84466
Not platform specific.
OS: Windows NT → All
Hardware: PC → All
Blocks: 71668
Depends on: 90872
QA Contact: bsharma → moied
updating summary to reflect this meta tracking bug "slow loading typical long page" , was "Still slow reading Java SDK docs"
Summary: Still slow reading Java SDK docs → slow loading typical long page
Depends on: 77953
Summary: slow loading typical long page → [meta] slow loading typical long page
Depends on: 103266
Depends on: 106513
Depends on: 97815
Bugs targeted at mozilla1.0 without the mozilla1.0 keyword moved to mozilla1.0.1 (you can query for this string to delete spam or retrieve the list of bugs I've moved)
Target Milestone: mozilla1.0 → mozilla1.0.1
don't move bugs that are in the 1.0 dependency tree. sorry.
Target Milestone: mozilla1.0.1 → mozilla1.0
Target Milestone: mozilla1.0 → Future
adding 116437 to teh depends list
Depends on: 116437
Keywords: qawanted
Depends on: 124697
Keywords: mozilla1.0+
Keywords: mozilla1.0+mozilla1.0-
Blocks: 142143
Depends on: 112738
Depends on: 145425
Depends on: 148338
Depends on: 42412
Keywords: meta
QA Contact: moied → parser
Whiteboard: [Snappy]
Maybe this is a candidate for Snappy? I've seen a lot of bugs related to slow loading of long pages.
I do not think it is appropriate for Snappy to focus on responsiveness problems due to weird pages at this point. We have more painful bugs to kill.
Whiteboard: [Snappy]

We don't need this meta bug anymore, let's close it

Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → INACTIVE
Status: RESOLVED → VERIFIED
Resolution: INACTIVE → FIXED
You need to log in before you can comment on or make changes to this bug.