Closed
Bug 56854
Opened 24 years ago
Closed 6 years ago
[meta] slow loading typical long page
Categories
(Core :: DOM: HTML Parser, defect, P3)
Core
DOM: HTML Parser
Tracking
()
VERIFIED
FIXED
Future
People
(Reporter: omega2, Assigned: waterson)
References
(Depends on 5 open bugs, Blocks 1 open bug, )
Details
(Keywords: meta, perf)
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; WinNT4.0; en-US; m18) Gecko/20001010
BuildID: 2000101014
Overall performance reading JDK docs seems to have taken a hit. This may be
related to bug 5588, but I think it is somewhat different.
Reproducible: Always
Steps to Reproduce:
1.Load the page
2. Compare perf to IE5, NS 4.6
Actual Results: Slow, sluggish load times up to 3x normal
Expected Results:
Comment 1•24 years ago
|
||
win98 200101804 (post m18) trunk build, page loads quick and no problems with
scrolling.
any winnt people care to test this?
Keywords: qawanted
I changed the URL to one in which the problem seems to be more apparent. This
is still with M18/ NT4. The time at the bottom reads 29 secs with this page
which seems very slow.
Comment 3•24 years ago
|
||
we do load this page considerably slower thatn 4.x and ie 5.x
Over to Layout.
Assignee: asa → clayton
Status: UNCONFIRMED → NEW
Component: Browser-General → Layout
Ever confirmed: true
QA Contact: doronr → petersen
Comment 5•24 years ago
|
||
Triage
Assignee: dcone → harishd
Component: Layout → Parser
QA Contact: petersen → janc
It spends a lot of time here!
nsQueryInterface::nsQueryInterface(const nsQueryInterface & {...}) + 11 bytes
do_QueryInterface(nsISupports * 0x02f71aa0, unsigned int * 0x00000000) line 347
+ 27 bytes
nsCOMPtr<nsIPresShell>::Assert_NoQueryNeeded() line 499 + 17 bytes
nsGetterAddRefs<nsIPresShell>::~nsGetterAddRefs<nsIPresShell>() line 909
nsContainerFrame::GetOverflowFrames(nsIPresContext * 0x02fd7090, int 0) line 888
nsInlineFrame::Reflow(nsInlineFrame * const 0x00f50e14, nsIPresContext *
0x02fd7090, nsHTMLReflowMetrics & {...}, const nsHTMLReflowState & {...},
unsigned int & 1230364) line 265 + 14 bytes
nsLineLayout::ReflowFrame(nsIFrame * 0x00f50e14, nsIFrame * * 0x0012d054,
unsigned int & 1230364, nsHTMLReflowMetrics * 0x00000000, int & 0) line 921
nsBlockFrame::ReflowInlineFrame(nsBlockReflowState & {...}, nsLineLayout &
{...}, nsLineBox * 0x00f50fac, nsIFrame * 0x00f50e14, unsigned char *
0x0012c5cc) line 4368 + 29 bytes
nsBlockFrame::DoReflowInlineFrames(nsBlockReflowState & {...}, nsLineLayout &
{...}, nsLineBox * 0x00f50fac, int * 0x0012cc48, unsigned char * 0x0012ca90, int
0, int 0) line 4252 + 28 bytes
nsBlockFrame::DoReflowInlineFramesAuto(nsBlockReflowState & {...}, nsLineBox *
0x00f50fac, int * 0x0012cc48, unsigned char * 0x0012ca90, int 0, int 0) line
4186 + 42 bytes
nsBlockFrame::ReflowInlineFrames(nsBlockReflowState & {...}, nsLineBox *
0x00f50fac, int * 0x0012cc48, int 0, int 0) line 4131 + 32 bytes
nsBlockFrame::ReflowLine(nsBlockReflowState & {...}, nsLineBox * 0x00f50fac, int
* 0x0012cc48, int 0) line 3267 + 29 bytes
nsBlockFrame::ReflowDirtyLines(nsBlockReflowState & {...}) line 2956 + 27 bytes
nsBlockFrame::Reflow(nsBlockFrame * const 0x00f50dc8, nsIPresContext *
0x02fd7090, nsHTMLReflowMetrics & {...}, const nsHTMLReflowState & {...},
unsigned int & 0) line 1746 + 15 bytes
nsBlockReflowContext::DoReflowBlock(nsHTMLReflowState & {...}, nsReflowReason
eReflowReason_Initial, nsIFrame * 0x00f50dc8, const nsRect & {...}, int 1, int
0, int 0, nsMargin & {...}, unsigned int & 0) line 561 + 36 bytes
nsBlockReflowContext::ReflowBlock(nsIFrame * 0x00f50dc8, const nsRect & {...},
int 1, int 0, int 0, nsMargin & {...}, unsigned int & 0) line 331 + 50 bytes
nsBlockFrame::ReflowBlockFrame(nsBlockReflowState & {...}, nsLineBox *
0x00f5b338, int * 0x0012d780) line 3884 + 56 bytes
nsBlockFrame::ReflowLine(nsBlockReflowState & {...}, nsLineBox * 0x00f5b338, int
* 0x0012d780, int 1) line 3149 + 23 bytes
nsBlockFrame::ReflowDirtyLines(nsBlockReflowState & {...}) line 2956 + 27 bytes
nsBlockFrame::Reflow(nsBlockFrame * const 0x00e198fc, nsIPresContext *
0x02fd7090, nsHTMLReflowMetrics & {...}, const nsHTMLReflowState & {...},
unsigned int & 0) line 1746 + 15 bytes
nsBlockReflowContext::DoReflowBlock(nsHTMLReflowState & {...}, nsReflowReason
eReflowReason_Incremental, nsIFrame * 0x00e198fc, const nsRect & {...}, int 1,
int 299, int 0, nsMargin & {...}, unsigned int & 0) line 561 + 36 bytes
nsBlockReflowContext::ReflowBlock(nsIFrame * 0x00e198fc, const nsRect & {...},
int 1, int 299, int 0, nsMargin & {...}, unsigned int & 0) line 331 + 50 bytes
nsBlockFrame::ReflowBlockFrame(nsBlockReflowState & {...}, nsLineBox *
0x00ef0aa8, int * 0x0012e2b8) line 3884 + 56 bytes
nsBlockFrame::ReflowLine(nsBlockReflowState & {...}, nsLineBox * 0x00ef0aa8, int
* 0x0012e2b8, int 1) line 3149 + 23 bytes
nsBlockFrame::ReflowDirtyLines(nsBlockReflowState & {...}) line 2956 + 27 bytes
nsBlockFrame::Reflow(nsBlockFrame
Buster is the man ;-)
Assignee: harishd → buster
A simple wall-clock test on my machine showed the rtm candidate build from 11/06
loaded this page in a similar fashion as IE 4 on WinNT. NS6 was slightly slower
(25 seconds vs. 20 seconds.) A real test of layout requires doing the load with
a local file to remove network traffic noise. Not a high priority this week,
but we'll be doing lots of performance work over the next few months for the
next release.
Status: NEW → ASSIGNED
Target Milestone: --- → Future
I have performed this test locally. IE5/NT4 loads the page in about 3-4
seconds, M16 in about 25. This is a big difference! This page is a part of the
JDK 1.3 documentation and can be downloaded seperately.
Assignee | ||
Comment 12•24 years ago
|
||
Taking; will get a profile done by m0.9, but may not have a fix.
Assignee: buster → waterson
Status: ASSIGNED → NEW
Comment 13•24 years ago
|
||
1/3 of the total time is spent in nsBlockFrame::RecoverStateFrom which is called
1.5 million times from nsBlockFrame::ReflowDirtyLines which in itself is called
35000 times. I don't know what nsBlockFrame::RecoverStateFrom does, but it seems
like something worth looking closer on.
Assignee | ||
Comment 14•24 years ago
|
||
On my machine, Quantify claims the total time for this page load is about
9,300msec. As Danne notes, the sore thumb is reflow. Loading this page causes
240 reflows (reasonable), which account for 5,285msec of time (unreasonable).
The problem is the O(n**2) fanout at nsBlockFrame::ReflowDirtyLines().
Specifically, 16,962 calls to ReflowDirtyLines() fans out to 1.3M calls to
nsBlockFrame::RecoverStateFrom(), accounting for about 65% of the reflow time
(3,400msec).
nsBlockFrame::RecoverStateFrom() appears to end up being particularly expensive
because most of the line boxes contain blocks. This sends us through what
appears to be a fairly expensive code path in
nsBlockReflowState::RecoverVerticalMargins(). Ideally, we'd avoid the O(n**2)
problem by storing off the state we need somewhere (and avoid ``recovering'' it
altogether). But barring that (for, say, space reasons), we may be able to to
optimize RecoverVerticalMargins().
The rest of reflow is spent reflowing lines: 1,420msec in 15,143 calls to
nsBlockFrame::ReflowInlineFrames(). This time is dominated by text measurement
in nsTextFrame::MeasureText().
Probably worth looking into at some point will be frame construction. There are
12,180 calls to nsCSSFrameConstructor::ContentAppended(), which winds up
fanning out to 63,739 calls to nsCSSFrameConstructor::ConstructFrame(). This
accounts for 1,729msec of time. 371msec is spent in 11,763 calls to
nsHTMLAnchorElement::GetHrefCString(), called from nsStyleUtil::IsHTMLLink().
We do this so we can look up the link state in global history. (That's about
20% of frame construction time.)
I'm going to see if I can improve the O(n**2)-ness in RecoverStateFrom().
Status: NEW → ASSIGNED
Comment 15•24 years ago
|
||
bug 64858 is probably a duplicate. See:
http://www.damowmow.com/mozilla/bugs/slow/002.html
Assignee | ||
Comment 16•24 years ago
|
||
End-of-milestone reality check.
Target Milestone: mozilla0.9 → mozilla0.9.1
Assignee | ||
Comment 17•24 years ago
|
||
This is where I write another chapter in my book entitled, ``I've been so burned
by Quantify!''
When we last left Our Heroes, they'd made the assumption that the O(n**2)
behavior of RecoverStateFrom() was the culprit. Well, it is and it isn't. I've
put together an atrocious hack that avoids doing _any_ state recovery by caching
the nsBlockReflowState object's per-line data stuff. On my fast machine, it
doesn't make a white of difference. Well, maybe a whit: page load time dropped
from about 11s to about 9s -- regardless, a far cry from the ~55% win I was
expecting.
So I thought about this a bit, and realized that on a fast machine, the layout
reflow timers don't fire often enough to exacerbate the O(n**2) behavior of
state recovery. In other words, the fast machine slurps up enough data that we
don't need to do very many reflows. Unfortunately, when we're running under
Quantify, the timers fire at the same rate, but the ``crippled'' binary does
much less work per unit of time. This leads to more incremental reflows,
which makes the O(n**2) algorithm jump out.
To justify my self-worth, I moved my build over to a very, very slow machine
(166MHz running Win98). On this machine, it drops the layout time from over 70s
to about 50s. (On this machine, IE5 loads the page in 11s.)
Once I get some of the other miscellaneous block frame changes out of my tree,
I'll post a patch, and we can discuss whether this is worth pursuing. (``It's
Risky.'') shaver: it'd probably work well if integrated with your last-line
stuff.
I'll post another profile (with my hacks, so the timers don't confound the
results) once I get some free time to analyze it.
Assignee | ||
Comment 18•24 years ago
|
||
Profiled with my changes to speed up state recovery.
Total time attributed to main() is 6,104msec. (All times attributed to
functions are F+D, unless noted otherwise.)
. 33% of the time (1,993msec) is spent processing 106 reflow commands.
. 1/3 of that time (670msec, about 11% overall) is spent measuring
text in nsTextFrame::MeasureText().
- Half of that (330msec, 5% overall) is spent in
GetTextExtentPoint32A(); i.e., the native toolkit.
- nsTextFrame::ComputeWordFragmentWidth() accounts for 22% of text
measurement time (150msec, 2.5% overall). This should be
examined carefully. [1]
. Deleting the time spent measuring text leaves 688msec (35% of
reflow, or 11% overall) spent in nsBlockFrame::ReflowInlineFrames().
nsTextFrame::Reflow() (sans text measurement) still accounts for
30% of the inline reflow time, 208msec.
. nsTextFrame's nsTextStyle ctor accounts for 45% of this
(95msec), mostly because it has to get color (?) and font
metrics information from the OS.
- Getting the color information looks to fall through to the OS
each time, which makes it more expensive that need be.
- Furthermore, the implementation of nsXPLookAndFeel::GetColor()
is atrocious given that we're calling this 88K times! This
should a straight lookup, not a table crawl (which will *miss*
most of the time).
- But why do we even need color information at all here? [2]
Probably the way to fix this is to make nsLookAndFeel (probably
a singleton?) cache the color IDs in a table so it doesn't have
to fall through to nsXPLookAndFeel *or* the OS. Getting color
information accounts for 6% of the line reflow time (40msec, a
bit under 1% of the overall time). [3]
As for font, nsFontCache::GetMetricsFor() accounts for 16% of
the inline reflow time (115msec, 2% overall). The font cache
look to be implemented pretty inefficiently.
- 156K calls to GetMetricsFor() fan-out to 430K calls to
nsFont::Equals(). This accounts for 26% of the time. Hash?
- Addref'ing and release'ing the language group atom accounts
for about 30% of the time. Do we need to addref & release
atoms here to compare?
- Only 15% of the time is spent actually going to the OS to get
font information.
So, improving the implementation here might yield a 1-1.5%
speedup. [4]
. 43K calls to nsFrame::Invalidate() account for 60msec (30% of
text reflow, 3% of overall reflow, 1% overall). Could these be
batched? (In fact, it looks like this eager invalidation in
nsTextFrame::Reflow() may be the result of us hacking around
some other bug.) [5]
. nsLineLayout::VerticalAlignFrames() accounts for 5% of reflow time
(99msec).
- About of 1/3 of that (29msec) goes to
nsRenderingContextWin::SetFont(), which busies itself with our
inefficient font cache.
- About 1/2 of it (50msec) goes to
nsHTMLReflowState::CalcLineHeight(), which *again* winds up
being dominated by the cost to fetch font metrics information
from the font cache. A surprising 15msec is spent in ftol() from
ComputeLineHeight(). Perhaps we ought evaluate how the math is
done near the end of this routine (the only place that floating
point appears to be necessary). [6]
. nsHTMLReflowState's ctor accounts for 79msec; we should go over
this carefully as it's done all over the place. [7]
. Looks like we're actually allocating the PerFrameData structures
from the global heap. This probably should be allocated from the
PresShell's pool. [8]
. 30% of the time (1,853msec) is spent in 381 calls to
PresShell::ContentAppended().
. 44% of that time (825msec, 14% overall) is spent in 64K calls to
nsCSSFrameConstructor::ResolveStyleContext()
. 22K calls to malloc() account for 13% of this time (112msec, 2%
overall). The malloc-happiness appears to come from PR_Malloc()
and PL_strdup(). The PL_strdup()'s come from
nsStdURL::Resolve(), no doubt preparing URLs for history
lookup. ns PR_Malloc()'s come from nsStdURL::Resolve().
. Wow, nsStdURL::Resolve() is a hot ticket. Looking at this
another way, we see that nsHTMLAnchorElement::GetHrefCString()
accounts for 42% of style resolution (350msec, 6%
overall). Whee, purple links! What I wouldn't give to be able
to do this baloney lazily. [9]
. 27% of this time (504msec, 8% overall) is spent in 62K calls to
nsCSSFrameConstructor::CreateGeneratedContentFrame(). Although I
can't see why this simple HTML page would trigger it, we end up
with 5.8K calls to nsHTMLDocument::CreateElement(), so it looks
like we're actually _creating_ content for pseudos here? As far as
I can tell, the calls to CreateElement() bottom out in
NS_NewHTMLSpanElement(), so we're ending up with spans for some
reason. Worth looking into. [10]
. 4% of the time (75msec, 1% overall) is spent constructing 33K text
frames from
nsCSSFrameConstructor::ConstructFrameByTag().
. 6% of the time (107msec, 2% overall) is spent in
381 calls to nsCSSFrameConstructor::AppendFrames().
. 1/4 of that time (22msec) is spent in nsFrameList::LastChild();
shaver's patch to fix LastChild() ought to fix this.
. nsLineBox::FindLineContaining(), called from
nsBlockFrame::AppendFrames(), also goes O(n**2), and accounts
for 59msec. This may also be fixed by shaver's patch.
Bottom line: shaver's last-line hackery may yield a 2% speedup
here; better on slower machines because we do more appends.
. 5% of the time (85msec, 1% overall) is spent in 12K calls to
nsCSSFrameConstructor::HaveSpecialBlockStyle(). This bottoms out
in more pseudo-style probing.
. In fact, we spend at least 123msec (7% of the time) probing pseudo
style contexts; i.e., in
nsPresContext::ProbePseudoStyleContextFor(). What is this and why
is it so slow? [11]
. 11% of the time (680msec) is spent in 202 calls to
nsParser::Tokenize().
. nsScanner::ReadUntil() seems to be the sore thumb here. There are
about 51K calls, and it accounts for 35% of the time (240msec, 4%
overall). There is a huge (12x) fanout to the string routines,
indicating some further analysis would be worth while. [12]
[1] Bug 77938
[2] Bug 77939
[3] Bug 77941
[4] Bug 77942
[5] Bug 77945
[6] Bug 77947
[7] Bug 77948
[8] Bug 77949
[9] Bug 77953
[10] Bug 77954
[11] Bug 77956
[12] Bug 77959
Comment 19•24 years ago
|
||
I see that a lot of the profiles end up in API calls: GetTextExtentPoint32A, the
color lookup, strlen, malloc... I just want to remind you that those routines
are (by default? always?) timed by Quantify instead of doing micro level
profiling. My experience with mozilla is that Quantify overestimates all timed
routines, so that they _might_ not be as significant as it seems.
I still have in fresh memory where inlining code made Quantify report 20% less
time spent, but the wall clock still reported the same real time used just
because the functions were profiled for real instead of timed.
Also, if the timed functions are overestimated, it means that the other non
timed functions have a larger real share than Qunatify displays.
Assignee | ||
Updated•24 years ago
|
Target Milestone: mozilla0.9.1 → mozilla0.9.2
Assignee | ||
Comment 20•24 years ago
|
||
*** Bug 68232 has been marked as a duplicate of this bug. ***
Assignee | ||
Updated•24 years ago
|
Target Milestone: mozilla0.9.2 → mozilla1.0
Comment 21•24 years ago
|
||
Maybe the summary should be changed to make it more general.
Comment 23•23 years ago
|
||
updating summary to reflect this meta tracking bug "slow loading typical long
page" , was "Still slow reading Java SDK docs"
Summary: Still slow reading Java SDK docs → slow loading typical long page
Depends on: 77953
Summary: slow loading typical long page → [meta] slow loading typical long page
Comment 24•23 years ago
|
||
Bugs targeted at mozilla1.0 without the mozilla1.0 keyword moved to mozilla1.0.1
(you can query for this string to delete spam or retrieve the list of bugs I've
moved)
Target Milestone: mozilla1.0 → mozilla1.0.1
Comment 25•23 years ago
|
||
don't move bugs that are in the 1.0 dependency tree. sorry.
Target Milestone: mozilla1.0.1 → mozilla1.0
Assignee | ||
Updated•23 years ago
|
Target Milestone: mozilla1.0 → Future
Updated•23 years ago
|
Keywords: mozilla1.0+
Updated•23 years ago
|
Keywords: mozilla1.0+ → mozilla1.0-
Updated•15 years ago
|
QA Contact: moied → parser
Updated•13 years ago
|
Whiteboard: [Snappy]
Comment 27•13 years ago
|
||
Maybe this is a candidate for Snappy? I've seen a lot of bugs related to slow loading of long pages.
Comment 28•13 years ago
|
||
I do not think it is appropriate for Snappy to focus on responsiveness problems due to weird pages at this point. We have more painful bugs to kill.
Whiteboard: [Snappy]
Comment 29•6 years ago
|
||
We don't need this meta bug anymore, let's close it
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → INACTIVE
Updated•6 years ago
|
Status: RESOLVED → VERIFIED
Resolution: INACTIVE → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•