Closed Bug 145425 Opened 23 years ago Closed 13 years ago

extreme cpu, memory usage and time to load big HTML file (in comparsion with other browsers)

Categories

(Core :: Layout, defect, P2)

x86
All
defect

Tracking

()

RESOLVED WORKSFORME
Future

People

(Reporter: aha, Assigned: waterson)

References

(Depends on 1 open bug)

Details

(4 keywords, Whiteboard: [reflow-refactor])

Attachments

(1 file, 6 obsolete files)

Linkchecker Xenu has no problem to generate several megs big HTML file as its report. Last week generate about 30 MB big HTML files and start Mozilla to display this report. Mozilla hangs for several minutes and its memory used growns to stars. So I prepared similar structured file (32 MBs, PREformated text with lot of links) and compared several common used browsers, below are results. Mozilla needs 13x more time to render page than NN 4.79 and eats near to 2x more memory than other browsers. In NN4.79 you could scroll document while loading, in Mozilla you can't anything. Mozilla 2002051708/trunk: Memory used before test: 16 MB Time to 1st displayed char: 4 s Time to load: 302 s Memory used peak: 239 MB Memory on Document Done: 202 MB Mozilla 2002051006/RC2: Memory used before test: 16 MB Time to 1st displayed char: 10 s Time to load: 292 s Memory used peak: 256 MB Memory on Document Done: 202 MB NN 4.79 Memory used before test: 9 MB Time to 1st displayed char: 0 s Time to load: 23 s Memory used peak: 132 MB Memory on Document Done: 130 MB Opera 6.02 (B#1101): Memory used before test: 10 MB Time to 1st displayed char: 0 s Time to load: 90 s Memory used peak: 151 MB Memory on Document Done: 151 MB MSIE 5.0: Memory used before test: 9.5 MB Time to 1st displayed char: 1 s Time to load: 58 s Memory used peak: 133 MB Memory on Document Done: 132 MB
Big testcase file (ZIP, 1.9MB) http://www.hauner.cz/testcasehtml.zip Tested on Win2K (SP2), AMD Duron 950 MHz, 512 MB RAM. While test was constant CPU usage by other running application (just supporting test). All browsers was started just before separate test with blank page.
Keywords: 4xp, perf, testcase
-> Layout
Assignee: Matti → attinasi
Component: Browser-General → Layout
QA Contact: imajes-qa → petersen
This is a duplicate. Please search for the "slow large page" tracking bug (those exact words). Then dup to one of its dependencies.
Whiteboard: DUPEME
Boris. 0 bugs found... I cannot reproduce this bug on Linux. testcase load very, very fast (<1 sec) Mozilla 1505200207 Linux redHat 7.3 KDE 1 ghz, 128 ram, 8 mbps connection.
Marking WFM in the OS X May 19th build (2002-05-19-05)
Status: UNCONFIRMED → RESOLVED
Closed: 23 years ago
Resolution: --- → WORKSFORME
Chris: Huh? You marked this bug reported for W2K as WFM, if don't occur on Linux and OS X? I retried it again using 2002052108/trunk with same result - 5 minutes in hang, memory usage peak on 270 MBs, after load 205 MBs. Futhermore, I tested this bug now on Linux comp and opening and displaying file from local disk takes 5,5 minutes (2002051920/RH7.1) and similar amount of memory. Bug is still present, so I'm REOPENing it. Zbigniew: Did you try small file atteched with this bug or did you download ZIP file with real (and big) testcase?
Status: RESOLVED → UNCONFIRMED
Resolution: WORKSFORME → ---
I would be willing to be that we spend a large amount of time looking up the visited state of each of these links in global history. Reporter: could you try this with a clean profile and see if there is any difference in performance?
Chris: Using 2002052508/trunk/W2K with new profile and empty history I'm getting same results (253 MB peak and 205 MB after page load). Do you have any other idea, where browser could spend time? Boris: Zbigniew Braniecki tried to find bug to duplicate, but he wasn't lucky. (I'm changing OS to All because it occurs on Linux too - my comment 7).
OS: Windows 2000 → All
ah, sorry. "long" not "large". This bug needs a profile; otherwise we're all just guessing. I'll try to do that in a few weeks.
Blocks: 56854
Keywords: qawanted
Taking for now. What bzbarsky says: we need a profile.
Assignee: attinasi → waterson
Status: UNCONFIRMED → NEW
Ever confirmed: true
jprof was set up as setenv JPROF_FLAGS "JP_DEFER JP_PERIOD=0.0015" trunk pulled may 25 Wall clock time to load 210 seconds
The jprof info seems to point away from checking the links, although I'm not so good at interpreting jprof output. However, I did also test making |nsStyleUtil::IsHTMLLink| into an early return (no-op) and it seemed to reduce wall clock time by about 10%. (I'll also note in passing that because of 8bit chars in some of the mailto: urls in that page, we load (on win32) msgcompo.dll and create a new nsIURI to parse those mail URLs just to pass them as char* to the global history to check the link for visited state. Ugh. 4.x linux doesn't appear to bother with :visited state for mailto: links).
(Actually, 8bit-in-mailto is too rare to worry about the extra cycles).
So.. the first obvious culprit is nsFrameList::LastChild. Would it make any sense to add a mLastChild member to nsFrameList? Because as it stands appending frames has to walk the entire existing frame list, so there are O(n^2) calls to LastChild() on pageload.... (and this starts to quickly suck for cases with lots of child frames). What would the footprint cost of adding this member be?
Every container frame would grow by a word, which isn't death but... It looks like the only reason that we're trying to _use_ the LastChild in nsCSSFrameConstructor::AppendFrames is to determine if there are :after pseudos -- a fairly unlikely case which we could probably afford to handle in a less efficient way.
Did you see my proposal in 144826?
Changing priority to P2.
Priority: -- → P2
Some thoughts about nsFrameList and adding |mLastChild|: Do we always go through the nsFrameList appending functions rather than just using SetNextSibling? (Note that things done in batches within the frame constructor already use the last child member of nsFrameItems.) Would it be better to do something similar to what I did for the rule node child storage: use a tagged pointer, where the low bit being unset means that it's just a first child, and if it's set, it points to a pair of pointers that has first and last child? We could then avoid switching to the latter mechanism until we hit 8 or 16 frames or so.
dbaron, I'm not sure how you're proposal would make things different in this case. Wouldn't we still need to get the last frame to look at it and see if it was the generated frame? I truncated the big file to 100,000 lines, and did some timing tests: 67s Current code 59s `#if 0' LastFrame check; always append content. 61s Probe parent for :after pseudo, and only then check to see if LastFrame is the pseudo frame (this patch).
Chris, any chance of seeing how that patch affects the pageload tests, if at all?
On the giant test case, I get: 267s #if 0 265s (!?) probe 308s original I'm just looking at the `Document: Done' number, so this isn't very scientific. The important thing is that we do start to see an algorithmic win on really, really, long pages. As for page load: Original. Test id: 3CF6C24902 Avg. Median : 734 msec Minimum : 177 msec Average : 758 msec Maximum : 2345 msec Test id: 3CF6C6D9C0 Avg. Median : 739 msec Minimum : 210 msec Average : 753 msec Maximum : 2353 msec Probe. Test id: 3CF6BCAF43 Avg. Median : 736 msec Minimum : 190 msec Average : 744 msec Maximum : 2292 msec Test id: 3CF6C03F6F Avg. Median : 734 msec Minimum : 179 msec Average : 745 msec Maximum : 2331 msec So it looks like probing might be a slight win, but it's probably just in the noise.
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla1.1alpha
Comment on attachment 85661 [details] [diff] [review] probe parent for pseudo before walking frame list, diff -uw + // See if the eparent has an :after pseudo-element "the parent" right? r=bzbarsky with that. If it does not regress pageload I say land it and then reprofile.
Attachment #85661 - Flags: review+
Chris, could you retarget this bug as 1.1a is out?
Keywords: footprint
dbaron says we ought to put a bit in the style context that indicates whether or not a frame has a :before or :after pseudo; that way, we'll avoid doing the pseudo-probe. Since this is only an issue for really, really, really big pages (and it may have a negative impact on small pages), I'm going to FUTURE.
Target Milestone: mozilla1.1alpha → Future
We should look more at that jprof. There are some interesting things other than the Append to examine. Almost 1/2 the time is spent in laying out the data. Random example: around 150 hits in nsFont::Equals, similar numbers in converting text strings to single-byte for getting the width for layout. We spend much more than than (>5%-ish if I remember) dealing with creating and recovering block reflow states.
I re-test (with my primitive technique) giant testcase and I got these _bad_ numbers for actual nightbuild (used hardware is same): Mozilla 200212108/trunk: Memory used before test: 21 MB Time to 1st displayed char: 1 s Time to load: 689 s Memory used peak: 313 MB Memory on Document Done: 313 MB
Blocks: 23187
Whiteboard: DUPEME
Blocks: 174359
Blocks: 29805
Blocks: 112858
Attached patch waterson's patch, updated to trunk. (obsolete) (deleted) — Splinter Review
Attachment #85661 - Attachment is obsolete: true
Comment on attachment 121076 [details] [diff] [review] waterson's patch, updated to trunk. roc, would you do the honors?
Attachment #121076 - Flags: superreview?(roc+moz)
Attachment #121076 - Flags: review?(roc+moz)
Attached patch A bit more cleanup... (obsolete) (deleted) — Splinter Review
Attachment #121076 - Attachment is obsolete: true
Attachment #121079 - Flags: superreview?(roc+moz)
Attachment #121079 - Flags: review?(roc+moz)
Attachment #121076 - Flags: superreview?(roc+moz)
Attachment #121076 - Flags: review?(roc+moz)
Comment on attachment 121079 [details] [diff] [review] A bit more cleanup... > + NS_PRECONDITION(aContent, "Must have a content node"); This should actually be removed from nsLayoutUtils::IsGeneratedContentFor
Comment on attachment 121079 [details] [diff] [review] A bit more cleanup... OK, that looks fine. You could make HasPseudoStyle inline since it's really just wrapping a single method call.
Attachment #121079 - Flags: superreview?(roc+moz)
Attachment #121079 - Flags: superreview+
Attachment #121079 - Flags: review?(roc+moz)
Attachment #121079 - Flags: review+
Comment on attachment 121079 [details] [diff] [review] A bit more cleanup... removed extra assert, inlined that function, and checked in. Time to reprofile here, I'd say....
Attachment #121079 - Attachment is obsolete: true
Blocks: 203448
Are there other issues implied in this bug or can it be marked fix?
See comment 26 and comment 33 -- we need a new profile to identify any remaining problems.
dbradley/bz - could you do a profile to further investigate what can be done here.
Attached file Updated jprof of testcase (obsolete) (deleted) —
Looks like a good deal of time is spent in reflow or in GetPrimaryFrameFor (the latter all on #text nodes, in response to content being appended to those nodes...)
Attachment #85405 - Attachment is obsolete: true
Perhaps we need a way to avoid triggering reflow off these AppendData calls in cases when the text node's frame has not yet been constructed (or at least the text node's parent has not been flushed from the sink)? Or do we have one?
Depends on: 115049
Blocks: 147894
Depends on: 233463
Depends on: 74656
Depends on: 235255
Can someone do a fresh test/analysis of current builds?
quick load on Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a1) Gecko/20050903 Firefox/1.6a1 with 2ghz amd 64 less than 2s. memory usage up 200k.
*** Bug 301862 has been marked as a duplicate of this bug. ***
*** Bug 61684 has been marked as a duplicate of this bug. ***
Blocks: 267525
(In reply to comment #39) > Can someone do a fresh test/analysis of current builds? > ditto (again)
Not worth it till the reflow branch lands.
Whiteboard: [reflow-refactor]
Attached file Bzip2-compressed testcase. (deleted) —
For what it's worth, I just tested on reflow branch and performance seems largely unaffected. I'll reprofile after the branch lands.
Attachment #84154 - Attachment is obsolete: true
Attachment #126694 - Attachment is obsolete: true
Attachment #234142 - Attachment is patch: false
Attachment #234142 - Attachment mime type: text/plain → application/bzip2
QA Contact: chrispetersen → layout
Performance in the testcase is better in Firefox 8.0a2 than in IE9. However, adding to MemShrink.
Whiteboard: [reflow-refactor] → [reflow-refactor][MemShrink]
On my Linux box, we load this page faster than chrome (7s versus 19s) and use almost exactly the same amount of memory. However, this testcase gives us high (~50%) heap-unclassified. I imagine this is covered by one of our existing darkmatter bugs, but I'll open a separate bug with the testcase just in case.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago13 years ago
Resolution: --- → WORKSFORME
Whiteboard: [reflow-refactor][MemShrink] → [reflow-refactor]
I would expect that textruns might account for a lot of the heap here... in any case, please cc me and njn on the new bug?
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: