Closed Bug 352367 Opened 18 years ago Closed 2 years ago

Firefox loads large tables with forms MUCH slower as they grow in size

Categories

(Core :: Layout, defect)

x86
All
defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME
Performance Impact ?

People

(Reporter: bugzilla, Assigned: jesup)

References

(Depends on 2 open bugs, )

Details

(Keywords: perf, testcase, Whiteboard: jar:https://bugzilla.mozilla.org/attachment.cgi?id=238716!/index.html)

Attachments

(6 files, 3 obsolete files)

User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.6) Gecko/20060802 Mandriva/1.5.0.6-1.4.20060mdk (2006.0) Firefox/1.5.0.6 Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.6) Gecko/20060802 Mandriva/1.5.0.6-1.4.20060mdk (2006.0) Firefox/1.5.0.6 Bernd <bernd_mozilla@gmx.de> asked me to refile this with smaller test cases, but you can see the original submission on bug #328858. IE's speed on form display is linear. Firefox is not. It's not trivially exponential, either. It has burps as we go to larger tables, but on large tables, Firefox is unusably slow. This is an old problem, and has been around for years. It tends to keep Firefox out of corporate America, because Intranet apps are sometimes laundry lists with form elements. IE handles them just fine, but Firfox can take 15 seconds to minutes to load the page. It's very easy to just have a policy to use IE, and this problem goes away. A page with 911 form elements takes a small fraction of a second to load. One with 1821 takes twenty times longer. After that initial bump, things settle down to a linear speed: SLOW. Reproducible: Always Steps to Reproduce: 1. go to the URL (http://www.global-com.com/pom2/firefox_vs_ie) 2. click on the second table of examples. 3. But read the others without clicking, as you'll get bored fast if you do that. Actual Results: Firefox is magnificently slow. Which is a pity, since it was doing well against IE in the small tests, and we all know it's a much better browser in terms of standards compliance, security, etc. Expected Results: I expect Firefox to at least keep up with IE. A little slower would be fine. TWICE as slow would be fine. There's something going on here in the data structures. I can smell it. I'd assume some huge blowup which causes swapping, but I watched my memory -- lots of space. Is there a tree which gets so lopsided it's a list?
Keywords: topperf
*** Bug 328858 has been marked as a duplicate of this bug. ***
Paul, thank you for the very clear and detailed bug report! Bernd will comment in a minute with the numbers he's seeing in Firefox (trunk) and IE on the testcases (mostly on the "Text with some input boxes and a style in a table (10 times bigger)" testcase), but he asked me to look at this too. Here's what I have so far. Current trunk builds and current reflow branch builds are pretty similar. Reflow branch may be about 20% faster on the largest testcase, but no more than that. So we can't expect things to miraculously get better when the branch lands. Sad. ;) I cannot reproduce the large time jump when going from "5 Instances" to "10 instances". I wonder whether this is due to the way the time measurement is done. The testcases currently measure the time it takes to get to parsing the second <script> tag. Depending on HTTP packet boundaries and general timing characteristics of the hardware, incremental layout may or may not happen between parsing the two script tags. If we cross such a phase transition at the 5->10 boundary, that would explain the results. Paul, could you try putting the "end timing" script into the onload handler for the page? If you do that, do you still see the extreme nonlinearity in number of instances? I'm going to work on profiling the "Text with some input boxes and a style in a table (10 times bigger)" testcase, since Bernd reports that we're about 6 times slower than IE on that one for him. But he also reports that IE doesn't seem to do incremental rendering on that testcase.... :( Oh, I was wondering... Do we have your permission to attach those testcases to this bug? That would make sure we don't lose track of them.
Status: UNCONFIRMED → NEW
Component: General → Layout
Ever confirmed: true
Product: Firefox → Core
QA Contact: general → layout
Version: unspecified → Trunk
Keywords: perf
There is some network dependency (or the amount of incr. reflows) http://www.global-com.com/pom2/firefox_vs_ie/b6.html ff old trunk (20060111) 3.5sec IE 3.2 sec http://www.global-com.com/pom2/firefox_vs_ie/b6a.html ff: 138 sec. IE: 30.6 sec Especially scaring is the time it takes to close a tab once http://www.global-com.com/pom2/firefox_vs_ie/b6a.html is loaded It might leak...
I tested a few browsers with bzbarsky's version of b6a http://web.mit.edu/bzbarsky/www/test.html loaded from harddrive. System was a 2.4GHz P4 with Debian Linux: SeaMonkey 1.0b: 186.959s (hang when closing) Galeon 1.3.2: 144.756s Konqueor 3.3.2: 24.782s Mozilla 1.7: 282.844s (hang when closing) Opera 9.01: 4.218s So Opera was by far the fastest browser, and both SeaMonkey and Moz 1.7 showed significant hangs when closing the page. It's noticeable, that Galeon (using GTK2+ and not XUL for frontend) didn't suffer from the hang at closing.
Depends on: 352390
Attached file bz2-compressed profile (deleted) —
Sadly, ZIP doesn't create a small enough file that I could attach it...
Attachment #238032 - Attachment is patch: false
Attachment #238032 - Attachment mime type: text/plain → application/x-bzip2
Separate bug on the "hang on close" problem, please. I filed bug 352390 on one obvious problem in that profile, which was done with a reflow branch build. In general, the breakdown for that profile is: Total hit count: 1281040. This breaks down largely as: 144772 nsContentUtils::GenerateStateKey (due to bug 352390). 495334 nsCSSFrameConstructor::ContentAppended (constructing layout objects). 584413 PresShell::ProcessReflowCommands Of the time under nsCSSFrameConstructor::ContentAppended, about 296115 hits are under nsTextControlFrame::InitEditor. That could be fixed by bug 221820, I think, but I filed bug 352394 on the more localized version of the problem. Under reflow, we have: 418422 BasicTableLayoutStrategy::ComputeColumnIntrinsicWidths Which breaks down as: 231564 GetWidthInfo 156529 nsTableCellMap::GetCellInfoAt 14698 nsTableFrame::RoundToPixel So it seems like we're having to do a lot of column intrinsic width recomputation... I wonder whether we could store more state there as we go or something...
Depends on: 221820, 352394
The non-colspan part could be optimized pretty easily for handling of row addition. Optimizing the colspan part would be harder, except to just skip the colspan part entirely when there are no column-spanning cells.
(Unless, of course, I decide to change to a row-order-dependent table layout algorithm. Such an algorithm would have performance advantages...)
I think I can reduce the part that we spend in nsTableCellMap::GetCellInfoAt significantly. What we typically do in BasicTableLayoutStrategy::ComputeColumnIntrinsicWidths is we loop over a row and pick up all cells. In order to do this we ask the cellmap what is pos(col,row). The function spends a large part of the time to find the cellmap containing the row and then returns the information. Now imagine a iterator that we setup initialy which returns the first cell on a row and then has a Next function which returns the next cell in this row together with its starting index. Knowing the startindex and the colspan we can advance the loop index. Now that I wrote it,... we have the infrastructure allready: nsTableRowFrame::GetFirstCell() nsTableCellFrame::GetNextCell() and nsTableCellFrame::GetColIndex(PRInt32 &aColIndex) do what I proposed
oh man, writing before the first coffee is not a good idea: we loop over the rows and not cols, so one would need a iterator in the cellmap that replaces the part: nsTableCellFrame *cellFrame = cellMap->GetCellInfoAt(row, col, &originates, &colSpan); if (!cellFrame || !originates || colSpan > 1) continue;
Depends on: 352461
I splitted the cellmap part off to bug 352461
Keywords: testcase
OS: Linux → All
> Oh, I was wondering... Do we have your permission to attach those testcases to > this bug? That would make sure we don't lose track of them. Boris, you're too polite -- you have my permission to do absolutely anything for this bug. Paul
I posted to www-style about behavior changes that would remove the O(N^2) aspect of this: http://lists.w3.org/Archives/Public/www-style/2006Sep/0127
> Boris, you're too polite Actually, I've just been burned by legal before; those testcases are copyrighted by you, so without your permission we can't really attach them to the bug, put them in our regression suites (which are licensed under the MPL), etc. ;)
Attached file Er, the real zipped-up testcases (deleted) —
Attachment #238715 - Attachment is obsolete: true
Attachment #238716 - Attachment is patch: false
Attachment #238716 - Attachment mime type: text/plain → application/zip
Whiteboard: jar:https://bugzilla.mozilla.org/attachment.cgi?id=238716!/index.html
>Optimizing the colspan part would be harder, except to just skip the >colspan part entirely when there are no column-spanning cells. The trunk code has this full optimization it creates a list of colspanning cells and then only works on those cells.
I tested the b6a.html testcase again. Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.2pre) Gecko/20070111 SeaMonkey/1.1 needs 148.75s to load the page and hangs when trying to close the tab for over a minute, consuming 100% of cpu time. Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a2pre) Gecko/2007012210 SeaMonkey/1.5a aka Suiterunner including patches for bug 352394 and bug 352461 needs 79.95s and doesn't show the hang like SeaMonkey 1.1 does (it only uses ~23% of CPU for 2 seconds). IE7 needs 1.4s according to the testcase. So the situation is much better than it was in 1.8 branch, but IE7 is still far away.
Out of curiousity, what number does IE show if you move the var t2 = new Date(); t2 = t2.getTime(); alert("Elapsed time for load: " + ((t2 - t1) / 1000) + " seconds."); into an onload handler? So just wrap those lines in: window.onload = function() { ... } I am, to be honest, finding that 1.4s number a little hard to believe. ;)
(In reply to comment #19) > I am, to be honest, finding that 1.4s number a little hard to believe. ;) You're absolutely right ;-) With the change IE7 needs about 5 seconds.
Heh. OK, lots more work to do, clearly.
Boris how did the profile data changed after the reflow branch + celliterator patch landing?
Attachment #256768 - Attachment is patch: false
Attachment #256768 - Attachment mime type: text/plain → application/x-bzip2
So we spend about 10% of the total time just iterating over mutation observers on the document. I kinda wonder why... I'll look into that. Total hit count: 2774939 1666867 under PresShell::ProcessReflowCommands 933358 under nsParser::ResumeParse (plus frame construction) Of that 933358 number, 413958 hits are under editor init (or rather the editor's SetValue), and 211714 are under nsPlaintextEditor::Init and nsPlaintextEditor::SetWrapWidth. So that's the bug 221820 depency. Under reflow, we have 1595257 hits under ViewportFrame::Reflow. Of those, 1562182 are under nsTableOuterFrame::Reflow. Of these, 355427 are under BasicTableLayoutStrategy::ComputeIntrinsicWidths. This breaks down as: 265733 GetCellWidthInfo(nsIRenderingContext*, nsTableCellFrame*) 71429 nsCellMapColumnIterator::GetNextFrame(int*, int*) 2416 nsTableColFrame::AddMinCoord(int) 1838 memcpy The time under nsCellMapColumnIterator::GetNextFrame breaks down as: 21588 nsTableCellFrame::GetRowSpan() 11665 nsCellMap::GetEffectiveColSpan(nsTableCellMap const&, int, int, int&) 10884 nsTPtrArray::SafeElementAt(unsigned int) const 1840 nsTArray >::operator[](unsigned int) const 1323 nsStyleContext::GetPseudoType() const I suppose we could try only getting the rowspan if we're looking at a rowspanning celldata... not sure. We're calling nsFrame::IntrinsicWidthOffsets what looks like three separate times in GetWidthInfo -- once from GetMinWidth, once from GetPrefWidth, and once directly. Total time spent under nsFrame::IntrinsicWidthOffsets is 109737. I filed bug 372084 on that. Nothing else is jumping out at me right now, and it's a bit late at night, so... ;)
Depends on: 372084
Filed bug 372086 on the observer/range stuff.
Depends on: 372086
(In reply to comment #24) > Nothing else is jumping out at me right now, and it's a bit late at night, > so... ;) Someone obviously found something ;-) Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a7pre) Gecko/200707080909 SeaMonkey/2.0a1pre only needs about 18 seconds for b6a.html (latest Minefield shows the same order of magnitude). IE7 is still faster with about 5 seconds, but we get closer.
That's a lot longer than the time reported for IE7 in comment 18. Are you using the same hardware? Also, keep in mind that the time the testcase reports as-is is somewhat bogus. At least if you want to include the time needed to lay out the page.
(In reply to comment #27) > That's a lot longer than the time reported for IE7 in comment 18. Are you > using the same hardware? The 1.4s from comment 18 were without the change in the testcase you suggested in comment 19. With the change the time was and still is 5s for IE7 (tested on a Athlon64 @ 1.4GHz and Windows XP2).
Ah, ok. Interesting. I wonder when it fell. You want to try tracking that down? I don't like things getting faster for no reason -- it usually indicates that bugs got introduced. ;)
(In reply to comment #29) > Ah, ok. Interesting. > > I wonder when it fell. You want to try tracking that down? I don't like > things getting faster for no reason -- it usually indicates that bugs got > introduced. ;) I made a quick test with the Gran Paradiso Alpha releases and the biggest drop occured somewhen between Gran Paradiso Alpha2 (~55 sec) and Alpha 3 (~21 sec). Maybe it's bug 372086? All other fixed bugs in the depends list should already be part of Alpha2. I'll compile a new suiterunner build without the patch for bug 372086 to verify this (probably until the end of the week).
I suppose that on faster hardware than mine that bug could have a big effect... Note that you don't need to compile to test older builds; you can just download them from archive.mozilla.org.
I tested with some builds from archive.mozilla.org Gecko/20070301 SeaMonkey/1.5a: 24.1 sec Gecko/20070228 SeaMonkey/1.5a: 62.5 sec Gecko/20070301 Minefield/3.0a3pre: 22.1 sec Gecko/20070228 Minefield/3.0a3pre: 62.9 sec There was a big performance boost from 02-28 to 03-01. So it's probably really bug 372086, but maybe bug 367650 also has an effect. If you want I can compile a build with one of those backed out.
Nah. In that range, bug 372086 is the only thing that could have really helped. The percentage difference might be due to me not profiling the whole load, since that part would have been O(N^2) over the total pageload. Well, we should fix bug 221820 if we can, and see where we go from there.
No activity on this bug for over a year. It's a shame. Such table/input box combinations are common in the business world. And since there's no movement here, businesses that have such pages will continue to use IE, because: IE loads test case b6a.html in 0.719 seconds. Firefox 3.51 loads it in 16.615 seconds. Yes, the most recent release version of Firefox still takes more than 20 times as long to load the page. Also, after loading the page, clicking "About Mozilla Firefox" causes the browser to hang for several minutes. This is on Windows XP. Please, someone who might have the ability to fix this, take another look at this bug. Thanks.
Paul, there's certainly been activity on fixing the bugs that block the bugs that block this bug. There isn't a single issue to "fix" here; just a number of different hard architectural changes to make..
Here's our timing improvements with bug 221820: test | Before bug 221820 | After bug 221820 b3 | 0.204 | 0.006 b4 | 0.309 | 0.053 b5 | 0.220 | 0.021 b6 | 0.347 | 0.352 b6a | 14.916 | 11.745 b3a | 0.048 | 0.013 b3b | 0.114 | 0.015 b3c | 0.390 | 0.008 b3d | 1.061 | 0.016 b3e | 3.804 | 0.045 b3f | 7.348 | 0.754 b3g | 12.691 | 1.589 b3h | 19.831 | 1.850 b3i | 103.132 | 4.438 I'll shark b6a on a build with bug 221820 to see what's happening there.
For b6a, we spend 75.6% of the time reflowing, of which 61.6% is spent in nsTableOuterFrame::Reflow, 51.2% in nsTableRowFrame::Reflow, and 33.7% in nsTableCellFrame::Reflow.
Quick update/check on b6a.html (using reload several times since results are not very consistent). Note: I just used some existing instances, so this isn't a very scientific test. Machine 1 (Ubuntu 8.04, AMD x2) 3.6.14 on a browser with lots of tabs: 22-25 seconds Trunk as of a week or so ago: Typically 0.3-0.6s, sometimes 1-1.5s, once as long as 5s Machine 2 (WinXP, AMD x4 965 (fairly fast), 4GB 4.0.1 on a browser with ~440 tabs (really), 1GB VSS size: 7-22 seconds (varies wildly though entire system is on 5-8% loaded before I click reload). IE 8.0.6001: 1.95-2.05s So, I'd say that 3.6 and 4.0 are at least comparable, but something has made a huge improvement in Trunk (and it's way faster than IE now, at least most of the time). Still confused about the high variation even in the trunk build. I think it would be useful for someone to do a real retest of this bug suite, and we may well (finally) be able to close this bug out. A jprof on Trunk shows 46% of time was spent in Reflow. ~9.5% in nsCSSFrameConstructor::ProcessPendingRestyles(), and 37% in nsLayoutUtils::PaintFrame().
That's ... odd. On a Mac, I see us loading the page in about 0.2s in Firefox 4 and on trunk. Can you test in 4.0 without the huge session? ;) Also, make sure that you have the HTML5 parser enabled in your Firefox 4?
On a WinXP VM (2 processors) on a Core i7-920 6GB (i.e. fast), FF4.01 with nothing else open loads in 0.18 to 1.2 seconds. Note this is an unusually broad range, with a strong preference for 0.18-0.25s. On FF 3.6.12 on Fedora 11 (native) on a core i7-920, it loads in 12-15 seconds. Re-ran it on the hundreds-of-(unloaded)-tabs system - 4 runs in the 11-14 second range, one of 0.27s. Yes, html5 is enabled. Really, really odd (though perhaps not primarily this). (This is an AMDx4 965, 3GB ram, WinXP) Killed it, started a new instance on a new profile. Load time is either 5-7 seconds or 11-12 seconds (out of a dozen or so reloads). The wide variation in load times (both on trunk and FF4, and even more-so on the lots-of-tabs FF4) makes me wonder is there's something timing-related interacting with incremental reflow/coalescing.
The dichotomy makes me think *strongly* of the 0.250 second incremental delay that used to be configed via nglayout.initialwait.delay. I.e. if it builds the page in less than ~0.250 seconds, poof it all comes in at once, but if it misses that and starts incremental reflow, watch out, it'll suck. So Trunk might also not really be better - hard to tell without either running it on a slower machine or doing some variant comparisons.
Sorry, I meant nglayout.initialpaint.delay
Or turning off the paint suppression thing and seeing what the numbers look like, right?
In particular, on PresShell.cpp, change this bit: 2773 // For printing, we just immediately unsuppress. 2774 if (!mPresContext->IsPaginated()) { to say |if (1)| instead.
Er, I meant |if (0)|.
if (0): Still wide variation, though smaller range: Most are 2.1-2.5 seconds, with an occasional 0.6 or 0.3. (AMDx2 CPU, Ubuntu) I also tried larger values for nglayout.initialpaint.delay in this build, and on 4.01/WinXP - it seemed to have no impact on total time or the wide variation. if (!mPresContext->IsPaginated()): Most 0.9 to 1.5 seconds, with outliers at 0.6 and 2 seconds. So it seems to be a bit slower with if (0), but still with a lot of variation.
With normal reflow timers, across a bunch of reloads with starting/stopping a jprof at start and end of b6a.html (where it grabs the time for the alerts). These were generally in the 1.3-1.5s range - which is not bad at all, and way better than 3.x, though the variation is somewhat weird/unsettling (why is it sometimes circa 0.3s?). 832 (60.9%) nsRefreshDriver::Notify(nsITimer*) 1 (0.1%) PresShell::DispatchSynthMouseMove(nsGUIEvent*, int) 71728 0 (0.0%) 833 (60.9%) PresShell::FlushPendingNotifications(mozFlushType) 540 (39.5%) PresShell::ProcessReflowCommands(int) 292 (21.4%) nsCSSFrameConstructor::CreateNeededFrames() 1 (0.1%) nsViewManager::EndUpdateViewBatch(unsigned int) Almost all of the rest is in the html5 parser: 1370 (100.3%) NS_ProcessNextEvent_P(nsIThread*, int) 163049 0 (0.0%) 1365 (99.9%) nsThread::ProcessNextEvent(int, int*) 873 (63.9%) nsTimerEvent::Run() 255 (18.7%) nsHtml5ExecutorFlusher::Run() 142 (10.4%) nsHtml5StreamParserContinuation::Run() 57 (4.2%) nsHtml5ExecutorReflusher::Run() 22 (1.6%) nsHtml5DataAvailable::Run() 14 (1.0%) nsBaseAppShell::OnProcessNextEvent(nsIThreadInternal*, int, unsigned int) 5 (0.4%) nsProxyObjectCallInfo::Run() 2 (0.1%) nsSynthMouseMoveEvent::Run()
I cloned b6a.html to be 8 times as long as a test. Time went from .6-1.5s to 110-170s, so it looks like something on the order of O(n^2). (Of course, it's starting from a much better place than 3.x and before did.) Looking over the jprof (which I'll attach here), the first thing that jumps out at me is that sPaintSuppressionCallback() went from 3% to almost 30%. 6533 (29.2%) nsTimerImpl::Fire() 71617 0 (0.0%) 6533 (29.2%) PresShell::sPaintSuppressionCallback(nsITimer*, void*) 6533 (29.2%) PresShell::UnsuppressAndInvalidate() 1/2 of which is: 3206 (14.3%) nsFrame::DestroyFrom(nsIFrame*) 71459 3206 (14.3%) 3206 (14.3%) PresShell::ClearFrameRefs(nsIFrame*) and the other 1/2 of which is: 3069 (13.7%) nsWeakFrame::InitInternal(nsIFrame*) 71445 3069 (13.7%) 3069 (13.7%) nsIPresShell::RemoveWeakFrameInternal(nsWeakFrame*) Another hotspot is nsRegion::Or() and SubRect() and InsertInPlace(): 2228 (10.0%) mozilla::FrameLayerBuilder::InvalidateThebesLayerContents(nsIFrame*, nsRect const&) 2 (0.0%) ViewportFrame::InvalidateInternal(nsRect const&, int, int, nsIFrame*, unsigned int) 69587 465 (2.0%) 2230 (9.9%) nsRegion::Or(nsRegion const&, nsRect const&) 1649 (7.4%) nsRegion::SubRect(nsRegion::nsRectFast const&, nsRegion&, nsRegion&) const 95 (0.4%) nsRegion::InsertInPlace(nsRegion::RgnRect*, int) 14 (0.1%) __i686.get_pc_thunk.bx 7 (0.0%) nsRegion::Copy(nsRegion const&) There are more, but I'll highlight one more: nsPresContext::NotifyInvalidation() which calls MayHavePaintEventListener(): 2518 (11.3%) nsPresContext::NotifyInvalidation(nsRect const&, unsigned int) 939 (4.2%) MayHavePaintEventListener(nsPIDOMWindow*) (self) 2 (0.0%) ViewportFrame::InvalidateInternal(nsRect const&, int, int, nsIFrame*, unsigned int) 2 (0.0%) nsHTMLScrollFrame::InvalidateInternal(nsRect const&, int, int, nsIFrame*, unsigned int) 71335 291 (1.3%) 2522 (11.2%) MayHavePaintEventListener(nsPIDOMWindow*) 939 (4.2%) MayHavePaintEventListener(nsPIDOMWindow*) (self) 920 (4.1%) nsCOMPtr_base::assign_from_qi(nsQueryInterface, nsID const&) 633 (2.8%) nsCOMPtr_base::~nsCOMPtr_base() 244 (1.1%) nsCOMPtr_base::assign_with_AddRef(nsISupports*) 64 (0.3%) non-virtual thunk to nsWindowRoot::GetListenerManager(int) 58 (0.3%) nsXULElement::Release() 47 (0.2%) non-virtual thunk to nsWindowRoot::QueryInterface(nsID const&, void**) 40 (0.2%) non-virtual thunk to nsWindowRoot::Release() 34 (0.2%) nsInProcessTabChildGlobal::QueryInterface(nsID const&, void**) 29 (0.1%) nsWindowRoot::GetListenerManager(int) 26 (0.1%) non-virtual thunk to nsInProcessTabChildGlobal::Release() 22 (0.1%) __i686.get_pc_thunk.bx 19 (0.1%) nsInProcessTabChildGlobal::Release() 18 (0.1%) nsDOMEventTargetHelper::GetListenerManager(int) 17 (0.1%) nsXULElement::AddRef() 15 (0.1%) nsWindowRoot::GetParentTarget() 14 (0.1%) nsInProcessTabChildGlobal::GetOwnerContent() 13 (0.1%) nsWindowRoot::QueryInterface(nsID const&, void**) 5 (0.0%) non-virtual thunk to nsWindowRoot::GetParentTarget() 4 (0.0%) nsGenericElement::Release() 4 (0.0%) nsGenericElement::AddRef() 3 (0.0%) nsWindowRoot::Release() 2 (0.0%) nsDOMEventTargetHelper::Release() All together, those routines I've highlighted account for approaching 50% or more of the time, and are probably all execute or are executed at an O(n log(n)) or worse. Comparisons to smaller versions of this table will help show the rate of increase, and I'm going to start looking at some of this code.
Attached file jprof of b6b.html (deleted) —
b6b.html is b6a.html with the table rows cloned 8 times, and jprof start/stop in the scripts that capture and alert() the load time. I'll upload a compressed copy of it. This is two runs; one took ~110s, the other ~170s on an AMDx2 that was a reasonably fast machine maybe 3 years ago; Ubuntu 8.04.
Attached file b6a (updated for jprof) and b6b tests (deleted) —
Note that b6b.html is purposely huge to expose any O(n log(n)) or O(n^2) issues
Revise title as FF4 and especially trunk are now on the order of the same as IE or faster for smaller to moderately large tables, before O(n log(n)) or worse kicks in.
Summary: Firefox renders large tables with forms MUCH slower than IE (100x slower) → Firefox loads large tables with forms MUCH slower as they grow in size
The sPaintSuppressionCallback() time is entirely due to prevViewer->Destroy() in DocumentViewerImpl::Show(). The routines eating the time are spending all of it walking the mWeakFrames list backwards (getPreviousWeakFrame()) and removing them. One removes and exits; the other removes and keeps searching for more refs (ClearFrameRefs). It's interesting they use almost identical amounts of time; this implies to me that they're searching a long list from the end for a node at the start...
Ok; I know what's up with sPaintSuppressionCallback(): it ends up calling nsPresContext::EnsureVisible(), which does a Show() on the document. That destroys the previous viewer, if any, leading to the ClearFrameRefs() and RemoveWeakFrameInternal() calls. The weakframes are only getting cleared when you reload or leave the page (and it exits the cache and gets destroyed). So the first load is faster than reload (by a lot). And if you navigate away at some point (soon) it will be slow for no apparent reason. ClearFrameRefs() appears to never be finding the frames (at least in this testcase), and in any case it will always walk the entire list. RemoveWeakFrameInternal() is removing them but because the list exists only via a tail pointer and prev links, the weak references removed are always at the wrong end of the list (so in this case it has to walk 45000 items to find one to remove, then 44999 items, ad nauseum - for a total number of node walks of circa 2 billion.) FYI, b6a.html has around 5600 items, for around 32 million node walks to do a reload or navigate away. Still a lot. Reversing the list (or reversing the order of destruction) would cut that in half, but ClearFrameRefs() still will walk them all. Converting to hash would help enormously, with some expense to smaller pages. At some runtime cost you could change to a hash above a threshold. It might be possible to find a better place to store the weak refs such that they aren't all in one giant list.
There may VERY well be a subtlety I'm missing, but couldn't we just move mWeakFrames from the PresShell to the Frame? Typically each frame will have 0 or maybe 1 weak reference. It costs 1 pointer per active frame object but removes two O(n^2) functions. (I have to admit I found the storing of the WeakReferences in the PresShell odd...) I've implemented this (not cleaned up yet) and a cursory test seems to work fine, and PresShell::sPaintSuppressionCallback(nsITimer*, void*) dropped from about 30% to about 1.6% in the b6b.html test - and none of that in the WeakFrame code. I may upload a not-ready-for-checkin patch for people to look at/play with. Overall, though it was a pretty clean and easy change.
Taking bug. I should note that this weakFrame change doesn't remove all O(n log(n)) or O(n^2) effects in this testcase - it helps but does not solve the problem. The weakFrame stuff is just the tallest nail (and will probably help a number of other large-page perf bugs).
Assignee: nobody → rjesup
Attached file jprof with the weakFrame -> nsIFrame change (obsolete) (deleted) —
Updated jprof - 85% is now in Reflow. 14.5% in nsRegion::Or() alone...
Attached patch weakFrame patch - rough - NOT ready for checkin (obsolete) (deleted) — Splinter Review
For people to look at the approach I'm taking on weakFrame references - note I still have stuff #if'd out, etc.
OK, so in order: > if (0): Still wide variation, though smaller range: Most are 2.1-2.5 seconds, > with an occasional 0.6 or 0.3. OK, so no more 12-15 second things? Good. ;) > though the variation is somewhat weird/unsettling (why is it sometimes circa > 0.3s? Indeed. > Another hotspot is nsRegion::Or() and SubRect() and InsertInPlace(): Ah, invalidation.... See bug 539356.
Depends on: dlbi
Oh, I meant to add. Please do one bug blocking this bug per issue you find. That means as separate bug for the weakframe stuff. Keep this bug to analysis. That said... why are there 45000 weakframes around? > It costs 1 pointer per active frame object That's not so great, actually, for what should be a rare case (needing an nsWeakFrame). Let's follow up on all this in said separate weakframe bug.
Depends on: 655084
nsWeakFrames split off as bug 655084. This bug continues since even with that handled, there are still serious O(n^2) overheads left in Reflow, and also the inconsistency of loadtimes is concerning even in smaller tables. Note that I suspect VERY large tables are likely to be at least O(n log(n)) to layout, I suspect, even under good assumptions. And we should re-verify how fast we degrade as size goes up (more data points) against IE and Chrome, etc.
With bug 655084 fixed (soon), we can look at what remains here. This is load and reload of b6b.html Biggest hit is now nsRegion::Or() at 12.7% inclusive, and nsTableFrame::InvalidateFrame() stands out at 26.3% inclusive (probably including Or()).
Attachment #530234 - Attachment is obsolete: true
Attachment #530235 - Attachment is obsolete: true

Moving open bugs with topperf keyword to triage queue so they can be reassessed for performance priority.

Performance Impact: --- → ?
Keywords: topperf

:jesup could you provide an update here, and set the performance project flag appropriately?

Flags: needinfo?(rjesup)

I checked. The time seem to be linear, though far faster (O(1000-5000)). Not an issue any more.

Status: NEW → RESOLVED
Closed: 2 years ago
Flags: needinfo?(rjesup)
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: