Open Bug 575614 Opened 14 years ago Updated 2 years ago

{inc} Resizing window is slow for long list with multicol (from "-moz-column-count: 3" or "column-width: 20em;")

Categories

(Core :: Layout: Columns, defect)

defect

Tracking

()

Performance Impact low

People

(Reporter: mdinger.bugzilla, Unassigned)

References

(Blocks 2 open bugs, )

Details

(Keywords: perf, perf:responsiveness, testcase, Whiteboard: [layout:backlog])

Attachments

(4 files)

Window resizing for the French wikipedia page is much slower than the same page in English.  It is a problem on 1.9.2 and trunk.


STR:
1. Open French page:
http://fr.wikipedia.org/wiki/Robert_Browning
2. Resize window using upper left corner of the screen.

Result:
Window resize is much slower than the same page in English.

Expected result:
Both English and French pages should resize similarly and quickly.


French page:
http://fr.wikipedia.org/wiki/Robert_Browning

English page:
http://en.wikipedia.org/wiki/Robert_Browning


Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.6) Gecko/20100625 Firefox/3.6.6

Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.3a6pre) Gecko/20100627 Minefield/3.7a6pre
Keywords: testcase-wanted
Attached file Reduced testcase (deleted) —
Component: General → Layout
Product: Firefox → Core
QA Contact: general → layout
Summary: Wikipedia page in french resizes much slower than in english → Resizing window is slow if long list with style="-moz-column-count: 3;"
1.9.0 and 1.9.1 both show the problem.


Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.0.19pre) Gecko/2010021606 GranParadiso/3.0.19pre

Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.11pre) Gecko/20100608 Shiretoko/3.5.11pre
Attachment #454866 - Attachment mime type: text/html → text/html; charset=iso-8859-1
> Both English and French pages should resize similarly and quickly.

I'll try to profile in the next few days, but it's not obvious to me why those are the expected results.  The pages may have different amounts of text, but more importantly non-ASCII characters can require slower text rendering APIs to be used on many OSes (including Windows).  So while it would be _good_ if the pages both resized fast, rendering text in French is somewhat to a lot slower than doing it in English, on average.
Attached file Faster testcase (deleted) —
This seems much faster: same as Reduced testcase without the div element.  Granted a list should be simpler than a multi-dimensional list.

I did notice the two pages were quite different after I posted...
Oh, yes.  Not having the huge columnset would make a giant difference, since you don't have to do the height-balacing dance on width change that columnsets have to do.

Still worth profiling, but fundamentally columns are an advanced typographic feature that's _very_ computationally expensive to do well...
Whiteboard: DUPEME
roc, do we have existing bugs about trying to avoid doing full dirty reflows of column kids when doing rebalancing?  If so, this is almost certainly dependent on them....
Attached file Profile of Reduced testcase (deleted) —
AMD CodeAnalyst profile on the following build for Reduced testcase.

Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.3a6pre) Gecko/20100627 Minefield/3.7a6pre
Yep, definitely looks like the dirty reflows are what's going on.
We do try to avoid full dirty reflows.

I assme "if (reflowNext) child->AddStateBits(NS_FRAME_IS_DIRTY)" is triggering the setting of the dirty bit. In that case, we could try to refine the meaning of NS_FRAME_REFLOW_NEXTINFLOW so that it's only asking us to reflow the frames which have prev-in-flows in the previous column.

I don't know how much this would help, though. Especially during the early phases of the binary search, we're moving a lot of lines between columns and those lines all need to be reflowed unless we do even more tricky optimizations...
Hmm.  And when we move the line we have to blow away intrinsic widths, textrun info, and the like, I guess?
We actually don't blow away textruns. Matt's profile doesn't show us doing textrun construction. We don't cache intrinsic widths for text frames.
Hmm...  The profile shows us doing gfxTextRun::BreakAndMeasureText; we don't cache the results of that?  Also the nsFontCache::GetMetricsFor, etc....  Maybe we just don't cache this stuff and I thought we did?

Since we'd doing a width change we reflow all lines no matter what; I guess the point is that we do a _lot_ of reflows of all lines here due to the balancing?  I'd be interested in knowing how many balance passes we actually end up doing here.
BreakAndMeasureText gets called a lot because it's the core of text reflow, but I'm not sure how we could cache the results of that other than just by not reflowing the frame. nsFontCache::GetMetricsFor could be sped up but basically we're getting a cached object at the end. I think this profile just boils down to "we're reflowing a lot of lines a lot".
(BreakAndMeasureText basically just says "how many characters, starting a given offset in the text run, fit into a given width?" and answers that by adding up widths that are cached in the textrun.)
And I guess we do need to reflow when pushing across the boundary, due to floats, right?  Or are there other things that require us to mark the moved lines dirty?
I'm not sure. If the new container had a different available width, that would definitely be a problem. That can't happen with CSS3 columns currently, but we don't rely on it.

Not dirtying lines that get transferred to an overflow list and then pulled into a next-in-flow could be done. It would just be tricky and fragile.
Did you determine if this bug is valid? Should it be marked WONTFIX or INVALID?  My guess from the DUPEME is that it is valid.
It's certainly valid in the sense that we should try to do this faster.
Ok good.
Keywords: perf
Blocks: 698783
Ehsan, this might be worth considering for QF triage.  Not sure how many pages this is biting in practice, but even just Wikipedia is common enough...
Flags: needinfo?(ehsan)
Yeah, agreed, thanks for raising this!  Let's see if the layout team can take a look at this.
Blocks: FastReflows
Flags: needinfo?(ehsan) → needinfo?(bugs)
Whiteboard: DUPEME → [qf:p1]
To Neerja for a look while she's in multicol:span
Flags: needinfo?(bugs) → needinfo?(npancholi)
Assignee: nobody → npancholi
Flags: needinfo?(npancholi)
Profile: https://perfht.ml/2qoJtZi
I profiled this and also spoke with bz on IRC and I'm copying his thoughts on what might be the best way to proceed - 

So what's needed here is some thought on how we can optimize this process
Possible thoughts:
1) Do something smarter than binary search to find the height.  Especially in cases when we're doing an incremental reflow anyway, which is the resizing case
In that it seems like often enough the right answer will be pretty much the old answer..
2) Cache more something somewhere.  I'm not sure what we should cache where in this case.
A good place to start here would be to analyze what actually happens on a horizontal resize on the testcase in question
e.g. how many steps the binary search takes, whether we end up hitting the child->AddStateBits(NS_FRAME_IS_DIRTY) line, whether skipping that line actually helps performance, etc
dbaron says his forthcoming patches in bug 1308876 might help here, because it'll adjust how/when dirty flags are set & how many dirty reflows we do.

It'd be worth taking a look to see if this gets better after that lands. In the meantime, we're going to prioritize this as [qf:p3].
Whiteboard: [qf:p1] → [qf:p3]
One other thing I talked with roc about in the past (and couldn't really convince him of the value of) is that we could add an additional pagination mode that has the semantics of "break at the first breaking opportunity *after* height X" in addition to the current "break at the last opportunity *before* height X".  This would then let us do cheaply-balanced columns in just two passes (one pass to find the full height, and then one to split it up into columns, probably a little bit short in the last column).
FWIW, the French wikipedia page from comment 0 does still have "column-count:3" styling on its footnotes, but other also-slow-to-resize wikipedia pages (e.g. https://en.wikipedia.org/wiki/United_States ) seem to have "column-width: 20em" instead (and can end up with 4 columns or more depending on your window size).

So Wikipedia uses different styles for triggering multicol, it seems. In any case, both varieties are slow in Firefox Nightly right now. --> Broadening description slightly & adding {inc} to tag this as an incremental-layout bug.
Summary: Resizing window is slow if long list with style="-moz-column-count: 3;" → {inc} Resizing window is slow for long list with multicol (from "-moz-column-count: 3" or "column-width: 20em;")
Adding a summary of past discussions with Roc, Dbaron and Bz for future reference -

In my opinion, making the binary search faster would not be the most productive way to get noticeable improvements on this so I feel like the approach proposed by Roc might be the best because it preserves the binary search in case we need it for more correct results and proposes a shorter, faster layout path in column balancing cases where we can safely assume we don't need the binary search.

Example, say in the case of Wikipedia if we are laying out text only and assuming that we must balance this across columns then our algorithm would broadly be -
 1. Determine the column-count and/or column-width as we do now (on the basis of '3.4. Pseudo-algorithm' in the spec or what the nsColumnSetFrame::ChooseColumnStrategy method does)
 2. Check the content to see if it is text only.
 3. If yes -
    3.1. Layout all content in one column with the column-width above and let it's height grow unbounded. This gives us the total height of the content. 
    3.2. Now divide the total height by column-count and get column height (rounded to the next largest whole number)
    3.3. Use this column height to layout the multicol element. If there still is overflow, then abandon this approach and try   binary search. We are ok with some underflow as there probably will not be much here. 
 4. If no, perform binary search as normal. 

 Note: This approach would only be faster than the binary search if -
 1. We can say with some certainty that most affected cases of multicol use text only. (Maybe we can get some answers to this using Telemetry? Not sure if this is worth the effort.) We're considering text only here because we can safely predict that laying out text in one column *with fixed width* would break at about the same places as if this column was broken into multiple shorter columns of the same width. 
 2. Step 2. above i.e. checking content to see if it is text only is faster than the binary search itself. 

**One caveat might be that rounding to the next largest whole number in #3.2. might not be enough or unsightly once rendered.
Needinfoing Daniel in case he wants to add something here -
Flags: needinfo?(dholbert)
One correction from the conversation with me: to avoid overflow, you really have to round up to the nearest multiple of line-height or something, not the nearest whole number.  Otherwise you always end up with at least one line not fitting at the end.
(In reply to Neerja Pancholi[:neerja] from comment #35)
> Needinfoing Daniel in case he wants to add something here -

(I was discussing this with Neerja & was thinking we should bump this down from [qf:p1] -- hence this needinfo -- but then I realized that it's already rated [qf:p3] which I think is accurate.  Not high-priority for Firefox 57, but worth fixing when time/priorities allow.)
Flags: needinfo?(dholbert)
Unassigned myself in case someone else wants to pick up where I left off.
Assignee: npancholi → nobody
Component: Layout → Layout: Columns
Anything we can do to help? Would love if this 9 years old bug could be fixed :) Is this still a problem?
Yes, this is still a problem, though other priorities have kept it from getting recent attention. It's on our radar as a layout performance issue, though.

(Thanks for the offer of help, but I'm not aware of any help needed at this point -- we already have reduced testcases & known-affected URLs.  Really, this bug needs attention from an experienced gecko layout hacker, and there are a limited set of such people and many bugs jockeying for their attention, so it's largely a question of prioritization based on relative amounts of user pain & similar factors.)
OS: Windows 7 → All
Hardware: x86 → All
Whiteboard: [qf:p3] → [qf:p3:responsiveness]

Some other thoughts on possible solutions:

A. comment 34 (so we don't forget)

B. A proposal I had in the past: have a breaking mode that breaks at the first breaking point after a given height, and use that mode to do slightly-imperfect column balancing (leaving the last column a little short)

C. (more work) Add a mode that records all the break points (and how they add/remove height at the break) so that we can then do balancing on the entire flow without running layout more than once. This requires a bunch of code to verify that this code works correctly...

No longer blocks: 698783
Whiteboard: [qf:p3:responsiveness] → [qf:p3:responsiveness][layout:backlog]
Performance Impact: --- → P3
Whiteboard: [qf:p3:responsiveness][layout:backlog] → [layout:backlog]
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: