Closed Bug 470978 Opened 16 years ago Closed 16 years ago

Hang [@ nsSplittableFrame::GetPrevInFlow] on print preview with table cell with large height

Categories

(Core :: Layout, defect)

defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla1.9.2a1

People

(Reporter: martijn.martijn, Assigned: MatsPalmgren_bugz)

Details

(Keywords: hang, regression, testcase)

Attachments

(5 files, 4 obsolete files)

Attached file testcase (deleted) β€”
I seem to hang in this case when using a table cell with a large height.
I don't hang with it in Firefox 2, so it seems like a regression.

Stack of debug build of hang:
>	gklayout.dll!nsSplittableFrame::GetPrevInFlow()  Line 155 + 0x1f bytes	C++
 	gklayout.dll!nsSplittableFrame::GetFirstInFlow()  Line 186 + 0x8 bytes	C++
 	gklayout.dll!nsHTMLReflowState::InitResizeFlags(nsPresContext * aPresContext=0x042ab3a0)  Line 373 + 0x4c bytes	C++
 	gklayout.dll!nsHTMLReflowState::Init(nsPresContext * aPresContext=0x042ab3a0, int aContainingBlockWidth=-1, int aContainingBlockHeight=-1, const nsMargin * aBorder=0x00000000, const nsMargin * aPadding=0x00000000)  Line 296	C++
 	gklayout.dll!nsTableRowFrame::InitChildReflowState(nsPresContext & aPresContext={...}, const nsSize & aAvailSize={...}, int aBorderCollapse=0, nsTableCellReflowState & aReflowState={...})  Line 106	C++
 	gklayout.dll!nsTableRowFrame::ReflowChildren(nsPresContext * aPresContext=0x042ab3a0, nsHTMLReflowMetrics & aDesiredSize={...}, const nsHTMLReflowState & aReflowState={...}, nsTableFrame & aTableFrame={...}, unsigned int & aStatus=0)  Line 916	C++
 	gklayout.dll!nsTableRowFrame::Reflow(nsPresContext * aPresContext=0x042ab3a0, nsHTMLReflowMetrics & aDesiredSize={...}, const nsHTMLReflowState & aReflowState={...}, unsigned int & aStatus=0)  Line 1074 + 0x25 bytes	C++
 	gklayout.dll!nsContainerFrame::ReflowChild(nsIFrame * aKidFrame=0x0527266c, nsPresContext * aPresContext=0x042ab3a0, nsHTMLReflowMetrics & aDesiredSize={...}, const nsHTMLReflowState & aReflowState={...}, int aX=0, int aY=0, unsigned int aFlags=16, unsigned int & aStatus=0, nsOverflowContinuationTracker * aTracker=0x00000000)  Line 793 + 0x21 bytes	C++
 	gklayout.dll!nsTableRowGroupFrame::ReflowChildren(nsPresContext * aPresContext=0x042ab3a0, nsHTMLReflowMetrics & aDesiredSize={...}, nsRowGroupReflowState & aReflowState={...}, unsigned int & aStatus=0, int * aPageBreakBeforeEnd=0x0012b614)  Line 436 + 0x2c bytes	C++
 	gklayout.dll!nsTableRowGroupFrame::Reflow(nsPresContext * aPresContext=0x042ab3a0, nsHTMLReflowMetrics & aDesiredSize={...}, const nsHTMLReflowState & aReflowState={...}, unsigned int & aStatus=0)  Line 1336 + 0x1d bytes	C++
 	gklayout.dll!nsContainerFrame::ReflowChild(nsIFrame * aKidFrame=0x05272770, nsPresContext * aPresContext=0x042ab3a0, nsHTMLReflowMetrics & aDesiredSize={...}, const nsHTMLReflowState & aReflowState={...}, int aX=0, int aY=0, unsigned int aFlags=16, unsigned int & aStatus=0, nsOverflowContinuationTracker * aTracker=0x00000000)  Line 793 + 0x21 bytes	C++
etc...
Attached file complete stack of hang in a debug build (obsolete) (deleted) β€”
Whats going on here? New fuzzer? Christmas spoiling?
Sorry Bernd, it just happened that you and Mats fixed a few of these hangs and that got me motivated to find some more.
WARNING: Overflowed nscoord_MAX in conversion to nscoord: file ../../dist/include/gfx/nsCoord.h, line 339

so we cap the height to nscoord_MAX which leads to the row-group having
a desired height of 13421774 device pixels, which requires 20286 pages
to render.  Creating and reflowing the frame tree for that takes a couple
of minutes, but that's not the real hang.  After reflow there is a reflow
callback, nsGfxScrollFrameInner::ReflowFinished, which sets attributes
http://mxr.mozilla.org/mozilla-central/source/layout/generic/nsGfxScrollFrame.cpp#2396
on the scrollbar content, which leads to GetPrimaryFrameFor() for it,
which leads to nsCSSFrameConstructor::FindFrameWithContent() and this is
where the real problem is...
Component: Layout: Tables → Layout
OS: Windows XP → All
QA Contact: layout.tables → layout
Hardware: x86 → All
Attached file stack + frame tree (deleted) β€”
... this is what the frame tree looks like, note that the content is
"html" for all the frames down to the TableRowFrame.
In FindFrameWithContent we recurse all child frames with the same
content, and all their continuations.  We also process all continuations
of the parent.
http://mxr.mozilla.org/mozilla-central/source/layout/base/nsCSSFrameConstructor.cpp?mark=10784,10793,10833#10765

The child frame continuations will be processed again for every
parent continuation...
Attachment #354337 - Attachment is obsolete: true
... we never find the scrollbar frame because we start at the parent
content's primary frame which is Block(html), not HTMLScroll(html)...
Attached patch WIP (obsolete) (deleted) β€” β€” Splinter Review
I think this should fix the multiple traversal of child continuations.
FWIW, Print Preview takes about 5 minutes (2.66GHz Core2 CPU)
with this patch in an opt build on Linux.
Attached file Frame dump #2 (deleted) β€”
FWIW, the scrollbar we're trying to find a frame for doesn't have
one in the Print Preview context at all.  It's anonymous content
in the Galley context.

(The first frame tree is from the Galley pres context, the second
is from the Print Preview pres context.)

Maybe we could suppress mutation notifications for native anonymous
content in "other" shells?
Attached patch Patch rev. 1 (obsolete) (deleted) β€” β€” Splinter Review
Fixes the frame walk.

It still takes a while to render the testcase but the fix for that is
probably to make the operation interruptable and have a Cancel button
in the dialog... ideally with progress feedback how many pages have
been generated...
Assignee: nobody → mats.palmgren
Attachment #354429 - Attachment is obsolete: true
Attachment #354454 - Flags: superreview?(bzbarsky)
Attachment #354454 - Flags: review?(bzbarsky)
Why do we need the initialParent thing?  If we're going to walk all the continuations, we can always skip the non-first-continuations for recursion purposes, no?

roc should probably look over the placeholder part of this.
Shouldn't we just be looping over all continuations of the initial parent in an outermost loop, and not searching over any continuations in any of the recursive calls?
(In reply to comment #10)
The initialParent thing is for picking up flow chains that crosses the
"starting edge" of this subtree.  If that never occurs then we can do
as you suggest.  I'm not sure if it does or not, especially in overflow
containers.

(In reply to comment #11)
That would work if the invariant "continuing frames should be found
as part of the walk over the top-most frame's continuing frames" holds.
We saw in bug 411835 comment 20 that it does not hold for overflow
containers given the top-most frame to DeletingFrameSubtree().

It probably holds for the top-most frame in this case (the primary frame
for the parent content) though, but I'm not sure.
I will try it and run some tests...
Attached patch Patch rev. 2 (obsolete) (deleted) β€” β€” Splinter Review
What roc suggested in comment 11, plus removing some useless code:
http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/layout/base/nsCSSFrameConstructor.cpp&rev=1.1476&root=/cvsroot&mark=11007-11012,11020-11030#10967
(I'm using CVS blame here since the code is the same and HG blame is
ridiculously slow)

The IsFrameSpecial block is not needed because FindFrameWithContent()
iterates with GetNextContinuationOrSpecialSibling() so the parent's
special siblings are already included.

The 2nd block is for retrying without a hint if we had one, for
text content only.  Again, FindFrameWithContent() already did that,
for all content:
http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/layout/base/nsCSSFrameConstructor.cpp&rev=1.1476&root=/cvsroot&mark=10911-10921#10911

It passes regression tests on Linux.  I also verified that the removed
blocks never find a frame with a printf: running regression tests and
loading Alexa Top100 sites was silent.

Sorry for the big patch but most of it is from moving the parent frame
loop from FindFrameWithContent() to FindPrimaryFrameFor() and the
resulting indentation change.  I tried "hg diff -w" but it didn't
produce a notably smaller patch.
Attachment #354454 - Attachment is obsolete: true
Attachment #354772 - Flags: superreview?(bzbarsky)
Attachment #354772 - Flags: review?(bzbarsky)
Attachment #354454 - Flags: superreview?(bzbarsky)
Attachment #354454 - Flags: review?(bzbarsky)
If you want real whitespace-ignoring diff, as opposed to the inanity that hg has, you can add the following to your .hgrc:

[extensions]
extdiff =

[extdiff]
cmd.wdiff = diff
opts.wdiff = -p -U 8 -w -r -N

Then use "hg wdiff" to get your diffs.  If you can do that for this patch, that would almost certainly make it easier to review...
Attached patch Patch rev. 3 (deleted) β€” β€” Splinter Review
Great!  Using that I found that I left a printf in there...
Attachment #354772 - Attachment is obsolete: true
Attachment #354772 - Flags: superreview?(bzbarsky)
Attachment #354772 - Flags: review?(bzbarsky)
Attached patch Patch rev. 3 (wdiff) (deleted) β€” β€” Splinter Review
Attachment #354775 - Flags: superreview?(bzbarsky)
Attachment #354775 - Flags: review?(bzbarsky)
Comment on attachment 354775 [details] [diff] [review]
Patch rev. 3 (wdiff)

>           // We search the immediate children only, but if the child frame has
>           // the same content pointer as its parent then we need to search its
>           // child frames, too.
>           // We also need to search if the child content is anonymous and scoped
>           // to the parent content.
>           // XXXldb What makes us continue the search once we're inside
>           // the anonymous subtree?
>           if (aParentContent == kidContent ||
>-              (aParentContent && IsBindingAncestor(kidContent, aParentContent))) 
>-          {
>+            (aParentContent && IsBindingAncestor(kidContent, aParentContent))) {

Why remove that aParentContent == kidContent check?  That looks wrong to me, and in particular it looks like it would never find the primary frame for the textnode in this markup:

  <span style="display: table-row">Text</span>

r+sr=me with that check reinstated, and ideally a regression test added for that case.
Attachment #354775 - Flags: superreview?(bzbarsky)
Attachment #354775 - Flags: superreview+
Attachment #354775 - Flags: review?(bzbarsky)
Attachment #354775 - Flags: review+
And my apologies for the lag time.  :(
> Why remove that aParentContent == kidContent check?

I didn't.  I made a hanging brace to match the coding style of the file.
Oh, duh.  I thought there was a '-' on that line for some reason.  In that case, looks great!
Nominating this for blocking1.9.1, because it's purportedly fixes bug 470272 (according to bug 470272 comment 2), and THAT bug...
 -- is blocking1.9.1
 -- is a regression in mozilla-central from bug 465913, which is currently holding up that bug (and others) from landing on 1.9.1, per bug 430332 comment 56

So basically, this is currently holding up a chain of 1.9.1-blockers.
Flags: blocking1.9.1?
Mats -- assuming this gets blocking status, this needs to land ASAP (first on mozilla-central, and then if it passes tests, on 1.9.1 branch).  At that point, if you can check it in yourself or add the "checkin-needed" keyword, that'd be much appreciated!
(In reply to comment #21)
> Nominating this for blocking1.9.1, because it's purportedly fixes bug 470272
> (according to bug 470272 comment 2)

FWIW, I just tried out patch v3, and I can confirm that it fixes bug 470272, on both of that bug's testcases.
Flags: blocking1.9.1? → blocking1.9.1+
Nevermind -- it turns out this no longer needs to block 1.9.1, at least not for the reasons that I requested it, because bug 465913's patch is no longer wanted on 1.9.1.  For reference, see bug 470272 comment 5 and bug 430332 comment 58.
 --> canceling blocking1.9.1 (please re-request if this should block for a different reason)
Flags: blocking1.9.1+
http://hg.mozilla.org/mozilla-central/rev/0f51447d7424

-> FIXED
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla1.9.2a1
The testcase still seems to hang for me in current trunk build. I let it appr. 5 minutes stay in print preview, but I got not indication that it soon would stop hanging.
Print Preview of the testcase takes about 5 minutes (2.66GHz Core2 CPU)
for me in an opt build on x86_64 Linux.  It generates 20284 pages, that is
about 67 pages per second.  It doesn't seem that slow to me.
I filed bug 481540 that request some way to Cancel the operation though.

Martijn, feel free to file a perf bug if you think it's too slow.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: