Closed Bug 393936 Opened 17 years ago Closed 16 years ago

[PATCH] Crash [@ nsCellMapColumnIterator::GetNextFrame] with display: table-footer-group, binding, contenteditable and image

Categories

(Core :: Graphics: ImageLib, defect, P3)

defect

Tracking

()

VERIFIED FIXED
mozilla1.9.2a1

People

(Reporter: martijn.martijn, Assigned: joe)

References

(Blocks 1 open bug)

Details

(5 keywords)

Crash Data

Attachments

(5 files, 5 obsolete files)

Attached file binding needed for testcase (deleted) —
See upcoming testcase, which crash current trunk builds on load or reload. http://crash-stats.mozilla.com/report/index/a03be961-54fc-11dc-92fe-001a4bd43ed6 0 nsCellMapColumnIterator::GetNextFrame(int*, int*) mozilla/layout/tables/nsCellMap.cpp:2801 1 BasicTableLayoutStrategy::ComputeColumnIntrinsicWidths(nsIRenderingContext*) mozilla/layout/tables/BasicTableLayoutStrategy.cpp:295 2 BasicTableLayoutStrategy::ComputeIntrinsicWidths(nsIRenderingContext*) mozilla/layout/tables/BasicTableLayoutStrategy.cpp:576 3 BasicTableLayoutStrategy::GetMinWidth(nsIRenderingContext*) mozilla/layout/tables/BasicTableLayoutStrategy.cpp:69 4 nsTableFrame::TableShrinkWidthToFit(nsIRenderingContext*, int) mozilla/layout/tables/nsTableFrame.cpp:1658 5 nsTableFrame::ComputeAutoSize(nsIRenderingContext*, nsSize, int, nsSize, nsSize, nsSize, int) mozilla/layout/tables/nsTableFrame.cpp:1690 6 nsFrame::ComputeSize(nsIRenderingContext*, nsSize, int, nsSize, nsSize, nsSize, int) mozilla/layout/generic/nsFrame.cpp:2966 7 nsTableFrame::ComputeSize(nsIRenderingContext*, nsSize, int, nsSize, nsSize, nsSize, int) mozilla/layout/forms/nsFieldSetFrame.cpp:429 8 nsHTMLReflowState::InitConstraints(nsPresContext*, int, int, nsMargin const*, nsMargin const*) mozilla/layout/generic/nsHTMLReflowState.cpp:1820 9 nsHTMLReflowState::Init(nsPresContext*, int, int, nsMargin const*, nsMargin const*) mozilla/layout/generic/nsHTMLReflowState.cpp:315 10 nsTableOuterFrame::InitChildReflowState(nsPresContext&, nsHTMLReflowState&) mozilla/layout/tables/nsTableOuterFrame.cpp:468 11 nsTableOuterFrame::OuterReflowChild(nsPresContext*, nsIFrame*, nsHTMLReflowState const&, void*, nsHTMLReflowMetrics&, int, nsSize&, nsMargin&, unsigned int&) mozilla/layout/tables/nsTableOuterFrame.cpp:1133 12 nsTableOuterFrame::Reflow(nsPresContext*, nsHTMLReflowMetrics&, nsHTMLReflowState const&, unsigned int&) mozilla/layout/tables/nsTableOuterFrame.cpp:1255 13 nsBlockReflowContext::ReflowBlock(nsRect const&, int, nsCollapsingMargin&, int, int, nsMargin&, nsLineBox*, nsHTMLReflowState&, unsigned int&) mozilla/layout/generic/nsBlockReflowContext.cpp:338 14 nsBlockFrame::ReflowBlockFrame(nsBlockReflowState&, nsLineList_iterator, int*) mozilla/layout/generic/nsBlockFrame.cpp:2922 etc. This regressed between 2007-06-26 and 2007-06-28. And because the contenteditable attribute is necessary for the crash, I think this is somehow a regression from bug 237964.
Attached file testcase (deleted) —
Flags: blocking1.9?
Attached file stack (deleted) —
###!!! ASSERTION: reflowing in the middle of frame construction
OS: Windows XP → All
Can we make one of the calls on that stack (in the #39 -- #51 range) asynchronous off an event instead?
Uh... Cancel() must not notify anything sync. In particular, the removal from the loadgroup MUST be async per the nsIRequest contract (though we should document this more clearly in nsIRequest.idl).
Component: Layout: Tables → ImageLib
QA Contact: layout.tables → imagelib
Attached patch Possible patch (obsolete) (deleted) — Splinter Review
Compiled, but untested (don't have a fully compiling build at the moment).
So I just tested this, and it doesn't quite work. Unfortunately, mListener is a weak pointer and consumers rely on Cancel() nulling it out. There are at least some cases where the Cancel() call is actually in mListener's destructor. In these cases we really shouldn't be sending OnStopRequest to the listener either. Perhaps there should be a status code that such callsites can use to indicate "cancel and never talk to me again"? Feels hacky, but this whole setup with having to hold the imgIRequest which has a pointer back to us has this little leak problem if all the refs are made strong. :( In any case, with just that patch we crash in OnStartRequest on the testcase because mListener is pointing to deleted memory.
Flags: blocking1.9? → blocking1.9+
Priority: -- → P2
Still crashing for me, using a 2007-11-01 trunk build.
Stuart, would you be OK with either introducing a status code to pass to cancel(), as described in comment 6, or an API which we could use to tell the imgRequestProxy to drop its listener reference instead of overloading cancel()?
Assignee: nobody → pavlov
Flags: wanted1.9.0.x+
Flags: tracking1.9+
Flags: blocking1.9-
Priority: P2 → --
Blocks: 428263
This testcase now triggers ###!!! ASSERTION: Someone forgot to block scripts: 'aIsSafeToFlush == nsContentUtils::IsSafeToRunScript()', file /Users/jruderman/trunk/mozilla/layout/base/nsPresShell.cpp, line 4532 in addition to the three assertions it has triggered for a long time ###!!! ASSERTION: reflowing in the middle of frame construction: 'mPresContext->mLayoutPhaseCount[eLayoutPhase_FrameC] == 0', file ../../dist/include/layout/nsPresContext.h, line 948 ###!!! ASSERTION: Why no rowgroups?: 'mOrigCells == 0', file /Users/jruderman/trunk/mozilla/layout/tables/nsCellMap.h, line 632 ###!!! ASSERTION: Bogus mOrigCells?: 'mCurMapRow < mCurMapRelevantRowCount', file /Users/jruderman/trunk/mozilla/layout/tables/nsCellMap.cpp, line 2928
Still crashing for me, using: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1b1pre) Gecko/20080905031348 Minefield/3.1b1pre
Flags: blocking1.9.1?
Assignee: pavlov → joe
Flags: blocking1.9.1? → blocking1.9.1+
Priority: -- → P1
Since we absolutely must make removal from the loadgroup async, this is an updated version of Boris' patch that includes a new function imgIRequest::CancelAndForgetObserver(). I think I fixed all callers of Cancel() who needed to be CancelAndForgetObserver instead, but there are a couple of calls to SetFrame(nsnull) in nsImageFrame and nsBulletFrame that I can't fix since I don't understand the class hierarchy. Ideally, that would be a simple call to CancelAndForgetObserver in nsImageFrame::Destroy() instead of removing the observer from the image loader and setting the frame to null, but I don't know how things are structured to do that.
Attachment #278728 - Attachment is obsolete: true
Attachment #341971 - Flags: review?(bzbarsky)
Also, I should mention that patch fixes the crash in this bug. I'm running through mochitests and reftests to make sure I haven't broken anything else.
Attached patch Enhanced patch with -U 8 (obsolete) (deleted) — Splinter Review
Attachment #341971 - Attachment is obsolete: true
Attachment #341977 - Flags: review?(bzbarsky)
Attachment #341971 - Flags: review?(bzbarsky)
So as I mentioned on irc, we need to keep the SetFrame(nsnull) calls that are in place now. Just replacing Cancel() with CancelAndForgetObserver() sounds reasonable to me.
Attached patch Re-add SetFrame(nsnull) calls (obsolete) (deleted) — Splinter Review
Attachment #341977 - Attachment is obsolete: true
Attachment #342282 - Flags: superreview?
Attachment #342282 - Flags: review?(bzbarsky)
Attachment #341977 - Flags: review?(bzbarsky)
Attachment #342282 - Flags: superreview? → superreview?(pavlov)
FYI, this passes all mochitests and reftests.
Attachment #342282 - Flags: review?(bzbarsky) → review+
Comment on attachment 342282 [details] [diff] [review] Re-add SetFrame(nsnull) calls >+NS_IMETHODIMP imgRequestProxy::CancelAndForgetObserver(nsresult aStatus) >+ NullOutListener(); >+} return NS_OK? Or return the return value of Cancel()? In any case, as written this won't compile on Windows: gotta return _something_. With that fixed, looks good.
Attached patch Address review comments (obsolete) (deleted) — Splinter Review
Carrying over r+.
Attachment #342288 - Flags: superreview?(pavlov)
Attachment #342288 - Flags: review+
Attachment #342282 - Attachment is obsolete: true
Attachment #342282 - Flags: superreview?(pavlov)
Attachment #342288 - Flags: superreview?(pavlov) → superreview+
From my tests, this no longer crashes in current Firefox 3.1 nightlies.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Crap - I forgot to reopen this. It bounced out immediately because this patch caused refcount leaks. I'm currently debugging that.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Attached patch Fix refcount leaks (deleted) — Splinter Review
The previous patch had a nice subtle leak that occurred because CancelAndForgetObserver() called Cancel() before calling NullOutListener() (to avoid code duplication). Unfortunately, the important part of Cancel(), it calling mOwner->RemoveProxy(), was done asynchronously, so by the time it get called, mListener was null, and so OnStopRequest was never called on the listener.
Attachment #342288 - Attachment is obsolete: true
Attachment #351252 - Flags: superreview?(pavlov)
Attachment #351252 - Flags: review?(bzbarsky)
Hold on. Which listener needed the OnStopRequest call? See comment 6.
Hm, that's not entirely clear. (It might be nsGIFDecoder2, in at least one case, but it's not clear to me where its mObserver gets nulled out.) This patch does the "simple" thing, which is simply make it possible to do both an asynchronous and a synchronous cancel, with the mechanics of that cancel being the same. This isn't quite the same as what you mentioned in comment 6 (AIUI), since we do talk to the listener again, we just do it synchronously.
But the point is that a number of the callers of this method are calling it in a destructor. You _can't_ talk to them safely. I know the old code sort of did, but that seems bogus to me...
I agree. In fact, almost all callers of CancelAndForgetObserver() are directly _inside_ the destructor. However, I'm not sure that I feel 100% comfortable making that change, because it involves some pretty significant changes to the callers. In the end, I think it'd end up making CancelAndForgetObserver() entirely obsolete. If you really want that change, I can do it, but it doesn't feel like the safest thing to do. This way maintains the previous behaviour for the places that need it, and fixes Cancel() for the places that didn't know about imgRequestProxy's broken Cancel() behaviour.
Ok, fair enough. I guess this is at least not making things any worse. Will try to review tomorrow.
Attachment #351252 - Flags: review?(bzbarsky) → review+
Summary: Crash [@ nsCellMapColumnIterator::GetNextFrame] with display: table-footer-group, binding, contenteditable and image → [PATCH] Crash [@ nsCellMapColumnIterator::GetNextFrame] with display: table-footer-group, binding, contenteditable and image
Comment on attachment 351252 [details] [diff] [review] Fix refcount leaks Looks fine, but add more documentation for the new method and for the old cancel() explaining what they do and why you shouldn't use the new one in new code.
Attachment #351252 - Flags: superreview?(pavlov) → superreview+
Pushed to mozilla-central: http://hg.mozilla.org/mozilla-central/rev/5f6b82c5c67e After some baking on m-c, I'll push to 1.9.1.
Status: REOPENED → RESOLVED
Closed: 16 years ago16 years ago
Resolution: --- → FIXED
had to back this due to leaks on tinderbox
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Pushed to mozilla-central: http://hg.mozilla.org/mozilla-central/rev/e87d4b42a4b2 After some baking, I'll push to 1.9.1.
Status: REOPENED → RESOLVED
Closed: 16 years ago16 years ago
Resolution: --- → FIXED
I can't verify the fix, since builds before the fix were already not crashing with the testcase.
Hardware: x86 → All
Target Milestone: --- → mozilla1.9.2a1
Verified with: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1b3pre) Gecko/20090114 Shiretoko/3.1b3pre Ubiquity/0.1.4 ID:20090114020333 Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.2a1pre) Gecko/20090113 Minefield/3.2a1pre ID:20090113020320 Can we please get this checked-in as a crashtest?
Status: RESOLVED → VERIFIED
Flags: in-testsuite?
Attachment #358377 - Flags: review?(bzbarsky)
Blocks: 469880
Does that include the fix for bug 471101? Also, you presumably want Joe to review this too.
Attachment #358377 - Flags: superreview+
Comment on attachment 358377 [details] [diff] [review] for 1.9.0 with imgIRequest_MOZILLA_1_9_0_BRANCH Looks good, but you definitely need the fix to bug 471101 too. sr=joe so long as you do that.
Comment on attachment 358377 [details] [diff] [review] for 1.9.0 with imgIRequest_MOZILLA_1_9_0_BRANCH Looks ok if you don't change the CID and backport the other bug.
Attachment #358377 - Flags: review?(bzbarsky) → review+
Depends on: 493240
Depends on: 508057
Alexander, What ended up happening with your 1.9.0 patch? Did Linux distributions accept it/ship it?
asac: Ditto Joe's question. See also regression fix bug 508057 that would be required if you did ship this patch.
Crash Signature: [@ nsCellMapColumnIterator::GetNextFrame]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: