Closed Bug 53969 Opened 24 years ago Closed 24 years ago

crashes after printing and moving the mouse into a text frame

Categories

(Core :: Layout, defect, P1)

defect

Tracking

()

VERIFIED FIXED

People

(Reporter: dbaron, Assigned: waterson)

References

Details

(Keywords: crash, topcrash, Whiteboard: [nsbeta3-][rtm++] FIXED ON TRUNK and BRANCH)

Attachments

(7 files)

One of the top crashes on talkback reports (around #10 or so) is a crash in nsTableFrame::GetRowGroupFrame with the top of the stack like this (from 2000091806): nsTableFrame::GetRowGroupFrame [d:\builds\seamonkey\mozilla\layout\html\table\s rc\nsTableFrame.cpp line 1127] nsTableFrame::ReflowMappedChildren [d:\builds\seamonkey\mozilla\layout\html\table\src\nsTableFrame.cpp line 3086] nsTableFrame::ResizeReflowPass2 [d:\builds\seamonkey\mozilla\layout\html\table\src\nsTableFrame.cpp line 2066] nsTableFrame::Reflow [d:\builds\seamonkey\mozilla\layout\html\table\src\nsTableFrame.cpp line 1710] See http://www.mozilla.org/projects/seamonkey/reports/ns6analysis.html for lots of data on this crash.
Keywords: crash, nsbeta3, topcrash
Looks like either a bad or null frame is being passed in to the function and it is crashing here: http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/layout/html/table/src/nsTableFrame.cpp&rev=3.396&mark=1125#1115
There are also one stack in the talkback data listed under 0x00000000 as the stack signature (from build 2000091409), where the stack is just: 0x00000000 (everything I pasted in above)
Many of the user comments in the talkback reports mention printing, particularly printing in mail.
Summary: crashes in nsTableFrame::GetRowGroupFrame → crashes while printing [ @ nsTableFrame::GetRowGroupFrame]
Marking nsbeta3+.
Status: NEW → ASSIGNED
Whiteboard: [nsbeta3+]
Adding me to CC. This bug is easy to reproduce. Just print out this page. Crashes for me with WinNT 2000092120. In the crash analysis somebody mentioned that he crashed while printing out a bug. (And did not file a bug????) The print out appeared at our network printer completely. It seems that after a printout the redraw of the page fails.
Attached patch patch to fix the crash (deleted) — Splinter Review
I've added a patch to fix the bug, I think. I can't test it right now because my build has some other problems. There may be other problems later, but the one that dbaron pointed out should be fixed. Don, we need to find out why this only seems to happen when printing.
Whiteboard: [nsbeta3+] → [nsbeta3+]fix-in-hand-maybe
I applied chris's patch. It does not work, now the second argument is a null pointer.
adding [@ nsTableFrame::GetRowGroupFrame] for topcrash tracking.
Summary: crashes while printing [ @ nsTableFrame::GetRowGroupFrame] → crashes while printing [@ nsTableFrame::GetRowGroupFrame]
Bernd, I'm not seeing a crash in GetRowGroupFrame (on http://bugzilla.mozilla.org/enter_bug.cgi?product=Browser) after applying the patch on my 9/22 build (besides, there is already a check for null on the 2nd parameter). I do get a crash in other code after printing when the mouse goes into the browser. dcone mentioned that there is a recent bug assigned to joki in which mousing over a text field after printing causes a crash. I'm CCing him and we should probably wait for him to fix that bug before proceeding with this one.
I am seeing the crash on the following URL http://bugzilla.mozilla.org/show_bug.cgi?id=53969 The crash happens at http://lxr.mozilla.org/seamonkey/source/layout/html/table/src/nsTableFrame.cpp#1 125 Inside the handling routine for IFrameTypeIn. The problem is: the virtual function pointer of aframe is null. Calling the GetFrameType method produces the crash.
Don, I just tried this again on my 9/22 build and am getting a serious assertion (but only after moving the mouse into the browser area after printing this page). The stack below is different than what Bernd reported, which supports the theory that there may be some evil corruption going on (or as Kevin suggests, maybe references to frames are being held after the frames are destroyed). In the assertion below, the content does not have a document. I continued past all of the assertions and never saw table code on the stack. I think this is either your problem or joki's. nsDebug::Assertion(const char * 0x01478a34, const char * 0x01478a30, const char * 0x01478a00, int 4170) line 256 + 13 bytes nsDebug::WarnIfFalse(const char * 0x01478a34, const char * 0x01478a30, const char * 0x01478a00, int 4170) line 358 + 21 bytes nsTextFrame::Reflow(nsTextFrame * const 0x00e0c66c, nsIPresContext * 0x017a9790, nsHTMLReflowMetrics & {...}, const nsHTMLReflowState & {...}, unsigned int & 1228400) line 4170 + 38 bytes nsLineLayout::ReflowFrame(nsIFrame * 0x00e0c66c, nsIFrame * * 0x0012c8a8, unsigned int & 1228400, nsHTMLReflowMetrics * 0x00000000, int & 0) line 921
Assignee: karnaze → dcone
Status: ASSIGNED → NEW
I'm upgrading the priority of this bug because currently, PDT is moving all P3-P5 nsbeta3+ bugs to nsbeta3- and nominate for rtm if needed. This bug sounds like it should be addressed before ship. Do we think that this bugs needs to be fixed for nsbeta3? It sounds like under a certain case, we crash when printing?
Priority: P3 → P2
Not holding PR3 for this. While I trust "topcrash", I've printed a ton of mail and bug reports and not crashed. Marking nsbeta3- but adding rtm keyword
Keywords: rtm
Whiteboard: [nsbeta3+]fix-in-hand-maybe → [nsbeta3-]fix-in-hand-maybe
This is a duplicate of bug 53219.. has the same stack trace.. *** This bug has been marked as a duplicate of 53219 ***
Status: NEW → RESOLVED
Closed: 24 years ago
Resolution: --- → DUPLICATE
Whiteboard: [nsbeta3-]fix-in-hand-maybe → [nsbeta3-]
How is this the same stack trace?
when me an Karnaze looked at this.. when he would move the cursor over the textfield.. we got the same stacktrace that was in the other bug. The stacktraces to vary depending on what you do after the print.. but the common thing is that if you print and do nothing (don't move the cursor over the page), no crash.. if you start moving the cursor over the page.. all kinds of bad things happen. Joki has this bug .. I am going to go back to previous builds and find out when this regression was introduced...
So is this a dup of #53219?
*** Bug 54068 has been marked as a duplicate of this bug. ***
This bug isn't a duplicate of bug 53912 because this crash still occurs after fixing that bug. Re-opening...
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
I am really impressed by the PDT ignorance to this bug, basing the judgement on private impression instead of facts (there are only 27 bugs listed in todays crash data), but it is your product and your reputation.
Adding rtm+, waiting for PDT approval. Bernd, thanks for reminding (the rest of) us that this is important.
Whiteboard: [nsbeta3-] → [nsbeta3-][rtm+]
This is the same stack trace as 53967. Something unstable happened.. and it was introduced between 9/19/00 and 9/20/00. My regression builds point to this time frame when printing then moving the cursor over the page causes crashes.. this is a duplicate of that bug. *** This bug has been marked as a duplicate of 53967 ***
Status: REOPENED → RESOLVED
Closed: 24 years ago24 years ago
Resolution: --- → DUPLICATE
Sorry wrong bug number.. reopening this.
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
The simplified test case is just a <input value=foo type=text> inside a document. Print the document and then put the mouse inside the text control and the fun begins. Thanks Chris...
Assignee: dcone → waterson
Status: REOPENED → NEW
Summary: crashes while printing [@ nsTableFrame::GetRowGroupFrame] → crashes after printing and moving the mouse into a text frame
Status: NEW → ASSIGNED
Here is one way to fix the problem. Instead of storing the nsIAnonymousContentCreator-generated content in the document (well, the binding manager, which is 1-1 with the document), we store it in the pres shell. Note that nsBindingManager::ChangeDocumentFor() now iterates through aOldDocument's shells to update each shell's anonymous content.
This is P1. You crash after every form you try to print.
Priority: P2 → P1
PDT agrees [rtm need info] until code reviews are available. Definitely want a fix for this topcrash though.
Whiteboard: [nsbeta3-][rtm+] → [nsbeta3-][rtm need info]
If this bug is not TABLE specific, please change to appropriate component.
Component: HTMLTables → Layout
The changes make sense to me. It seems a little strange to be going through the pres shells in nsBindingManager::ChangeDocumentFor, but then again, it means the code only has to be in one place (rather than two). Is it possible for some of a document's pres shells to be destroyed before the document is? If so, should you be setting the anonymous content to null in the pres shell destructor too? (Should the setting to null then be done through a method on the pres shell that makes sure it's only done once?) (While we're here, is |nsIAnonymousContentCreator| deprecated? Should it be marked as such in layout/html/base/src/nsIAnonymousContentCreator.h? And should we remove my |SetDocumentForAnonymousContent| hack plus the one remaining call to it in nsDocument.cpp that should have been removed with the fix for bug 50999?)
> Is it possible for some of a document's pres shells to be destroyed before > the document is? Yes, and this patch doesn't account for that, I don't think. For example, let's print the document. Now we'll create a PresShell for printing, insert some anonymous content in it, and then kill the PresShell. At no time will we ChangeDocumentFor(...) with a null document. > If so, should you be setting the anonymous content to null in the > pres shell destructor too? (Should the setting to null then be done > through a method on the pres shell that makes sure it's only done once?) I'll give that a try. I'll think about the rest of your question (removing the nsIAnonymousContentCreator hacks) tomorrow... :-)
The above patch incorporates dbaron's feedback: 1. nsPresShell::~nsPresShell now enumerates the anonymous content left in the table and ensures that SetDocument(nsnull) is called on each. N.B. that this may duplicate some of the work done by ChangeDocumentFor() when the document is destroyed; however, if an element's mDocument is already null, the SetDocument() implementation's bail early (so we're basically wasting an extra function call). Nevertheless, this is necessary to handle the case where a PresShell dies before the document does: in theory, the PresShell may have triggered anonymous content to be generated with a script object. 2. We no longer need the SetDocumentForAnonymousContent() method on nsIAnonymousContentCreator, so I removed it, and the one call made by it. dbaron, care to give this another look? The patch is pretty much the same as last time, modulo addition of ClearDocumentEnumerator() in nsPresShell.cpp and gutting of nsIAnonymousContentCreator::SetDocumentForAnonymousContent().
*** Bug 55130 has been marked as a duplicate of this bug. ***
Am I correct that this solution means all anonymous content should get a SetDocument(null) twice (once when the pres shell owning the frame owning the content goes away, and once when the "owning" content gets SetDocument(null))? Keeping all anonymous content around until the pres shell goes away seems like it could be bad for memory usage, but I don't see a better way (and I've proposed that we should do something similar for other content...). Is it possible for there to be problems with content->[JS]->document->presShell->content circular references? (Would the added code *ever* break such circular references where it didn't before? i.e., was my previous "pres shell destructor" suggestion bad?) Who owns the pres shell? Would it be better to do the pres shell's half of the SetDocument(null) work only when the document releases the pres shell instead of from the pres shell's dtor?
My comment about circular references doesn't make much sense -- if the document still owns the pres shell then we're fine, because the iteration in the content's half of the work should get it. But would it be better to be more tolerant of leaked pres shells (i.e., not leak ten times more along with it...)?
Does that fix also solve the original problem reported by dbaron? Or did this bug morph underneath? See bug 55162 for another manifestation of the original bug.
I don't think this bug has morphed. The problem causing the crash (based on what Chris Waterson's fix does) was in the way we were unrooting (root meaning a root for JS GC) anonymous content that was attached to frames (which is needed to prevent huge leaks - see bug 45676 and bug 42895, although hyatt's more recent changes to create event.originalTarget may have "fixed" those cases by not causing JS objects to be created for anonymous content in the tooltip code). My original fix (for bug 45676) was very slow (and also didn't account for reframes at all) -- see bug 50999. Waterson's fix for bug 50999 just needs to be modified to correctly account for multiple sets of frames (each set owned by a PresShell) on the same document. These problems are caused by attaching anonymous content to frames (any frame implementing nsIAnonymousContentCreator does this). Things get somewhat messy once you start creating multiple frame trees (e.g., for printing). That's why I mentioned deprecating nsIAnonymousContentCreator above (in favor of XBL anonymous content). Hopefully that explanation is at least related to the truth...
> Am I correct that this solution means all anonymous content should get a > SetDocument(null) twice (once when the pres shell owning the frame owning the > content goes away, and once when the "owning" content gets SetDocument(null))? This will only happen in the "longest lived" pres shell. (Of course, that means it will happen in every pres shell that's used to view a web page!) But, as I noted earlier, SetDocument() implementations will short circuit much of the work if mDocument == aDocument, so the second iteration will waste a traversal of a shallow DOM tree, but not much else. > Keeping all anonymous content around until the pres shell goes away > seems like it could be bad for memory usage, Perhaps we can explore bounding the lifetime of the anonymous content to the lifetime of the frame that created it in the future. > But would it be better to be more tolerant of leaked pres > shells (i.e., not leak ten times more along with it...)? One problem, one bug, please! :-) > Does that fix also solve the original problem reported by dbaron? Or > did this bug morph underneath? No, this doesn't fix the table crasher. Yes, the bug has morphed, but is still valid. I think that bug 55162 should be dup'd to bug 54829, which covers the bad table row group dereference.
>This will only happen in the "longest lived" pres shell. (Of course, that means Right... >> But would it be better to be more tolerant of leaked pres >> shells (i.e., not leak ten times more along with it...)? > >One problem, one bug, please! :-) What I meant was that it might be better to put the code you were adding to the pres shell's destructor at the place where the document releases the pres shell.
Ah, I see. That's a good idea. I'll work up another patch.
I incorporated dbaron's suggestion to add out-of-band signalling to the PresShell to nuke the document pointers: there's now nsIPresShell::ReleaseAnonymousContent() that's called from nsDocument::- and nsXULDocument::SetScriptGlobalObject(nsnull). This method is implemented such that it will drop the mAnonymousContentTable after being called, avoiding multiple traversals through the anonymous content to call SetDocument().
I'm guessing that, for OS/2, |ClearDocumentEnumerator| needs |PR_CALLBACK|, i.e., |static PRBool PR_CALLBACK|. The behavior of |PresShell::SetAnonymousContentFor| when given a null |aAnonymousElements| seems odd - you remove the anonymous content from the table without calling |SetDocument|. Perhaps you should either: * change |if (oldAnonymousElements && aAnonymousElements)| to |if (oldAnonymousElements)| * (my preference) make null |aAnonymousElements| an assertion+failure return, since you have |ReleaseAnonymousContent| now. My comments about resistance to leaked |PresShell| don't make any sense at all -- if we leak a pres shell we leak the entire document anyway (my guess at the ownership model was backwards -- I should have looked). So I guess some of the current changes aren't as useful as I thought, but I don't think there's anything wrong with them. Or does the stuff I got you to add do anything at all? (I'm tired... I can't tell. Sorry to change my mind so much.)
Fixed the PR_CALLBACK stuff: good catch. mkaply will thank you, even though he doesn't know it yet. You're right about allowing null for aAnonymousItems. Bogus. Vestigial junk from an earlier incarnation. Fixed that. I actually like the consistency of unrooting the anonymous content at the same time we unroot real content. Although I'm almost positive that it's probably not strictly necessary (as indicated in the comment), it certainly feels tidier. I'll attach another patch.
r=dbaron (assuming you've run some basic leak tests, e.g., bloat tests and a little mousing around in the browser)
dbaron: yep.
a=hyatt
PDT, returning to + status. You've ++'d it before. Still want it?
Whiteboard: [nsbeta3-][rtm need info] → [nsbeta3-][rtm+]
PDT marking [rtm need info]. We'd still like a fix for a topcrash (don't think it was ever ++) but this patch is really big. Please validate this on the trunk for a day and come back after that was successful.
Whiteboard: [nsbeta3-][rtm+] → [nsbeta3-][rtm need info]
The size of the patch is deceptive. Most of it is moving code from one place to another or removing unused code.
PDT: this is a top crash, and while the patch is long, I can verify what david says: "number of lines changed" here is deceptive, because the patch largely moves functionality from one place to another, and obsoletes some methods which are correctly removed. It isn't 0-risk, but the risk is not high either. After a day or two of validation on the trunk and maybe some extra testing help from QA and/or dev staff, we should take this fix on the branch.
I agree with buster, this is a big deal.. printing will not work in 90% of the cases if this is not fixed.
I filed a bunch of printing bugs on the beta and some of them got funneled into this bug. We really should fix this for rtm, I think we look pretty bad without this fix.
Fix checked in to trunk. Marking rtm+ again for re-eval.
Whiteboard: [nsbeta3-][rtm need info] → [nsbeta3-][rtm+] FIXED ON TRUNK
I agree that we need this. I just filed 55910 on a bunch of printing issues, which are most likely this bug. For one, after printing, scroll bars are totally hosed.
checked into the embedding branch.
marking rtm++
Whiteboard: [nsbeta3-][rtm+] FIXED ON TRUNK → [nsbeta3-][rtm++] FIXED ON TRUNK
fixed on branch
Status: ASSIGNED → RESOLVED
Closed: 24 years ago24 years ago
Resolution: --- → FIXED
Status Whiteboard says 'FIXED ON TRUNK' - comments says fixed on branch. I assume sice it is an rtm++ bug, it was fixed on the branch. Please clarify prior to verification. Thanks
Both.
Whiteboard: [nsbeta3-][rtm++] FIXED ON TRUNK → [nsbeta3-][rtm++] FIXED ON TRUNK and BRANCH
Verified fixed on Win, Mac and Linux with 10_11 branch build. Added vtrunk keyword.
Keywords: vtrunk
I didn't get crash on Win NT nov_1 nor Mac oct_24 nor Linux nov_2 trunk builds by "printing and moving the mouse into a text frame", printing in mail. So I'm marking this bug as VERIFIED and removing vtrunk keyword.
Status: RESOLVED → VERIFIED
Keywords: vtrunk
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: