Closed Bug 96911 Opened 23 years ago Closed 23 years ago

app crash when bringing up composition window (for both new or reply to msgs) -- gcc -O2

Categories

(Core :: DOM: Editor, defect)

x86
Linux
defect
Not set
blocker

Tracking

()

VERIFIED FIXED
mozilla0.9.4

People

(Reporter: cls, Assigned: cls)

References

Details

(Keywords: crash, regression, Whiteboard: possible fix attached; need testing)

Attachments

(6 files)

I'm sure this is a dupe but I can't seem to find the previous bugs that I've seen on this topic. The entire app crashes whenever I try to "Reply To" an existing mail message. It also crashes when I try to compose a new message. First noticed in a custom build from 1am. I repulled at 12:30pm and clobbered. I do not see the problem in the egcs nightly build but the gcc 2.95.2 -O2 nightly build crashes as well. My local builds are gcc 2.95.2 -O2 & -O3.
Attached file stdout log + backtrace (deleted) —
Keywords: crash, smoketest
I am on vacation. Reassign to varada.
Assignee: ducarroz → varada
Ok, after backing out jfrancis' checkin from 8/21 22:32, the crash went away. Reassigning.
Assignee: varada → brade
Component: Composition → Editor
Product: MailNews → Browser
QA Contact: sheelar → sujay
Moving to jfrancis.
Assignee: brade → jfrancis
doh! Investigating.
Status: NEW → ASSIGNED
Any other platforms seeing this? Is there a known problem with functions that return nsCOMPtrs on gcc? From the stack trace I can only assume gcc is doing the wrong thing in this situation. I am unable to reproduce this on mac. Scott, do you know of gcc probs with functions that return nsCOMPtrs?
Afaik, it's only gcc and only with a higher optimization level (-O2 or higher). I thought it was similar to bug 80988 and bug 61501 but I can't seem to find a quick fix for this one. The previous bugs were caused by gcc mis-optimizing nsCOMPtrs when they were declared inside a do-while loop. This problem appears to be fixed in the gcc 3.x releases but we haven't upgraded to them.
Summary: app crash when bringing up composition window (for both new or reply to msgs) → app crash when bringing up composition window (for both new or reply to msgs) -- gcc -O2
adding regression keyword, and making "the -O[2,3] bug" dependent on this...
Blocks: O2
Keywords: regression
Well... maybe a stupid idea... what about using |volatile| to tell the compiler to turn all optimisations off for that variable (assuming that gcc listens to |volatile| ...) ?
It's a return value. Is volatile nsCOMPtr<nsIFoo> Bar(); a valid function declaration? I haven't seen volatile used that way before, nor do I know what it would mean there.
Does anyone have a better stack-trace on this? With arguments, perhaps? And where in nsSupportsArray::AppendElement it is? This may very well be the compiler bug, but I can't be sure from the stack trace. Even better, a disasm of the code?
The problem won't be seen by looking at disasm of AppendElement. The real problem is inside nsEditor::GetNextNode(). It is calling a routine in there that returns an nsCOMPtr. Before the comptr was an outParam of the function (ie, an * nsCOMPtr). Now it's the return value. GCC seems to be destroying the comptr prematurely somewhere along the way - perhaps the temporary generated by the compiler is getting destroyed before it is used to set the returned value.
Note that http://mozilla.org/projects/xpcom/nsCOMPtr.html#guide_nsCOMPtr_in_APIs says that |nsCOMPtr|s should not be used as return values of functions.
This is not seen in the release builds for today, removing smoketest keyword.
Keywords: smoketest
In an effort to get the tree open quickly, couldn't we just change GetNextNode() and GetPriorNode() to assign into COMPtrs when calling GetLeftmostChild()/GetRightmostChild()? I believe right now they are assigning directly into raw pointers, which could be why things are getting released prematurely?
Ok, I misread the code ... it's been a long day already ... disregard my previous statements above. I have a clue now. ;-)
I disagree with the guideline DBaron mentions. I wouldn't disagree if the "already addrefed" return type worked correctly, but kin experienced problems wen trying to use it. In the meantime, comptrs are the lesser of evils. Non-comptrs have greater opportunity for usage error (not agreeing on ownership) than comptrs (assigning into non-comptrs, which I never do anyway).
is there a way to crank down gcc opts in the souce with pragmas?
Attached patch diffs to work around gcc prob (deleted) — Splinter Review
fix in hand; need reviews, and need a= if we want for 094
Whiteboard: fix in hand; need r/sr/a
Target Milestone: --- → mozilla0.9.4
also, if a linux gcc user could verify that this fixes things that would be nice.
oops, didn't diff all the files, new patch coming shortly
Attached patch more complete diffs (deleted) — Splinter Review
After the patch, it still crashes with the same stack. I've tried compiling a couple of the files with -g to get some extra debugger info but that seems to make the crash go away. *sigh*
I created a new profile and started up Mail/News. I hit the "New Msg" button and the composer window opened fine along with the account creation wizard, in which I created a imap account instance. When I hit finish, Mozilla core dumped. Here are the last few lines of gdb's output. I'll attach the full output separately. Loaded symbols for /export/src/mozilla/builds/mozilla-trunk/dist/bin/components/libmsgimap.so Reading symbols from /export/src/mozilla/builds/mozilla-trunk/dist/bin/components/libmsgdb.so...done. Loaded symbols for /export/src/mozilla/builds/mozilla-trunk/dist/bin/components/libmsgdb.so #0 0x40b1a0a5 in nsDocument::~nsDocument () from /export/src/mozilla/builds/mozilla-trunk/dist/bin/components/libgkcontent.so (gdb) I did all of this WITH the latest patch in this bug thread.
*** Bug 97081 has been marked as a duplicate of this bug. ***
attaching possible fix from scc and kin
Whiteboard: fix in hand; need r/sr/a → possible fix attached; need testing
Attached patch patch for editor base (deleted) — Splinter Review
The latest patch fails on compile (Linux/gcc2.95.3) with the following: c++ -o nsEditor.o -c -DENABLE_EDITOR_API_LOG=1 -DOSTYPE=\"Linux2.4\" -DOSARCH=\"Linux\" -DOJI -I../../dist/include -I../../dist/include -I/export/src/mozilla/builds/mozilla-trunk/dist/include/nspr -I/usr/X11R6/include -fPIC -I/usr/X11R6/include -fno-rtti -fno-exceptions -Wall -Wconversion -Wpointer-arith -Wbad-function-cast -Wcast-align -Woverloaded-virtual -Wsynth -pedantic -Wno-long-long -pipe -pthread -O2 -march=i686 -DNDEBUG -DTRIMMED -I/usr/X11R6/include -DMOZILLA_CLIENT -include ../../config-defs.h -Wp,-MD,.deps/nsEditor.pp /export/src/mozilla/mozilla/editor/base/nsEditor.cpp /export/src/mozilla/mozilla/editor/base/nsEditor.cpp: In method `struct already_AddRefed<nsIDOMNode> nsEditor::GetRightmostChild(nsIDOMNode *, int = 0)': /export/src/mozilla/mozilla/editor/base/nsEditor.cpp:3398: conversion from `nsCOMPtr<nsIDOMNode>' to non-scalar type `already_AddRefed<nsIDOMNode>' requested /export/src/mozilla/mozilla/editor/base/nsEditor.cpp:3388: warning: unused variable `nsresult result' /export/src/mozilla/mozilla/editor/base/nsEditor.cpp: In method `struct already_AddRefed<nsIDOMNode> nsEditor::GetLeftmostChild(nsIDOMNode *, int = 0)': /export/src/mozilla/mozilla/editor/base/nsEditor.cpp:3425: conversion from `nsCOMPtr<nsIDOMNode>' to non-scalar type `already_AddRefed<nsIDOMNode>' requested /export/src/mozilla/mozilla/editor/base/nsEditor.cpp:3415: warning: unused variable `nsresult result' gmake[3]: *** [nsEditor.o] Error 1 gmake[3]: Leaving directory `/export/src/mozilla/builds/mozilla-trunk/editor/base' gmake[2]: *** [install] Error 2 gmake[2]: Leaving directory `/export/src/mozilla/builds/mozilla-trunk/editor' gmake[1]: *** [install] Error 2 gmake[1]: Leaving directory `/export/src/mozilla/builds/mozilla-trunk' gmake: *** [build] Error 2
With the 8/28 23:00 gcc295 nightly, I'm not seeing this problem any longer. I'm also not seeing it in my local builds any longer. Not sure what "fixed" it. Is anyone else still seeing the problem?
Hmm, my build from 2001-08-28, between 12:00 and 13:00 PST (I think, it was around 21:30 MEST here when I pulled) does work OK as well though (I think) about 24 hours earlier I saw this still happening. What has happened here?
I pulled the latest changes at 2:30 and did a depend build. A relatively new profile (the one that caused the crash in my message of 2001-08-27 17:58) was able to start up composer and send an email just fine. The same build, with my normal profile (which had been imported from NN4.7 into Mozilla 0.9 or so) freezes with the same symptoms it has been having for the last week or so: whole app freezes and I have to Control-C to kill it. Just to be clear: I only had an actual core-dump once. Every other time it has just hung the whole app. Here's my .mozconfig, in case that might help: ac_add_options --disable-logging ac_add_options --disable-editor-api-log ac_add_options --disable-tests ac_add_options --disable-debug ac_add_options --enable-crypto ac_add_options --enable-optimize="-O2 -march=i686" ac_add_options --with-jpeg ac_add_options --with-zlib ac_add_options --with-png ac_add_options --with-gtk
Ok, I verified that it was kin's checkin to nsEditor.{h,cpp} yesterday that "fixed" the problem. I also noticed, that with a pre-kin version of nsEditor.cpp, if I move the declaration of node outside of the do-while loops in GetPriorNode() & GetNextNode(), then the app doesn't crash when the window comes up. It crashes when the first character is typed in the window.
Ok, I found the pre-kin fix for the original crash problem that supported my original hypothesis of not declaring nsCOMPtrs inside of do-while loops.
cool beans. I'm much happier knowing that Chris has isolated the problem rather than have me keep guessing at it. r=jfrancis
sr=kin@netscape.com Chris, go ahead and check that in, if you can get approval.
Assignee: jfrancis → cls
Status: ASSIGNED → NEW
a=blizzard on behalf of drivers for 0.9.4
Patch checked in.
Status: NEW → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
thanks to chris, kin, and scott for al lthe help on this.
*** Bug 97585 has been marked as a duplicate of this bug. ***
Christopher, can you verify this fix and mark this verified-fixed? thanks.
Setting QA Contact to sheelar for verification.
Status: RESOLVED → VERIFIED
QA Contact: sujay → sheelar
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: