Closed
Bug 718999
Opened 13 years ago
Closed 13 years ago
make "You can't dereference a NULL nsCOMPtr" assertions fatal
Categories
(Core :: XPCOM, defect)
Tracking
()
RESOLVED
FIXED
mozilla12
People
(Reporter: mak, Assigned: mak)
Details
Attachments
(1 file)
(deleted),
patch
|
benjamin
:
review+
|
Details | Diff | Splinter Review |
While it's true this is most likely followed by a crash, in some cases we don't get a decent stack from it, for example if this happens on tinderbox and causes a segfault there is no way to get a stack.
Thus, I suggest we convert these assertions to MOZ_ASSERT so we abort and get a stack in all cases.
An example of a tinderbox failure:
###!!! ASSERTION: You can't dereference a NULL nsCOMPtr with operator->().: 'mRawPtr != 0', file ../../../../../dist/include/nsCOMPtr.h, line 809
NS_DebugBreak+0x00000026 [/builds/slave/m-aurora-lnx-dbg/build/obj-firefox/dist/bin/libxpcom.so +0x0000203C]
UNKNOWN [../../../../../dist/bin/test_IHistory +0x000025AA]
UNKNOWN [../../../../../dist/bin/test_IHistory +0x00003A3B]
UNKNOWN [../../../../../dist/bin/test_IHistory +0x00004F7F]
UNKNOWN [../../../../../dist/bin/test_IHistory +0x0000211C]
UNKNOWN [/builds/slave/m-aurora-lnx-dbg/build/obj-firefox/dist/bin/libxul.so +0x010F7C37]
UNKNOWN [../../../../../dist/bin/test_IHistory +0x00008B2C]
UNKNOWN [../../../../../dist/bin/test_IHistory +0x00006211]
__libc_start_main+0x000000DC [/lib/libc.so.6 +0x00015DEC]
###!!! ASSERTION: You can't dereference a NULL nsCOMPtr with operator->().: 'mRawPtr != 0', file ../../../../../dist/include/nsCOMPtr.h, line 809
/bin/sh: line 1: 32579 Segmentation fault XPCOM_DEBUG_BREAK=stack-and-abort /builds/slave/m-aurora-lnx-dbg/build/obj-firefox/dist/bin/run-mozilla.sh ../../../../../dist/bin/$f
make[5]: *** [check] Error 139
make[5]: Leaving directory `/builds/slave/m-aurora-lnx-dbg/build/obj-firefox/toolkit/components/places/tests/cpp'
make[4]: *** [check] Error 2
make[4]: Leaving directory `/builds/slave/m-aurora-lnx-dbg/build/obj-firefox/toolkit/components/places/tests'
make[3]: *** [check] Error 2
make[3]: Leaving directory `/builds/slave/m-aurora-lnx-dbg/build/obj-firefox/toolkit/components/places'
make[2]: *** [check] Error 2
make[2]: Leaving directory `/builds/slave/m-aurora-lnx-dbg/build/obj-firefox/toolkit/components'
make[1]: *** [check] Error 2
make[1]: Leaving directory `/builds/slave/m-aurora-lnx-dbg/build/obj-firefox/toolkit'
make: *** [check] Error 2
program finished with exit code 2
elapsedTime=782.101013
TinderboxPrint: check<br/>16834/0
Unknown Error: command finished with exit code: 2
Assignee | ||
Comment 1•13 years ago
|
||
As a side note, we can't use MOZ_ASSERT here, due to bug 717540.
So either I use NS_ABORT_IF_FALSE or I have to wait for that bug.
Depends on: 717540
Comment 2•13 years ago
|
||
I don't understand how using MOZ_ASSERT will help: this test is already using stack-and-abort, which called mozalloc_abort. MOZ_ASSERT forwards to CrashInJS, which uses the same basic technique, so I don't see how we'd get any better stack...
I think this is just an issue where binary test programs aren't hooked up to breakpad while running tests, so we don't get the good crashreport stacks, just whatever stacks get printed by the program itself.
No longer depends on: 717540
Assignee | ||
Comment 3•13 years ago
|
||
well, we'd get a stack on the xpcom deref rather than later, plus the benefit of having an aborting assertion rather than a weak one, the error is pretty important to catch.
Even if, according to what you say, looks like even aborting at the deref we'd not get a stack due to how binary tests are built? Would that be hard to fix?
Comment 4•13 years ago
|
||
I don't oppose the change, but it won't make this particular case any better.
Each of the binary tests would have to enable the crash reporter and we'd have to run them in a test environment which is aware of and processes the resulting minidumps. For things that don't go through XRE_main that's a real PITA, and it's a pretty large chunk of work.
Assignee | ||
Comment 5•13 years ago
|
||
I see, it's a pity we can't figure these failures, being unable to reproduce locally :(
Btw, in case we want to take an abort_if_false patch, here it is.
Attachment #589512 -
Flags: review?(benjamin)
Comment 6•13 years ago
|
||
Why aren't we getting good stacks out of these crashes? They're running on the build machines as part of "make check", so if they're hitting assertions they should get good stacks on debug builds. (The binaries should be unstripped, so NS_Stackwalk should work just fine.) Are we not hitting these on debug builds, only release builds, or what?
Assignee | ||
Comment 7•13 years ago
|
||
Sorry, I don't know the details of how our stackwalk works, though I can say that these failures happen really late in the shutdown process, much after xpcom-shutdown, so aborting earlier may help, or not.
Assignee | ||
Comment 8•13 years ago
|
||
just as a reference, the above has been copied from this log https://tbpl.mozilla.org/php/getParsedLog.php?id=8628442&tree=Mozilla-Aurora
Comment 9•13 years ago
|
||
Okay, after talking to glandium on IRC, I think the problem here is that NS_StackWalk just hasn't ever given us good symbols on Linux (because we're just using dladdr, and it doesn't read debug info). fix-linux-stack.pl exists to fix this very problem, so we should just fix our C++ unit tests to pipe their output through that. Filed bug 719120 on that.
Assignee | ||
Comment 10•13 years ago
|
||
Thank you, fixing that will be really useful.
Updated•13 years ago
|
Attachment #589512 -
Flags: review?(benjamin) → review+
Assignee | ||
Comment 11•13 years ago
|
||
Target Milestone: --- → mozilla12
Comment 12•13 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Comment 13•13 years ago
|
||
This change makes my life harder :/ I'll have to add some fragile code to http://www.squarefree.com/2010/11/21/how-my-dom-fuzzer-ignores-known-bugs/ in order to ignore specific assertions with specific stack traces.
Assignee | ||
Comment 14•13 years ago
|
||
The scope was to make easier for people to detect null dereferences and to get their stacks from automated tests (once bug 719120 is fixed), though, if this makes things harder for the fuzzer we use to debug, we can backout.
My assumption was that this kind of failure should be reported and fixed immediately.
Comment 15•13 years ago
|
||
Fuzzer bugs often don't get fixed immediately. The DOM fuzzer's ignore list currently has signatures for 32 crash bugs.
Comment 16•13 years ago
|
||
I don't understand the problem, Jesse: isn't this assertion *always* immediately followed by a crash? There's really no continuing from a nullptr deref.
Comment 17•13 years ago
|
||
The problem is that when the assertion is changed to an abort, the crash doesn't happen. My fuzzer scripts use the entire stack to decide whether a crash is a known bug, but only the assertion message to decide whether an assertion failure is a known bug.
Comment 18•13 years ago
|
||
That doesn't seem like the right way to do it, given that we have JS_ASSERTs at highly centralized pinch points, and a failure at such a pinch point may well alias more than one issue...
You need to log in
before you can comment on or make changes to this bug.
Description
•