Closed Bug 673017 Opened 13 years ago Closed 12 years ago

Intermittent segfault while running test_IHistory.cpp on Linux, or bus error on OS X [***NOT: Expected true, got false***][*** [check] Error 139]

Categories

(Toolkit :: Places, defect)

defect
Not set
major

Tracking

()

RESOLVED WORKSFORME
mozilla11
Tracking Status
firefox11 --- affected
firefox12 --- affected

People

(Reporter: cmtalbert, Assigned: espindola)

References

Details

(Keywords: intermittent-failure)

Attachments

(3 files, 12 obsolete files)

(deleted), patch
Details | Diff | Splinter Review
(deleted), patch
mak
: review+
espindola
: checkin+
Details | Diff | Splinter Review
(deleted), patch
dietrich
: review+
Details | Diff | Splinter Review
There is an intermittent segfault occurring when running the make check test test_nsIHistory.cpp on linux desktop mobile builds. It has happened so far on both a QT (Bq) and standard (Bm) build. Re-running the Bm build on the same changeset caused it to go green, proving this is intermittent. I re-ran the build here: https://build.mozilla.org/buildapi/self-serve/mozilla-central/build/4656276 First known case of this (as far as I can tell) is changeset http://hg.mozilla.org/mozilla-central/rev/a80aa88ded45 today. The one I noticed it on and re-ran the build for was changeset: http://hg.mozilla.org/mozilla-central/rev/5c7a49dfa7f7 = Errors = Changeset: a80aa88ded45 s: moz2-linux-slave16 Linux QT mozilla-central build on 2011/07/20 12:44:02 TEST-INFO | (../../../../../../toolkit/components/places/tests/cpp/test_IHistory.cpp) | Running test_two_null_links_same_uri. TEST-INFO | (../../../../../../toolkit/components/places/tests/cpp/test_IHistory.cpp) | 246 of 246 tests passed /bin/sh: line 1: 18859 Segmentation fault XPCOM_DEBUG_BREAK=stack-and-abort ../../../../../dist/bin/run-mozilla.sh ../../../../../dist/bin/$f NEXT ERROR make[5]: *** [check] Error 139 make[5]: Leaving directory `/builds/slave/m-cen-linuxqt/build/obj-firefox/toolkit/components/places/tests/cpp' NEXT ERROR make[4]: *** [check] Error 2 make[4]: Leaving directory `/builds/slave/m-cen-linuxqt/build/obj-firefox/toolkit/components/places/tests' NEXT ERROR make[3]: *** [check] Error 2 make[3]: Leaving directory `/builds/slave/m-cen-linuxqt/build/obj-firefox/toolkit/components/places' make[2]: *** [check] Error 2 make[2]: Leaving directory `/builds/slave/m-cen-linuxqt/build/obj-firefox/toolkit/components' make[1]: *** [check] Error 2 make[1]: Leaving directory `/builds/slave/m-cen-linuxqt/build/obj-firefox/toolkit' make: *** [check] Error 2 program finished with exit code 2 Changeset: 5c7a49dfa7f7 s: moz2-linux-slave21 Linux Mobile Desktop mozilla-central build on 2011/07/20 13:37:57 TEST-INFO | (/builds/slave/m-cen-lnx-mb/build/toolkit/components/places/tests/cpp/test_IHistory.cpp) | Running test_two_null_links_same_uri. TEST-INFO | (/builds/slave/m-cen-lnx-mb/build/toolkit/components/places/tests/cpp/test_IHistory.cpp) | 240 of 240 tests passed /bin/sh: line 1: 15022 Segmentation fault XPCOM_DEBUG_BREAK=stack-and-abort ../../../../../dist/bin/run-mozilla.sh ../../../../../dist/bin/$f NEXT ERROR make[5]: *** [check] Error 139 make[5]: Leaving directory `/builds/slave/m-cen-lnx-mb/build/obj-firefox/toolkit/components/places/tests/cpp' NEXT ERROR make[4]: *** [check] Error 2 make[4]: Leaving directory `/builds/slave/m-cen-lnx-mb/build/obj-firefox/toolkit/components/places/tests' NEXT ERROR make[3]: *** [check] Error 2 make[3]: Leaving directory `/builds/slave/m-cen-lnx-mb/build/obj-firefox/toolkit/components/places' make[2]: *** [check] Error 2 make[2]: Leaving directory `/builds/slave/m-cen-lnx-mb/build/obj-firefox/toolkit/components' make[1]: *** [check] Error 2 make[1]: Leaving directory `/builds/slave/m-cen-lnx-mb/build/obj-firefox/toolkit' make: *** [check] Error 2 program finished with exit code 2
This looks like the same failure: http://tinderbox.mozilla.org/showlog.cgi?log=Build-System/1311268102.1311273340.12134.gz#err0 It's on a regular desktop Firefox Linux64 build. Linux x86-64 build-system build on 2011/07/21 10:08:22
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1311283026.1311287431.14880.gz Linux mozilla-central build on 2011/07/21 14:17:06 make[5]: Entering directory `/builds/slave/m-cen-lnx/build/obj-firefox/toolkit/components/places/tests/cpp' TEST-INFO | (../../../../../../toolkit/components/places/tests/cpp/test_IHistory.cpp) | Disabling Idle Service. TEST-INFO | (../../../../../../toolkit/components/places/tests/cpp/test_IHistory.cpp) | Running test_set_places_enabled. TEST-INFO | (../../../../../../toolkit/components/places/tests/cpp/test_IHistory.cpp) | Running test_unvisted_does_not_notify_part1. TEST-INFO | (../../../../../../toolkit/components/places/tests/cpp/test_IHistory.cpp) | Running test_visited_notifies. TEST-INFO | (../../../../../../toolkit/components/places/tests/cpp/test_IHistory.cpp) | Running test_unvisted_does_not_notify_part2. TEST-INFO | (../../../../../../toolkit/components/places/tests/cpp/test_IHistory.cpp) | Running test_same_uri_notifies_both. TEST-INFO | (../../../../../../toolkit/components/places/tests/cpp/test_IHistory.cpp) | Running test_unregistered_visited_does_not_notify. TEST-INFO | (../../../../../../toolkit/components/places/tests/cpp/test_IHistory.cpp) | Running test_new_visit_notifies_waiting_Link. TEST-INFO | (../../../../../../toolkit/components/places/tests/cpp/test_IHistory.cpp) | Running test_RegisterVisitedCallback_returns_before_notifying. TEST-INFO | (../../../../../../toolkit/components/places/tests/cpp/test_IHistory.cpp) | Running test_observer_topic_dispatched. TEST-INFO | (../../../../../../toolkit/components/places/tests/cpp/test_IHistory.cpp) | Running test_visituri_inserts. TEST-INFO | (../../../../../../toolkit/components/places/tests/cpp/test_IHistory.cpp) | Running test_visituri_updates. TEST-INFO | (../../../../../../toolkit/components/places/tests/cpp/test_IHistory.cpp) | Running test_visituri_preserves_shown_and_typed. TEST-INFO | (../../../../../../toolkit/components/places/tests/cpp/test_IHistory.cpp) | Running test_visituri_creates_visit. TEST-INFO | (../../../../../../toolkit/components/places/tests/cpp/test_IHistory.cpp) | Running test_visituri_transition_typed. TEST-INFO | (../../../../../../toolkit/components/places/tests/cpp/test_IHistory.cpp) | Running test_visituri_transition_embed. TEST-INFO | (../../../../../../toolkit/components/places/tests/cpp/test_IHistory.cpp) | Running test_new_visit_adds_place_guid. TEST-INFO | (../../../../../../toolkit/components/places/tests/cpp/test_IHistory.cpp) | Running test_two_null_links_same_uri. TEST-INFO | (../../../../../../toolkit/components/places/tests/cpp/test_IHistory.cpp) | 240 of 240 tests passed /bin/sh: line 1: 31114 Segmentation fault XPCOM_DEBUG_BREAK=stack-and-abort ../../../../../dist/bin/run-mozilla.sh ../../../../../dist/bin/$f NEXT ERROR make[5]: *** [check] Error 139 make[5]: Leaving directory `/builds/slave/m-cen-lnx/build/obj-firefox/toolkit/components/places/tests/cpp' NEXT ERROR make[4]: *** [check] Error 2 make[4]: Leaving directory `/builds/slave/m-cen-lnx/build/obj-firefox/toolkit/components/places/tests' make[3]: *** [check] Error 2 make[3]: Leaving directory `/builds/slave/m-cen-lnx/build/obj-firefox/toolkit/components/places' make[2]: Leaving directory `/builds/slave/m-cen-lnx/build/obj-firefox/toolkit/components' make[2]: *** [check] Error 2 make[1]: *** [check] Error 2make[1]: Leaving directory `/builds/slave/m-cen-lnx/build/obj-firefox/toolkit' make: *** [check] Error 2 program finished with exit code 2 elapsedTime=212.060998
Summary: Intermittent segfault while running test_nsIHistory.cpp on desktop mobile builds → Intermittent segfault while running test_nsIHistory.cpp on Linux
probably the same as bug 672493, or better, probably same glib bug. we'll see when that patch lands.
Summary: Intermittent segfault while running test_nsIHistory.cpp on Linux → Intermittent segfault while running test_IHistory.cpp on Linux
(In reply to comment #5) > probably the same as bug 672493, or better, probably same glib bug. > we'll see when that patch lands. For what its worth, SeaMonkey Linux64 Opt has been frequently hitting this, and that bug looks like its only protecting against a debug case at a quick glance.
ignore comment 14 please, wrong bug.
http://tinderbox.mozilla.org/showlog.cgi?log=Mozilla-Inbound/1312450742.1312463476.7323.gz tinderbox.mozilla.org/showlog.cgi?log=Mozilla-Inbound/1312452008.1312465643.19461.gz tinderbox.mozilla.org/showlog.cgi?log=Mozilla-Inbound/1312429808.1312437099.15888.gz
tinderbox.mozilla.org/showlog.cgi?log=Mozilla-Inbound/1313094896.1313097201.29198.gz
This one's different! It's a bus error! http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1313157524.1313168265.2882.gz OS X 10.6.2 mozilla-central build on 2011/08/12 06:58:44 make[5]: Nothing to be done for `check'. TEST-INFO | (/builds/slave/m-cen-osx64/build/toolkit/components/places/tests/cpp/test_IHistory.cpp) | Disabling Idle Service. TEST-INFO | (/builds/slave/m-cen-osx64/build/toolkit/components/places/tests/cpp/test_IHistory.cpp) | Running test_set_places_enabled. TEST-INFO | (/builds/slave/m-cen-osx64/build/toolkit/components/places/tests/cpp/test_IHistory.cpp) | Running test_unvisted_does_not_notify_part1. TEST-INFO | (/builds/slave/m-cen-osx64/build/toolkit/components/places/tests/cpp/test_IHistory.cpp) | Running test_visited_notifies. TEST-INFO | (/builds/slave/m-cen-osx64/build/toolkit/components/places/tests/cpp/test_IHistory.cpp) | Running test_unvisted_does_not_notify_part2. TEST-INFO | (/builds/slave/m-cen-osx64/build/toolkit/components/places/tests/cpp/test_IHistory.cpp) | Running test_same_uri_notifies_both. TEST-INFO | (/builds/slave/m-cen-osx64/build/toolkit/components/places/tests/cpp/test_IHistory.cpp) | Running test_unregistered_visited_does_not_notify. TEST-INFO | (/builds/slave/m-cen-osx64/build/toolkit/components/places/tests/cpp/test_IHistory.cpp) | Running test_new_visit_notifies_waiting_Link. TEST-INFO | (/builds/slave/m-cen-osx64/build/toolkit/components/places/tests/cpp/test_IHistory.cpp) | Running test_RegisterVisitedCallback_returns_before_notifying. TEST-INFO | (/builds/slave/m-cen-osx64/build/toolkit/components/places/tests/cpp/test_IHistory.cpp) | Running test_observer_topic_dispatched. TEST-INFO | (/builds/slave/m-cen-osx64/build/toolkit/components/places/tests/cpp/test_IHistory.cpp) | Running test_visituri_inserts. TEST-INFO | (/builds/slave/m-cen-osx64/build/toolkit/components/places/tests/cpp/test_IHistory.cpp) | Running test_visituri_updates. TEST-INFO | (/builds/slave/m-cen-osx64/build/toolkit/components/places/tests/cpp/test_IHistory.cpp) | Running test_visituri_preserves_shown_and_typed. TEST-INFO | (/builds/slave/m-cen-osx64/build/toolkit/components/places/tests/cpp/test_IHistory.cpp) | Running test_visituri_creates_visit. TEST-INFO | (/builds/slave/m-cen-osx64/build/toolkit/components/places/tests/cpp/test_IHistory.cpp) | Running test_visituri_transition_typed. TEST-INFO | (/builds/slave/m-cen-osx64/build/toolkit/components/places/tests/cpp/test_IHistory.cpp) | Running test_visituri_transition_embed. TEST-INFO | (/builds/slave/m-cen-osx64/build/toolkit/components/places/tests/cpp/test_IHistory.cpp) | Running test_new_visit_adds_place_guid. TEST-INFO | (/builds/slave/m-cen-osx64/build/toolkit/components/places/tests/cpp/test_IHistory.cpp) | Running test_two_null_links_same_uri. TEST-INFO | (/builds/slave/m-cen-osx64/build/toolkit/components/places/tests/cpp/test_IHistory.cpp) | 246 of 246 tests passed /bin/sh: line 1: 79922 Bus error XPCOM_DEBUG_BREAK=stack-and-abort ../../../../../dist/bin/run-mozilla.sh ../../../../../dist/bin/$f make[5]: *** [check] Error 138 make[4]: *** [check] Error 2 make[3]: *** [check] Error 2 make[2]: *** [check] Error 2 make[1]: *** [check] Error 2 make: *** [check] Error 2 program finished with exit code 2 elapsedTime=946.137366 TinderboxPrint: check<br/>10574/0 Unknown Error: command finished with exit code: 2 === Output ended ===
OS: Linux → All
Summary: Intermittent segfault while running test_IHistory.cpp on Linux → Intermittent segfault while running test_IHistory.cpp on Linux, or bus error on OS X
I am trying to run the test under valgrind to see if there is some non deterministic behavior in it.
Attached patch demo patch (obsolete) (deleted) — Splinter Review
This patch fixes the problems valgrind was finding. Given the comment just after the memset I am inserting, it looks like the problem is xpc_UnmarkGrayObject being called before the first gc cycle. I will try to debug this a bit more, but have just been preempted by another bug.
Comment on attachment 555813 [details] [diff] [review] demo patch This makes the program "valgrind clean", but it is probably not the fix we want. Do you think it is a good idea to check it in for a day or so to see if there is some other problem that is not seem by valgrind?
Attachment #555813 - Flags: review?(ehsan)
I was unable to reproduce the problem on my desktop today. Will try it on one of the reference machines.
Attachment #555813 - Flags: review?(ehsan) → review?(jorendorff)
The interesting bits from valgrind: ==12911== Conditional jump or move depends on uninitialised value(s) ==12911== at 0x5FDAADF: xpc_UnmarkGrayObject(JSObject*) (xpcpublic.h:177) ==12911== by 0x63CE437: _ZL12FinishCreateR14XPCCallContextP21XPCWrappedNativeScopeP18XPCNativeInterfaceP14nsWrapperCacheP16XPCWrappedNativePS8_.clone.173 (xpcprivate.h:2469) ==12911== by 0x63D0FC2: XPCWrappedNative::GetNewOrUsed(XPCCallContext&, xpcObjectHelper&, XPCWrappedNativeScope*, XPCNativeInterface*, int, XPCWrappedNative**) (xpcwrappednative.cpp:610) ==12911== by 0x63B85B0: nsXPCComponents::AttachNewComponentsObject(XPCCallContext&, XPCWrappedNativeScope*, JSObject*) (xpccomponents.cpp:4370) ==12911== by 0x63AB3E6: nsXPConnect::InitClasses(JSContext*, JSObject*) (nsXPConnect.cpp:1014) ==12911== by 0x63C7A26: XPCJSContextStack::GetSafeJSContext(JSContext**) (xpcthreadcontext.cpp:290) ==12911== by 0x623CA4D: nsScriptSecurityManager::GetSafeJSContext() (nsScriptSecurityManager.cpp:326) .... ==12911== Uninitialised value was created by a heap allocation ==12911== at 0x4A05016: memalign (vg_replace_malloc.c:532) ==12911== by 0x4A0506F: posix_memalign (vg_replace_malloc.c:660) ==12911== by 0x63C49B9: XPConnectGCChunkAllocator::doAlloc() (xpcjsruntime.cpp:1469) ==12911== by 0x6945024: void* js::gc::RefillTypedFreeList<JSObject>(JSContext*, unsigned int) (jsgcchunk.h:63) ==12911== by 0x6A419AB: js::GlobalObject::create(JSContext*, js::Class*) (jsgcinlines.h:219) ==12911== by 0x68D35EB: JS_NewCompartmentAndGlobalObject (jsapi.cpp:3044) ==12911== by 0x63AD3BE: CreateNewCompartment(JSContext*, JSClass*, nsIPrincipal*, xpc::CompartmentPrivate*, JSObject**, JSCompartment**) (nsXPConnect.cpp:1052) ==12911== by 0x63AE0C2: xpc_CreateMTGlobalObject(JSContext*, JSClass*, nsISupports*, JSObject**, JSCompartment**) (nsXPConnect.cpp:1122) ==12911== by 0x63C79B0: XPCJSContextStack::GetSafeJSContext(JSContext**) (xpcthreadcontext.cpp:264) ==12911== by 0x623CA4D: nsScriptSecurityManager::GetSafeJSContext() (nsScriptSecurityManager.cpp:326)
Attached patch updated patch (obsolete) (deleted) — Splinter Review
I think this is actually the correct fix. If the bit is marked gray in the uninitiated memory, a call to xpc_UnmarkGrayObject will call xpc_UnmarkGrayObjectRecursive. I am not familiar with this part of the code, but even if the objects themselves are initialized and calling xpc_UnmarkGrayObjectRecursive is safe, it is probably more expensive than clearing the bitmap early.
Attachment #555813 - Attachment is obsolete: true
Attachment #556095 - Flags: review?(jorendorff)
Attachment #555813 - Flags: review?(jorendorff)
Attachment #556095 - Flags: review?(jorendorff) → review?(wmccloskey)
Comment on attachment 556095 [details] [diff] [review] updated patch Thanks, Rafael! This is a really scary bug. I'd like to fix it the other way though.
Attachment #556095 - Flags: review?(wmccloskey)
Attached patch alternate fix (obsolete) (deleted) — Splinter Review
This returns early from UnmarkGray if we haven't done the first GC yet.
Attachment #556134 - Flags: review?(continuation)
Comment on attachment 556134 [details] [diff] [review] alternate fix Actually, I realize this won't fix the valgrind warning. It's still going to see those bits as undefined. Maybe the other patch would be better. I'll try to measure the cost of zeroing out the mark bits while running V8, which allocates a lot of new chunks.
Attachment #556134 - Flags: review?(continuation)
Yeah, I was just going to say that. We probably don't want to do any fancy tests in GetWrapper() either. You don't need to do the zeroing after the GC has run once, so maybe you could guard the bitmap clearing with a check that the GC has run? I don't know the perf implications of any of it. Can we zero out the mark bits for an object when it is put into a wrapper cache? If that only happens once per wrapper cache maybe it isn't so terrible? I'm not quite sure of their lifetime...
Comment on attachment 556095 [details] [diff] [review] updated patch Review of attachment 556095 [details] [diff] [review]: ----------------------------------------------------------------- I have a few drive-by nits on the comments. Also, the comments should probably be // to match the rest of the file. Good work, though! Pretty scary that this has been like this. I wonder how many random crashes this will fix... ::: js/src/jsgc.cpp @@ +339,5 @@ > for (size_t i = 0; i != JS_ARRAY_LENGTH(markingDelay); ++i) > markingDelay[i].init(); > > + /* We clear the bitmap to guard againt xpc_UnmarkGrayObject being > + called before the first gc cycle. */ The clearing is guarding against xpc_IsGrayGCThing being called on uninitialized data, not to keep xpc_UnmarkGrayObject from being called. minor typo: against, not againt capitalize "GC". @@ +342,5 @@ > + /* We clear the bitmap to guard againt xpc_UnmarkGrayObject being > + called before the first gc cycle. */ > + bitmap.clear(); > + > + /* The rest of info fields is initailzied in PickChunk. */ is initailzied -> are initialized
Oops, sorry, I was confused about what file this is in. I think the two line comment should be formatted more like this: /* * foo... * bar... */ The format of the single line one is okay.
Comment on attachment 556095 [details] [diff] [review] updated patch This doesn't seem to cause any V8 regressions. I talked to Andrew about this. He realized that we have to worry about new chunks even after the first GC, since their mark bits will also be in an undefined state. So the patch I submitted wouldn't work at all. Given the difficulty of reasoning about this problem, I think this patch is the right way to go. r+ with Andrew's changes.
Attachment #556095 - Flags: review+
Assignee: nobody → respindola
Component: Places → JavaScript Engine
Product: Toolkit → Core
QA Contact: places → general
Version: unspecified → Trunk
With this patch in place, we might be able to tear out some of the checking-for-GC-before-CC stuff. Though now that we use the GC to recover from UnmarkGrey overflow there's probably less that can be actually removed.
http://tbpl.allizom.org/php/getParsedLog.php?id=6179378 (four pushes later on inbound, so apparently that wasn't actually it)
Weird. I tried running this test locally. Strangely, I don't get any valgrind warnings even without the patch. Instead I get this output: ... TEST-INFO | (/home/billm/mozilla/in0/toolkit/components/places/tests/cpp/test_IHistory.cpp) | Running test_new_visit_adds_place_guid. TEST-INFO | (/home/billm/mozilla/in0/toolkit/components/places/tests/cpp/test_IHistory.cpp) | Running test_two_null_links_same_uri. TEST-PASS | /home/billm/mozilla/in0/toolkit/components/places/tests/cpp/test_IHistory.cpp TEST-INFO | (/home/billm/mozilla/in0/toolkit/components/places/tests/cpp/test_IHistory.cpp) | 240 of 240 tests passed WARNING: nsExceptionService ignoring thread destruction after shutdown: file /home/billm/mozilla/in0/xpcom/base/nsExceptionService.cpp, line 197 WARNING: could not open zipfile for write: file /home/billm/mozilla/in0/startupcache/StartupCache.cpp, line 364 WARNING: An event was posted to a thread that will never run it (rejected): file /home/billm/mozilla/in0/xpcom/threads/nsThread.cpp, line 388 WARNING: An event was posted to a thread that will never run it (rejected): file /home/billm/mozilla/in0/xpcom/threads/nsThread.cpp, line 388 WARNING: An event was posted to a thread that will never run it (rejected): file /home/billm/mozilla/in0/xpcom/threads/nsThread.cpp, line 388 WARNING: NS_ENSURE_TRUE(asyncCloseWasCalled) failed: file /home/billm/mozilla/in0/storage/src/mozStorageConnection.cpp, line 822 WARNING: NS_ENSURE_TRUE(asyncCloseWasCalled) failed: file /home/billm/mozilla/in0/storage/src/mozStorageConnection.cpp, line 822 Finished running IHistory tests. Am I maybe running the test incorrectly? I just did a Linux 64 build and then said: make -C objdir-ff-dbg/toolkit/components/places/tests/cpp
I got different results in different systems, which I guess is because of the use before initialization problem was sensitive to the timing of the GC pass. I will try to reproduce this again tomorrow.
Status: NEW → ASSIGNED
The patch clearly did not fix this problem, but since the bug now has a landed patch associated with it, should we dupe this bug into a new one, use the new one for the orange, and turn this bug into something for just the landed patch?
(In reply to Andrew McCreight [:mccr8] from comment #201) > The patch clearly did not fix this problem, but since the bug now has a > landed patch associated with it, should we dupe this bug into a new one, use > the new one for the orange, and turn this bug into something for just the > landed patch? No, we can keep it open. The patch landed in this bug has not fixed it.
Right, but how do we track the patch that we did land? I may want to request that we put the patch in Aurora and Beta, but it seems bizarre to request that in this bug that is basically unrelated. And setting "fixed in 8" here or whatever doesn't make any sense either.
It's quite common for orange bugs to have multiple patches because people try different things. I don't like multiple patches per bug myself, but if you do want to get this landed on Aurora, I suggest you file a new bug, attach the desired patch for branch and request approval there.
Attached patch Patch to turn the segfault in to an assert (obsolete) (deleted) — Splinter Review
Not a fix, but this turns the the segmentation fault into a test_IHistory: ../../../dist/include/nsTArray.h:171: nsTArray_base::size_type nsTArray_base<Alloc>::Length() const [with Alloc = nsTArrayDefaultAllocator, nsTArray_base::size_type = unsigned int]: Assertion `this' failed.
I have just been preempted by another bug, but will try to get back to this one this week. The backtrace I got was #0 0x00237410 in __kernel_vsyscall () #1 0x0087bd26 in nanosleep () from /lib/libc.so.6 #2 0x0087bb4f in sleep () from /lib/libc.so.6 #3 0x080493e3 in crap_handler (signum=6) at /home/espindola/mozilla-central/toolkit/components/places/tests/cpp/test_IHistory.cpp:149 #4 <signal handler called> #5 0x00237410 in __kernel_vsyscall () #6 0x00984e61 in raise () from /lib/libpthread.so.0 #7 0x0115b869 in nsContentUtils::AddScriptBlocker () at /home/espindola/mozilla-central/content/base/src/nsContentUtils.cpp:4354 #8 0x01784a3b in nsAutoScriptBlocker (this=0x9978110, aURI=0x9ba35e0) at ../../../dist/include/nsContentUtils.h:1947 #9 mozilla::places::History::NotifyVisited (this=0x9978110, aURI=0x9ba35e0) at /home/espindola/mozilla-central/toolkit/components/places/History.cpp:1325 #10 0x01784c5a in mozilla::places::(anonymous namespace)::VisitedQuery::NotifyVisitedStatus (this=0x9978cf0) at /home/espindola/mozilla-central/toolkit/components/places/History.cpp:402 #11 0x01784d2d in mozilla::places::(anonymous namespace)::VisitedQuery::HandleCompletion (this=0x9978cf0, aReason=0) at /home/espindola/mozilla-central/toolkit/components/places/History.cpp:394 #12 0x016f33ee in mozilla::storage::(anonymous namespace)::CompletionNotifier::Run (this=0x99cbf40) at /home/espindola/mozilla-central/storage/src/mozStorageAsyncStatementExecution.cpp:179 #13 0x019004fc in nsThread::ProcessNextEvent (this=0x97edc98, mayWait=1, result=0xbfd7fc0c) at /home/espindola/mozilla-central/xpcom/threads/nsThread.cpp:631 #14 0x018cf4a4 in NS_ProcessNextEvent_P (thread=<value optimized out>, mayWait=1) at /home/espindola/mozilla-central/obj-i686-pc-linux-gnu/xpcom/build/nsThreadUtils.cpp:245 #15 0x01900821 in nsThread::Shutdown (this=0x9b5af48) at /home/espindola/mozilla-central/xpcom/threads/nsThread.cpp:494 #16 0x019042b2 in TimerThread::Shutdown (this=0x97ede10) at /home/espindola/mozilla-central/xpcom/threads/TimerThread.cpp:172 #17 0x019030c8 in nsTimerImpl::Shutdown () at /home/espindola/mozilla-central/xpcom/threads/nsTimerImpl.cpp:194 #18 0x018d38b2 in mozilla::ShutdownXPCOM (servMgr=0x0) at /home/espindola/mozilla-central/xpcom/build/nsXPComInit.cpp:612 #19 0x00609e99 in NS_ShutdownXPCOM (svcMgr=0x0) at /home/espindola/mozilla-central/xpcom/stub/nsXPComStub.cpp:167 #20 0x080496f5 in ScopedXPCOM::~ScopedXPCOM (this=0xbfd7fda8, __in_chrg=<value optimized out>) at ../../../../../dist/include/testing/TestHarness.h:204 #21 0x0804c6c3 in main (aArgc=Cannot access memory at address 0xb50 ) at /home/espindola/mozilla-central/toolkit/components/places/tests/cpp/places_test_harness_tail.h:117
Attachment #557538 - Attachment is obsolete: true
Blocks: 685007
Attached patch proposed patch (obsolete) (deleted) — Splinter Review
I noticed that a crash was happening when sBlockedScriptRunners was set to null in the following backtrace. #0 nsContentUtils::Shutdown () at /home/espindola/mozilla-central/content/base/src/nsContentUtils.cpp:1031 #1 0x0134fae6 in nsLayoutStatics::Shutdown () at /home/espindola/mozilla-central/layout/build/nsLayoutStatics.cpp:326 #2 0x0134e43b in LayoutShutdownObserver::Observe (this=0x81e3c80, aSubject=0x8064ecc, aTopic=0x209fde1 "xpcom-shutdown", someData=0x0) at /home/espindola/mozilla-central/layout/build/nsLayoutModule.cpp:350 #3 0x01c636ca in nsObserverList::NotifyObservers (this=0x823835c, aSubject=0x8064ecc, aTopic=0x209fde1 "xpcom-shutdown", someData=0x0) at /home/espindola/mozilla-central/xpcom/ds/nsObserverList.cpp:130 #4 0x01c63af8 in nsObserverService::NotifyObservers (this=0x8179088, aSubject=0x8064ecc, aTopic=0x209fde1 "xpcom-shutdown", someData=0x0) at /home/espindola/mozilla-central/xpcom/ds/nsObserverService.cpp:182 #5 0x01c5888e in mozilla::ShutdownXPCOM (servMgr=0x0) at /home/espindola/mozilla-central/xpcom/build/nsXPComInit.cpp:595 #6 0x008bbe99 in NS_ShutdownXPCOM (svcMgr=0x0) at /home/espindola/mozilla-central/xpcom/stub/nsXPComStub.cpp:167 #7 0x080496f5 in ScopedXPCOM::~ScopedXPCOM (this=0xbfffe298, __in_chrg=<value optimized out>) at ../../../../../dist/include/testing/TestHarness.h:204 #8 0x0804c6c3 in main (aArgc=0, aArgv=0x0) at /home/espindola/mozilla-central/toolkit/components/places/tests/cpp/places_test_harness_tail.h:117 and the nsIOService was passed NS_XPCOM_SHUTDOWN_OBSERVER_ID after that. The attached patch fixes the issue by having nsIOService react to NS_XPCOM_WILL_SHUTDOWN_OBSERVER_ID. Thanks to ehsan for explaining the xpcom message passing semantics.
Attachment #558902 - Flags: review?(bzbarsky)
Comment on attachment 558902 [details] [diff] [review] proposed patch I _think_ this should be ok, but I'd really like biesi or Jason to double-check it....
Attachment #558902 - Flags: review?(jduell.mcbugs)
Attachment #558902 - Flags: review?(cbiesinger)
Attachment #558902 - Flags: review?(bzbarsky)
Attachment #558902 - Flags: review+
Comment on attachment 558902 [details] [diff] [review] proposed patch Review of attachment 558902 [details] [diff] [review]: ----------------------------------------------------------------- I don't know this code at all, so I'm hesitant to review it. But I can learn it if biesi can't get around to it (would ehsan also be a fair reviewer?)
Comment on attachment 558902 [details] [diff] [review] proposed patch Here's my drive-by r+. Here's the logic of this patch. The "xpcom-will-shutdown" event should be used for the components which may use other components that go away during shutdown. The IO service is currently shutting down during "xpcom-shutdown", but its successful shutdown really depends on whether the layout services are available or not. Those services die during "xpcom-shutdown", which means that nsIOService should self destruct during "xpcom-will-shutdown".
Attachment #558902 - Flags: review+
Comment on attachment 558902 [details] [diff] [review] proposed patch Looks reasonable.
Attachment #558902 - Flags: review?(cbiesinger) → review+
(In reply to Rafael Ávila de Espíndola (:espindola) from comment #237) > #0 0x00237410 in __kernel_vsyscall () > #1 0x0087bd26 in nanosleep () from /lib/libc.so.6 > #2 0x0087bb4f in sleep () from /lib/libc.so.6 > #3 0x080493e3 in crap_handler (signum=6) at > /home/espindola/mozilla-central/toolkit/components/places/tests/cpp/ > test_IHistory.cpp:149 > #4 <signal handler called> > #5 0x00237410 in __kernel_vsyscall () > #6 0x00984e61 in raise () from /lib/libpthread.so.0 > #7 0x0115b869 in nsContentUtils::AddScriptBlocker () at > /home/espindola/mozilla-central/content/base/src/nsContentUtils.cpp:4354 > #8 0x01784a3b in nsAutoScriptBlocker (this=0x9978110, aURI=0x9ba35e0) at > ../../../dist/include/nsContentUtils.h:1947 > #9 mozilla::places::History::NotifyVisited (this=0x9978110, aURI=0x9ba35e0) > at /home/espindola/mozilla-central/toolkit/components/places/History.cpp:1325 > #10 0x01784c5a in mozilla::places::(anonymous > namespace)::VisitedQuery::NotifyVisitedStatus (this=0x9978cf0) > at > /home/espindola/mozilla-central/toolkit/components/places/History.cpp:402 > #11 0x01784d2d in mozilla::places::(anonymous > namespace)::VisitedQuery::HandleCompletion (this=0x9978cf0, aReason=0) > at > /home/espindola/mozilla-central/toolkit/components/places/History.cpp:394 > #12 0x016f33ee in mozilla::storage::(anonymous > namespace)::CompletionNotifier::Run (this=0x99cbf40) > at > /home/espindola/mozilla-central/storage/src/ > mozStorageAsyncStatementExecution.cpp:179 > #13 0x019004fc in nsThread::ProcessNextEvent (this=0x97edc98, mayWait=1, > result=0xbfd7fc0c) at > /home/espindola/mozilla-central/xpcom/threads/nsThread.cpp:631 > #14 0x018cf4a4 in NS_ProcessNextEvent_P (thread=<value optimized out>, > mayWait=1) at > /home/espindola/mozilla-central/obj-i686-pc-linux-gnu/xpcom/build/ > nsThreadUtils.cpp:245 > #15 0x01900821 in nsThread::Shutdown (this=0x9b5af48) at > /home/espindola/mozilla-central/xpcom/threads/nsThread.cpp:494 > #16 0x019042b2 in TimerThread::Shutdown (this=0x97ede10) at > /home/espindola/mozilla-central/xpcom/threads/TimerThread.cpp:172 > #17 0x019030c8 in nsTimerImpl::Shutdown () at > /home/espindola/mozilla-central/xpcom/threads/nsTimerImpl.cpp:194 > #18 0x018d38b2 in mozilla::ShutdownXPCOM (servMgr=0x0) at > /home/espindola/mozilla-central/xpcom/build/nsXPComInit.cpp:612 > #19 0x00609e99 in NS_ShutdownXPCOM (svcMgr=0x0) at > /home/espindola/mozilla-central/xpcom/stub/nsXPComStub.cpp:167 > #20 0x080496f5 in ScopedXPCOM::~ScopedXPCOM (this=0xbfd7fda8, > __in_chrg=<value optimized out>) at > ../../../../../dist/include/testing/TestHarness.h:204 > #21 0x0804c6c3 in main (aArgc=Cannot access memory at address 0xb50 > ) at > /home/espindola/mozilla-central/toolkit/components/places/tests/cpp/ > places_test_harness_tail.h:117 So, this backtrace is the sign of another problem. We try to shutdown the timer thread (and the rest of the threads) here: <http://mxr.mozilla.org/mozilla-central/source/xpcom/build/nsXPComInit.cpp#615>. At this point, "xpcom-shutdown" has been dispatched, and layout is dead. This causes us to lose. I think that the thread shutdown code should be moved to after "xpcom-will-shutdown" and before "xpcom-shutdown". CCing some people who may know this code better than me.
Attachment #558902 - Flags: review?(jduell.mcbugs) → checkin+
The XPCOM shutdown order is pretty precise, and I don't think we need to mess with it: https://wiki.mozilla.org/XPCOM_Shutdown In this case the places (or storage?) code should be watching xpcom-shutdown and doing all of its cleanup during that notification.
(In reply to Benjamin Smedberg [:bsmedberg] from comment #259) > The XPCOM shutdown order is pretty precise, and I don't think we need to > mess with it: https://wiki.mozilla.org/XPCOM_Shutdown > > In this case the places (or storage?) code should be watching xpcom-shutdown > and doing all of its cleanup during that notification. So you think that sBlockedScriptRunners should not be delete in xpcom-shutdown instead of moving code that depend on it existing to will-shutdown?
I'm saying that it shouldn't be doing it *after* xpcom-shutdown like it is now, and that messing with the thread shutdown sequence is probably not a good idea. Although I also think we could clean up sBlockedScriptRunners from nsLayoutStatics::Shutdown.
(In reply to Benjamin Smedberg [:bsmedberg] from comment #261) > I'm saying that it shouldn't be doing it *after* xpcom-shutdown like it is > now, and that messing with the thread shutdown sequence is probably not a > good idea. Although I also think we could clean up sBlockedScriptRunners > from nsLayoutStatics::Shutdown. Are you talking about the timer? Moving it to xpcom-shutdown would still have the same problem as we had with the IO service service, no? It can work or fail depending on the order xpcom-shutdown is delivered.
What timer? I'm talking about mozilla::places::History::NotifyVisited and mozilla::places::(anonymous namespace)::VisitedQuery::HandleCompletion. Presumably all those queries should be cancelled at shutdown. The order of xpcom-shutdown is LIFO with AddObserver calls. Normally the order in which they are registered is pretty deterministic based on module load, but this is not well documented and there may be hidden dependencies.
Marco: do you know about how places shuts down these jobs?
> What timer? The one in comment 237. See the TimerThread::Shutdown from nsXPComInit.cpp:612.. > The order of xpcom-shutdown is LIFO with AddObserver calls. Normally the > order in which they are registered is pretty deterministic based on module > load, but this is not well documented and there may be hidden dependencies. Interesting. Do we want to document it? If so my previous patch can be changed to just change the order of AddObserver. IMHO it is better to not depend on it.
I don't see any interesting timer in comment 237, just a thread-shutdown stack which is pumping the event loop and running an unrelated event (a places event in this case). We can certainly document the LIFO ordering of the observer service, since we depend on it. I'm not sure which observers you'd be reordering, though, since don't they depend on the order in which the module ctors run?
(In reply to Benjamin Smedberg [:bsmedberg] from comment #274) > I don't see any interesting timer in comment 237, just a thread-shutdown > stack which is pumping the event loop and running an unrelated event (a > places event in this case). The thread being shutdown from nsTimerImpl::Shutdown which is called from mozilla::ShutdownXPCOM. > We can certainly document the LIFO ordering of the observer service, since > we depend on it. I'm not sure which observers you'd be reordering, though, > since don't they depend on the order in which the module ctors run? If a reordering is done, it should make sure nsIOService gets the message before Layout. That is effective what the patch https://bug673017.bugzilla.mozilla.org/attachment.cgi?id=558902 did (but my using NS_XPCOM_WILL_SHUTDOWN_OBSERVER_ID).
Attached patch proposed patch (obsolete) (deleted) — Splinter Review
This patch implements Ehsan's proposed solution in comment 255. I was able to reproduce the crash. This time on a 64 bit bot. With the patch applied, I was able to run the test 1000 times with no crashes. A try run is at https://tbpl.mozilla.org/?tree=Try&usebuildbot=1&rev=b617c8a868d1
Attachment #560031 - Flags: review?(bzbarsky)
Comment on attachment 560031 [details] [diff] [review] proposed patch Absolutely not, this must occur after xpcom-shutdown-threads. Otherwise client code will try to use timers and cause very strange/unexpected errors.
Attachment #560031 - Flags: review?(bzbarsky) → review-
Attached patch proposed patch (obsolete) (deleted) — Splinter Review
The connection were never being closed and their threads never being shut down. These threads were the one responsible for the events after xpcom shutdown. I am no so sure about "calling" profile-change-teardown on ShutdownXPCOM, but I could not find a better place to put it. It also reverts 1d1143dde4bb which is not needed any more.
Attachment #558902 - Attachment is obsolete: true
Attachment #560031 - Attachment is obsolete: true
Attachment #560185 - Flags: review?(benjamin)
Comment on attachment 560185 [details] [diff] [review] proposed patch This is all fine except for the profile-change-teardown notification. That notification should only be sent if there is a profile selected, and it is sent during normal Firefox operation from here: http://mxr.mozilla.org/mozilla-central/source/toolkit/xre/nsXREDirProvider.cpp#776 called from here: http://mxr.mozilla.org/mozilla-central/source/toolkit/xre/nsAppRunner.cpp#1080 It's possible that this test is failing because xpcshell tests don't have a profile by default, and storage assumes that there is always a profile (and that therefore it only has to watch profile-change-teardown and not also one of the xpcom-shutdown observers).
Attachment #560185 - Flags: review?(benjamin) → review-
Attached patch New patch (obsolete) (deleted) — Splinter Review
This version changes nsNavHistory.cpp to react to NS_XPCOM_SHUTDOWN_OBSERVER_ID.
Attachment #556095 - Attachment is obsolete: true
Attachment #560185 - Attachment is obsolete: true
Attachment #560210 - Flags: review?(benjamin)
Attachment #560210 - Flags: review?(benjamin) → review+
Attached patch new new patch (obsolete) (deleted) — Splinter Review
The only difference from the previous one is + // Finalize all statements. + FinalizeInternalStatements(); That the debug builds found was missing.
Attachment #560210 - Attachment is obsolete: true
Attachment #560420 - Flags: review?(benjamin)
What component should this be in? It sounds like JS isn't appropriate any more.
(In reply to Andrew McCreight [:mccr8] from comment #294) > What component should this be in? It sounds like JS isn't appropriate any > more. Toolkit/Places maybe?
Component: JavaScript Engine → Places
Product: Core → Toolkit
QA Contact: general → places
An update on the problems I am having patching this: * We have to shutdown the threads used by the connections, or at least make sure that they are not producing new events. * To do that, I am trying to close the connections. * To do that, we need to finalize the sql statements. * It is not clear how to know which ones have to be finalized.
Comment on attachment 560420 [details] [diff] [review] new new patch This looks ok to me, but mak should review it also.
Attachment #560420 - Flags: review?(mak77)
Attachment #560420 - Flags: review?(benjamin)
Attachment #560420 - Flags: review+
Attached patch "final" patch (obsolete) (deleted) — Splinter Review
Try at https://tbpl.mozilla.org/?tree=Try&rev=2984fcdfc0b2 The differences from the previous ones are: * Add a way to get the services (Bookmark, Annotation, Favicon) if they exist. * Notify that we are about to close the connection, so that nsPlacesExpiration.js can finish its statements. I also added a bit of dead code removal. Let me know if you would like for that to be in another patch.
Attachment #560420 - Attachment is obsolete: true
Attachment #560662 - Flags: review?(dietrich)
Attachment #560662 - Flags: feedback?(benjamin)
Attachment #560420 - Flags: review?(mak77)
Attachment #560662 - Attachment is obsolete: true
Attachment #560662 - Flags: review?(dietrich)
Attachment #560662 - Flags: feedback?(benjamin)
The last version of this patch is at https://tbpl.mozilla.org/?tree=Try&rev=086453e7f478 but I think I have convinced myself that AsyncClose should not call Shutdown. That makes it not really async and causes the event loop to spin, which causes the callback to be called with AsyncClose still on the stack, which in turn can cause recursive generator invocations in javascript. What should probably happen is that the history code should be refactored to use the closed callback as a continuation.
A version with shutdown that passed xpcshell-tests locally is at https://tbpl.mozilla.org/?tree=Try&rev=1a1484638ae3 observing TOPIC_PLACES_CONNECTION_CLOSED is probably a hack, if that works I will try to use a lighter test. I will also try to audit the other calls to AsyncClose.
Places tries to stop using the database at profile-change-teardown, but clearly there may still be runnables or queries on the async thread, canceling them may be bad for a lot of reasons, just think about clear history on shutdown for example. I think the issue of possibly running after xpcom-shutdown is valid for any Storage consumer, not specific to Places (even if this may be visible here since we do lots of stuff for links coloring and visits, so the queue may be long). For this reason spinning the events loop in Places looks like just a workaround to something that may actually happen for any Storage consumer. And actually, may not that be true for anyone relying on xpcom-shutdown-threads and implicit nsThread::shutdown spinning and merge the thread there? Regarding the tests missing a profile (and thus profile notifications), that's a bug in the test, all Places tests ensure to call do_get_profile(), and xpcshell harness in such a case will take care to invoke the right profile shutdown notifications. Places is not supposed to work without a profile, so there's no point in supporting that special case. One interesting fact you found here, is that test_IHistory uses a separate cpp harness, that is indeed lacking profile creation and shutdown (and related notifications). Most likely fixing that harness would fix the randomness. I'dd suggest for now to take your changes to tests and the removal of TOPIC_PLACES_CONNECTION_CLOSING (looks unused atm), that are safe and good fixes, apart the test_browserhistory one that you filed a separate bug for. I'd like the harness in toolkit/components/places/tests/cpp/ to correctly implement something like do_get_profile() and fire the right notifications. The problem with Storage relying on thread shutdown to spin the events loop should most likely be moved and discusses in a Storage bug, Sdwilsh and Asuth have far better knowledge of the implications of changing that and this bug is too spammy to follow a discussion.
OK. I marked the test_browserhistory bug as a dependency of this one and created one just for the missing profile problem. I will split the bits that are unrelated to the missing profile or spinning so that they can go first and then open a forth bug about spinning (I don't do a full shutdown in the last version of the patch patch).
Attached patch the simple bits (obsolete) (deleted) — Splinter Review
Attachment #561069 - Flags: review?(mak77)
Comment on attachment 561069 [details] [diff] [review] the simple bits Review of attachment 561069 [details] [diff] [review]: ----------------------------------------------------------------- ::: netwerk/base/src/nsIOService.cpp @@ +969,4 @@ > } > } > } > + else if (!strcmp(topic, NS_XPCOM_SHUTDOWN_OBSERVER_ID)) { Hm is this undoing what Bug 673017 did? sounds like a bogus patch queue :) ::: services/sync/tests/unit/test_places_guid_downgrade.js @@ +159,4 @@ > do_check_eq(result.length, 1); > do_check_eq(result[0].id, tbid); > > + stmt.finalize(); nit: the newline before finalize() is not coherent with the others finalize() you added ::: toolkit/components/places/tests/head_common.js @@ +614,4 @@ > aCallback.apply(scope, args); > } > }); > + commit.finalize(); could you please fix the same in all places where this code is duplicated? see http://mxr.mozilla.org/mozilla-central/search?string=db.createAsyncStatement%28%22BEGIN%20EXCLUSIVE%22%29
Attachment #561069 - Flags: review?(mak77)
Attached patch new patch (obsolete) (deleted) — Splinter Review
The reversal of 1d1143dde4bb is intentional as that didn't fix this orange in the end. https://tbpl.mozilla.org/?tree=Try&rev=ca044377525a
Attachment #561069 - Attachment is obsolete: true
Attachment #561206 - Flags: review?(mak77)
Attachment #561223 - Flags: review?(mak77) → review+
Regarding bug 687726, while we wait for an answer and discussion results, if current patches won't fix this failure, we may evaluate a patch that spins the events loop in Places. But it should be a separate patch from the current ones, and I think it should spin the Storage async thread used by Places so that all queries and runnables on it are executed. This may be tricky though, since we can only spin the thread we are on, we may try to ->shutdown() it though (need to ensure how Storage reacts to such a thing though).
Whiteboard: [orange] → [orange] [more patches before this is fixed]
Comment on attachment 561223 [details] [diff] [review] new with the nsNavHistory.cpp bits that were missing [Checked in: Comment 319] https://hg.mozilla.org/mozilla-central/rev/02ce78afb984
most of the last reported failures are not this bug, "Expected true, got false" is not the segfault
Summary: Intermittent segfault while running test_IHistory.cpp on Linux, or bus error on OS X → Intermittent segfault while running test_IHistory.cpp on Linux, or bus error on OS X [***NOT: Expected true, got false***]
(In reply to Marco Bonardo [:mak] from comment #322) > most of the last reported failures are not this bug, "Expected true, got > false" is not the segfault This might be another Symptom of the same bug, but it is OK if we want to track it in another bug.
https://tbpl.mozilla.org/?tree=Try&usebuildbot=1&rev=b92894dab13b linux & linuxqt segv ================ make -C cpp check make[5]: Entering directory `/builds/slave/try-lnx/build/obj-firefox/toolkit/components/places/tests/cpp' TEST-INFO | (../../../../../../toolkit/components/places/tests/cpp/test_IHistory.cpp) | Running test_two_null_links_same_uri. TEST-INFO | (../../../../../../toolkit/components/places/tests/cpp/test_IHistory.cpp) | 247 of 247 tests passed /bin/sh: line 1: 27772 Segmentation fault XPCOM_DEBUG_BREAK=stack-and-abort /builds/slave/try-lnx/build/obj-firefox/dist/bin/run-mozilla.sh ../../../../../dist/bin/$f make[5]: *** [check] Error 139 make[5]: Leaving directory `/builds/slave/try-lnx/build/obj-firefox/toolkit/components/places/tests/cpp' make[4]: Leaving directory `/builds/slave/try-lnx/build/obj-firefox/toolkit/components/places/tests' make[4]: *** [check] Error 2
Daddy, is this where you keep your blog?
Hush, honey, Daddy's busy annoying people into inaction. https://tbpl.mozilla.org/php/getParsedLog.php?id=6980097&tree=Mozilla-Inbound
Summary: Intermittent segfault while running test_IHistory.cpp on Linux, or bus error on OS X [***NOT: Expected true, got false***] → Intermittent segfault while running test_IHistory.cpp on Linux, or bus error on OS X [***NOT: Expected true, got false***][*** [check] Error 139]
https://tbpl.mozilla.org/php/getParsedLog.php?id=7221641&tree=Mozilla-Inbound Not like we're gaining anything by having this [orange]/randomorange, other than misstars, might as well take them off.
No longer blocks: 438871
Whiteboard: [orange] [more patches before this is fixed] → [more patches before this is fixed]
according to some Try logging, this may improve the situation, we try to notifyvisited really late, at that point the service may have gone.
Attachment #573153 - Flags: review?(dietrich)
Attachment #573153 - Flags: review?(dietrich) → review+
let's see how this behaves. On the other side in the last days the failure has greatly reduces, likely some different distribution of work in the process. https://hg.mozilla.org/integration/mozilla-inbound/rev/c00340467919
Whiteboard: [more patches before this is fixed] → [please leave open, gathering failure stats]
Target Milestone: --- → mozilla11
Blocks: 654190
hasResult is false but the call succeeded isn't the most exciting assertion ever, but still... https://tbpl.mozilla.org/php/getParsedLog.php?id=7837063&tree=Mozilla-Inbound TEST-INFO | (../../../../../../toolkit/components/places/tests/cpp/test_IHistory.cpp) | Running test_same_uri_notifies_both. ###!!! ASSERTION: hasResult is false but the call succeeded?: 'hasResult', file ../../../../toolkit/components/places/nsNavHistory.cpp, line 587 UNKNOWN [/builds/slave/m-in-lnx-dbg/build/obj-firefox/dist/bin/libxul.so +0x00EA7BE9] UNKNOWN [/builds/slave/m-in-lnx-dbg/build/obj-firefox/dist/bin/libxul.so +0x00EAF9C4] UNKNOWN [../../../../../dist/bin/test_IHistory +0x00003A07] UNKNOWN [../../../../../dist/bin/test_IHistory +0x00004E20] UNKNOWN [../../../../../dist/bin/test_IHistory +0x0000211C] UNKNOWN [/builds/slave/m-in-lnx-dbg/build/obj-firefox/dist/bin/libxul.so +0x010FC2F8] UNKNOWN [../../../../../dist/bin/test_IHistory +0x00008B34] UNKNOWN [../../../../../dist/bin/test_IHistory +0x00006217] __libc_start_main+0x000000DC [/lib/libc.so.6 +0x00015DEC] ###!!! ASSERTION: hasResult is false but the call succeeded?: 'hasResult', file ../../../../toolkit/components/places/nsNavHistory.cpp, line 587 /bin/sh: line 1: 6076 Segmentation fault XPCOM_DEBUG_BREAK=stack-and-abort /builds/slave/m-in-lnx-dbg/build/obj-firefox/dist/bin/run-mozilla.sh ../../../../../dist/bin/$f
Attachment #573153 - Attachment description: check history → check history [Checked in: Comment 739]
Attachment #561223 - Attachment description: new with the nsNavHistory.cpp bits that were missing → new with the nsNavHistory.cpp bits that were missing [Checked in: Comment 319]
https://tbpl.mozilla.org/php/getParsedLog.php?id=8074556&tree=Try (Q: What's the last thing this bug wants to see? A. A bunch of pushes of mozilla-beta to try.)
Whiteboard: [please leave open, gathering failure stats] → [please leave open, gathering dust]
This is happening (quite frequently) on comm-central too: http://tinderbox.mozilla.org/showlog.cgi?log=ThunderbirdTrunk/1326219071.1326232266.13250.gz Linux comm-central build on 2012/01/10 10:11:11 http://tinderbox.mozilla.org/showlog.cgi?log=ThunderbirdTrunk/1325965928.1325970160.6313.gz Linux x86-64 comm-central build on 2012/01/07 11:52:08 + http://tinderbox.mozilla.org/showlog.cgi?log=SeaMonkey/1326237018.1326239351.28538.gz Linux x86-64 comm-central-trunk build on 2012/01/10 15:10:18
Severity: normal → major
Hardware: x86 → All
https://tbpl.mozilla.org/php/getParsedLog.php?id=8535052&tree=Firefox Linux x86-64 mozilla-central pgo-build on 2012-01-13 12:45:08 PST for push 790cd9bba7f5
https://tbpl.mozilla.org/php/getParsedLog.php?id=8628442&tree=Mozilla-Aurora TEST-INFO | (../../../../../../toolkit/components/places/tests/cpp/test_IHistory.cpp) | Running test_unvisited_does_not_notify_part1. WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80630003: file ../../../../toolkit/components/places/Database.cpp, line 823 WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80630003: file ../../../../toolkit/components/places/Database.cpp, line 823 WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80630003: file ../../../../toolkit/components/places/Database.cpp, line 418 WARNING: NS_ENSURE_TRUE(mDB) failed: file ../../../../toolkit/components/places/nsNavHistory.cpp, line 382 WARNING: NS_ENSURE_TRUE(serv) failed: file ../../../../toolkit/components/places/nsNavHistory.h, line 152 WARNING: NS_ENSURE_TRUE(navHistory) failed: file ../../../../toolkit/components/places/History.cpp, line 348 TEST-INFO | (../../../../../../toolkit/components/places/tests/cpp/test_IHistory.cpp) | Running test_visited_notifies. WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80630003: file ../../../../toolkit/components/places/Database.cpp, line 823 WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80630003: file ../../../../toolkit/components/places/Database.cpp, line 823 WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80630003: file ../../../../toolkit/components/places/Database.cpp, line 418 WARNING: NS_ENSURE_TRUE(mDB) failed: file ../../../../toolkit/components/places/nsNavHistory.cpp, line 382 ###!!! ASSERTION: You can't dereference a NULL nsCOMPtr with operator->().: 'mRawPtr != 0', file ../../../../../dist/include/nsCOMPtr.h, line 809 NS_DebugBreak+0x00000026 [/builds/slave/m-aurora-lnx-dbg/build/obj-firefox/dist/bin/libxpcom.so +0x0000203C] UNKNOWN [../../../../../dist/bin/test_IHistory +0x000025AA] UNKNOWN [../../../../../dist/bin/test_IHistory +0x00003A3B] UNKNOWN [../../../../../dist/bin/test_IHistory +0x00004F7F] UNKNOWN [../../../../../dist/bin/test_IHistory +0x0000211C] UNKNOWN [/builds/slave/m-aurora-lnx-dbg/build/obj-firefox/dist/bin/libxul.so +0x010F7C37] UNKNOWN [../../../../../dist/bin/test_IHistory +0x00008B2C] UNKNOWN [../../../../../dist/bin/test_IHistory +0x00006211] __libc_start_main+0x000000DC [/lib/libc.so.6 +0x00015DEC] ###!!! ASSERTION: You can't dereference a NULL nsCOMPtr with operator->().: 'mRawPtr != 0', file ../../../../../dist/include/nsCOMPtr.h, line 809 /bin/sh: line 1: 32579 Segmentation fault XPCOM_DEBUG_BREAK=stack-and-abort /builds/slave/m-aurora-lnx-dbg/build/obj-firefox/dist/bin/run-mozilla.sh ../../../../../dist/bin/$f
So, something tries to resolve link status too late in the shutdown process. Though I was unable to find the null ptr, would be cool if we'd have decent stacks, maybe we could convert the null nsCOMPtr assertion to a MOZ_ASSERT to get them.
filed bug 718999 to make those assertions fatal
https://tbpl.mozilla.org/php/getParsedLog.php?id=8900959&tree=Firefox Linux x86-64 mozilla-central build on 2012-01-28 00:32:44 PST for push dba0b31edfbf
https://tbpl.mozilla.org/php/getParsedLog.php?id=8919478&tree=Firefox Linux x86-64 mozilla-central build on 2012-01-29 02:23:44 PST for push f35a3eb44138
(In reply to Phil Ringnalda (:philor) from comment #602) > Hush, honey, Daddy's busy annoying people into inaction. You're not very good at this daddy. :(
Sorry daddy I confused inaction with action. Like like inflammible! Sorry about the house, too.
https://tbpl.mozilla.org/php/getParsedLog.php?id=9732423&tree=Mozilla-Beta Linux mozilla-beta build on 2012-02-29 15:10:44 PST for push 529498671c4c
https://tbpl.mozilla.org/php/getParsedLog.php?id=10673441&tree=Firefox Linux mozilla-central build on 2012-04-05 12:03:25 PDT for push b7adf1bde2f6
https://tbpl.mozilla.org/php/getParsedLog.php?id=11141681&tree=Firefox Linux QT mozilla-central build on 2012-04-23 18:11:54 PDT for push 21da3f655b30 slave: moz2-linux-slave09
https://tbpl.mozilla.org/php/getParsedLog.php?id=11587063&tree=Mozilla-Inbound You can't dereference a NULL nsCOMPtr with operator->(), but you already knew that.
This is that stack, FWIW: ###!!! ABORT: You can't dereference a NULL nsCOMPtr with operator->().: 'mRawPtr != 0', file ../../../../../dist/include/nsCOMPtr.h, line 809 libxpcom.so + 0x203c nsCOMPtr<nsINavHistoryService>::operator-> [nsCOMPtr.h:809] addURI [places_test_harness.h:193] test_visited_notifies [mozalloc.h:229] RunNextTest::Run [places_test_harness_tail.h:66] nsThread::ProcessNextEvent [nsThread.cpp:656] NS_ProcessNextEvent [nsThreadUtils.cpp:245] main [places_test_harness_tail.h:138]
(In reply to Ted Mielczarek [:ted] from comment #896) > So in that particular assertion it's barfing here: > http://mxr.mozilla.org/mozilla-central/source/toolkit/components/places/ > tests/cpp/places_test_harness.h#193 > because it's failing to get nsINavHistoryService. > > called via this test: > http://mxr.mozilla.org/mozilla-central/source/toolkit/components/places/ > tests/cpp/test_IHistory.cpp#185 Presumably because InitSchema is failing here: http://mxr.mozilla.org/mozilla-central/source/toolkit/components/places/Database.cpp#456 which causes nsNavHistory::Init to fail as well. Marco, do you know why this could happen?
(In reply to Ehsan Akhgari [:ehsan] from comment #897) > Presumably because InitSchema is failing here: > http://mxr.mozilla.org/mozilla-central/source/toolkit/components/places/ > Database.cpp#456 which causes nsNavHistory::Init to fail as well. > > Marco, do you know why this could happen? The failures in comment 893 and comment 894 are different from the previous ones, did something change yesterday? My previous assumption based on failures up to 892 was that some thread was running late, later than xpcom-shutdown, though these ones are quite the opposite.
Could be the spinning we do in bug 744294 that is changing a bit the rules here, the stack may be useful, will take a look at it.
Actually what fails is CreateBookmarkRoots, now if we'd have kept using NS_ENSURE_SUCCESS I'd have sort of a stack, but since we started using if (NS_FAILED(rv)) return rv; i have no idea where it bailed out.
though, the error is NS_ERROR_STORAGE_CONSTRAINT
The fact it started happening differently on Aurora too is puzzling, something must have changed in the tinderboxes or it's time sensitive. The only constraints that I may think of are: 1. we generate a duplicate guid. Should not happen this often, we use /dev/urandom on Linux to generate random bytes. 2. we try to insert 2 identical roots, though this may happen only if CreateBookmarkRoots runs twice on the same database, that can't happen, we'd bail out earlier with a warning in InitSchema when failing to create existing tables, but I don't see any other warnings here. I'll put back NS_ENSURE_SUCCESS and try to get a failure on try, to check which query is failing.
fwiw, I tried to reproduce this on try with some additional logging and error checking, but out of 100 runs I got no failures :(
https://tbpl.mozilla.org/php/getParsedLog.php?id=14669171&tree=Mozilla-Inbound Linux mozilla-inbound leak test build on 2012-08-24 05:50:43 PDT for push e048ac9eb279 slave: mv-moz2-linux-ix-slave04 TEST-INFO | (../../../../../../toolkit/components/places/tests/cpp/test_IHistory.cpp) | Running test_unvisited_does_not_notify_part1. WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80630003: file ../../../../toolkit/components/places/Database.cpp, line 818 WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80630003: file ../../../../toolkit/components/places/Database.cpp, line 818 WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80630003: file ../../../../toolkit/components/places/Database.cpp, line 410 WARNING: NS_ENSURE_TRUE(mDB) failed: file ../../../../toolkit/components/places/nsNavHistory.cpp, line 298 WARNING: NS_ENSURE_TRUE(serv) failed: file ../../../../toolkit/components/places/nsNavHistory.h, line 113 WARNING: NS_ENSURE_TRUE(navHistory) failed: file ../../../../toolkit/components/places/History.cpp, line 324 TEST-INFO | (../../../../../../toolkit/components/places/tests/cpp/test_IHistory.cpp) | Running test_visited_notifies. WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80630003: file ../../../../toolkit/components/places/Database.cpp, line 818 WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80630003: file ../../../../toolkit/components/places/Database.cpp, line 818 WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80630003: file ../../../../toolkit/components/places/Database.cpp, line 410 WARNING: NS_ENSURE_TRUE(mDB) failed: file ../../../../toolkit/components/places/nsNavHistory.cpp, line 298 ###!!! ABORT: You can't dereference a NULL nsCOMPtr with operator->().: 'mRawPtr != 0', file ../../../../../dist/include/nsCOMPtr.h, line 781 NS_DebugBreak+0x00000026 [/builds/slave/m-in-lnx-dbg/build/obj-firefox/dist/bin/libxpcom.so +0x0000203C] UNKNOWN [../../../../../dist/bin/test_IHistory +0x0000335C] UNKNOWN [../../../../../dist/bin/test_IHistory +0x000048D9] UNKNOWN [../../../../../dist/bin/test_IHistory +0x00005F93] UNKNOWN [../../../../../dist/bin/test_IHistory +0x00002D96] UNKNOWN [/builds/slave/m-in-lnx-dbg/build/obj-firefox/dist/bin/libxul.so +0x012E17C7] UNKNOWN [../../../../../dist/bin/test_IHistory +0x000132E0] UNKNOWN [../../../../../dist/bin/test_IHistory +0x00007245] __libc_start_main+0x000000DC [/lib/libc.so.6 +0x00015DEC] ###!!! ABORT: You can't dereference a NULL nsCOMPtr with operator->().: 'mRawPtr != 0', file ../../../../../dist/include/nsCOMPtr.h, line 781 /bin/sh: line 1: 30773 Segmentation fault XPCOM_DEBUG_BREAK=stack-and-abort /builds/slave/m-in-lnx-dbg/build/obj-firefox/dist/bin/run-mozilla.sh ../../../../../dist/bin/$f make[5]: *** [check] Error 139
Since I fixed bug 787176, we ought to be able to fix this test to generate a minidump and thus a usable stack. It should be as simple as copying this block of code: http://mxr.mozilla.org/mozilla-central/source/media/mtransport/test/mtransport_test_utils.h#54 to somewhere in here: http://mxr.mozilla.org/mozilla-central/source/toolkit/components/places/tests/cpp/places_test_harness_tail.h#84
Depends on: 787176
we don't see the segfault from September, closing.
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → WORKSFORME
Whiteboard: [please leave open, gathering dust]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: