721025 - Lots of crashes in GetJSContext from single-threaded runtime release assert on main thread, and about:jank addon

Oh, this is the single-threaded runtime release assert. If I read the buildids correctly, the crashes were before the landing of bug 675078 yesterday. Also a weird thing about these stacks is that the aborts are happening on the main thread. That would imply perhaps some funny business being played with threads? I wonder how this hasn't been hit until recently since the assert has been in for several months. I also see that this is all on the nightly-profiling branch; perhaps something is particular to this branch?

(no longer active)

Comment 4

•

13 years ago

I just got this crash: https://crash-stats.mozilla.com/report/index/bp-e1d430be-c981-41ab-99eb-111f42120127 Please let me know if I can provide any helpful information. Thanks!

Tim Taubert [:ttaubert] (inactive)

Comment 5

•

13 years ago

I crashed as well and would be happy to help! https://crash-stats.mozilla.com/report/index/bp-097fba0e-9442-4c36-be2c-573832120127

Axel Hecht

Comment 6

•

13 years ago

https://crash-stats.mozilla.com/report/index/bp-9c56aa00-4f82-44e8-a013-c21442120127, too, it seems.

Jeff Muizelaar [:jrmuizel]

Reporter

Comment 7

•

13 years ago

I think this crash is caused by about:jank/profiler. Has anyone who sees the crash not installed either of those addons?

Tim Taubert [:ttaubert] (inactive)

Comment 8

•

13 years ago

Uh, yeah I installed that some minutes before the crash.

Axel Hecht

Comment 9

•

13 years ago

Same same.

Luke Wagner [:luke]

Comment 10

•

13 years ago

Ah, clues! So does anyone know if the about:jank profiler does any tricks with threads or thread ids? Basically, JSRuntime stores PR_CurrentThread when it is created and asserts that is equal to PR_CurrentThread anytime JS_AbortIfWrongThread is called (e.g. in GetJSContext). There is also a little stupid dance where we temporarily change threads (in a single-threaded manner) during cycle collection in case that is relevant...

(no longer active)

Comment 11

•

13 years ago

(In reply to Tim Taubert [:ttaubert] from comment #8) > Uh, yeah I installed that some minutes before the crash. Same here.

Axel Hecht

Comment 12

•

13 years ago

Taras blogged about:jank.

(no longer active)

Comment 13

•

13 years ago

(In reply to Axel Hecht from comment #12) > Taras blogged about:jank. Jeff wrote about:jank! ;-)

Benoit Girard (:BenWa)

Comment 14

•

13 years ago

(In reply to Luke Wagner [:luke] from comment #10) > Ah, clues! So does anyone know if the about:jank profiler does any tricks > with threads or thread ids? Basically, JSRuntime stores PR_CurrentThread > when it is created and asserts that is equal to PR_CurrentThread anytime > JS_AbortIfWrongThread is called (e.g. in GetJSContext). There is also a > little stupid dance where we temporarily change threads (in a > single-threaded manner) during cycle collection in case that is relevant... We're building with frame pointers. While the extension is active we send a signal each 10ms to the main thread which will push a signal handler, perform a backtrace and resume.

Luke Wagner [:luke]

Comment 15

•

13 years ago

Does the signal handler possibly touch JS or XPConnect?

Benoit Girard (:BenWa)

Comment 16

•

13 years ago

That would be 'TableTicker::Tick'. The only thing we use from gecko is TimeStamps. There's no JS or XPConnect.

Jeff Muizelaar [:jrmuizel]

Reporter

Comment 17

•

13 years ago

So my best guess here is that perhaps something with TLS is getting screwed up.

Jeff Muizelaar [:jrmuizel]

Reporter

Comment 18

•

13 years ago

I was able to catch this in a debugger. mJSContext->runtime->ownerThread_ is equal to 0xc1ea12 when it should be equal to 0x100336220 (the value returned by PR_GetCurrentThread() and pthread_getspecific(261))

Josh Matthews [:jdm]

Comment 19

•

13 years ago

That would be caused by JSRuntime::clearOwnerThread: http://mxr.mozilla.org/mozilla-central/source/js/src/jsapi.cpp#891

Jeff Muizelaar [:jrmuizel]

Reporter

Comment 20

•

13 years ago

Also for this runtime suspendCount = 1, requestDepth = 0, I'm not sure if that's expected or not.

Jeff Muizelaar [:jrmuizel]

Reporter

Comment 21

•

13 years ago

(In reply to Josh Matthews [:jdm] from comment #19) > That would be caused by JSRuntime::clearOwnerThread: > http://mxr.mozilla.org/mozilla-central/source/js/src/jsapi.cpp#891 It looks like the likely caller of that is nsXPConnect::NotifyLeaveMainThread() which is called in one place by the Cyclecollector. Further, I don't see how we could avoid calling NotifyEnterMainThread() which should reset it to the proper value.

Luke Wagner [:luke]

Comment 22

•

13 years ago

Hmm, all hints seem to point toward something happening with cycle collection. One random idea is that perhaps a bug in NSPR is causing the cycle collector thread to get notified unexpectedly (so it would run a cycle collection concurrent with the main thread (bad!) and leave rt->ownerThread_ in the 'clear' state). First, I'd try running a debug build (which may catch things a lot earlier). Second, I'd put a printf (with flush, obviously) in JS_ClearRuntimeThread, JS_SetRuntimeThread, nsCycleCollector::BeginCollection, and the signal handler and see if there are any weird interleaving.

Jeff Muizelaar [:jrmuizel]

Reporter

Comment 23

•

13 years ago

(In reply to Luke Wagner [:luke] from comment #22) > First, I'd try running a debug build (which may catch things a lot earlier). > Second, I'd put a printf (with flush, obviously) in JS_ClearRuntimeThread, > JS_SetRuntimeThread, nsCycleCollector::BeginCollection, and the signal > handler and see if there are any weird interleaving. That sounds like a good idea. If someone can come up with a good way of reproducing this that would be great. I've only been able to see it after about a day of regular use in a browser.

Jeff Muizelaar [:jrmuizel]

Reporter

Comment 24

•

13 years ago

I was able to get a core dump of this assertion failing: JS_ASSERT(ownerThread_ == (void *)0xc1ea12); during nsXPConnect::NotifyEnterCycleCollectionThread () Here's some state when this happens. cycle collector thread - 0x117801000 main thread - 0x7fff70f6fcc0 rt->ownerThread - 0x7fff70f6fcc0 condition variable lock owner - 0x117801000 The main thread is not waiting in nsCycleCollectorRunner::Collect I don't yet have any theories as to what's going wrong.

Luke Wagner [:luke]

Comment 25

•

13 years ago

(In reply to Jeff Muizelaar [:jrmuizel] from comment #24) The main thread not waiting is a big red flag. This sounds like exactly what comment 22 para 1 is guessing. Perhaps there some bug involving signal handlers (or the backtrace facility) and NSPR's condition variables? Another question: I only see crash reports from OS X above. Does anyone know anyone using about:jank on windows and having/not-having problems?

Josh Matthews [:jdm]

Comment 26

•

13 years ago

One thing to note is that NSPR, and specifically PR_WaitCondVar (which is the underlying implementation of the CondVar class), does not protect against spurious wakeups or interrupted threads. CondVar::Wait actually returns an nsresult, so an NS_FAILED along with PR_GetError would be useful to see if NSPR is being fussy here.

Jeff Muizelaar [:jrmuizel]

Reporter

Comment 27

•

13 years ago

I added the following code: nsresult result = mRequest.Wait(); if (result != NS_OK) { printf("%x\n", result); assert(result == NS_OK); } and I still get the crash without hitting the assert

Luke Wagner [:luke]

Comment 28

•

13 years ago

Perhaps the bug is in Wait() ?

Josh Matthews [:jdm]

Comment 29

•

13 years ago

Wait is just a wrapper around PR_WaitCondVar, with some extra machinery for the deadlock detector in debug builds.

Luke Wagner [:luke]

Comment 30

•

13 years ago

Ok, then perhaps PR_WaitCondVar (on OS X) has a bug.

Scoobidiver (away)

Updated

•

13 years ago

Crash Signature: [@ CrashInJS] → [@ CrashInJS] [@ CrashInJS | XPCCallContext::GetJSContext] [@ CrashInJS | XPCCallContext::GetJSContext(JSContext**)]

Robert Kaiser

Comment 31

•

13 years ago

This is topcrash #3 in 11.0b3 now: https://crash-stats.mozilla.com/topcrasher/byversion/Firefox/11.0b3 Do we have any clue how to fix this?

tracking-firefox11: --- → ?

Robert Kaiser

Comment 32

•

13 years ago

(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #31) > This is topcrash #3 in 11.0b3 now: > https://crash-stats.mozilla.com/topcrasher/byversion/Firefox/11.0b3 > > Do we have any clue how to fix this? Sorry, it's actually bug 715757.

tracking-firefox11: ? → ---

Scoobidiver (away)

Updated

•

13 years ago

Crash Signature: [@ CrashInJS] [@ CrashInJS | XPCCallContext::GetJSContext] [@ CrashInJS | XPCCallContext::GetJSContext(JSContext**)] → [@ CrashInJS] [@ CrashInJS | XPCCallContext::GetJSContext ] [@ CrashInJS | XPCCallContext::GetJSContext(JSContext**) ]

Wayne Mery (:wsmwk)

Updated

•

9 years ago

Summary: Lots of crashes in GetJSContext → Lots of crashes in GetJSContext from single-threaded runtime release assert on main thread, and about:jank addon

Wayne Mery (:wsmwk)

Comment 33

•

8 years ago

This signature doesn't exist anymore https://crash-stats.mozilla.com/signature/?signature=CrashInJS&date=%3E%3D2016-06-14T04%3A38%3A06.000Z&date=%3C2016-12-14T04%3A38%3A06.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_sort=-date&page=1#reports

Status: NEW → RESOLVED

Closed: 8 years ago

Resolution: --- → INCOMPLETE