Closed
Bug 953435
Opened 11 years ago
Closed 10 years ago
Browser hang on Mac if an AfterProcessNextEvent callback tries to spin the event loop
Categories
(Core :: Widget: Cocoa, defect)
Tracking
()
RESOLVED
WORKSFORME
mozilla29
People
(Reporter: bzbarsky, Assigned: smaug)
References
(Depends on 1 open bug, Blocks 1 open bug)
Details
Attachments
(3 files, 2 obsolete files)
(deleted),
text/html
|
Details | |
(deleted),
patch
|
smichaud
:
review+
froydnj
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
Details | Diff | Splinter Review |
The attached testcase seems to completely freeze my browser on Mac: we never respond to clicks on the "OK" button of the alert. The only weird thing I see in here is that we're spinning the event loop from inside this stack: #61 0x000000010565565d in nsDOMMutationObserver::HandleMutations () at nsDOMMutationObserver.h:390 #62 0x00000001056457b4 in nsXPConnect::AfterProcessNextEvent (this=0x10060bbe0, aThread=0x100615520, aRecursionDepth=0, aEventWasProcessed=true) at nsXPConnect.cpp:1130 #63 0x0000000105645b84 in non-virtual thunk to nsXPConnect::AfterProcessNextEvent(nsIThreadInternal*, unsigned int, bool) () at nsXPConnect.cpp:1246 #64 0x0000000103b79379 in nsThread::ProcessNextEvent (this=0x100615520, mayWait=false, result=0x7fff5fbfcb13) at nsThread.cpp:651 #65 0x0000000103a8255b in NS_ProcessPendingEvents (thread=0x100615520, timeout=20) at nsThreadUtils.cpp:210 via the DOM mutation observer calling alert(). I'm not sure whether this is cross-platform or Mac-only... Olli, do you see the same issue on Linux?
Flags: needinfo?(bugs)
Assignee | ||
Comment 2•11 years ago
|
||
And works fine on Windows too.
Assignee | ||
Comment 3•11 years ago
|
||
Somewhat similar to Bug 397439?
Assignee | ||
Updated•11 years ago
|
Component: XPCOM → Widget: Cocoa
Reporter | ||
Updated•11 years ago
|
Flags: needinfo?(smichaud)
Assignee | ||
Comment 4•11 years ago
|
||
A possible patch coming, but can't test it since I don't have a (working) mac.
Assignee | ||
Comment 5•11 years ago
|
||
May or may not compile on OSX, and may or may not fix the issue :) GetIsProcessingEvents could be made thread-safe, if needed (by using atomic counter), but other similar things aren't thread-safe either there. https://tbpl.mozilla.org/?tree=Try&rev=5bb11e763dc0
Assignee | ||
Updated•11 years ago
|
Summary: Browser hang (at least on Mac) if an AfterProcessNextEvent callback tries to spin the event loop → Browser hang on Mac if an AfterProcessNextEvent callback tries to spin the event loop
Assignee | ||
Comment 6•11 years ago
|
||
And it doesn't compile.
Assignee | ||
Comment 7•11 years ago
|
||
Maybe this https://tbpl.mozilla.org/?tree=Try&rev=e07a3ef2a8be
Attachment #8351740 -
Attachment is obsolete: true
Reporter | ||
Comment 8•11 years ago
|
||
v2 fixes the bug for me once I fix the presumably-backwards test in the assert in GetIsProcessingEvents.
Assignee | ||
Comment 9•11 years ago
|
||
Oops, the assert in the patch is indeed wrong
Assignee | ||
Comment 10•11 years ago
|
||
The patch makes Cocoa appshell less fragile, by making it use generic thread API. isProcessingEvents is similar, but not quite the same to recursionDepth. (I think we should merge nsIThreadInternal to nsIThread)
Assignee: nobody → bugs
Attachment #8351743 -
Attachment is obsolete: true
Attachment #8354929 -
Flags: review?(smichaud)
Attachment #8354929 -
Flags: review?(nfroyd)
Assignee | ||
Comment 11•11 years ago
|
||
(And yes, isProcessingEvents is a bit odd, given that it isn't threadsafe, but for now it should be enough.)
Comment 12•11 years ago
|
||
Comment on attachment 8354929 [details] [diff] [review] +asserts fixed Review of attachment 8354929 [details] [diff] [review]: ----------------------------------------------------------------- r=me with the changes below. ::: xpcom/threads/nsIThread.idl @@ +83,5 @@ > */ > boolean processNextEvent(in boolean mayWait); > + > + /** > + * true if we're processing runnables or thread observers. Please add a blurb about only accessing this on the current thread. I see that nsThread::GetRecursionDepth, which is similar to this, MOZ_ASSERTs for mismatched threads, which seems a little more likely to catch errors. Please use MOZ_ASSERT instead of warning. (Bonus points for making this attribute [infallible], then.) ::: xpcom/threads/nsThread.h @@ +149,5 @@ > PRThread *mThread; > uint32_t mRunningEvent; // counter > uint32_t mStackSize; > > + uint32_t mIsProcessingEvents; Naming this mProcessingEvent for parallelism with mRunningEvent seems like a better choice.
Attachment #8354929 -
Flags: review?(nfroyd) → review+
Assignee | ||
Comment 13•11 years ago
|
||
nsThread::GetRecursionDepth does *not* assert. That is the reason why I removed assert from my patch and made it consistent with recursionDepth
Comment 14•11 years ago
|
||
(In reply to Olli Pettay [:smaug] from comment #13) > nsThread::GetRecursionDepth does *not* assert. That is the reason why I > removed > assert from my patch and made it consistent with recursionDepth Oh, whoops, you're right. I was either looking at an old version or was just confused. Carry on!
Comment 15•11 years ago
|
||
Comment on attachment 8354929 [details] [diff] [review] +asserts fixed This looks fine to me, but I'd like to test it. I should be able to do that later today or tomorrow.
Flags: needinfo?(smichaud)
Comment 16•11 years ago
|
||
Comment on attachment 8354929 [details] [diff] [review] +asserts fixed I just tested this in current trunk code, and everything went as expected. This bug's testcase "froze" Firefox without the patch, but didn't with the patch. Conceptually, using nsIThread::isProcessingEvents is exactly what I wanted to do from the beginning. But I convinced myself that the "recursion depth" is a good proxy. And indeed it seems to be, since this code is old, and we don't seem to have had much trouble with it since it landed. What I don't understand, though, is exactly how/why using nsIThread::isProcessingEvents can sometimes give a different result from using the "recursion depth". Can anyone here explain?
Attachment #8354929 -
Flags: review?(smichaud) → review+
Assignee | ||
Comment 17•11 years ago
|
||
recursionDepth value is decreased before calling the thread observers.
Assignee | ||
Comment 18•11 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/e176126d2b42
Comment 19•11 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/e176126d2b42
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla29
Assignee | ||
Comment 20•10 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/7e51a10dee23
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 22•10 years ago
|
||
this bug has originally showed about a 5.47% increase in tscroll on osx 10.6 and 10.8: http://graphs.mozilla.org/graph.html#tests=[[287,63,21]]&sel=none&displayrange=30&datatype=running The backout put us back to the original levels. This is seen in better detail on datazilla and two tests specifically benefit: tiled-fixed.html: https://datazilla.mozilla.org/?start=1390855933&stop=1391460733&product=Firefox&repository=Mozilla-Inbound&test=tscrollx&page=tiled-fixed.html&x86=false&x86_64=true&project=talos and tiled.html: https://datazilla.mozilla.org/?start=1390855933&stop=1391460733&product=Firefox&repository=Mozilla-Inbound&test=tscrollx&page=tiled.html&x86=false&x86_64=true&project=talos
Whiteboard: [talos_regression]
Comment 23•10 years ago
|
||
Do we want to track this as a talos regression? This was backed out and the original patch is a performance imporovement. I am just following up on all performance bugs that have now migrated to Aurora.
Flags: needinfo?(bugs)
Assignee | ||
Comment 24•10 years ago
|
||
We don't want to track the talos regression.
Flags: needinfo?(bugs)
Updated•10 years ago
|
Comment 26•10 years ago
|
||
Smaug, could you reland this bug's patch on trunk (or mozilla-inbound), now that I've landed a bandaid patch for bug 959281? It will ultimately need to be backed out again when a patch for bug 996848 lands (since that patch will make it redundant). We can look out for another talos regression -- but I suspect the bandaid patch from bug 959281 will prevent it from happening.
Flags: needinfo?(bugs)
Assignee | ||
Comment 27•10 years ago
|
||
Feel free to land, or I'll land tomorrow.
Comment 28•10 years ago
|
||
Go ahead and land it tomorrow. I'm at the end of my day and tired :-)
Assignee | ||
Comment 29•10 years ago
|
||
Try https://tbpl.mozilla.org/?tree=Try&rev=8aa0cdfd6ca9
Updated•10 years ago
|
Flags: needinfo?(bugs)
Assignee | ||
Comment 30•10 years ago
|
||
Do we still need some variant of this? I don't have access to any OSX devices? nsAppShell.mm seems to have changed quite a bit, and the patch certainly doesn't apply anymore.
Flags: needinfo?(smichaud)
Comment 31•10 years ago
|
||
> Do we still need some variant of this? As long as my patch for bug 996848 remains in the tree, we don't. So far that patch has done pretty well. But it's conceivable that some issue will arise that we can't fix (or can't fix quickly enough). In that case we'd probably want to back it out and fall back to a combination of my bandaid patch for bug 959281 and your patch for this bug.
Flags: needinfo?(smichaud)
Comment 32•10 years ago
|
||
Let's close this WORKSFORME for the time being.
Status: REOPENED → RESOLVED
Closed: 11 years ago → 10 years ago
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•