Closed
Bug 1013571
Opened 10 years ago
Closed 10 years ago
support PBackground on workers
Categories
(Core :: DOM: Workers, defect)
Tracking
()
RESOLVED
FIXED
mozilla33
People
(Reporter: bkelly, Assigned: bkelly)
References
Details
Attachments
(1 file, 25 obsolete files)
(deleted),
patch
|
bent.mozilla
:
review+
|
Details | Diff | Splinter Review |
In bug 940273 we'd like to use PBackground to implement the ServiceWorker cache interface between the child and parent processes. Currently PBackground is not available on worker threads. From talking with Ben Turner about this, it appears the main issue is that PBackground uses a thread-local to store some state. Since workers reuse and share existing threads, this thread-local must be cleaned up to avoid having IPC messages routed to the wrong worker.
Assignee | ||
Comment 1•10 years ago
|
||
It seems like RuntimeService::WorkerThread::SetWorker() may be a good place to teardown the previous PBackground child and setup the new PBackground child: http://dxr.mozilla.org/mozilla-central/source/dom/workers/RuntimeService.cpp#2377
Assignee | ||
Comment 2•10 years ago
|
||
Work-in-progress to allow the background child to be detached from the current thread. Ben, is there anything I need to do other than null out the thread local and let the registered destructor run?
Attachment #8426549 -
Flags: feedback?(bent.mozilla)
Assignee | ||
Comment 3•10 years ago
|
||
Work-in-progress to enable PBackground in workers. I'm running into a fun link error on windows at the moment, though, so I'm not sure if this actually works yet or not. It links on linux, but I don't have a convenient way to do adhoc testing on my headless linux VM. Here is a try build to see if it completely blows up: https://tbpl.mozilla.org/?tree=Try&rev=ccf1ecc1a7f1
Comment on attachment 8426549 [details] [diff] [review] Part 1: Support detaching PBackground from current thread. (v0) Review of attachment 8426549 [details] [diff] [review]: ----------------------------------------------------------------- I believe that this is the basic approach that should work, yes. However, it's unclear to me what happens when Close() is called on a channel when there are pending messages in the queue... This will need some testing because I expect that there will be problems.
Attachment #8426549 -
Flags: feedback?(bent.mozilla) → feedback+
Comment on attachment 8426554 [details] [diff] [review] Part 2: Enable PBackground in workers. (v0) Review of attachment 8426554 [details] [diff] [review]: ----------------------------------------------------------------- For workers I think that we shouldn't do a 'primer' pattern (like we do with the main thread of child processes) but instead explicitly wait for the actor to be connected before we run the worker's script. It should only be a slight startup delay but then the code that uses such actors can be much simpler by assuming that it always has already been connected.
Assignee | ||
Comment 6•10 years ago
|
||
Fix a worker shutdown crash by making ChildImpl use thread safe reference counting. I still need to verify if this works when closing a channel with pending messages.
Attachment #8426549 -
Attachment is obsolete: true
Assignee | ||
Comment 7•10 years ago
|
||
Delay worker startup until after PBackground actor has been created. This was necessary not just for the reasons discussed in comment 5, but also to avoid assertions about running non-worker runnables on the worker thread after start. Note, I currently have an extra bounce to the main thread here to make initialization work with the current assertions. See the big XXX comment in the patch for details. I hope to tweak the assertions around with an extra state variable to avoid this.
Attachment #8426554 -
Attachment is obsolete: true
Assignee | ||
Comment 8•10 years ago
|
||
> Note, I currently have an extra bounce to the main thread here to make
> initialization work with the current assertions. See the big XXX comment in
> the patch for details. I hope to tweak the assertions around with an extra
> state variable to avoid this.
Fix this by treating |mWorkerPrivate == null && PR_CurrentThread() == mThread| as legal in WorkerThread::Dispatch().
Attachment #8428269 -
Attachment is obsolete: true
Hrm, you shouldn't need to do any thread bouncing here I don't think...
Assignee | ||
Comment 10•10 years ago
|
||
So these are the current steps as I think I have them arranged: 1) RuntimeService::ScheduleWorker() schedules a WorkerThreadInitPBackgroundRunnable() to run on the selected WorkerThread. 2) WorkerThreadInitPBackgroundRunnable() calls BackgroundChild::GetOrCreateForCurrentThread() from the WorkerThread. A WorkerBackgroundChildCallback() is registered with the WorkerThread as the target. 3) ChildImpl::GetOrCreateForCurrentThread() calls OpenProtocolOnMainThread() from the WorkerThread. 4) The actor protocol gets opened on the main thread. 5) WorkerBackgroundChildCallback is called on the WorkerThread. 6) WorkerBackgroundChildCallback dispatches a WorkerThreadPrimaryRunnable to the WorkerThread. 7) WorkerThreadPrimaryRunnable executes WorkerPrivate::DoRunLoop() for the WorkerThread. The issue described in comment 7 was that I initially could not perform step (6) with hitting an assert. This was due to WorkerThread::Dispatch() expecting mWorkerPrivate to be set before any runnable is dispatched from its own thread. I worked around this with a main thread bounce. In attachment 8429382 [details] [diff] [review] I loosened this restriction a bit so that the main thread bounce is not needed. Does this all sound correct/ok?
Assignee | ||
Comment 11•10 years ago
|
||
Allow IPC DoWorkRunnable objects to be canceled. This is a requirement for runnables executed on active worker threads. Right now the Cancel() method is implemented in a somewhat lame manner. I prevent the Run() method from executing if cancel was previously called, but I don't terminate nested execution within the MessageLoop.
Assignee | ||
Comment 12•10 years ago
|
||
When ENABLE_TESTS is compiled in and pbackground.testing is true, then send PBackgroundTest messages at the beginning and end of worker threads. Right now this uses a fixed test string, but I plan to make it random or unique per-thread in order to help verify that message routing works correctly. Note that the test messages at the end of the worker thread do trigger these warnings on the console: ###!!! [Parent][MessageChannel] Error: Channel closing: too late to send/recv, messages will be lost On b2g these are generally deemed harmless. Is that the case here as well? I can drain the MessageChannel on worker close if necessary, but it would be nice to avoid the added complexity.
Assignee | ||
Comment 13•10 years ago
|
||
https://tbpl.mozilla.org/?tree=Try&rev=42c4242b07cb
Assignee | ||
Comment 14•10 years ago
|
||
Fix the build. Seems perhaps ./mach build binaries was not enough to catch that for some reason. https://tbpl.mozilla.org/?tree=Try&rev=fd772b7a06f8
Attachment #8429578 -
Attachment is obsolete: true
Assignee | ||
Comment 15•10 years ago
|
||
When detaching the PBackground thread local, explicitly flush the pending queue before closing the actor. I'm not sure this really buys us much, though, as there will still be a race between the flush and the Close().
Attachment #8428267 -
Attachment is obsolete: true
Assignee | ||
Comment 16•10 years ago
|
||
Attachment #8429382 -
Attachment is obsolete: true
Assignee | ||
Comment 17•10 years ago
|
||
Randomize PBackgroundTest constructor values to help catch any cross-posting of messages.
Attachment #8429582 -
Attachment is obsolete: true
Assignee | ||
Comment 18•10 years ago
|
||
Ben, do you have any thoughts on how to further test PBackground on workers? Currently I am re-using PBackgroundTest and simply sending a test message at worker startup and shutdown. Anything else you think I am missing before flagging for review?
Flags: needinfo?(bent.mozilla)
Assignee | ||
Comment 19•10 years ago
|
||
Hmm, this try build suggests there might be a problem in Win7 debug M2 with the patches: https://tbpl.mozilla.org/?tree=Try&rev=67265addfa29 An assertion failure in: /tests/dom/datastore/tests/test_sync_worker.html
Assignee | ||
Comment 20•10 years ago
|
||
That try also shows OSX 10.6 M5 crashing in DeadlockDetector::CheckAquisition() which was called under MessageChannel::OnMaybeDequeueOne().
Assignee | ||
Comment 21•10 years ago
|
||
I'm having a hard time seeing how this error could be caused by PBackground changes since its in PContentParent's destructor: https://tbpl.mozilla.org/php/getParsedLog.php?id=40567673&tree=Try&full=1#error2 Should the IToplevelProtocol LinkedList be protected by a mutex?
(In reply to Ben Kelly [:bkelly] from comment #21) > Should the IToplevelProtocol LinkedList be protected by a mutex? This sounds a lot like bug 976479 ...
Assignee | ||
Comment 23•10 years ago
|
||
(In reply to Ben Kelly [:bkelly] from comment #20) > That try also shows OSX 10.6 M5 crashing in > DeadlockDetector::CheckAquisition() which was called under > MessageChannel::OnMaybeDequeueOne(). This looks like bug 966972, so probably unrelated to these changes.
Assignee | ||
Updated•10 years ago
|
Assignee | ||
Comment 24•10 years ago
|
||
(In reply to Ben Kelly [:bkelly] from comment #21) > I'm having a hard time seeing how this error could be caused by PBackground > changes since its in PContentParent's destructor: > > > https://tbpl.mozilla.org/php/getParsedLog. > php?id=40567673&tree=Try&full=1#error2 > > Should the IToplevelProtocol LinkedList be protected by a mutex? As far as I can tell this is a case of a PContentParent being cycle collected before all of its dependent, open actors have been released. This seems unrelated to the changes here so I opened bug 1017639 to track that issue. Its very low frequency (hasn't reproduced after 20 pushes) but it would be worth fixing if we can.
Assignee | ||
Comment 25•10 years ago
|
||
Since try looks relatively green I'm going to go ahead and ask for review here.
Flags: needinfo?(bent.mozilla)
Assignee | ||
Updated•10 years ago
|
Attachment #8430088 -
Flags: review?(bent.mozilla)
Assignee | ||
Updated•10 years ago
|
Attachment #8430103 -
Flags: review?(bent.mozilla)
Assignee | ||
Updated•10 years ago
|
Attachment #8429674 -
Flags: review?(bent.mozilla)
Assignee | ||
Updated•10 years ago
|
Attachment #8430104 -
Flags: review?(bent.mozilla)
Comment on attachment 8430088 [details] [diff] [review] Part 1: Support detaching PBackground from current thread. (v2) Review of attachment 8430088 [details] [diff] [review]: ----------------------------------------------------------------- ::: ipc/glue/BackgroundImpl.cpp @@ +377,5 @@ > { > AssertIsOnMainThread(); > } > > + NS_INLINE_DECL_THREADSAFE_REFCOUNTING(ChildImpl) This shouldn't be necessary... How is this ever being touched on a different thread? @@ +1662,5 @@ > + auto threadLocalInfo = > + static_cast<ThreadLocalInfo*>(PR_GetThreadPrivate(sThreadLocalIndex)); > + if (threadLocalInfo) { > + if (threadLocalInfo->mActor) { > + threadLocalInfo->mActor->FlushPendingInterruptQueue(); Splitting patches like this is hard to follow...
Attachment #8430088 -
Flags: review?(bent.mozilla) → review-
Comment on attachment 8430088 [details] [diff] [review] Part 1: Support detaching PBackground from current thread. (v2) Review of attachment 8430088 [details] [diff] [review]: ----------------------------------------------------------------- ::: ipc/glue/BackgroundChild.h @@ +57,5 @@ > GetOrCreateForCurrentThread(nsIIPCBackgroundChildCreateCallback* aCallback); > > + // See above. > + static void > + DetachFromCurrentThread(); Also let's call this something like "CloseForCurrentThread"... Detaching implies something else.
Assignee | ||
Comment 28•10 years ago
|
||
(In reply to ben turner [:bent] (use the needinfo? flag!) from comment #26) > ::: ipc/glue/BackgroundImpl.cpp > @@ +377,5 @@ > > { > > AssertIsOnMainThread(); > > } > > > > + NS_INLINE_DECL_THREADSAFE_REFCOUNTING(ChildImpl) > > This shouldn't be necessary... How is this ever being touched on a different > thread? I got consistent crashes when clearing the thread local without this. I believe it was dying on this line: http://dxr.mozilla.org/mozilla-central/source/xpcom/glue/nsISupportsImpl.h#429
Comment on attachment 8430103 [details] [diff] [review] Part 2: Enable PBackground in workers. (v3) Review of attachment 8430103 [details] [diff] [review]: ----------------------------------------------------------------- ::: dom/workers/RuntimeService.cpp @@ +907,5 @@ > private: > WorkerPrivate* mWorkerPrivate; > }; > > +class WorkerThreadInitPBackgroundRunnable MOZ_FINAL : public nsRunnable Rather than splitting this process into two runnables I think it would be much simpler to have WorkerThreadPrimaryRunnable::Run() just use NS_ProcessNextEvent and spin a nested loop waiting for the actor to be created. Something like: WorkerThreadPrimaryRunnable::Run() { // ... bool done; bool success; nsCOMPtr<nsIIPCBackgroundChildCreateCallback> callback = new WorkerBackgroundChildCallback(&done, &success); if (!BackgroundChild::GetOrCreateForCurrentThread(callback)) { // Handle failure. } while (!done) { if (NS_WARN_IF(!NS_ProcessNextEvent(mThread))) { // Handle failure. } } if (!success) { // Handle failure. } // ... } Does that make sense? @@ +2683,5 @@ > > // It is no longer safe to touch mWorkerPrivate. > mWorkerPrivate = nullptr; > > + BackgroundChild::DetachFromCurrentThread(); Hrm, I think it might be safer to call this earlier... Too many messages may go to components that expect the runtime to exist. Maybe move this to right before we unregister the runtime from the sampler?
Attachment #8430103 -
Flags: review?(bent.mozilla) → review-
(In reply to Ben Kelly (PTO, back Mon June 9) [:bkelly] from comment #28) > I got consistent crashes when clearing the thread local without this. Hrm, I guess we'll need stacks.
Assignee | ||
Comment 31•10 years ago
|
||
(In reply to ben turner [:bent] (use the needinfo? flag!) from comment #30) > (In reply to Ben Kelly (PTO, back Mon June 9) [:bkelly] from comment #28) > > I got consistent crashes when clearing the thread local without this. > > Hrm, I guess we'll need stacks. I got this without the threadsafe refcounting on my mac closing a tab: Hit MOZ_CRASH(ChildImpl not thread-safe) at /Users/bkelly/devel/hg/mozilla-central/ipc/glue/BackgroundImpl.cpp:381 PR_SetThreadPrivate+0x000000A6 [/Users/bkelly/devel/hg/mozilla-central/obj-x86_64-apple-darwin13.2.0-debug/dist/NightlyDebug.app/Contents/MacOS/libnss3.dylib +0x002159A6] Hit MOZ_CRASH(ChildImpl not thread-safe) at /Users/bkelly/devel/hg/mozilla-central/ipc/glue/BackgroundImpl.cpp:381 Hit MOZ_CRASH(ChildImpl not thread-safe) at /Users/bkelly/devel/hg/mozilla-central/ipc/glue/BackgroundImpl.cpp:381 mozilla::ipc::BackgroundChild::DetachFromCurrentThread()+0x00000035 [/Users/bkelly/devel/hg/mozilla-central/obj-x86_64-apple-darwin13.2.0-debug/dist/NightlyDebug.app/Contents/MacOS/XUL +0x003FE495] PR_SetThreadPrivate+0x000000A6 [/Users/bkelly/devel/hg/mozilla-central/obj-x86_64-apple-darwin13.2.0-debug/dist/NightlyDebug.app/Contents/MacOS/libnss3.dylib +0x002159A6] PR_SetThreadPrivate+0x000000A6 [/Users/bkelly/devel/hg/mozilla-central/obj-x86_64-apple-darwin13.2.0-debug/dist/NightlyDebug.app/Contents/MacOS/libnss3.dylib +0x002159A6] (anonymous namespace)::WorkerThreadPrimaryRunnable::Run()+0x00000790 [/Users/bkelly/devel/hg/mozilla-central/obj-x86_64-apple-darwin13.2.0-debug/dist/NightlyDebug.app/Contents/MacOS/XUL +0x016930D0] mozilla::ipc::BackgroundChild::DetachFromCurrentThread()+0x00000035 [/Users/bkelly/devel/hg/mozilla-central/obj-x86_64-apple-darwin13.2.0-debug/dist/NightlyDebug.app/Contents/MacOS/XUL +0x003FE495] mozilla::ipc::BackgroundChild::DetachFromCurrentThread()+0x00000035 [/Users/bkelly/devel/hg/mozilla-central/obj-x86_64-apple-darwin13.2.0-debug/dist/NightlyDebug.app/Contents/MacOS/XUL +0x003FE495] nsThread::ProcessNextEvent(bool, bool*)+0x0000050B [/Users/bkelly/devel/hg/mozilla-central/obj-x86_64-apple-darwin13.2.0-debug/dist/NightlyDebug.app/Contents/MacOS/XUL +0x000B819B] (anonymous namespace)::WorkerThreadPrimaryRunnable::Run()+0x00000790 [/Users/bkelly/devel/hg/mozilla-central/obj-x86_64-apple-darwin13.2.0-debug/dist/NightlyDebug.app/Contents/MacOS/XUL +0x016930D0] (anonymous namespace)::WorkerThreadPrimaryRunnable::Run()+0x00000790 [/Users/bkelly/devel/hg/mozilla-central/obj-x86_64-apple-darwin13.2.0-debug/dist/NightlyDebug.app/Contents/MacOS/XUL +0x016930D0] NS_ProcessNextEvent(nsIThread*, bool)+0x00000033 [/Users/bkelly/devel/hg/mozilla-central/obj-x86_64-apple-darwin13.2.0-debug/dist/NightlyDebug.app/Contents/MacOS/XUL +0x00020293]
Assignee | ||
Comment 32•10 years ago
|
||
I think I understand why we encounter MOZ_CRASH() when using the non-threadsafe NS_INLINE_DECL_REFCOUNTING macro in ChildImpl. 1) GetOrCreateForCurrentThread() is called on the worker thread 2) CreateActorRunnable is dispatched to the main thread 3) ChildImpl is constructed on the main thread 4) ChildImpl's owning thread is set to the main thread 5) Any call to AddRef() or Release() from the worker thread will trigger MOZ_CRASH() via NS_ASSERT_OWNINGTHREAD() Right now we can trigger a ChildImpl::AddRef() just by calling GetForCurrentThread(). It seems we can avoid this by using threadLocalInfo->mActor.get() instead of just returning mActor. We then just have to handle the release side of things. It seems we must do that by dispatching a runnable to the main thread to destroy the actor. I'll see if I can get that working. To be honest, though, I'm not sure what the advantage of this is over just using thread safe reference counting.
(In reply to Ben Kelly [:bkelly] from comment #32) > Right now we can trigger a ChildImpl::AddRef() just by calling > GetForCurrentThread(). It seems we can avoid this by using > threadLocalInfo->mActor.get() instead of just returning mActor. That isn't possible, we'd be leaking all over. Returning a nsRefPtr like that will convert it to the raw pointer type and will not add a reference. > We then just have to handle the release side of things. It seems we must do > that by dispatching a runnable to the main thread to destroy the actor. Yes, from my read this should be correct. In general it's always best to create and destroy things on the same thread.
Assignee | ||
Comment 34•10 years ago
|
||
(In reply to ben turner [:bent] (use the needinfo? flag!) from comment #33) > (In reply to Ben Kelly [:bkelly] from comment #32) > > Right now we can trigger a ChildImpl::AddRef() just by calling > > GetForCurrentThread(). It seems we can avoid this by using > > threadLocalInfo->mActor.get() instead of just returning mActor. > > That isn't possible, we'd be leaking all over. Returning a nsRefPtr like > that will convert it to the raw pointer type and will not add a reference. It seems that the conditional operator causes a temporary nsRefPtr object to be created here. If I change the code to use an if-statement I can avoid the temporary. Alternatively, using .get() avoids the AddRef() call.
(In reply to Ben Kelly [:bkelly] from comment #34) > It seems that the conditional operator causes a temporary nsRefPtr object to > be created here. Oh, ick. That's subtle and horrible. > If I change the code to use an if-statement I can avoid > the temporary. Yes, this one please.
Assignee | ||
Comment 36•10 years ago
|
||
Updated part 1 that reverts back to using non-threadsafe refcounts for ChildImpl. Ben suggested the code to run the ChildImpl::Release on the main thread. I'll address the remaining review feedback and submit a combined patch for review tomorrow.
Attachment #8430088 -
Attachment is obsolete: true
Assignee | ||
Comment 37•10 years ago
|
||
This addresses previous review feedback and combines the previous parts into a single patch. https://tbpl.mozilla.org/?tree=Try&rev=fa1d4d2ff3ad
Attachment #8429674 -
Attachment is obsolete: true
Attachment #8430103 -
Attachment is obsolete: true
Attachment #8430104 -
Attachment is obsolete: true
Attachment #8437232 -
Attachment is obsolete: true
Attachment #8429674 -
Flags: review?(bent.mozilla)
Attachment #8430104 -
Flags: review?(bent.mozilla)
Attachment #8437955 -
Flags: review?(bent.mozilla)
Assignee | ||
Comment 38•10 years ago
|
||
It occurs to me I could possibly address the XXX comment in MessagePump.cpp by setting |loop->SetNestableTasksAllowed(false)| in DoWorkRunnable::Cancel(). That's assuming I can get the right loop without running on the thread.
Assignee | ||
Comment 39•10 years ago
|
||
Updated with a few things I was thinking about last night: 1) Assert that ChildImpl::ActorDestroy() has been called in the thread local destructor in addition to ~ChildImpl. This avoids racing with the main thread. 2) Don't MOZ_CRASH if we fail to create the PBackground actor. Instead return NS_ERROR_FAILURE from the PrimaryWorkerRunnable::Run() similar to what is down if the context can't be created.
Attachment #8437955 -
Attachment is obsolete: true
Attachment #8437955 -
Flags: review?(bent.mozilla)
Attachment #8438678 -
Flags: review?(bent.mozilla)
Comment on attachment 8438678 [details] [diff] [review] Support PBackground on workers. (merged v1) Review of attachment 8438678 [details] [diff] [review]: ----------------------------------------------------------------- This looks really good! The big thing I think that still needs to be worked on is the lengthy comment in BackgroundImpl.cpp ::: dom/workers/RuntimeService.cpp @@ +62,5 @@ > #include "SharedWorker.h" > #include "WorkerPrivate.h" > #include "WorkerRunnable.h" > +#include "nsIIPCBackgroundChildCreateCallback.h" > +#include "BackgroundChild.h" Nit: Please alphabetize these @@ +917,5 @@ > > +class WorkerBackgroundChildCallback MOZ_FINAL : > + public nsIIPCBackgroundChildCreateCallback > +{ > + bool& mDone; Nit: I prefer pointers to refs in cases like this because it makes it obvious at the call site that you're modifying the value (e.g. |new WorkerBackgroundChildCallback(&done)|) @@ +1094,5 @@ > + { > + using mozilla::ipc::BackgroundChild; > + using mozilla::ipc::PBackgroundChild; > + if (gTestPBackground) { > + // Randomize value to validate workers are not cross-posting messages. Maybe also append the value of PR_GetCurrentThread()? @@ +1096,5 @@ > + using mozilla::ipc::PBackgroundChild; > + if (gTestPBackground) { > + // Randomize value to validate workers are not cross-posting messages. > + uint32_t testValue; > + PR_GetRandomNoise(&testValue, sizeof(testValue)); Might as well MOZ_RELEASE_ASSERT that PR_GetRandomNoise returns the correct size. @@ -1537,5 @@ > return false; > } > > -#ifdef DEBUG > - thread->SetAcceptingNonWorkerRunnables(false); I'd actually prefer that we leave this here, and then set true (and then false again) in WorkerThreadPrimaryRunnable::Run() before and after calling SynchronouslyCreatePBackground(). That will reduce the amount of time that something could slip in without triggering the assertions. @@ +2564,5 @@ > mWorkerPrivate->AssertIsOnWorkerThread(); > + > + // If the PBackground child is not created yet, then we must permit > + // blocking event processing to support SynchronouslyCreatePBackground(). > + MOZ_ASSERT(!aMayWait || !BackgroundChild::GetForCurrentThread()); Hrm, if we're waiting on PBackground then we shouldn't call into mWorkerPrivate here since it's not yet set up. That's probably fine today but I don't want that to change in the future. How about we change this to: if (aMayWait) { MOZ_ASSERT(aRecursionDepth == 2); MOZ_ASSERT(!BackgroundChild::GetForCurrentThread()); return NS_OK; } @@ +2640,5 @@ > + if (NS_WARN_IF(NS_FAILED(rv))) { > + // XXX need to fire an error at parent. > + return rv; > + } > +#ifdef ENABLE_TESTS Nit: Please add an extra line before #ifdef @@ +2705,5 @@ > > +#ifdef ENABLE_TESTS > + mThread->TestPBackground(); > +#endif > + BackgroundChild::CloseForCurrentThread(); I think we should do this sooner... Can it be moved to right after the block with DoRunLoop() but before we unregister from the profiler? @@ +2729,5 @@ > + bool done = false; > + nsCOMPtr<nsIIPCBackgroundChildCreateCallback> callback = > + new WorkerBackgroundChildCallback(done); > + > + if (!BackgroundChild::GetOrCreateForCurrentThread(callback)) { NS_WARN_IF here please, and in the other spots below where you return early due to failures. @@ +2734,5 @@ > + return NS_ERROR_FAILURE; > + } > + > + while (!done) { > + if (!NS_ProcessNextEvent(mThread)) { Nit: Since this matters earlier it would be clearer if this was: NS_ProcessNextEvent(mThread, true /* aMayWait */); ::: ipc/glue/BackgroundImpl.cpp @@ +1612,5 @@ > static_cast<ThreadLocalInfo*>(PR_GetThreadPrivate(sThreadLocalIndex)); > > + if (threadLocalInfo) { > + return threadLocalInfo->mActor; > + } else { Nit: No need for else after a return. @@ +1678,5 @@ > > return true; > } > > +/* static */ Uber-nit: |// static| to match the others. @@ +1687,5 @@ > + "BackgroundChild::Startup() was never called!"); > + auto threadLocalInfo = > + static_cast<ThreadLocalInfo*>(PR_GetThreadPrivate(sThreadLocalIndex)); > + if (threadLocalInfo) { > + if (threadLocalInfo->mActor) { I'm a little concerned about this... If we have a |threadLocalInfo| then someone called |GetOrCreateForCurrentThread()|. If we have an actor already then everything is fine. If we don't have an actor then the actor is being created somewhere and it just hasn't been assigned yet. In this case we're not going to do anything here which seems wrong. We're basically allowing a race condition to cause different behavior. I think we could do two things here: 1. MOZ_CRASH if we get into this state. 2. Set some flags and add extra code so that we close the actor once it finally gets created. What do you think? I would prefer not to allow a racy API like this if it's not too difficult... ::: ipc/glue/MessagePump.cpp @@ +220,5 @@ > > NS_IMETHODIMP > DoWorkRunnable::Run() > { > + if (mCanceled) { I actually think we can't bail out here... There's a 1:1 relationship between DoWorkRunnable and tasks in the chromium queue so if we skip one we'll leave something else unprocessed. For now just have Cancel() call Run() directly and we can revisit if something breaks?
Attachment #8438678 -
Flags: review?(bent.mozilla) → review-
Assignee | ||
Comment 41•10 years ago
|
||
> @@ +1687,5 @@ > I think we could do two things here: > > 1. MOZ_CRASH if we get into this state. > 2. Set some flags and add extra code so that we close the actor once it > finally gets created. I chose to MOZ_CRASH() for a few reasons: 1) This could also be a double close. 2) Lets not add complexity unless we find we need it. 3) I don't want to add something I can't test and writing good, racy test cases can be hard without a real use case that triggers it. New try: https://tbpl.mozilla.org/?tree=Try&rev=fca7c291b8cf
Attachment #8438678 -
Attachment is obsolete: true
Attachment #8442596 -
Flags: review?(bent.mozilla)
Assignee | ||
Comment 42•10 years ago
|
||
Also, I assumed if the runnable Cancel() is called, then I can assume Run() won't be called. Is that correct? Or do I still need the flag to short-circuit Run() to avoid double execution?
(In reply to Ben Kelly [:bkelly] from comment #42) > Also, I assumed if the runnable Cancel() is called, then I can assume Run() > won't be called. Is that correct? This is true for the worker run loop but I can't say for other things. > Or do I still need the flag to short-circuit Run() to avoid double execution? I would do that to be safe. Maybe assert too.
Assignee | ||
Comment 44•10 years ago
|
||
Sorry, realized after posting the previous patch I needed to fix the CloseForCurrentThread() description in BackgroundChild.h. It now indicates that the call may only be made once and only after the actor has been created. Also fix another case of whitespace missing around an #ifdef block.
Attachment #8442596 -
Attachment is obsolete: true
Attachment #8442596 -
Flags: review?(bent.mozilla)
Attachment #8442601 -
Flags: review?(bent.mozilla)
Assignee | ||
Comment 45•10 years ago
|
||
Sigh. Uploaded wrong patch.
Attachment #8442601 -
Attachment is obsolete: true
Attachment #8442601 -
Flags: review?(bent.mozilla)
Attachment #8442602 -
Flags: review?(bent.mozilla)
Assignee | ||
Comment 46•10 years ago
|
||
Hmm, try is unhappy. It seems the modified code is allowing the callback from the main thread to occur while SetAcceptingNonWorkerRunnables(false) for some reason. Investigating.
Assignee | ||
Comment 47•10 years ago
|
||
So it turns out at least some, if not all, the try errors were fallout from implementing this: > > > > -#ifdef DEBUG > > - thread->SetAcceptingNonWorkerRunnables(false); > > I'd actually prefer that we leave this here, and then set true (and then > false again) in WorkerThreadPrimaryRunnable::Run() before and after calling > SynchronouslyCreatePBackground(). > > That will reduce the amount of time that something could slip in without > triggering the assertions. The problem is that this creates a race between setting SetAcceptingNonWorkerRunnables(false) in ScheduleWorker() and setting it true on the worker thread. Usually it works, but every now and then we hit this condition: 1) Main thread calls thread->Dispatch() from ScheduleWorker() 2) Worker thread wakes up 3) Worker thread runs PrimaryWorkerRunnable 4) Worker thread calls SetAcceptingNonWorkerRunnables(true) 5) Worker thread blocks waiting for actor callback 6) Main thread wakes up 7) Main thread calls SetAcceptingNonWorkerRunnables(false) 8) Main thread creates actor 9) Main thread calls thread->Dispatch() for callback At that point we hit the assert because mAcceptingNonWorkerRunnables is false. I think the correct thing here is to make the final dispatch from the main thread in ScheduleWorker atomic with setting mAcceptingNonWorkerRunnable false. This patch does this by adding a DispatchAndStopAcceptingNonWorkerRunnables() method. This performs the Dispatch() and sets the boolean with a lock. Note, I had to switch to using a ReentrantMonitor because Dispatch() immediately tries to lock again by calling IsAcceptingNonWorkerRunnables(). I attempted to make lock recursion only allowed in IsAcceptingNonWorkerRunnables() by checking mMonitor.AssertNotCurrentThreadIn() in the other methods. It looks like this is not actually implemented yet, though. I wrote bug 1027772 to fix that. Anyway, what do you think of this solution?
Attachment #8442602 -
Attachment is obsolete: true
Attachment #8442602 -
Flags: review?(bent.mozilla)
Assignee | ||
Comment 48•10 years ago
|
||
This fixed the errors locally for me. Lets see how the new try does: https://tbpl.mozilla.org/?tree=Try&rev=a4180b4248a5
(In reply to Ben Kelly [:bkelly] from comment #47) > Anyway, what do you think of this solution? This sounds like a lot of work just to tighten that assertion... Sorry to send you down that rabbit hole, but I think we can just live with the old assertion behavior.
Assignee | ||
Comment 50•10 years ago
|
||
No problem. That's what I was thinking too. :-)
Assignee | ||
Comment 51•10 years ago
|
||
I believe this addresses all the review feedback to-date. Also passed on past try runs, I expect this to be green: https://tbpl.mozilla.org/?tree=Try&rev=1ac3e93dcf0e
Attachment #8443002 -
Attachment is obsolete: true
Attachment #8443214 -
Flags: review?(bent.mozilla)
Assignee | ||
Updated•10 years ago
|
Assignee | ||
Comment 52•10 years ago
|
||
Using the debugger Ben confirmed that early delivery of worker messages is still safe with this patch because the WorkerPrivate is still in a pending state while spinning the event loop in SynchronouslyCreatePBackground(). This causes those early worker messages to be queued separately until its state moves to running. This patch update adds asserts to SynchronouslyCreatePBackground() to verify that the worker private is indeed in its pending state. https://tbpl.mozilla.org/?tree=Try&rev=f8affed3ca5d
Attachment #8443214 -
Attachment is obsolete: true
Attachment #8443214 -
Flags: review?(bent.mozilla)
Attachment #8447268 -
Flags: review?(bent.mozilla)
Assignee | ||
Comment 53•10 years ago
|
||
Talking with Ben, it sounds like we can guarantee this safely without the assert. So remove the assert and just add a comment instead.
Attachment #8447268 -
Attachment is obsolete: true
Attachment #8447268 -
Flags: review?(bent.mozilla)
Attachment #8447315 -
Flags: review?(bent.mozilla)
Comment on attachment 8447315 [details] [diff] [review] Support PBackground on workers. (merged v8) Review of attachment 8447315 [details] [diff] [review]: ----------------------------------------------------------------- This looks great, thanks! ::: dom/workers/RuntimeService.cpp @@ +1095,5 @@ > + void > + TestPBackground() > + { > + using mozilla::ipc::BackgroundChild; > + using mozilla::ipc::PBackgroundChild; Nit: using namespace mozilla::ipc; @@ +1103,5 @@ > + PRSize randomSize = PR_GetRandomNoise(&testValue, sizeof(testValue)); > + MOZ_RELEASE_ASSERT(randomSize == sizeof(testValue)); > + nsCString testStr; > + testStr.AppendInt(testValue); > + testStr.AppendInt(reinterpret_cast<uint64_t>(PR_GetCurrentThread())); Nit: intptr_t here instead of uint64_t @@ +2735,5 @@ > return NS_OK; > } > > +nsresult > +WorkerThreadPrimaryRunnable::SynchronouslyCreatePBackground() Please assert here that BackgroundChild::GetForCurrentThread returns null. ::: ipc/glue/BackgroundImpl.cpp @@ +2009,5 @@ > AssertIsOnBoundThread(); > > BackgroundChildImpl::ActorDestroy(aWhy); > + > + mActorDestroyed = true; Nit: Let's move this to before you call BackgroundChildImpl::ActorDestroy. ::: ipc/glue/MessagePump.cpp @@ +264,5 @@ > + // here. If we don't process all of these then we will leave something > + // unprocessed in the chromium queue. Therefore, eagerly complete our work > + // instead by immediately calling Run(). > + mCanceled = true; > + mCallingRunWhileCanceled = true; Nit: Please use AutoRestore for this.
Attachment #8447315 -
Flags: review?(bent.mozilla) → review+
Assignee | ||
Comment 55•10 years ago
|
||
Fixed review feedback, except for the intptr_t comment. Turns out we don't have an intptr_t compatible AppendInt() method. Updated commit log.
Attachment #8447315 -
Attachment is obsolete: true
Attachment #8447398 -
Flags: review+
Assignee | ||
Comment 56•10 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/303027a0da95
Comment 57•10 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/303027a0da95
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla33
Assignee | ||
Updated•10 years ago
|
Assignee | ||
Comment 58•10 years ago
|
||
I pushed a backout of mozilla-inbound since fixing the oranges has been much trickier than I thought: https://hg.mozilla.org/integration/mozilla-inbound/rev/306fc8865aae
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee | ||
Comment 59•10 years ago
|
||
This landed on m-c as well: https://hg.mozilla.org/mozilla-central/rev/306fc8865aae
Updated•10 years ago
|
Target Milestone: mozilla33 → ---
Assignee | ||
Comment 60•10 years ago
|
||
After talking with Ben and much experimentation, we think the best approach is to make MessageChannel::Close() call ActorDestroy() synchronously in the case where the channel is still opening. This patch does this along with a few other minor fixes we uncovered. It works in the debugger, but needs a full try run to see if any of the oranges are still a problem. https://tbpl.mozilla.org/?tree=Try&rev=d6dc74bd5e77
Attachment #8447398 -
Attachment is obsolete: true
Assignee | ||
Comment 61•10 years ago
|
||
Comment on attachment 8451878 [details] [diff] [review] Reland support for PBackground on workers. (v10) Try looks good, so flagging for review. The splinter interdiff to v9 is mostly correct, although you have to ignore the first two hunks.
Attachment #8451878 -
Flags: review?(bent.mozilla)
Assignee | ||
Comment 62•10 years ago
|
||
And by try being good, I mean that there have been no failures in the previously orange tests. B2G ICS Emulator Debug M12 was the worst before. So far no failures after 20 pushes.
Comment on attachment 8451878 [details] [diff] [review] Reland support for PBackground on workers. (v10) Review of attachment 8451878 [details] [diff] [review]: ----------------------------------------------------------------- Thanks! ::: dom/workers/RuntimeService.cpp @@ +919,5 @@ > +public: > + WorkerBackgroundChildCallback(bool* aDone) > + : mDone(aDone) > + { > + MOZ_ASSERT(mDone); Let's also MOZ_ASSERT(!NS_IsMainThread()) ::: ipc/glue/BackgroundChild.h @@ +38,5 @@ > // time. > // > +// CloseForCurrentThread() will close the current PBackground actor. Subsequent > +// calls to GetForCurrentThread will return null. CloseForCurrentThread() may > +// only be called exactly once per thread. Currently it is illegal to call this CloseForCurrentThread() may be called more than once, you just have to create another actor in between. This comment needs more specificity. ::: ipc/glue/BackgroundImpl.cpp @@ +422,5 @@ > if (threadLocalInfo) { > if (threadLocalInfo->mActor) { > threadLocalInfo->mActor->Close(); > + threadLocalInfo->mActor->AssertActorDestroyed(); > + // Since the actor is created on the main thread it must only Nit: Please add a newline between the assert and the comment. @@ +1697,5 @@ > + MOZ_CRASH("Attempting to close a non-existent PBackground actor!"); > + } > + > + if (threadLocalInfo->mActor) { > + threadLocalInfo->mActor->FlushPendingInterruptQueue(); This really shouldn't be needed (it only does anything if we have rpc messages, which we shouldn't...) but I don't think it hurts either. @@ +1699,5 @@ > + > + if (threadLocalInfo->mActor) { > + threadLocalInfo->mActor->FlushPendingInterruptQueue(); > + } > + DebugOnly<PRStatus> status = PR_SetThreadPrivate(sThreadLocalIndex, nullptr); Nit: Please add a newline before this one, and maybe a comment saying that clearing the threadlocal will close the actor synchronously. @@ +2007,5 @@ > ChildImpl::ActorDestroy(ActorDestroyReason aWhy) > { > AssertIsOnBoundThread(); > > + mActorDestroyed = true; Nit: Please also assert that mActorDestroyed is false before you set it here. ::: ipc/glue/MessageChannel.cpp @@ +1662,5 @@ > } > > if (ChannelOpening == mChannelState) { > + // SynchronouslyClose() waits for an ack from the other side, so > + // the opening process should complete before this returns. Nit: s/process/sequence/ maybe? "process" is overloaded in this context. ::: ipc/glue/MessagePump.cpp @@ +12,5 @@ > #include "base/basictypes.h" > #include "base/logging.h" > #include "base/scoped_nsautorelease_pool.h" > #include "mozilla/Assertions.h" > +#include "mozilla/AutoRestore.h" Nit: This isn't needed any longer.
Attachment #8451878 -
Flags: review?(bent.mozilla) → review+
Assignee | ||
Comment 64•10 years ago
|
||
Updated with review feedback. Retested on debugger for good measure due to that extra assert we added in ChildImpl::ActorDestroy.
Attachment #8451878 -
Attachment is obsolete: true
Attachment #8452511 -
Flags: review+
Assignee | ||
Comment 65•10 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/725b7db309c3
Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/99c726eaa286 at bkelly's request.
Assignee | ||
Comment 67•10 years ago
|
||
Lets see if we can reproduce these android ThreadLink crashes in this try: https://tbpl.mozilla.org/?tree=Try&rev=02dff17f5139
Assignee | ||
Comment 68•10 years ago
|
||
I got it to reproduce once in that try. Took about 40 pushes of Android 2.3 Opt M6. That's about a 2% or 3% reproduction rate.
Assignee | ||
Comment 69•10 years ago
|
||
It appears this is easy to reproduce locally just by adding a printf in here: http://dxr.mozilla.org/mozilla-central/source/ipc/glue/MessageLink.cpp#189 This code looks very racy and could use more checking. Also, whats the history of this comment: // :TODO: MonitorAutoLock lock(*mChan->mMonitor); Ben, do you know?
Flags: needinfo?(bent.mozilla)
Assignee | ||
Comment 70•10 years ago
|
||
I think that TODO comment is about prevent messages from being sent on the channel, not about racing with the other ThreadLink destructor. So probably not strictly relevant here.
Flags: needinfo?(bent.mozilla)
Assignee | ||
Comment 71•10 years ago
|
||
This patch adds a static mutex to protect the operations in ~ThreadLink(). This code needs to run automically with respect to the other side's ~ThreadLink(). In theory a static mutex is a bit broad as we could have separate channels tearing down, but this does not seem too common a situation and probably won't contend much. Also, it seems the most straightforward as its not clear where else to stick a mutex that would be common to both ThreadLink objects. Taking both channel monitors seems like a recipe for deadlock. Here's a new try: https://tbpl.mozilla.org/?tree=Try&rev=3b44eaef7693 Based on local debugging I'm fairly confident this will solve the problem, so flagging for review now.
Attachment #8452511 -
Attachment is obsolete: true
Attachment #8453057 -
Flags: review?(bent.mozilla)
Assignee | ||
Comment 72•10 years ago
|
||
I suppose to avoid that mutex being abused elsewhere, I could declare it as a static within the ~ThreadLink() method scope itself.
Assignee | ||
Comment 73•10 years ago
|
||
On IRC Ben pointed out that mMonitor is shared between the two MessageChannel objects, so we can just lock that. I also dug into why it was commented out with a TODO. It appears that it was an unintended bug from merging channels together into on class in bug 901789. This patch restores the mMonitor lock in ~ThreadLink(), adds a comment about the monitor being shared, and adds some MOZ_ASSERTs for good measure. https://tbpl.mozilla.org/?tree=Try&rev=2cb59580ab0c
Attachment #8453057 -
Attachment is obsolete: true
Attachment #8453057 -
Flags: review?(bent.mozilla)
Attachment #8453181 -
Flags: review?(bent.mozilla)
Updated•10 years ago
|
Attachment #8453181 -
Flags: review?(bent.mozilla) → review+
Assignee | ||
Comment 74•10 years ago
|
||
Try looks good with over 40 triggers without a failure. Back to mozilla-inbound: https://hg.mozilla.org/integration/mozilla-inbound/rev/fc4d6b5ee8ce
Comment 75•10 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/fc4d6b5ee8ce
Status: REOPENED → RESOLVED
Closed: 10 years ago → 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla33
You need to log in
before you can comment on or make changes to this bug.
Description
•