Closed
Bug 1155547
Opened 10 years ago
Closed 9 years ago
Wait Nuwa forking a process if preallocated process isn't present
Categories
(Core :: IPC, defect)
Tracking
()
People
(Reporter: kk1fff, Assigned: cyu)
References
(Blocks 1 open bug)
Details
(Whiteboard: [caf priority: p3][CR 786372])
Attachments
(6 files, 13 obsolete files)
+++ This bug was initially created as a clone of Bug #1053634 +++
I discussed with Kanru. As the change is really large to make loading an frame asynchronous, and the complexity doesn't seen converge now, the risk of taking patches of 1053634 to 2.2 has increased. It have been already quite hard to just rebase the patches onto 2.2, and it could incur unexpected regression.
I've implemented short version that synchronously waits for nuwa forking a process when a process request comes after nuwa ready. I think we would need more time and complex work to make loading a frame become a real asynchronous task.
Attachment #8593798 -
Flags: review?(khuey)
Reporter | ||
Comment 1•10 years ago
|
||
with 8 lines of context.
Attachment #8593798 -
Attachment is obsolete: true
Attachment #8593798 -
Flags: review?(khuey)
Attachment #8593800 -
Flags: review?(khuey)
Reporter | ||
Updated•10 years ago
|
Flags: needinfo?(mlee)
Reporter | ||
Updated•10 years ago
|
Summary: Make all app processes forked from Nuwa → Wait Nuwa forking a process if preallocated process isn't present
Comment 2•10 years ago
|
||
According to comment 93 of bug 1053634, this bug is the simpler fix of a 2.2+ bug
blocking-b2g: --- → 2.2+
Reporter | ||
Comment 3•10 years ago
|
||
Frameloader can try several times before it actually get the process, adding a waiting list to make sure only one process per manifest url.
Attachment #8593800 -
Attachment is obsolete: true
Attachment #8593800 -
Flags: review?(khuey)
Attachment #8594714 -
Flags: review?(khuey)
Reporter | ||
Comment 4•10 years ago
|
||
Attachment #8594714 -
Attachment is obsolete: true
Attachment #8594714 -
Flags: review?(khuey)
Attachment #8594717 -
Flags: review?(khuey)
Updated•10 years ago
|
Assignee: nobody → kk1fff
Comment 5•10 years ago
|
||
Hi Patrick,
I noticed your needinfo request to me but don't see a question indicating what info do you need from me. Please clarify.
Thanks,
Mike
Flags: needinfo?(mlee) → needinfo?(kk1fff)
Reporter | ||
Comment 6•10 years ago
|
||
(In reply to Mike Lee [:mlee] from comment #5)
> I noticed your needinfo request to me but don't see a question indicating
> what info do you need from me. Please clarify.
I just like to know if you have any thought about this as a quick fix of bug 1053634, since we had some email contact about that bug. :)
Flags: needinfo?(kk1fff)
Reporter | ||
Comment 7•10 years ago
|
||
(In reply to Patrick Wang (Chih-Kai Wang) (:kk1fff) from comment #6)
> I just like to know if you have any thought about this as a quick fix of bug
> 1053634, since we had some email contact about that bug. :)
^ I meant.. had some conversations on mail
Comment 8•10 years ago
|
||
Low(er) risk fix for v2.2 sounds great to me. How will this compare to the previous approach in terms of memory and performance?
Once this lands will be plugin-container process be used for anything in b2g?
Reporter | ||
Comment 9•10 years ago
|
||
(In reply to Michael Vines [:m1] [:evilmachines] from comment #8)
> Low(er) risk fix for v2.2 sounds great to me. How will this compare to the
> previous approach in terms of memory and performance?
The patch simply require all process being forked from Nuwa. If this landed, I'd see original one as an architectural refactor for making loading frame asynchronous.
Both original one and this don't wait for Nuwa ready. As bug 1053634 comment 52 states, index db becomes busy when an x-heavy workload pushed, and Nuwa needs several minutes to becoming ready, waiting for Nuwa doesn't seem feasible. Thus, before Nuwa ready, we just fork from B2G. But after Nuwa ready, both approaches make sure that all processes are forked from Nuwa. The difference is that the previous one do it asynchronously while this one just enter nested loop and wait. I think the memory consumption should be close, and the loading time increases only when no preallocated process present, which is within 5 seconds after a process launched.
Comment 10•10 years ago
|
||
(In reply to Patrick Wang (Chih-Kai Wang) (:kk1fff) from comment #7)
> (In reply to Patrick Wang (Chih-Kai Wang) (:kk1fff) from comment #6)
> > I just like to know if you have any thought about this as a quick fix of bug
> > 1053634, since we had some email contact about that bug. :)
> ^ I meant.. had some conversations on mail
I agree with mvines comment 8 re: this lower risk patch.
Comment on attachment 8594717 [details] [diff] [review]
Patch
Review of attachment 8594717 [details] [diff] [review]:
-----------------------------------------------------------------
I'm afraid of this change, but I'm willing to try it.
::: dom/ipc/PreallocatedProcessManager.h
@@ +85,5 @@
> static void MaybeForgetSpare(ContentParent* aContent);
> static bool IsNuwaReady();
> static void OnNuwaReady();
> static bool PreallocatedProcessReady();
> + static already_AddRefed<ContentParent> WaitNewProcess(const nsAString& aManifestURL);
This is nit-picking, but I would prefer to call this BlockForNewProcess.
Regardless, it should definitely have a "For" in there.
Attachment #8594717 -
Flags: review?(khuey) → review+
Reporter | ||
Comment 12•10 years ago
|
||
Thanks, Kyle!
Fixing nit.
Attachment #8594717 -
Attachment is obsolete: true
Attachment #8598383 -
Flags: review+
Reporter | ||
Comment 13•10 years ago
|
||
Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/e91d718aa500 for xpcshell orange:
https://treeherder.mozilla.org/logviewer.html#?job_id=9371513&repo=mozilla-inbound
Flags: needinfo?(kk1fff)
Comment 15•10 years ago
|
||
Patrick, can you provide a status update here on the outlook for getting this patch re-landed?
Reporter | ||
Comment 16•10 years ago
|
||
I am going to move a patch that disables nuwa on xpcshell to here. B2G on try seems broken yesterday, thus I can't run the test cases. I'll try today.
Flags: needinfo?(kk1fff)
Reporter | ||
Comment 17•10 years ago
|
||
Looks like disabling Nuwa on xpcshell doesn't help. oauth test timed out. I would like to see if Cervantes could help take a look..
Flags: needinfo?(cyu)
Comment 18•10 years ago
|
||
Hey Patrick, can you please attach a v2.2 patch for this bug? We'll pull this in to our build in parallel to the xpchell test debug.
Flags: needinfo?(kk1fff)
Reporter | ||
Comment 19•10 years ago
|
||
Flags: needinfo?(kk1fff)
Assignee | ||
Comment 20•10 years ago
|
||
I'll continue working on this.
Assignee: kk1fff → cyu
Flags: needinfo?(cyu)
Comment 21•10 years ago
|
||
We applied the patch but it is causing major instability. cafbot will attach minidump and log info in a few minutes. Please do not merge the change as-is, thanks!
Comment 22•10 years ago
|
||
Comment 23•10 years ago
|
||
Comment 24•10 years ago
|
||
Comment 25•10 years ago
|
||
Assignee | ||
Comment 26•10 years ago
|
||
Greg, is there any STR for the above crashes? Thanks.
Flags: needinfo?(ggrisco)
Comment 27•10 years ago
|
||
(In reply to Cervantes Yu from comment #26)
> Greg, is there any STR for the above crashes? Thanks.
We found this in stability test, so no clear steps are available, although it is happening quite often. We've seen it around 40 times in a relatively small number of test hours.
Flags: needinfo?(ggrisco)
Assignee | ||
Comment 28•9 years ago
|
||
I also reproduced the crash. It turns out that TabParent::mFrameElement is destroyed and then is later used in referenced in TabParent::IsVisible() or TabParent::GetFrameLoader(). I guess the nested loop used in this patch reenters some code that is not reentrant. Need to investigate more.
Flags: needinfo?(ggrisco)
Assignee | ||
Updated•9 years ago
|
Flags: needinfo?(ggrisco)
Comment 29•9 years ago
|
||
(In reply to Cervantes Yu from comment #28)
> I also reproduced the crash. It turns out that TabParent::mFrameElement is
> destroyed and then is later used in referenced in TabParent::IsVisible() or
> TabParent::GetFrameLoader(). I guess the nested loop used in this patch
> reenters some code that is not reentrant. Need to investigate more.
Cervantes, What is your status? Are you still investigating or are you working on a fix?
Updated•9 years ago
|
status-b2g-v2.2:
--- → affected
Updated•9 years ago
|
Updated•9 years ago
|
Flags: needinfo?(cyu)
Assignee | ||
Comment 30•9 years ago
|
||
I am still trying to fix the crash or prove that the approach of using a nested loop is a dead end. I plan to have a conclusion by the end of this week.
Flags: needinfo?(cyu)
Assignee | ||
Comment 31•9 years ago
|
||
It turns out that the assertion placed in nsFrameLoader::TryRemoteBrowser() is violated. 2 TabParents are created for the same nsFrameLoader. So the 2nd TabParent can reference the dangling pointer to the nsGenericHTMLFrameElement if the frame element is destroyed.
Assignee | ||
Comment 32•9 years ago
|
||
Update to the previous patch: make it safe to reenter at places that may enter the nested loop.
A hard lesson for using the nested loop is that we must assume that the method invocation chain can repeat itself with the use of a nested loop. The crash happens because the nested loop is called inside a constructor-like method: ContentParent::CreateBrowserOrApp(). When the constructor-like method repeats itself, a frame loader creates 2 TabParent instances. When the frame element is destroyed, one of the 2 TabParent instances isn't unaware about it and holds a dangling weak reference. Later the TabParent instance crashes when dereferencing it.
The crash is fixed by making the call that might enter a nested loop explicit as ContentParent::BlockForNewProcess(). nsFrameLoader::TryRemoteBrowser() and ContentParent::RecvCreateChildProcess(). The frame loader also checks whether the remote browser is created deep inside the nested loop and skip the creation of another one if the nested loop already created it.
Attachment #8598383 -
Attachment is obsolete: true
Attachment #8606234 -
Flags: review?(khuey)
Assignee | ||
Comment 33•9 years ago
|
||
Assignee | ||
Comment 34•9 years ago
|
||
Greg, would you initiate a round of stability test with this patch to see if there are still other crashes or quirks? Thanks.
Attachment #8599939 -
Attachment is obsolete: true
Attachment #8606253 -
Flags: feedback?(ggrisco)
Comment 35•9 years ago
|
||
>
> Greg, would you initiate a round of stability test with this patch to see if
> there are still other crashes or quirks? Thanks.
Hi Cervantes, we won't be able to bring this in now since we are trying to stabilize for major milestone. Once you have stabilized this on master and our milestone has passed, we can bring it in. Thanks.
Updated•9 years ago
|
Attachment #8606253 -
Flags: feedback?(ggrisco)
Updated•9 years ago
|
Whiteboard: [caf priority: p1][CR 786372] → [caf priority: p3][CR 786372]
Comment 36•9 years ago
|
||
Kyle, have you had a chance to start looking at this review yet?
Flags: needinfo?(khuey)
Comment on attachment 8606234 [details] [diff] [review]
Use a nested loop to wait until Nuwa completes forking the process.
Review of attachment 8606234 [details] [diff] [review]:
-----------------------------------------------------------------
I was going to hold off on r+ing this until it got some testing, but whatever. I still don't think we should check this in to branches without extensive testing.
Attachment #8606234 -
Flags: review?(khuey) → review+
Flags: needinfo?(khuey)
Assignee | ||
Comment 38•9 years ago
|
||
From the push log, some tests seem to be permfail. I think there are still other issues with the use of nested loop. We need to consider another alternative.
Comment 39•9 years ago
|
||
Hi Cervantes,
Do you have any update about the failed test for push?
Thanks!
Flags: needinfo?(cyu)
Assignee | ||
Comment 40•9 years ago
|
||
(In reply to Josh Cheng [:josh] from comment #39)
> Hi Cervantes,
> Do you have any update about the failed test for push?
> Thanks!
I give up the approach in the original patch because the approach is too risky. I am working on another approach to fix this bug.
Flags: needinfo?(cyu)
Assignee | ||
Comment 41•9 years ago
|
||
My plan B is moving the Nuwa operations to the background thread. When the frameloader requests a process but there is no spare one, the request will be processed using the background thread while the frameloader gets blocked (hopefully a short time). This is supposed to be much safer than using a nested loop.
Kyle, do you see any issue with this approach?
Flags: needinfo?(khuey)
Updated•9 years ago
|
Target Milestone: --- → 2.2 S14 (12june)
That sounds like a good approach to try.
This should be dropped from the 2.2 scope though. There's nowhere near enough time to be confident in the stability of a change like that for 2.2.
Flags: needinfo?(khuey)
Comment 43•9 years ago
|
||
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #42)
> That sounds like a good approach to try.
>
> This should be dropped from the 2.2 scope though. There's nowhere near
> enough time to be confident in the stability of a change like that for 2.2.
Agree with Kyle. And according to whiteboard, this is a bug with caf priority 3 for partner. It seems not a
very critical issue. Can we discuss with partner to remove it from CAF Blocker list?
Flags: needinfo?(jocheng)
Flags: needinfo?(doliver)
Comment 44•9 years ago
|
||
Confirmed that we will move this off the 2.2 list. Bumping to 3.0? to retain visibility.
blocking-b2g: 2.2+ → 3.0?
Flags: needinfo?(jocheng)
Flags: needinfo?(doliver)
Comment 45•9 years ago
|
||
Removing dependency on CAF 2.2, per agreement with partner.
No longer blocks: CAF-v2.2-metabug
Assignee | ||
Comment 46•9 years ago
|
||
This is the rollup patch for the current WIP.
TODO:
* All fork requests are sync now. The sync fork request takes 60 to 90 ms on flame. We should only block when we need one ContentParent and there is no prellocated process. The request to fork a preallocated process should run async.
Attachment #8606234 -
Attachment is obsolete: true
Attachment #8606253 -
Attachment is obsolete: true
Updated•9 years ago
|
Blocks: CAF-v2.2-metabug
Updated•9 years ago
|
No longer blocks: CAF-v2.2-metabug
Assignee | ||
Comment 47•9 years ago
|
||
Attachment #8617276 -
Attachment is obsolete: true
Attachment #8617875 -
Flags: review?(khuey)
Assignee | ||
Comment 48•9 years ago
|
||
Attachment #8617876 -
Flags: review?(khuey)
Assignee | ||
Comment 49•9 years ago
|
||
Attachment #8617875 -
Flags: review?(khuey) → review+
Assignee | ||
Comment 50•9 years ago
|
||
Changes: fix PreallocatedProcessManagerImpl::GetSpareProcess() and missing #include in NuwaChild.cpp.
Attachment #8617876 -
Attachment is obsolete: true
Attachment #8617876 -
Flags: review?(khuey)
Attachment #8623560 -
Flags: review?(khuey)
Am I expecting a new patch here? Or is this the latest version?
Flags: needinfo?(cyu)
Assignee | ||
Comment 52•9 years ago
|
||
This is the latest version, but I just found a conflict in updating m-c. I will rebase and fix a build break on Windows.
Flags: needinfo?(cyu)
Assignee | ||
Comment 53•9 years ago
|
||
Fix build breaks on Windows and Linux.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=be14c1ef2408
Attachment #8623560 -
Attachment is obsolete: true
Attachment #8623560 -
Flags: review?(khuey)
Attachment #8628126 -
Flags: review?(khuey)
Comment on attachment 8628126 [details] [diff] [review]
Part 2: PNuwa protocol implementation
Review of attachment 8628126 [details] [diff] [review]:
-----------------------------------------------------------------
r=me, looks good with comments addressed.
::: dom/ipc/ContentChild.cpp
@@ +532,5 @@
> if (IsNuwaProcess()) {
> return;
> }
>
> +
nit: extra /n
::: dom/ipc/ContentParent.h
@@ +938,5 @@
> #endif
>
> PProcessHangMonitorParent* mHangMonitorActor;
> +
> + // NuwaParent and ContentParent holds strong references to each other. The
hold
::: dom/ipc/NuwaChild.cpp
@@ +34,5 @@
> +#ifdef MOZ_NUWA_PROCESS
> +
> +namespace {
> +
> +// FIXME indentation: 2
I think you fixed this.
::: dom/ipc/NuwaParent.cpp
@@ +46,5 @@
> + , mClonedActor(nullptr)
> + , mWorkerThread(do_GetCurrentThread())
> + , mNewProcessPid(0)
> +{
> + MOZ_ASSERT(!NS_IsMainThread());
AssertIsOnBackgroundThread();
@@ +103,5 @@
> + MOZ_ASSERT(runnable);
> + MOZ_ALWAYS_TRUE(NS_SUCCEEDED(mWorkerThread->Dispatch(runnable,
> + NS_DISPATCH_NORMAL)));
> +
> + lock.Wait();
wait in a loop
@@ +114,5 @@
> + // We can't call ActorConstructed() right after Alloc() in the above runnable.
> + // To be safe we dispatch a runnable to the current thread to do it.
> + runnable = NS_NewRunnableFunction([actor] () -> void
> + {
> + // On the main thread.
Add a MOZ_ASSERT(NS_IsMainThread()) in this one.
@@ +117,5 @@
> + {
> + // On the main thread.
> + nsCOMPtr<nsIRunnable> nested = NS_NewRunnableFunction([actor] () -> void
> + {
> + // Call NuwaParent::ActorConstructed() on the worker thread.
And an AssertIsOnBackgroundThread() in this one.
@@ +150,5 @@
> + contentParent->SetNuwaParent(nullptr);
> + // Need to clear the ref to ContentParent on the main thread.
> + self->mContentParent = nullptr;
> +
> + });
nit: extra \n
@@ +166,5 @@
> + }
> +
> + nsCOMPtr<nsIRunnable> runnable =
> + NS_NewNonOwningRunnableMethod(mContentParent.get(),
> + &ContentParent::OnNuwaReady);
Add a comment about how NonOwning is safe because destroying the mContentParent ref would have to go to the main thread anyways.
@@ +236,5 @@
> + return false;
> + }
> +
> + MonitorAutoLock lock(mMonitor);
> + mBlocked = true;
nit: only one space here
@@ +237,5 @@
> + }
> +
> + MonitorAutoLock lock(mMonitor);
> + mBlocked = true;
> + lock.Wait();
wait on the condvar in a loop
::: dom/ipc/NuwaParent.h
@@ +17,5 @@
> +
> +class ContentParent;
> +
> +class NuwaParent: public mozilla::dom::PNuwaParent
> +{
nit: "NuwaParent :"
Attachment #8628126 -
Flags: review?(khuey) → review+
Assignee | ||
Comment 55•9 years ago
|
||
Assignee | ||
Comment 56•9 years ago
|
||
Rebased to latest and added/removed #include's accordingly.
Attachment #8617875 -
Attachment is obsolete: true
Attachment #8640980 -
Flags: review+
Assignee | ||
Comment 57•9 years ago
|
||
Rebased and addressed review comments.
Attachment #8628126 -
Attachment is obsolete: true
Attachment #8640981 -
Flags: review+
Assignee | ||
Comment 58•9 years ago
|
||
Kicked off another round of tests on try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=58aaebff90db
Assignee | ||
Comment 59•9 years ago
|
||
url: https://hg.mozilla.org/integration/mozilla-inbound/rev/2bdcd74e503d782614c12a9fb013ad80e1784de2
changeset: 2bdcd74e503d782614c12a9fb013ad80e1784de2
user: Cervantes Yu <cyu@mozilla.com>
date: Fri Jul 31 15:25:14 2015 +0800
description:
Bug 1155547, Part 1: Fix unified build breakage in adding new sources under dom/ipc/. r=khuey
url: https://hg.mozilla.org/integration/mozilla-inbound/rev/89d7e4e93bec246c56efe4225f5e816763e4ba5d
changeset: 89d7e4e93bec246c56efe4225f5e816763e4ba5d
user: Cervantes Yu <cyu@mozilla.com>
date: Fri Jul 31 15:25:27 2015 +0800
description:
Bug 1155547, Part 2: Create PNuwa protocol (managed by PBackground) for forking content processes. r=khuey
This allows us to send a sync fork request to the Nuwa process when we need one but there is no
spare process available. After an app is launched, the request to fork a spare process is still
handled asynchronously.
You need to log in
before you can comment on or make changes to this bug.
Description
•