Closed Bug 968604 Opened 11 years ago Closed 11 years ago

[Tarako] The browser process needs to be forked from Nuwa

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla31

Project Flags:

tracking-b2g

backlog

Tracking Flags:

Tracking

Status

b2g-v1.3T

---

fixed

People

(Reporter: llin, Assigned: cyu)

References

Details

Attachments

(5 files, 1 obsolete file)

Screenshot from 2014-02-06 10:12:11.png 11 years ago Lawrence Lin (deleted), image/png		Details
For the browser process from Nuwa 11 years ago Cervantes Yu [:cyu] [:cervantes] (deleted), patch	khuey : review+	Details \| Diff \| Splinter Review
Fix the test case test_Preallocated.html 11 years ago Cervantes Yu [:cyu] [:cervantes] (deleted), patch	khuey : review+	Details \| Diff \| Splinter Review
Fork the browser process from Nuwa 11 years ago Cervantes Yu [:cyu] [:cervantes] (deleted), patch	cyu : review+	Details \| Diff \| Splinter Review
Disable test_Preallocated.html 11 years ago Cervantes Yu [:cyu] [:cervantes] (deleted), patch	fabrice : review+	Details \| Diff \| Splinter Review
Fork the browser process from Nuwa (for m-c) 11 years ago Cervantes Yu [:cyu] [:cervantes] (deleted), patch	cyu : review+	Details \| Diff \| Splinter Review

Lawrence Lin

Reporter

Description

•

11 years ago

Attached image Screenshot from 2014-02-06 10:12:11.png (deleted) — Details

After smoking test of Tarako, it was found at final memory usage check. Reproduce steps are still wanted. The screenshot of b2g-info is attached.

[:fabrice] Fabrice Desré

Comment 1

•

11 years ago

Browser processes are the ones used by browser tabs. So it can be normal to have more than one.
The potential issue is that they are forked from the main b2g process, and not from a preallocated one.

Thinker Li [:sinker]

Comment 2

•

11 years ago

Was the Nuwa process killed and re-born? Lawrence, could you provide logcat or information of creation/killing of processes?

[:fabrice] Fabrice Desré

Comment 3

•

11 years ago

Thinker, I just did a quick check:

Before opening a page in the browser:
fabrice@fabrice-x240:~/dev/tarako$ b2g-info 
                         |     megabytes     |
           NAME PID PPID CPU(s) NICE  USS  PSS  RSS VSIZE OOM_ADJ USER   
            b2g  86    1   28.4    0 40.9 44.8 52.5 147.0       0 root   
         (Nuwa) 321   86    0.9    0  0.8  3.4  9.5  49.5       0 root   
     Homescreen 338  321    5.5    1 10.2 14.6 23.6  64.7       2 app_338
(Preallocated a 390  321    0.9   18  4.5  7.6 15.0  57.5      10 root   

And after opening a page:

fabrice@fabrice-x240:~/dev/tarako$ b2g-info 
                    |     megabytes     |
      NAME PID PPID CPU(s) NICE  USS  PSS  RSS VSIZE OOM_ADJ USER   
       b2g  86    1   38.7    0 33.9 37.0 42.5 150.0       0 root   
    (Nuwa) 321   86    1.1    0  0.3  1.1  3.4  49.5       0 root   
Homescreen 338  321    5.9   18  6.8  8.6 12.8  63.7       8 app_338
   Browser 438   86    4.8    1 13.2 16.2 21.8  71.2       2 app_438

So we don't reuse the preallocated process, but just fork from b2g. The USS of the browser app is then pretty high (in this test I could not even get a resource because I have no network access...)

Lawrence Lin

Reporter

Comment 4

•

11 years ago

In the smoke test of Tarako, there is no browser tab related cases. 
It means, during the test, no additional tab will be created.

The issue is that I cannot see or kill the additional broswer process via the application manager, and what the browser process being killed is the latest created. 
Even the application manager shows "No recent apps", the browser process (in the case, PID is 1024) is still there.
Have we been got any reported zombie processes?

Lawrence Lin

Reporter

Comment 5

•

11 years ago

Possible reproduce steps:

1. Load reference workload (light) on the target phone
2. After reboot, launch Music and play a song
3. Launch Contacts
4. Launch Messages
5. Launch Gallery
6. Call the target phone from another phone
7. The dialer(Communications) is not launched and the call is redirected to the voice mail directly
8. Then, the instance of zombie browser is on the process list

Cervantes Yu [:cyu] [:cervantes]

Assignee

Comment 6

•

11 years ago

I think there is a bug in creating the browser process so it forks from b2g. I will take a look.

Assignee: nobody → cyu

Cervantes Yu [:cyu] [:cervantes]

Assignee

Updated

•

11 years ago

Summary: [Tarako] There might be two existing instances of Browser → [Tarako] The browser process needs to be forked from Nuwa

Cervantes Yu [:cyu] [:cervantes]

Assignee

Comment 7

•

11 years ago

Attached patch For the browser process from Nuwa (obsolete) (deleted) — Details — Splinter Review

* Perform ContentChild::InitProcessAttributes() when PBrowser is first allocated in the child process. This is to reset child-side attributes for browser.
* Add ContentParent::TransformPreallocatedIntoBrowser() to reset ContentParent's member fields and set OS privileges of the child process.

Attachment #8375423 - Flags: review?(khuey)

Cervantes Yu [:cyu] [:cervantes]

Assignee

Comment 8

•

11 years ago

Tests on try: https://tbpl.mozilla.org/?tree=Try&rev=8fb61c9e31a4

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 9

•

11 years ago

Comment on attachment 8375423 [details] [diff] [review]
For the browser process from Nuwa

Review of attachment 8375423 [details] [diff] [review]:
-----------------------------------------------------------------

very nice. r=me

Attachment #8375423 - Flags: review?(khuey) → review+

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 10

•

11 years ago

This patch causes dom/browser-element/mochitest/priority/test_Simple.html to fail though ... are we not setting the priority of the browser process?

Cervantes Yu [:cyu] [:cervantes]

Assignee

Comment 11

•

11 years ago

We are.

ProcessPriorityManager::SetProcessPriority(this,                                           PROCESS_PRIORITY_FOREGROUND);

should do this. I will take a look why it fails.

Cervantes Yu [:cyu] [:cervantes]

Assignee

Updated

•

11 years ago

blocking-b2g: --- → 1.3T?

[:fabrice] Fabrice Desré

Updated

•

11 years ago

blocking-b2g: 1.3T? → 1.3T+

Cervantes Yu [:cyu] [:cervantes]

Assignee

Comment 12

•

11 years ago

The test case passes when run alone, but fails when run in in the suite.

This seems to be relevant to the test case failure.

 1:03.11 [Parent 15327] WARNING: Ignoring duplicate observer.: file /home/cervantes/hg/mozilla-central/modules/libpref/src/nsPrefBranch.cpp, line 622
 1:03.12 [Parent 15327] WARNING: Adding an observer twice!: '!mEventObservers.Contains(observer)', file /home/cervantes/hg/mozilla-central/xpcom/threads/nsThread.cpp, line 752

Still checking why there is a duplicate observer.

Cervantes Yu [:cyu] [:cervantes]

Assignee

Comment 13

•

11 years ago

The failure is a false alarm not indicating to a bug in this fix. The problem is that (very unfortunately) test_Preallocated.html runs before test_Simple.html. The preallocated process created in test_Preallocated.html is used by test_Simple.html, then "process-priority-manager:TEST-ONLY:process-created" will not be fired because it already was in test_Preallocated.html.

We need to shutdown the preallocated process in test_Preallocated.html so it's not affecting the test case running after it.

Cervantes Yu [:cyu] [:cervantes]

Assignee

Comment 14

•

11 years ago

Attached patch Fix the test case test_Preallocated.html (deleted) — Details — Splinter Review

This fixes the test case that the preallocated process is not actually shut down in the cleanup() function:

- PreallocatedProcessManagerImpl::AllocateNow() creates the ContentParent instance.
- ContentParent notifies that the process is created
- The test case calls cleanup() to flip the preference back to false and then PreallocatedProcessManagerImpl::Disable() is called.
- PreallocatedProcessManagerImpl::Disable() won't call mPreallocatedAppProcess->Close() because the assignment in PreallocatedProcessManagerImpl::AllocateNow() hasn't been done.

This fixes the test case by making cleanup() async.

Attachment #8381368 - Flags: review?(khuey)

Cervantes Yu [:cyu] [:cervantes]

Assignee

Comment 15

•

11 years ago

Tests: https://tbpl.mozilla.org/?tree=Try&rev=b966175e43ff

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Updated

•

11 years ago

Attachment #8381368 - Flags: review?(khuey) → review+

Cervantes Yu [:cyu] [:cervantes]

Assignee

Comment 16

•

11 years ago

(In reply to Cervantes Yu from comment #15)
> Tests: https://tbpl.mozilla.org/?tree=Try&rev=b966175e43ff

Mochitest group 2 on Linux Debug and Linux X64 Debug still fail in a line. Seems not intermittent. I need to look at it.

[:fabrice] Fabrice Desré

Comment 17

•

11 years ago

Cervantes, do you need help fixing these tests issues?

Flags: needinfo?(cyu)

Cervantes Yu [:cyu] [:cervantes]

Assignee

Comment 18

•

11 years ago

(In reply to Fabrice Desré [:fabrice] from comment #17)
> Cervantes, do you need help fixing these tests issues?

That would be great. :)

Flags: needinfo?(cyu)

Cervantes Yu [:cyu] [:cervantes]

Assignee

Comment 19

•

11 years ago

From the log it looks like it completed all the test cases in mochitest-2, then it get stuck in shutting down. The test runner sends SIGABRT to force it to shut down. I still don't make the test get stuck. Is there any way to bisect the offending test case in mochitest-2?

Joe Cheng [:jcheng] (please needinfo)

Comment 20

•

11 years ago

ni? Fabrice to see if any help can be provided regarding comment 19

Flags: needinfo?(fabrice)

Cervantes Yu [:cyu] [:cervantes]

Assignee

Comment 21

•

11 years ago

For more information, I ran mochitest-2 locally with the patches applied. The failure is intermittent. I still have no clue why it is permfail on try.

Jason Smith [:jsmith]

Updated

•

11 years ago

Component: General → IPC

Product: Firefox OS → Core

Cervantes Yu [:cyu] [:cervantes]

Assignee

Comment 22

•

11 years ago

From the crash stack in the log, the failure indicates that it is performing PlatformThread::join(), called from ShutdownXPCOM() to delete sIOThread. It's still unclear how the change can make this pthread_join() to block until we got killed by the test runner.

Cervantes Yu [:cyu] [:cervantes]

Assignee

Comment 23

•

11 years ago

Still wrestling with mochitest-2 failures. With https://bugzilla.mozilla.org/attachment.cgi?id=8381368 fixing the inherent bug in the test case, we saw consecutive mochitest-2 failure. It's hard to say that this is intermittent. I'll try to reorder or disabling this test case to see if we can pass the suite. If yes, can we unblock the patch to 1.3T and reenable the test case once we fix it. Fabrice, how does this sound?

James Zhang (Spreadtrum)

Comment 24

•

11 years ago

When can we land this patch?

Flags: needinfo?(styang)

[:fabrice] Fabrice Desré

Comment 25

•

11 years ago

(In reply to Cervantes Yu from comment #23)
> Still wrestling with mochitest-2 failures. With
> https://bugzilla.mozilla.org/attachment.cgi?id=8381368 fixing the inherent
> bug in the test case, we saw consecutive mochitest-2 failure. It's hard to
> say that this is intermittent. I'll try to reorder or disabling this test
> case to see if we can pass the suite. If yes, can we unblock the patch to
> 1.3T and reenable the test case once we fix it. Fabrice, how does this sound?

Last resort we'll disable the test temporarily. Are you sure it's a test issue and not a bug in the patch itself?

(In reply to James Zhang from comment #24)
> When can we land this patch?

Worst case we'll land Tuesday with the test disabled, sooner if Cervantes finds what's going wrong.

Flags: needinfo?(fabrice)

Cervantes Yu [:cyu] [:cervantes]

Assignee

Comment 26

•

11 years ago

(In reply to Fabrice Desré [:fabrice] from comment #25)
> (In reply to Cervantes Yu from comment #23)
> Last resort we'll disable the test temporarily. Are you sure it's a test
> issue and not a bug in the patch itself?
> 
Seriously, I am not sure. But from the log the test cases finished, but the suite cannot finish and get killed by the test runner after no output for 330 seconds. mochitest-2 on Linux Debug and Linux 64 Debug are the only 2 suites that failed during shutdown.

3 more tests:

1. https://tbpl.mozilla.org/?tree=Try&rev=69bea5abc0f8
Only with test case fix. 1 passed, 2 failed. This should always pass in theory.

2. https://tbpl.mozilla.org/?tree=Try&rev=d96de9c1e324
Code and test fixes, plus a little change to not use preallocated process for the browser. All 3 failed.

3. 2. with test_Preallocated.html disable. 1 green (for the 1st time...), 2 more running.

Because of 1. I tend to believe there is something wrong in how we run the tests. I need to dig into the tests further to fix that.

Cervantes Yu [:cyu] [:cervantes]

Assignee

Comment 27

•

11 years ago

Oops, the URL for 3. in comment #26 is https://tbpl.mozilla.org/?tree=Try&rev=aa61c08dab3b

Cervantes Yu [:cyu] [:cervantes]

Assignee

Comment 28

•

11 years ago

I guess the main thread fails to pthread_join() the IO thread because it is stuck in the following call:

Thread 3
 0  libpthread-2.15.so + 0xf88d
 1  libxul.so!ChildReaper::WaitForChildExit [process_watcher_posix_sigchld.cc:69bea5abc0f8 : 68 + 0xa]
 2  libxul.so!ChildLaxReaper::WillDestroyCurrentMessageLoop [process_watcher_posix_sigchld.cc:69bea5abc0f8 : 158 + 0x4]
 3  libxul.so!MessageLoop::~MessageLoop() [message_loop.cc:69bea5abc0f8 : 155 + 0x5]
 4  libxul.so!base::Thread::ThreadMain() [thread.cc:69bea5abc0f8 : 174 + 0xb]
 5  libxul.so!ThreadFunc [platform_thread_posix.cc:69bea5abc0f8 : 39 + 0x2]
 6  libpthread-2.15.so + 0x7e99

There is a loop in HANDLE_EINTR(waitpid()). It could be an infinite loop so the main thread can never shut down.

Since this is not reproduced locally, I pushed a test with more debugging instruments. Let's see if it's really what I guess. If so then I think we have another bug.
https://tbpl.mozilla.org/?tree=Try&rev=5d82ded68f39

Cervantes Yu [:cyu] [:cervantes]

Assignee

Comment 29

•

11 years ago

waitpid() for the prealloc'd process doesn't return but instead block indefinitely. I think PreallocatedProcessManagerImpl::Disable() doesn't close the prealloc'd process correctly.

Cervantes Yu [:cyu] [:cervantes]

Assignee

Updated

•

11 years ago

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=987164

Steven Yang [:styang]

Updated

•

11 years ago

Flags: needinfo?(styang)

Joe Cheng [:jcheng] (please needinfo)

Comment 30

•

11 years ago

(In reply to Fabrice Desré [:fabrice] from comment #25)
> (In reply to James Zhang from comment #24)
> > When can we land this patch?
> 
> Worst case we'll land Tuesday with the test disabled, sooner if Cervantes
> finds what's going wrong.

It seems to be difficult to debug the test case failure on try server, can we land first and follow up to fix the test case? thanks

status-b2g-v1.3T: --- → affected

Flags: needinfo?(fabrice)

Flags: needinfo?(cyu)

Cervantes Yu [:cyu] [:cervantes]

Assignee

Comment 31

•

11 years ago

Attached patch Fork the browser process from Nuwa (deleted) — Details — Splinter Review

Updated patch for 1.3t:
* Check for dom.ipc.processPrelaunch.enabled before returning null from ContentParent::GetNewOrUsed() so it will work correctly for NUWA + no prelaunch config.
* Don't #ifdef in ContentChild::RecvPBrowserConstructor() so it will work correctly for NUWA + no prelaunch config.

Attachment #8375423 - Attachment is obsolete: true

Attachment #8396307 - Flags: review+

Cervantes Yu [:cyu] [:cervantes]

Assignee

Comment 32

•

11 years ago

(In reply to Joe Cheng [:jcheng] from comment #30)
> It seems to be difficult to debug the test case failure on try server, can
> we land first and follow up to fix the test case? thanks

Fabrice, if you agree, we can land this on 1.3t with test_Preallocated.html disabled temporarily (the test is never correctly cleaned up, see comment #14). I will fix it and reenable the test in bug 987164.

Flags: needinfo?(cyu)

Cervantes Yu [:cyu] [:cervantes]

Assignee

Comment 33

•

11 years ago

Attached patch Disable test_Preallocated.html (deleted) — Details — Splinter Review

Attachment #8396314 - Flags: review?(fabrice)

[:fabrice] Fabrice Desré

Updated

•

11 years ago

Depends on: 987164

[:fabrice] Fabrice Desré

Comment 34

•

11 years ago

Comment on attachment 8396314 [details] [diff] [review]
Disable test_Preallocated.html

Review of attachment 8396314 [details] [diff] [review]:
-----------------------------------------------------------------

ok, let's land.

Attachment #8396314 - Flags: review?(fabrice) → review+

[:fabrice] Fabrice Desré

Updated

•

11 years ago

Flags: needinfo?(fabrice)

Keven Kuo [:kkuo]

Comment 35

•

11 years ago

Ying, are you okay to uplift it? thanks!

--
Keven

Flags: needinfo?(ying.xu)

Cervantes Yu [:cyu] [:cervantes]

Assignee

Comment 36

•

11 years ago

Attached patch Fork the browser process from Nuwa (for m-c) (deleted) — Details — Splinter Review

Attachment #8396937 - Flags: review+

Cervantes Yu [:cyu] [:cervantes]

Assignee

Comment 37

•

11 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/0c3c8215ef73
https://hg.mozilla.org/integration/mozilla-inbound/rev/e3ddffe9d504
https://hg.mozilla.org/integration/mozilla-inbound/rev/65092435a027

Ryan VanderMeulen [:RyanVM]

Comment 38

•

11 years ago

https://hg.mozilla.org/mozilla-central/rev/0c3c8215ef73
https://hg.mozilla.org/mozilla-central/rev/e3ddffe9d504
https://hg.mozilla.org/mozilla-central/rev/65092435a027

Status: NEW → RESOLVED

Closed: 11 years ago

Flags: in-testsuite+

Resolution: --- → FIXED

Target Milestone: --- → mozilla31

[:fabrice] Fabrice Desré

Comment 39

•

11 years ago

Cervantes, can you rebase and push to 1.3t? thanks!

Flags: needinfo?(cyu)

Cervantes Yu [:cyu] [:cervantes]

Assignee

Comment 40

•

11 years ago

Push to 1.3t branch.

https://hg.mozilla.org/releases/mozilla-b2g28_v1_3t/rev/c6ec532e89ca
https://hg.mozilla.org/releases/mozilla-b2g28_v1_3t/rev/c9715ee1bdd1
https://hg.mozilla.org/releases/mozilla-b2g28_v1_3t/rev/0e12c0414846

Flags: needinfo?(cyu)

Cervantes Yu [:cyu] [:cervantes]

Assignee

Updated

•

11 years ago

status-b2g-v1.3T: affected → fixed

ying.xu

Updated

•

11 years ago

Flags: needinfo?(ying.xu)

Kan-Ru Chen [:kanru] (UTC+9)

Updated

•

11 years ago

Depends on: 992853

Ben Kelly [:bkelly, not reviewing]

Updated

•

11 years ago

Blocks: 987556

Ben Kelly [:bkelly, not reviewing]

Comment 41

•

11 years ago

Nom for 1.3? because we will need this to help fix bug 987556 which is a 1.3+ blocker.

blocking-b2g: 1.3T+ → 1.3?

Jason Smith [:jsmith]

Comment 42

•

11 years ago

What's the regression risk analysis on this patch?

Flags: needinfo?(cyu)

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 43

•

11 years ago

This is relatively high risk, IMO.

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 44

•

11 years ago

Although we're going to have to fix any issues that come up for Tarako since we took this on 1.3T, so ...

Jason Smith [:jsmith]

Comment 45

•

11 years ago

Ok - in that case, we're probably going to need to punt this then. We can't take anything risky at this point. I'll go back & discuss the blocking bug here with Vance here to see if we can get a waiver for it.

Flags: needinfo?(cyu)

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 46

•

11 years ago

bkelly has a new theory about the blocking bug, fwiw

Ben Kelly [:bkelly, not reviewing]

Comment 47

•

11 years ago

Yes, let me see if I can get browsermark to run without this.  Its a bit up in the air at the moment.

That sad truth is our baseline memory usage went up from b2g18 to b2g26.  We more or less compensated with NUWA.

That being said, the major issue with the blocking bug is with script source compression being disabled.  I just need to verify if I can get browsermark to run with compression or another fix, but without NUWA.

Cervantes Yu [:cyu] [:cervantes]

Assignee

Comment 48

•

11 years ago

You don't have to really test on Tarako. For a quick verification we can test it on m-c, ./edit-prefs.sh to disable "dom.ipc.processPrelaunch.enabled" and retest.

Andrew Overholt [:overholt]

Comment 49

•

11 years ago

Triage decided to move this to backlog but reevaluate if this is deemed necessary to fix bug 987556.

blocking-b2g: 1.3? → backlog

Olli Pettay [:smaug][bugs@pettay.fi]

Updated

•

10 years ago

Depends on: 1003046

Nobody; OK to take it and work on it

Updated

•

10 years ago

blocking-b2g: backlog → ---

tracking-b2g: --- → backlog

You need to log in before you can comment on or make changes to this bug.