Closed Bug 941466 Opened 11 years ago Closed 10 years ago

Load preload.js in Nuwa process may freeze Nuwa

Categories

(Firefox OS Graveyard :: General, defect)

ARM
Linux
defect
Not set
normal

Tracking

(blocking-b2g:-)

RESOLVED DUPLICATE of bug 970307
blocking-b2g -

People

(Reporter: kk1fff, Assigned: kk1fff)

References

Details

Attachments

(1 file, 4 obsolete files)

Currently, threads start to be freezed after PreloadSlowThings(). However, there may be some asynchronous events (for example, input stream ready) dispatched to main thread, and some of them will wait for conditions that are designed to be notified by some freezed threads. This will cause main thread be blocked.
Since we can't control what is done deep in PreloadSlowThings(), we plan to buffer the message after the Nuwa process is already frozen. The buffered messages won't be sent to the Nuwa process but instead replicated and sent to the spawned processes. Here are the details: * The Nuwa process calls PreloadSlowThings(). * In PreloadSlowThings(), some script sends an async request to the chrome process. * PrealoadSlowThings() returns, the Nuwa process starts to freeze its threads. * The chrome process fulfills the request on behalf of the Nuwa process and tries to send an async response back. The current problem is as follows: * The Nuwa process receives the response and notifies the main thread. * The main thread requests another thread, which is already frozen, to perform the request and waits for completion. * The main thread is blocked forever. The proposed solution is as follows: * After PreloadSlowThings(), send out a request (e.g. BeginNuwaPreparation) to the chrome process. * The chrome process receives the request and starts to buffer the messages after BeginNuwaPreparation except NuwaFork. * The chrome process acks BeginNuwaPreparation. * The Nuwa process starts to freeze the threads after receiving the ack. * The Nuwa process sends out NuwaReady as before. * The chrome process sends out NuwaFork, receives AddNewProcess, and then replays the buffered message. The potential pitfall of this approach is that if the buffered messages to be replayed differs for each process, this approach will fail. If this ever happens, what is called in PreloadSlowThings() that results in this message is not safe for the Nuwa process. It should be taken out from the preload.
I'm very concerned about trying to fix this quickly for 1.3. I have two thoughts: 1. What if we just never call PreloadSlowThings for now? Does that make the memory savings unacceptable? 2. What if we remove anything from PreloadSlowThings that sends an async message to the parent? We could enforce with a hard abort. These are obviously not great long term solutions but I think they're much safer than trying to alter MessageChannel for buffering messages in the 1.3 timeframe.
Some experiment results: Without PreloadSlowThings(): APPLICATION PID Vss Rss Pss Uss cmdline Homescreen 21514 36804K 30204K 16533K 12792K /system/b2g/plugin-container Settings 22632 30964K 30960K 15864K 11360K /system/b2g/plugin-container Calendar 23004 30116K 30112K 15143K 10728K /system/b2g/plugin-container (Nuwa) 21187 20088K 20088K 6709K 444K /system/b2g/plugin-container (Preallocated a 23248 11940K 11940K 3776K 436K /system/b2g/plugin-container ------ ------ ------ 67313K 44596K TOTAL With PreloadSlowThings(): APPLICATION PID Vss Rss Pss Uss cmdline Homescreen 14429 35588K 28988K 15243K 11728K /system/b2g/plugin-container Settings 15670 29532K 29528K 14444K 10232K /system/b2g/plugin-container Calendar 15828 28056K 28052K 13059K 8916K /system/b2g/plugin-container (Nuwa) 14106 25404K 25404K 10476K 4616K /system/b2g/plugin-container (Preallocated a 15953 14632K 14632K 5556K 3088K /system/b2g/plugin-container ------ ------ ------ 68077K 47396K TOTAL It's about 1.x MB per app in USS. We can turn off PreloadSlowThings() and land the enablement. In the meantime, we are going to look for async events sent from PreloadSlowThings() so we pick them out.
Messages after PreloadSlowThings called. TabChild::PreloadSlowThings() [time:1385558116607387][378][PHalChild] Sending Msg_GetCurrentScreenConfiguration([TODO]) [time:1385558116611593][378][PHalChild] Received reply Reply_GetCurrentScreenConfiguration([TODO]) [time:1385558116614448][378][PContentChild] Sending Msg_ReadFontList([TODO]) [time:1385558116618122][378][PContentChild] Received reply Reply_ReadFontList([TODO]) [time:1385558116939452][378][PContentChild] Sending Msg_SyncMessage([TODO]) [time:1385558116951151][378][PContentChild] Received reply Reply_SyncMessage([TODO]) [time:1385558116998537][378][PContentChild] Sending Msg_PIndexedDBConstructor([TODO]) [time:1385558117048892][378][PNeckoChild] Sending Msg_PCookieServiceConstructor([TODO]) [time:1385558117072046][378][PContentChild] Sending Msg_ReadPermissions([TODO]) [time:1385558117085069][378][PContentChild] Received reply Reply_ReadPermissions([TODO]) [time:1385558117334462][378][PContentChild] Sending Msg_PStorageConstructor([TODO]) [time:1385558117622830][378][PContentChild] Received Msg_LoadAndRegisterSheet([TODO]) [time:1385558117647017][378][PContentChild] Received Msg_LoadAndRegisterSheet([TODO]) [time:1385558117649317][378][PContentChild] Received Msg_LoadAndRegisterSheet([TODO]) [time:1385558117652447][378][PContentChild] Received Msg_AsyncMessage([TODO]) [time:1385558117663834][378][PContentChild] Received Msg_FileSystemUpdate([TODO]) [time:1385558117665284][378][PStorageChild] Received Msg_ScopesHavingData([TODO]) [time:1385558117666088][378][PContentChild] Sending Msg_NuwaReady([TODO]) <----------- Threads are frozen [time:1385558117693758][378][PContentChild] Received Msg_NotifyProcessPriorityChanged([TODO]) [time:1385558117694639][378][PContentChild] Received Msg_CancelMinimizeMemoryUsage([TODO]) [time:1385558117694898][378][PContentChild] Received Msg_AsyncMessage([TODO]) [time:1385558117695214][378][PContentChild] Received Msg_NuwaFork([TODO])
Depends on: 944665
Assignee: nobody → pwang
blocking-b2g: --- → 1.3?
Whiteboard: [tarako]
hi Patrick, can you confirm if this is still needed? during triage, it is believed that this bug is no longer needed
Flags: needinfo?(pwang)
(In reply to Joe Cheng [:jcheng] from comment #5) > hi Patrick, can you confirm if this is still needed? during triage, it is > believed that this bug is no longer needed We disabled PreloadSlowThings on Nuwa since it breaks test cases on TBPL. Re-enabling preloading on Nuwa can save more memory, I think this is what we should do. However, Nuwa works without this, so I don't think it is necessary for 1.3.
Flags: needinfo?(pwang)
Patrick, we must check that not running PreloadSlowThings doesn't cause regressions in app startup time though.
We still have PreloadSlowThings, although it is not called in Nuwa process. After Nuwa produced a Preallocated process, we call PreloadSlowThings() like original preallocated process does.
Patrick, Please provide next steps and ETA.
Flags: needinfo?(pwang)
Our plan is to separate PreloadSlowThings into two parts: we only run part of PreloadSlowThings that doesn't make Nuwa frozen before Nuwa start to freeze thread, and the rest is run after new process created. I will figure out which part we can load this week.
Flags: needinfo?(pwang)
Attached patch WIP (obsolete) (deleted) — Splinter Review
Attached patch WIP (obsolete) (deleted) — Splinter Review
Attachment #8349238 - Attachment is obsolete: true
Attachment #8349241 - Attachment is obsolete: true
Did some tests for separating the preload slow things: Disable preload (i.e. preload everything after preallocated process launched): APPLICATION PID Vss Rss Pss Uss cmdline b2g 108 77156K 67148K 53027K 48948K /system/b2g/b2g Homescreen 352 32208K 32208K 19134K 16160K /system/b2g/plugin-container Settings 423 30464K 30460K 15051K 10652K /system/b2g/plugin-container Usage 349 27188K 27188K 14717K 12060K /system/b2g/plugin-container Calendar 497 32432K 28232K 14119K 10820K /system/b2g/plugin-container (Preallocated a 516 16552K 16548K 5987K 2784K /system/b2g/plugin-container (Nuwa) 327 18944K 18944K 5793K 1804K /system/b2g/plugin-container ------ ------ ------ 139744K 113324K TOTAL Move following part into preload2.js: 1. Preloading IndexedDBHelper.jsm 2. Preloading SettingsDB.jsm 3. initializing cooking service 4. initializing preference service 5. calling docShell.createAboutBlankContentViewer and load preload2.js after preallocated process launched, the procrank result: APPLICATION PID Vss Rss Pss Uss cmdline b2g 620 76828K 66832K 52620K 48520K /system/b2g/b2g Homescreen 680 32116K 32116K 19014K 16040K /system/b2g/plugin-container Settings 753 30048K 30044K 14646K 10260K /system/b2g/plugin-container Usage 677 27136K 27136K 14641K 11984K /system/b2g/plugin-container Calendar 775 31748K 27548K 13595K 10360K /system/b2g/plugin-container (Nuwa) 661 21908K 21904K 8091K 4596K /system/b2g/plugin-container (Preallocated a 792 16092K 16088K 5828K 3372K /system/b2g/plugin-container ------ ------ ------ 138179K 113696K TOTAL Saved about 480k per process.
Patrick, did you do startup timings? That would be nice to know how we do with and without preload2.js That's unfortunate that we need to move docShell.createAboutBlankContentViewer() into preload2.js since that's one the main responsible for startup time if I recall correctly.
We measured app start-up time for following case: 1. Preload after preallocate process launched, i.e. without any patch in this bug. cold_load_time: median:1068, mean:1071, std: 54, max:1293, min:967 2. Preload staff that don't send asynchronous messages before Nuwa ready, and preload preload2.js after preallocate process launched, i.e. with these two patches. cold_load_time: median:1309, mean:1287, std: 49, max:1337, min:1162 3. Preload staff that don't send asynchronous messages before Nuwa ready, but don't call preload2.js cold_load_time: median:1362, mean:1339, std: 46, max:1395, min:1208 The result shows that the first case is the fastest in terms of app start-up time, and it takes 300 more ms to start in the 2nd case. The 300ms value is almost constant in our test on Setting, Clock and Calendar app. Based on the 2nd case, we also did some experiments on changing the timing of calling GC. If we don't call GC before Nuwa ready and call GC after preallocated process launched, the startup time decreased by about 100 ms. If we call GC before Nuwa ready and after preallocated process launched, the startup time decreased by about 100 ms as well. Based on the result, we presume that the initialization process of an app may modify the values that have been preloaded. For the 2nd case and the 3rd case, these actions cause memory copies, which introduce latency during app startup. But for the first case, since the preloading occurs after fork, they won't trigger memory copy.
hi Patrick, do you mind giving an update on how to move forward with this bug? thanks
Flags: needinfo?(pwang)
(In reply to Fabrice Desré [:fabrice] from comment #16) > Patrick, did you do startup timings? That would be nice to know how we do > with and without preload2.js > > That's unfortunate that we need to move > docShell.createAboutBlankContentViewer() into preload2.js since that's one > the main responsible for startup time if I recall correctly. I took measurement of how long it takes. In the preallocated process, nsDocShell::CreateAboutBlankContentViewer() is called twice, taking ~ 35 and 25 ms, respectively. And preload2.js runs in the preallocated process, that is, it is already executed when an app is launched. I don't think it should have such significant impact to app launch time.
COW doesn't fully explain how such big slowdown came to be. Bug 948648 comment #5 has some test result.
(In reply to Joe Cheng [:jcheng] from comment #18) > hi Patrick, do you mind giving an update on how to move forward with this > bug? thanks We are still investigating the cause of longer launch time. I am trying to figure out which scripts trigger COW. As Cervantes mentioned in comment 20, longer launch time may be the result of many factors, not just COW. We will need more investigation on root cause to avoid slowing down the app launch time.
Flags: needinfo?(pwang)
(In reply to Patrick Wang (ChihKai) (:kk1fff) from comment #6) > (In reply to Joe Cheng [:jcheng] from comment #5) > > hi Patrick, can you confirm if this is still needed? during triage, it is > > believed that this bug is no longer needed > > We disabled PreloadSlowThings on Nuwa since it breaks test cases on TBPL. > Re-enabling preloading on Nuwa can save more memory, I think this is what we > should do. However, Nuwa works without this, so I don't think it is > necessary for 1.3. Marking blocking- per this comment
blocking-b2g: 1.3? → -
add [POVB] to note that this can go into Tarako branch
Whiteboard: [tarako] → [tarako][POVB]
Summary: Freezing threads too early makes main thread blocked → Load preload.js in Nuwa process may freeze Nuwa
As Cervantes' test (Bug 957509 comment 7), loading Preload in Nuwa but not loading Preload2 in Preallocated Process saves more USS in Preallocated process. According to experiment in comment 17, preloading only first stage should be a better way to preload than preload in preallocated process under the condition of not blocking Nuwa.
Patrick, any update?
I pushed my patch to try and got failure in test cases, I think that it is because the patch modifies the BrowserElementChildPreload stuff, still investigating.
I just get an idea of this bug. If I am right, every IPC requests from content process would have one or more tasks/runnables in the chrome process, in queue or running, until they are done. (I means all tasks for IPC messages are done.) The IPC messages send back to the content process only before done. That means once all queues of the chrome process are empty no any messages would be sent to the Nuwa process in the future, except requests for forking a new process. The idea is to let the Nuwa process keep awake until the first time all task queues of the chrome are empty, then the Nuwa process could go to frozen.
remove [tarako] whiteboard as it's not going to be in time for tarako timeframe
Whiteboard: [tarako][POVB]
Separating preload.js would help reduce USS of preallocated process. We can take solution in comment 27 as a long term solution and separate the script to make it not freeze nuwa for now.
Attachment #8349970 - Attachment is obsolete: true
Attachment #8349971 - Attachment is obsolete: true
Blocks: 970307
Try run without preload2. (preload scripts that don't freeze nuwa only) https://tbpl.mozilla.org/?tree=Try&rev=99a6df383e1f
No longer blocks: 970307
This bug is for temporary solution for preload in Nuwa. However, we have decided to focus on bug 970307, which is filed for a more complete solution.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: