Closed Bug 921659 Opened 11 years ago Closed 11 years ago

Pre Allocated plugin container consumes more memory in FFOS 1.2 than FFOS 1.1

Categories

(Firefox OS Graveyard :: General, defect, P1)

ARM
Gonk (Firefox OS)
defect

Tracking

(blocking-b2g:-)

RESOLVED WORKSFORME
1.3 C2/1.4 S2(17jan)
blocking-b2g -

People

(Reporter: tkundu, Assigned: khuey)

References

Details

(Keywords: perf, regression, Whiteboard: [MemShrink:P1] [c=memory p= s= u=])

Attachments

(1 file, 9 obsolete files)

I launched multiple apps and run |adb shell procrank| in both FFOS 1.1 and 1.2 . 

I found that Preallocated app is taking memory as below :

In 1.1 : 

Preallocate app PSS	Preallocate app USS
10516	                  7964
10028	                  7968
9689	                  7960
9455	                  7964
9453	                  7964
9453	                  7968
9369	                  8068
9870	                  8640
9529	                  7960

In 1.2 :

Preallocate app PSS	Preallocate app USS
16052	                11552
14835	                11432
13961	                11220
13779	                11456
13368	                11456
13248	                11456
18781	                16872
17808	                14960
18853	                16872
18274	                16040


It seems to that Gecko has some changes which is causing Pre-allocated process to take more USS memory in 1.2 than 1.1 

Can anyone tell us what are those changes ? It would be great if we can reduce it.
Note - memory issues are tracked with the MemShrink whiteboard. perf is used for responsiveness issues. This is to establish separation of memory issues from the rest of the performance issues, so that MemShrink team is aware of memory-specific issues they may need to investigate.
Keywords: perf
Whiteboard: [MemShrink]
Keywords: perf
See comment 1.
Keywords: perf
I don't see any harm in tagging with both??
blocking-b2g: --- → koi?
Adding 'perf' so we can review it.
'regression' because it is worse than in the previous release.
Keywords: perf, regression
(In reply to Tapas Kumar Kundu from comment #0)
> I launched multiple apps and run |adb shell procrank| in both FFOS 1.1 and
> 1.2 . 
> 
> I found that Preallocated app is taking memory as below :

Can we get a dashboard that tracks that to make sure we don't keep regressing that more?
Hi, patch from #bugid 811671 is improving memory usage for PreAllocated process.
(In reply to Tapas Kumar Kundu from comment #6)
> Hi, patch from #bugid 811671 is improving memory usage for PreAllocated
> process.

Yes, but we still need to understand (and fix is possible) the regression.
Whiteboard: [MemShrink] → [MemShrink] [c=memory p= s= u=]
Can we get about:memory dumps from the preallocated process on 1.1 and 1.2?
Assignee: nobody → mchang
Assignee: mchang → nobody
blocking-b2g: koi? → koi+
I will get about:memory dumps on hamachi for v1.1 and v1.2.
Assignee: nobody → mchang
Priority: -- → P1
Status: NEW → ASSIGNED
Whiteboard: [MemShrink] [c=memory p= s= u=] → [MemShrink] [c=memory p=2 s= u=]
Attached file About Memory dumps v1.2 vs v1.1 (obsolete) (deleted) —
Attached are about:memory dumps on an otoro device on branch v1.2 versus v1.1-train. I rebooted the device, then ran the tools/get_about_memory.py. It does indeed look like there is a regression on the preallocated process. v1.2 uses 8.23 mb in explicit allocations versus v1.1 uses 5.11 mb. Can I help with anything else?
Flags: needinfo?(khuey)
Mason, I can't get the 1.2 dump to load in my nightly : "Invalid memory report(s): non-sentence other description: redundant/js-main-runtime-compartments/system,"
Fabrice, yeah I had that issue as well. The about:memory report only works on Aurora for me.
Target Milestone: --- → 1.2 C3(Oct25)
Whiteboard: [MemShrink] [c=memory p=2 s= u=] → [MemShrink:P1] [c=memory p=2 s= u=]
(In reply to Fabrice Desré [:fabrice] from comment #5)
> (In reply to Tapas Kumar Kundu from comment #0)
> 
> Can we get a dashboard that tracks that to make sure we don't keep
> regressing that more?

Bug 917717 has been filed for that.
Thanks Mason.

The most interesting part:

3,281,372 B (100.0%) -- explicit
├──1,501,156 B (45.75%) -- workers/workers()/worker(resource://gre/modules/osfile/osfile_async_worker.js, 0xNNN)

:-(
Flags: needinfo?(khuey)
Yoric do you know if anything on b2g is actually using OS.file?  I suspect the answer is yes, but if it's no this would be easy to fix ...
Flags: needinfo?(dteller)
Mason is no longer the right owner for this.  I'll take it for now.
Assignee: mchang → khuey
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #15)
> Yoric do you know if anything on b2g is actually using OS.file?  I suspect
> the answer is yes, but if it's no this would be easy to fix ...

And to be clear, this is anything in the child process, which afaik can't actually do I/O ...
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #17)
> (In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #15)
> > Yoric do you know if anything on b2g is actually using OS.file?  I suspect
> > the answer is yes, but if it's no this would be easy to fix ...
> 
> And to be clear, this is anything in the child process, which afaik can't
> actually do I/O ...

Kyle, I'll take a look tomorrow at what we preload in child processes. We may import a jsm that can run both in child and parent and does i/o in the parent.
We should explore making osfile.jsm create the worker lazily too.
Attached patch patch (obsolete) (deleted) — Splinter Review
Kyle, can you check if this patch fixes the issue?
Attachment #817624 - Flags: feedback?(khuey)
It is very unclear to me what happens in children processes and what in parent, so I'm afraid I can't tell whether B2G children processes use OS.File, sorry.

A few things, though:
- unless the implementation of workers is different between Firefox and B2G, the worker should be created lazily by the underlying C++ code;
- 1.5Mb memory use by OS.File looks very surprising – could this be related to bug 894993?
Flags: needinfo?(dteller)
Comment on attachment 817624 [details] [diff] [review]
patch

Mason can you try this?  I don't have a 1.2 build env available.
Attachment #817624 - Flags: feedback?(khuey) → feedback?(mchang)
Attached file fabriceData.zip (obsolete) (deleted) —
Just uploaded the about:memory dumps on an otoro for v1.2. v1.2-init is a fresh flash of an otoro device. v1.2fix includes fabrice's patch. Memory usage of the pre-allocated process is roughly the same (8.46 mb pre-patch, 8.37mb post patch). Per khuey's comment 14, The worker resource still exists in both about:memory dumps. 

For Fabrice's patch, I also attached the an adb logcat. I do not see the dump error if a parent does not exist. At the moment, it looks like the patch does not resolve the issue.
Yep, that confirms what b2g-info is showing. I'll dig a bit more...
Attachment #817624 - Flags: feedback?(mchang)
So, the worker usage comes from netwerk/protocol/http/UserAgentUpdates.jsm that loads the ua-update.json file from disk (it has special permissions to let content process read it).

This file uses OS.File, and the worker is never shutdown :( I see two options there:
1) the OS.File team provides a way to shutdown the worker.
2) we rewrite the file access code in UserAgentUpdates.jsm to not use OS.File anymore.

Yoric, can you tell if 1) is feasible?
Flags: needinfo?(dteller)
Also, even with that worker issue fixed, we won't be back at 1.1 level it seems.
That should be possible. I assume that you want the worker to be reloaded transparently next time it is needed, don't you?
Can you file a bug and mention your deadline?
Flags: needinfo?(dteller)
Depends on: 927560
(In reply to David Rajchenbach Teller [:Yoric] from comment #28)
> That should be possible. I assume that you want the worker to be reloaded
> transparently next time it is needed, don't you?

I want things to work, yes ;)

> Can you file a bug and mention your deadline?

I filed bug 927560.
We need to backport that on b2g 1.2... so we need this fix asap.
For what it's worth: Child processes shouldn't be opening files themselves; any interaction with the filesystem should go through the parent.  Eventually, it will be the case that if the child process tries to use the open() system call, it will be immediately killed by the OS, but there are a few known issues in Gecko or underlying libraries (media cache, font loading, graphics drivers) that need to be dealt with first.
(In reply to Fabrice Desré [:fabrice] from comment #27)
> Also, even with that worker issue fixed, we won't be back at 1.1 level it
> seems.

No, although getting rid of the worker should get us at least half the way there.
Depends on: 927633
It's not clear to me how the worker could take 1.5Mb, though.
(In reply to David Rajchenbach Teller [:Yoric] from comment #32)
> It's not clear to me how the worker could take 1.5Mb, though.

Workers need their own js runtime. Also, if you're using js-ctypes I heard that this was not the most memory efficient api we have.
Actually, that makes sense.

For reference, on my desktop, osfile_async_worker.js takes 1.94Mb after ~4 days of use.
- 0.89Mb runtime;
- three zones of 0.25 to 0.4Mb each.

I am just a little surprised by these three zones. I was certain that workers only had one zone/one compartment each. Maybe there is something that can be improved there.
Yes, nbp is working on workers memory improvements.
Actually, come to think about it, it may be possible to bring the memory down by making each js-ctypes binding lazy (bug 801376). Tell me if you need me to focus on that.
Flags: needinfo?(fabrice)
(In reply to David Rajchenbach Teller [:Yoric] from comment #34)
> Actually, that makes sense.
> 
> For reference, on my desktop, osfile_async_worker.js takes 1.94Mb after ~4
> days of use.
> - 0.89Mb runtime;
> - three zones of 0.25 to 0.4Mb each.
> 
> I am just a little surprised by these three zones. I was certain that
> workers only had one zone/one compartment each. Maybe there is something
> that can be improved there.

There's one for the content JS, one for atoms, and one for self-hosted JS.
(In reply to David Rajchenbach Teller [:Yoric] from comment #36)
> Actually, come to think about it, it may be possible to bring the memory
> down by making each js-ctypes binding lazy (bug 801376). Tell me if you need
> me to focus on that.

I don't think it's needed for us here. We'll need to do bug 927633 sooner or later.
Flags: needinfo?(fabrice)
Attached file User Agent Moved to Parent Process about:memory dump (obsolete) (deleted) —
Per Bug 927633 patch, attached is an updated about:memory. By moving the UserAgent logic to use IPC rather than file reading/writing in every child process, memory usage for a preallocated app is reduced to 6.15 mb! So to recap, the memory usage for a preallocated app:

v1.2 - 8.23 mb
v1.1 - 5.11 mb
v1.2 w/ UserAgent moved to parent - 6.15 mb.
Attachment #820805 - Attachment description: User Agent Moved to Parent Process → User Agent Moved to Parent Process about:memory dump
Thanks Mason.  That's great: a 2/3rds reduction in the memory increase.

After that we're left with an increase of 1,106,364 B (100.0%) -- explicit.  That's scattered across a lot of things though ... it might even be noise.  I think after we fix the worker we should close this as fixed or retriage it off the koi+ list.
Can we retest here now that bug 927560 has landed?  Hopefully that buys us everything bug 927633 does.
Flags: needinfo?(mchang)
Bug 927560 does *not* shutdown the worker automatically. We still need ~1 loc in the right place to shutdown the worker once it is not needed anymore. I have no idea where that loc is needed, though.
There are uses of OS.File in UserAgentUpdates.jsm and AppsUtils.jsm. It's not clear to me in which order these things are called. Could someone who knows the code better than me add a call to |OS.File.resetWorker()| in the right place?

Note that this method will throw (asynchronously) if some files are not closed yet.
Alternatively, I could add a pref-controlled mechanism to auto-closed. This will take a little more time, though.
Does resetWorker affect the whole world?  That is, do we have to coordinate all the modules that use OS.File to shut down the worker?
Yes. OS.File.resetWorker shuts down the worker for everyone. The worker is relaunched transparently as needed. The other constraint is that you can't shutdown the worker while a file or directlry is open.
Can each module just call it when they're done with it, and if some other module is still using it just swallow the exception?  That seems like the best way to use it to me.
Unless I misunderstand your proposal, that sounds fragile/complicated/invasive.
Attached patch closeFile.patch (obsolete) (deleted) — Splinter Review
Patch to close file worker with bug 927560.
Attached file workers.zip (obsolete) (deleted) —
Attached is the about:memory reports using a fresh mozilla-central with and without attachment 822649 [details] (closeFile.patch). The "fix" directory is with patch 822648 applied. The "init" directory is a clean mozilla-central build which includes the landed bug 927560. It shows that by explicitly calling resetWorker, we clean up the worker in a preallocated app on a fresh boot and it no longer shows up in the about:memory profile.

Sidenote - The patch in 927560 does not cleanly apply to mozilla-aurora, so I had to use mozilla-central which consumes more memory for a preallocated app in general.

There are a few issues that I can see with this fix versus bug 927633.
1) According to adb logcat, we re initialized / closed the worker twice after a fresh 'adb reboot'. In general, my understanding is that creating a worker everytime we start a process just to close it will add a performance penalty to launching new apps. simplified logcat dump:

I/Gecko   (  339): Reset worker in apply saved update
I/Gecko   (  339): Cleaned up the worker
....
I/Gecko   (  446): Reset worker in apply saved update
I/Gecko   (  446): Cleaned up the worker

2) We could have a hanging memory leak if any module is reading from a file since the worker is shared. Maybe we could create a new bug to track the work Kyle mentioned in Comment 48? I heard that due to an upcoming sandboxing policy, children wouldn't be able to read from files anyway soon? Is that true?

3) The problem with bug 927633 is that it is requires more CPU cycles to use a sync call using IPC than the previous implementation. I think bug 927633 overall buys more though because we don't have (a) explicit open file / close file at every process which delays app startup. and (b) the performance slowdown occurs only if one specific call path is taken (navigator.userAgent). Although, this equation could switch and the reading/closing file is better if the app reads navigator.userAgent all the time.

That's my current understanding of the b2g system, but I'm new so please let me know if I'm way off!
Flags: needinfo?(mchang)
I'm not a bug fan of comment 48, but here's how we can do it.

I assume that we know the list of OS.File clients. We can assign each client a promise to be resolved once it doesn't use OS.File anymore, and attach Promise.all(...).then(OS.File.resetWorker).
> 1) According to adb logcat, we re initialized / closed the worker twice after a fresh 'adb reboot'. In general, my understanding is that creating a worker everytime we start a process just to close it will add a performance penalty to launching new apps.

I'm not sure I understand the "creating a worker everytime we start a process just to close it", but yes, creating a worker certainly costs performance.

Other than that, I'm not sure how I can be of further assistance on this bug. Please ping me / needinfo? me if you think I can help.
 > I'm not sure I understand the "creating a worker everytime we start a
> process just to close it", but yes, creating a worker certainly costs
> performance.

If I'm reading the code correctly, preload.js (used when a new process is created), inits UserAgentOverrides which inits UserAgentUpdates which reads the file ua-update.json. Since we read the file from disk, we create the OS.File worker, read our file, then close the worker again. Ideally, we could avoid this by just having the worker open once on device power up, and ask the worker to read the file when we need it, thereby avoiding the worker creation / shutdown cost everytime we start a process.

It looks like bug 927633 will require a bit more work, so as a temporary solution, we could use the fix in bug 927560. Kyle, what do you think?
Flags: needinfo?(khuey)
I concur that having a single OS.File worker in the parent process would certainly be more efficient.
Updating Target Milestone for FxOS Perf koi+'s.
Target Milestone: 1.2 C3(Oct25) → 1.2 C4(Nov8)
Whiteboard: [MemShrink:P1] [c=memory p=2 s= u=] → [MemShrink:P1] [c=memory p=2 s= u=1.2]
Attached file workers.zip (obsolete) (deleted) —
Bug 927633 has landed. Attached are updated about:memory numbers with the landed patch. 

To summarize memory size for preallocated app:
v1.1 - 5.11 mb
V1.2 orig - 8.72 mb
v1.2 w/ Bug 927633: 6.94 mb.

So we still have a regression, but do we keep this koi+, or close as fixed as Kyle stated in comment 40.
Attachment #822656 - Attachment is obsolete: true
Well we certainly can't close this until bug 927633 lands on 1.2.
Flags: needinfo?(khuey)
Can we get somebody from QA to figure out how we look now?  The numbers seem to be a bit noisy, so a few measurements each on 1.1, 1.2 without bug 927633, and 1.2 with 927633 would be nice.
Flags: needinfo?(jsmith)
Keywords: qawanted
Sure. I'll find someone to look into this.
Flags: needinfo?(jsmith)
QA Contact: sarsenyev
I have just looked at the memory used by OS.File on my system, and it seems to have decreased, presumably due to bug 801376.
Whiteboard: [MemShrink:P1] [c=memory p=2 s= u=1.2] → [MemShrink:P1] [c=memory p= s= u=1.2]
Target Milestone: 1.2 C4(Nov8) → 1.2 C5(Nov22)
Here is a couple measurements for 1.1 and 1.2 on Leo device after launched multiple apps

Preallocated app on 1.1
     Pss      Uss  
   8484K    7872K  
   8498K    7888K  
   9010K    7876K  
   9010K    7876K  
   10807K   10080K  

Preallocated app on 1.2 when the bug 927633  fix has landed 
   10529K    9064K  
   10558K    9092K  
   10540K    9076K 
   10532K    9068K 
   10537K    9072K  
1.1
Device: Leo 1.1 MOZ RIL
BuildID: 20131108041203
Gaia: d9c6b0c3384625e467f892c9dafbc40cfe132180
Gecko: 31fa87bfba88
Version: 18.0
Firmware Version:V10c

1.2 
Device: Leo 1.2 MOZ RIL
BuildID: 20131108004004
Gaia: 4cf40fb30c7b8380ea83ed0d4efe043b8b81737f
Gecko: a886c641a306
Version: 26.0
Firmware Version:V10c
Keywords: qawanted
That's not sufficient for the QA Wanted request. You need to include preallocated app values before bug 927633 landed as well & indicate how you measured it, how often you repeated the test, etc.
Keywords: qawanted
Attached file b2g-procrank1.1&1.2.zip (obsolete) (deleted) —
As per comment 63 adding an additional info
The measurement of preallocated app
For the test I used two commands "adb shell b2g-procrank" and adb shell procrank
Each step I repeated for 5 times, each time I restarted the device.
Leo device was used for the test, for some reason the "procrank" command doesn't work on Buri device

Here is the measurement on Leo 1.2 before the fixed has landed in bug 927633,
(Preallocated  13613K   12084K  
               12321K   11328K  
               12831K   11300K 
               12905K   11376K 
               12292K   11300K
Keywords: qawanted
The info about the last build before the fix is landed
Device: Leo 1.2 MOZ RIL
BuildID: 20131107004003
Gaia: 590eb598aacf1e2136b2b6aca5c3124557a365ca
Gecko: 26f1e160e696
Version: 26.0
Firmware Reversion: V10c
Moving to 1.3
blocking-b2g: koi+ → 1.3?
Whiteboard: [MemShrink:P1] [c=memory p= s= u=1.2] → [MemShrink:P1] [c=memory p= s= u=]
Target Milestone: 1.2 C5(Nov22) → 1.3 Sprint 5 - 11/22
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #59)
> Can we get somebody from QA to figure out how we look now?  The numbers seem
> to be a bit noisy, so a few measurements each on 1.1, 1.2 without bug
> 927633, and 1.2 with 927633 would be nice.

Kyle, now that QA has responded to your qawanted can you take a look at this?
Flags: needinfo?(khuey)
Target Milestone: 1.3 Sprint 5 - 11/22 → ---
So fixing the OS.file thing shaved about 2.3 MB off the USS increase in the preallocated app process.  We still have a 1.1 MB USS increase from 1.1 to 1.2.
Flags: needinfo?(khuey)
Can we get another set of about:memory dumps from 1.1 and the tip of 1.2?
Flags: needinfo?(sarsenyev)
The memory dumps from the latest Buri 1.2
13104K   10184K 
10773K    9116K
9318K    8156K 
12097K   10212K 
12236K   10596K 

The memory dumps from the latest Leo 1.1
9061K    8004K 
9176K    8004K 
9168K    7996K 
10609K    9296K 
9172K    8000K
Flags: needinfo?(sarsenyev)
No, I'm not talking about just the summary numbers.  I'm talking about the memory-reports JSON file that get_about_memory.py grabs.  Like what mchang collected in comment 10.
Flags: needinfo?(sarsenyev)
Attached file The latest-1.2VS1.1.zip (obsolete) (deleted) —
The latest memory reports in the latest attachment the "latest1.2VS1.1"
Flags: needinfo?(sarsenyev)
Kyle

Do we have the requisite info  to make progress?

which component is the bug seen in?
Flags: needinfo?(khuey)
The latest reports show that the USS of the preallocated app process has increased by 1.9 MB from 1.1 to 1.2.

The about:memory reports show that the explicit/ tree has increased by 1.1 MB.  But 0.6 MB of that is from the reporters added in Bug 898558, which moved reporters from outside the explicit tree to inside the explicit tree.  So really, the explicit/ tree has increased 0.5 MB and the other 1.4 MB increase comes from somewhere outside the normal heap.

Continuing to investigate ...
Flags: needinfo?(khuey)
Minusing from 1.3 since major causes have been identified and addressed. Remaining issues likely have multiple causes that will take time to resolve. See comment 39, comment 40 & comment 41 for details.

Investigation will continue alongside development of the memory monitoring and debugging tools covered by bug 917717 and bug 945973.
blocking-b2g: 1.3? → -
Attached file 11v12.zip (deleted) —
The lack of the smaps reporter main diagnosing this further a pain, so I added it back in.  The result is that almost all of the remaining memory regression is caused by additional private bytes attributed to libxul.so.  Since libxul increased in size by 33% from gecko 18 to gecko 26 it's not that surprising.  I'm going to build with prelink enabled to see if that reduces the regression.
Attachment #815496 - Attachment is obsolete: true
Attachment #817624 - Attachment is obsolete: true
Attachment #817948 - Attachment is obsolete: true
Attachment #820805 - Attachment is obsolete: true
Attachment #822648 - Attachment is obsolete: true
Attachment #825537 - Attachment is obsolete: true
Attachment #829477 - Attachment is obsolete: true
Attachment #8342571 - Attachment is obsolete: true
Is there anything else actionable here, or can it be closed?
Yeah I think we're done here.  The remaining regression is accounted by increased binary size and is mitigated by prelinking.
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → WORKSFORME
Target Milestone: --- → 1.3 C2/1.4 S2(17jan)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: