Closed Bug 957346 Opened 11 years ago Closed 11 years ago

[B2G][Music]The music application terminates when playing the game "Poppit"

Categories

(Firefox OS Graveyard :: Gaia::Music, defect, P1)

ARM
Gonk (Firefox OS)
defect

Tracking

(blocking-b2g:1.3+, firefox27 wontfix, firefox28 fixed, firefox29 fixed, b2g-v1.2 unaffected, b2g-v1.3 fixed, b2g-v1.4 fixed)

RESOLVED FIXED
1.3 C2/1.4 S2(17jan)
blocking-b2g 1.3+
Tracking Status
firefox27 --- wontfix
firefox28 --- fixed
firefox29 --- fixed
b2g-v1.2 --- unaffected
b2g-v1.3 --- fixed
b2g-v1.4 --- fixed

People

(Reporter: selkabule, Assigned: bkelly)

References

Details

(Keywords: perf, regression, Whiteboard: [c=memory p=5 s= u=1.3] dogfood1.3 [MemShrink])

Attachments

(2 files, 2 obsolete files)

Attached file Music cat log.txt (deleted) —
Description:
while playing a song in the background and launching a game application. The music terminates unexpectedly 
Repro Steps:
1) Updated Buri1.3 Moz Ril
2) Download the game "Poppit" from the marketplace
3) Launch the music application 
4) Select a song to play and exit the app
5) launch the game "poppit"

Actual:
The music application terminates and the music in the background stops unexpectedly either during game play or upon launching the game

Expected:
The music should play in the back ground 

Environmental Variables
Device: buri1.3 Moz Ril
Build ID: 20140107004001
Gecko: http://hg.mozilla.org/releases/mozilla-aurora/rev/71ad7ff30010
Gaia: ff5f06dd321f7d2bf4a5b311ded2c56e865d4f83
Platform Version: 28.0a2

Notes:
Repro frequency: 100%
See attached: video clip(http://www.youtube.com/watch?v=ZFxWTpdmwiM), logcat
This issue does not reproduce on buri1.2
Environmental Variables
Device: buri1.2 Moz Ril
Build ID: 20140106004001
Gecko: http://hg.mozilla.org/releases/mozilla-b2g26_v1_2/rev/d552c08a72d0
Gaia: 8441587c3b352e052fee07665c21fd192540f19f
Platform Version: 26.0
Sounds like a memory regression if we aren't seeing this happen on 1.2.
blocking-b2g: --- → 1.3?
Keywords: perf
Whiteboard: dogfood1.3 → dogfood1.3 [MemShrink]
(In reply to Jason Smith [:jsmith] from comment #2)
> Sounds like a memory regression if we aren't seeing this happen on 1.2.

More specifically, Poppit is consuming far more memory in 1.3 than 1.2 to cause the music app to terminate in the background.
QA Contact: rkunkel
blocking-b2g: 1.3? → 1.3+
Whiteboard: dogfood1.3 [MemShrink] → [c=memory p= s= u=1.3] dogfood1.3 [MemShrink]
I was unable to open both the music app and poppit app for v1.3 Buri builds 9/19 - 9/24. The Poppit app will not fully open if the music app is in use on these builds.

- First Broken - 

9/25
Environmental Variables:
Device: Buri 1.3 MOZ
BuildID: 20130925174530
Gaia: 7e42b4d690049709c62e8783910f16ab20869f42
Gecko: fa0e6916f88c
Version: 27.0a1
Firmware Version: 20131115
Priority: -- → P1
Assignee: nobody → bkelly
Status: NEW → ASSIGNED
I took some b2g-info captures while playing poppit on my phone.  I just wanted to compare overall memory usage with the game at first, so I'm not even running the music app yet.

v1.2 b2g-info:

      NAME   PID NICE  USS  PSS  RSS VSIZE OOM_ADJ USER
       b2g 13324    0 44.4 46.6 50.7 170.6       0 root
Homescreen 13395   18 14.1 16.0 19.7  69.5       8 app_13395
  Poppit!" 13548    0 75.3 77.6 81.7 481.3       1 app_13548


v1.3 b2g-info:

    NAME   PID NICE  USS  PSS  RSS VSIZE OOM_ADJ USER
     b2g 14524    0 49.6 52.8 56.2 172.5       0 root
Poppit!" 14812    1 85.3 88.6 92.1 495.6       2 app_14812


So both the b2g parent and poppit processes are using more memory.  I'll see if I can get about:memory running.
This is a link to a zip archive containing a number of about-memory reports running just poppit on my buri.  It was too large to attach to the bug directly.

For both v1.2 and v1.3 I captured:

- 3 reports taken immediately after launching poppit and revealing the game board
- 3 reports taken after playing about half a board of poppit

I'm still analyzing the information, but here are my observations so far:

1) ril_worker.js is always present in the v1.3 reports and missing from the v1.2 reports.  This takes about 2.7MB in the parent process.
2) The poppit process consistently has much more unused-gc-things in the app js-zone for v1.3 compared to v1.2.  This accounts for a difference ranging from 1.5MB to 2.5MB.
3) The shell.html in the parent process takes a little bit more memory in v1.3 compared to v1.2.  This accounts for 0.1MB.
4) The keyboard takes slightly more memory in v1.3 compared to v1.2.  This accounts for 0.1MB.

I'm going to focus on (1) and (2) for now since they explain a loss of ~5MB.  This could easily explain no longer being able to run the music app given the current memory levels.

I'll write a separate bug for (1) to see if I can get anyone from the RIL team to look at it.

I'm still investigating the unused-gc-things in (2).  From talking to many folks on #jsapi it sounds like this is due to a shrinking GC.  Most of the items in a memory chunk have been free'd, but there are still some outstanding allocations preventing the entire chunk from being free'd.  These "unused things" are essentially a memory fragmentation issue.

My best line on a possible cause of the fragmentation is an observation that there appears to be more global-parent JS shapes in use.  For example, see the js-compartment -> shapes -> gc-heap -> tree -> global-parent entries in v1.2:


│   │  ├──2.40 MB (03.39%) -- js-compartment(app://{a6594ada-be91-43fa-817f-ca6aa5e68ce3}/index.html)
│   │  │  ├──0.70 MB (00.98%) ++ baseline
│   │  │  ├──0.65 MB (00.92%) -- objects
│   │  │  │  ├──0.49 MB (00.69%) -- gc-heap
│   │  │  │  │  ├──0.30 MB (00.43%) ── ordinary
│   │  │  │  │  ├──0.14 MB (00.20%) ── function
│   │  │  │  │  └──0.05 MB (00.07%) ── dense-array
│   │  │  │  ├──0.16 MB (00.23%) -- malloc-heap
│   │  │  │  │  ├──0.12 MB (00.16%) ── slots
│   │  │  │  │  └──0.05 MB (00.07%) ── elements/non-asm.js
│   │  │  │  └──0.00 MB (00.00%) ── non-heap/code/asm.js
│   │  │  ├──0.46 MB (00.65%) ── ion-data
│   │  │  ├──0.42 MB (00.59%) -- shapes
│   │  │  │  ├──0.22 MB (00.31%) -- gc-heap
│   │  │  │  │  ├──0.13 MB (00.19%) ── tree/global-parented
│   │  │  │  │  ├──0.07 MB (00.10%) ── base
│   │  │  │  │  └──0.01 MB (00.01%) ── dict
│   │  │  │  └──0.20 MB (00.29%) -- malloc-heap
│   │  │  │     ├──0.13 MB (00.18%) ── tree-tables
│   │  │  │     ├──0.06 MB (00.08%) ── compartment-tables
│   │  │  │     ├──0.01 MB (00.02%) ── tree-shape-kids
│   │  │  │     └──0.01 MB (00.01%) ── dict-tables
│   │  │  ├──0.09 MB (00.13%) ++ type-inference
│   │  │  ├──0.06 MB (00.09%) ++ scripts
│   │  │  └──0.01 MB (00.02%) ++ sundries

And in v1.3:

│   │  ├──3.18 MB (04.45%) -- js-compartment(app://{d051d64e-9c4a-4927-b010-b9820603f9bd}/index.html)
│   │  │  ├──1.25 MB (01.74%) -- objects
│   │  │  │  ├──0.87 MB (01.22%) -- gc-heap
│   │  │  │  │  ├──0.43 MB (00.60%) ── function
│   │  │  │  │  ├──0.39 MB (00.55%) ── ordinary
│   │  │  │  │  └──0.05 MB (00.07%) ── dense-array
│   │  │  │  └──0.37 MB (00.52%) -- (2 tiny)
│   │  │  │     ├──0.37 MB (00.52%) -- malloc-heap
│   │  │  │     │  ├──0.33 MB (00.46%) ── slots
│   │  │  │     │  ├──0.04 MB (00.05%) ── elements/non-asm.js
│   │  │  │     │  └──0.01 MB (00.01%) ── regexp-statics
│   │  │  │     └──0.00 MB (00.00%) ── non-heap/code/asm.js
│   │  │  ├──1.06 MB (01.48%) ++ (5 tiny)
│   │  │  └──0.87 MB (01.22%) -- shapes
│   │  │     ├──0.60 MB (00.84%) -- gc-heap
│   │  │     │  ├──0.36 MB (00.50%) -- tree
│   │  │     │  │  ├──0.35 MB (00.48%) ── global-parented
│   │  │     │  │  └──0.02 MB (00.02%) ── non-global-parented
│   │  │     │  ├──0.19 MB (00.27%) ── base
│   │  │     │  └──0.05 MB (00.07%) ── dict
│   │  │     └──0.27 MB (00.38%) -- malloc-heap
│   │  │        ├──0.12 MB (00.17%) ── tree-tables
│   │  │        ├──0.08 MB (00.12%) ── compartment-tables
│   │  │        ├──0.03 MB (00.05%) ── tree-shape-kids
│   │  │        └──0.03 MB (00.04%) ── dict-tables 

This suggests to me that there are differences in how the APIs being used by the app are behaving.  Poppit seems to use mozAudio, which is a somewhat rare API, so I will look there.  (For example, I don't see this change in homescreen, so mozAudio is something that poppit uses, but homescreen does not.)

Fortunately the code for poppit is not minified, so I can easily modify it to investigate different theories.
Depends on: 958760
Ok, the feedback I got on the ril_worker difference was that my device is running COM ril on v1.2 and MOZ ril on v1.3.  I closed that bug as invalid.  I'll have to remeasure with MOZ RIL on v1.2.
I've updated the about-memory zip file with MOZ RIL data.

I'm still investigating, but wanted to note that about 25MB to 30MB of the memory in use by poppit is due to the way it loads sounds using HTML5 audio elements.  It wants to be able to play multiple instances of the same sound quickly or overlapping so it loads each sound 4 times.  This means the app keeps 4 copies of the decoded data in memory.  The heap-unclassified also seems to grow quite a bit with multiple copies of the sounds, so its likely the driver libraries are using memory for each copy as well.

Here is an article about the multi-channel audio technique it is using:

  http://phoboslab.org/log/2011/03/multiple-channels-for-html5-audio

cloneNode() did not help.  The feedback I got on #media was that the duplicated memory could be avoided by using WebAudio.

I'll continue investigating as this particular memory issue is in both v1.2 and v1.3.  I just wanted to note there are major gains to be had in poppit there.
Depends on: 959603
I opened an evangelism bug to investigate tuning some settings in impactjs to reduce the issue described in comment 8.  Still investigating this bug.
Spamming b2g-info while launching poppit during music playing showed this:

                    |     megabytes    |
    NAME   PID NICE  USS  PSS  RSS VSIZE OOM_ADJ USER
     b2g 18026    0 46.8 49.0 52.8 167.3       0 root
   Music 20189    7 16.4 18.7 22.6  87.8      10 app_20189
Poppit!" 20293    1 75.2 77.3 81.2 479.8       2 app_20293

System memory info:

            Total 179.7 MB
     Used - cache 158.7 MB
  B2G procs (PSS) 144.9 MB
    Non-B2G procs  13.7 MB
     Free + cache  21.0 MB
             Free   3.3 MB
            Cache  17.8 MB

And then this:

                           |     megabytes    |
           NAME   PID NICE  USS  PSS  RSS VSIZE OOM_ADJ USER
            b2g 18026    0 47.2 49.9 52.8 167.1       0 root
       Poppit!" 20293    1 76.3 79.1 82.2 480.2       2 app_20293
plugin-containe 20978   18  1.4  1.6  2.0  37.2       8 app_20978

System memory info:

            Total 179.7 MB
     Used - cache 143.6 MB
  B2G procs (PSS) 130.6 MB
    Non-B2G procs  13.0 MB
     Free + cache  36.1 MB
             Free  16.4 MB
            Cache  19.7 MB

The replacement of the music app with the pre-allocated process suggests to me we might want to implement bug 947571.  It also makes me wonder if the memory in the pre-allocated process has increased even further beyond bug 921659.
Depends on: 947571
Actually, it appears that on v1.3 the Music app is getting an OOM_ADJ of 10 while on v1.2 it has an OOM_ADJ of 1.  That explains why its getting killed so easily.  Now to see why the OOM_ADJ value changed...
For comparison, here is v1.2 b2g-info with poppit and music running at the same time:

                   |     megabytes    |
    NAME  PID NICE  USS  PSS  RSS VSIZE OOM_ADJ USER
     b2g  810    0 47.4 49.1 52.3 167.4       0 root
   Music 4215    0 15.1 16.6 19.7  80.7       1 app_4215
Poppit!" 4287   18 75.4 77.2 80.6 483.6       2 app_4287

System memory info:

            Total 179.7 MB
     Used - cache 159.0 MB
  B2G procs (PSS) 143.0 MB
    Non-B2G procs  16.0 MB
     Free + cache  20.7 MB
             Free   4.6 MB
            Cache  16.1 MB
(In reply to Ben Kelly [:bkelly] from comment #11)
> Actually, it appears that on v1.3 the Music app is getting an OOM_ADJ of 10
> while on v1.2 it has an OOM_ADJ of 1.  That explains why its getting killed
> so easily.  Now to see why the OOM_ADJ value changed...

Question - what does OOM_ADJ do? Does this affect process priorities when something gets killed by the low memory killer?
(In reply to Jason Smith [:jsmith] from comment #13)
> Question - what does OOM_ADJ do? Does this affect process priorities when
> something gets killed by the low memory killer?

Yes.  On v1.2 music is set to a high priority when music starts playing.  On v1.3 this high priority is not getting set.  Seems to be a gaia change to me, but haven't completely isolated it yet.
It looks like the super high priority that music app was receiving in v1.2 was a bug.  In v1.2 all apps incorrectly get mozapptype set to 'critical' by this code:

  https://github.com/mozilla-b2g/gaia/blob/v1.2/apps/system/js/browser_frame.js#L69

Specifically, the following function call will always return true:

      config.url.startsWith(window.location.protocol +
                              '//communications.gaiamobile.org' +
                              window.location.port ?
                              '' : (':' + window.location.port) +
                              '/dialer')

This is due to order of operations on the inline conditional.  The intention was for |window.location.port| to be used in the conditional statement to control if the port is appended or not.  Unfortunately, though, the string concatenation binds more tightly and the condition triggers on the entire |window.location.protocol + '//communications.gaiamobile.org' + window.location.port|.  This of course is truthy which results in |config.url.startsWith('')| being executed.  This will always return true.

This bug was fixed as a side effect of bug 905116 with this commit:

  https://github.com/mozilla-b2g/gaia/commit/afd555ae68dd12429bd9162d7826b8329f251f90

This means that Music now correctly gets the "perceived" priority in v1.3 while actually playing music.  This results in an OOM_ADJ value of 6.

Unfortunately, there is a further issue/bug.  When the Music app changes tracks it will sometimes be temporarily dropped back down to a normal background priority with an OOM_ADJ of 10.  This is a race condition between starting the next track and the one second priority reset timer firing.

Combine this with the slightly larger footprint of poppit on v1.3 and it seems understandable why Music is getting killed.  In v1.2 Music was being protected by its high priority and poppit was forced to GC repeatedly during load.  Now, poppit is free to grow more quickly and Music is at a higher risk of OOM death with an adj value of 6.  If even Music survives the initial load of poppit, its likely to be killed when it switches tracks and temporarily drops back to an OOM_ADJ of 10.

My plan of action going forward will be:

1) See if I can better tune the priority reset timer to accommodate media apps that transition between tracks.
2) Attempt to track down the increased memory usage due to gc-unused-things and other items in the memory report.
I opened bug 959897 to document the mozapptype issue on v1.2.  Its unclear to me if we should try to fix that or not at this point in the release cycle.
Before proceeding I decided to run some more comparison between v1.2 and v1.3.  In order to make it an apples-to-apples comparison I hacked up the v1.3 gaia to run music with mozapptype 'critical'.

With music being marked as critical, I get pretty much the same performance in v1.3 and v1.2.  You can launch poppit and play the game.  Sometimes poppit will get killed while music continues to play.  This game OOM happens in both v1.2 and v1.3.

I also took memory profiles in both v1.2 and my modified v1.3.  They were nearly identical with only some minor differences.
Attached patch perceive-grace.patch (obsolete) (deleted) — Splinter Review
This patch against mozilla-aurora allows me to play music in the background and also play poppit in the foreground.  It simply extends the grace period for downgrading process priority to 5 seconds for perceivable background apps.  This seems reasonable to me since apps working with media will often have to deal with track changes, etc.

Asking for feedback while I work on a mozilla-central patch with a unit test.
Attachment #8360478 - Flags: feedback?(gsvelto)
Attachment #8360478 - Flags: feedback?(fabrice)
Attached patch perceive-grace.patch (deleted) — Splinter Review
Essentially the same patch against mozilla-central for review.

I looked at writing a test, but the comments in dom/browser-element/mochitest/priority/test_HighPriority.html made me think checking for delay differences would be a bad idea for the mochitest:

        // Unfortunately our timers
        // are not accurate!  There's little we can do here except fudge.
        // Thankfully all we're trying to test is that we get /some/ delay; the
        // exact amount of delay isn't so important.

Of course, if we think its important enough I can implement another test for the new delay against perceivable background processes.

Here is a try:

  https://tbpl.mozilla.org/?tree=Try&rev=7f67b567193d
Attachment #8360478 - Attachment is obsolete: true
Attachment #8360478 - Flags: feedback?(gsvelto)
Attachment #8360478 - Flags: feedback?(fabrice)
Attachment #8360567 - Flags: review?(fabrice)
Attachment #8360567 - Flags: feedback?(gsvelto)
Comment on attachment 8358692 [details]
Link to memory zip:  https://www.dropbox.com/s/mfvq9ps1mcerfqr/poppit-memory.zip

Obsolete this memory report per comment 17.
Attachment #8358692 - Attachment is obsolete: true
An alternative, simpler patch would be to just bump up the grace period for all background processes to 5 seconds.
Comment on attachment 8360567 [details] [diff] [review]
perceive-grace.patch

This looks good to me and I favor it over increasing the grace period for all applications. The latter approach could have some serious side-effects when switching a resource-intensive application into to background while launching something important (e.g. incoming call when playing a game).
Attachment #8360567 - Flags: feedback?(gsvelto) → feedback+
Comment on attachment 8360567 [details] [diff] [review]
perceive-grace.patch

Review of attachment 8360567 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/ipc/ProcessPriorityManager.cpp
@@ +816,5 @@
> +    if (mPriority == PROCESS_PRIORITY_BACKGROUND_PERCEIVABLE) {
> +      ScheduleResetPriority("backgroundPerceivableGracePeriodMS");
> +    } else {
> +      ScheduleResetPriority("backgroundGracePeriodMS");
> +    }

I would have written :
ScheduleResetPriority(mPriority == PROCESS_PRIORITY_BACKGROUND_PERCEIVABLE ? ... : ...) but feel free to land your version.
Attachment #8360567 - Flags: review?(fabrice) → review+
Thanks fabrice.  I went with the normal if-statement since it seems more readable to me.  The constant and strings are so long here its hard to fit the inline conditional on a single line.

Pushed to b2g-inbound:

  https://hg.mozilla.org/integration/b2g-inbound/rev/3d534667a5df
https://hg.mozilla.org/mozilla-central/rev/3d534667a5df
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → 1.3 C2/1.4 S2(17jan)
Whiteboard: [c=memory p= s= u=1.3] dogfood1.3 [MemShrink] → [c=memory p=5 s= u=1.3] dogfood1.3 [MemShrink]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: