Closed Bug 857135 Opened 12 years ago Closed 9 years ago

AWSY: ~1MB regression in resident memory usage on Fennec ARMv6 from push notifications

Categories

(Firefox for Android Graveyard :: General, defect)

22 Branch
ARM
Android
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED INVALID
Firefox 23

People

(Reporter: kats, Assigned: nsm)

References

()

Details

(Whiteboard: [MemShrink:P2])

Attachments

(3 files)

Attached file Full about:memory dumps (deleted) —
The "Start +30s" resident memory usage has a persistent ~2.09 MiB regression that first appeared between cset 753f285620 and cset cc72c7cba674ca. Pushlog link: https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=753f285620fe5c0&tochange=cc72c7cba674ca - this appears to mostly consist of bug 822712. The "Start" line has a regression on the same cset of ~1.3 MiB. Note however, that the regression is much smaller on the "explicit" graph data; looking at the detailed about:memory dumps, it looks like over half of the difference in RSS is from "rss/anonymous/anonymous, outside brk()/[rw-p]" and the other half is distributed in "rss/other-files". Therefore I don't think bug 822712 is directly to blame, but probably triggered more stuff to get loaded into memory somehow. I'm not sure if this is expected, and if not, what the best way to track this down is. Any help would be appreciated. "Start" data from AWSY/mobile [view] 753f285620fe5c029e5d9b96199205db7585d737 96.74MiB Δ 124.00KiB [view] cc72c7cba674ca2e2ff0c8e0b60c24e08a19fd93 98.05MiB Δ 1,344.00KiB [view] 71545c41ea4c072e06e6c26173686156061baba3 97.59MiB Δ -472.00KiB "Start +30s" data from AWSY/mobile [view] 753f285620fe5c029e5d9b96199205db7585d737 104.30MiB Δ 220.00KiB [view] cc72c7cba674ca2e2ff0c8e0b60c24e08a19fd93 106.39MiB Δ 2.09MiB [view] 71545c41ea4c072e06e6c26173686156061baba3 105.34MiB Δ -1,072.00KiB Diff of the "rss" trees between csets 753f285620 and cc72c7cba6: (This is organized in tree form, and indicates that of the 2199552 byte difference in "rss" between the two csets, 1150976 bytes were in the rss/anonymous subtree, 1003520 bytes were in the rss/other-files subtree, and 45056 bytes were in the rss/shared-libraries subtree, etc.). 2199552 rss 1150976 rss/anonymous 1150976 rss/anonymous/anonymous, outside brk() 1150976 rss/anonymous/anonymous, outside brk()/[rw-p] 1003520 rss/other-files 413696 rss/other-files/dalvik-heap 413696 rss/other-files/dalvik-heap/[rw-p] 319488 rss/other-files/data@app@org.mozilla.fennec-1.apk@classes.dex 319488 rss/other-files/data@app@org.mozilla.fennec-1.apk@classes.dex/[r--p] 196608 rss/other-files/dalvik-bitmap-2 (added) 196608 rss/other-files/dalvik-bitmap-2/[rw-p] (added) 135168 rss/other-files/dalvik-LinearAlloc 135168 rss/other-files/dalvik-LinearAlloc/[rw-p] 36864 rss/other-files/core.odex 36864 rss/other-files/core.odex/[r--p] 28672 rss/other-files/org.mozilla.fennec-1.apk 28672 rss/other-files/org.mozilla.fennec-1.apk/[r--p] 24576 rss/other-files/dalvik-jit-code-cache 24576 rss/other-files/dalvik-jit-code-cache/[r-xp] 16384 rss/other-files/framework.odex 16384 rss/other-files/framework.odex/[r--p] 12288 rss/other-files/bouncycastle.odex 12288 rss/other-files/bouncycastle.odex/[r--p] 12288 rss/other-files/dalvik-aux-structure 12288 rss/other-files/dalvik-aux-structure/[rw-p] -192512 rss/other-files/dalvik-bitmap-1 (removed) -192512 rss/other-files/dalvik-bitmap-1/[rw-p] (removed) 45056 rss/shared-libraries 40960 rss/shared-libraries/shared-libraries-other 32768 rss/shared-libraries/shared-libraries-other/libicuuc.so 32768 rss/shared-libraries/shared-libraries-other/libicuuc.so/[r-xp] 8192 rss/shared-libraries/shared-libraries-other/libcrypto.so 8192 rss/shared-libraries/shared-libraries-other/libcrypto.so/[r-xp] 4096 rss/shared-libraries/shared-libraries-mozilla 4096 rss/shared-libraries/shared-libraries-mozilla/libxul.so 4096 rss/shared-libraries/shared-libraries-mozilla/libxul.so/[r-xs]
kats, we are pushing a patch to disable this feature on all but b2g. Can you check to see if you see this 2MB drop? Justin, any idea of what could be using this much memory?
> Justin, any idea of what could be using this much memory? Merely loading the JS could easily use 2mb of memory.
We might be able to explain the difference in the size of the increase of RSS vs. explicit by arguing that explicit was counting some decommitted memory, and now it's counting less decommitted memory. It's hard to say.
(In reply to Doug Turner (:dougt) from comment #1) > kats, we are pushing a patch to disable this feature on all but b2g. Can > you check to see if you see this 2MB drop? Sure, let me know the cset when it lands and I'll check to see if the regression goes away.
The landing of https://hg.mozilla.org/integration/mozilla-inbound/rev/7f1a8ad8bf0c does not appear to have significantly helped. I'll test a try build with the changes completely backed out to confirm that it is in fact bug 822712 to blame.
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=753f285620fe&tochange=71545c41ea4c Test cc72c7cba674ca2e2ff0c8e0b60c24e08a19fd93 seems to be a bit noisy, it might be that it is a superfluous spike -- is it possible that https://hg.mozilla.org/integration/mozilla-inbound/rev/71545c41ea4c is the culprit?
With bug 822712 +dependencies backed out [1] the "Start +30s" resident memory drops from 111529984 bytes to 110808064 bytes (average over 4 samples), which is ~0.7 MiB less. With bug 842887 backed out [2], the "Start +30s" resident memory drops from 111529984 bytes to 111407104 bytes (again averaged over 4 samples) which is about ~0.1 MiB less. So at this point it looks more like bug 822712 than bug 842887, but it's quite possible that the actual regression was combined with some noise and thus is over-represented. [1] https://tbpl.mozilla.org/?tree=Try&rev=df18d2e18cf1 [2] https://tbpl.mozilla.org/?tree=Try&rev=d87e43f77464
Assignee: nobody → doug.turner
Whiteboard: [MemShrink] → [MemShrink:P2]
Also, just to double-check, I re-ran the inbound builds from csets 753f285620fe and cc72c7cba674 through my harness 5 times each. resident memory from "Start +30s" on 753f285620fe: Samples: 5 Average: 109227212.80 Stddev: 48847924.30 Max: 109309952 Min: 109154304 resident memory from "Start +30s" on cc72c7cba674: Samples: 5 Average: 110338048.00 Stddev: 49344813.95 Max: 110485504 Min: 110125056 So there is definite regression of at least ~1 MiB; the original 2 MB seems to have included a random spike.
Summary: AWSY: ~2MB regression in resident memory usage on Fennec ARMv6 → AWSY: ~1MB regression in resident memory usage on Fennec ARMv6 from push notifications
(In reply to Nikhil Marathe from comment #9) > @kats: > > Can you try this? https://tbpl.mozilla.org/?tree=Try&rev=b3c3102dc6a2 Unfortunately it looks like this didn't build the ARMv6 APK which is weird because it is specified in the trychooser syntax. Maybe a tbpl bug? Anyway I've re-pushed your patch in that try push to a new try push at https://tbpl.mozilla.org/?tree=Try&rev=9231e08b8b1c and I will run the build from there once it's ready.
Oh, it's probably because you specified -b d in the syntax which only does debug builds.
Sorry for the delay. It looks like your patch does help but doesn't reduce it completely: resident memory from "Start +30s" without your patch: Samples: 4 Average: 111354880.00 Stddev: 55677442.59 Max: 111382528 Min: 111304704 resident memory from "Start +30s" with your patch: Samples: 4 Average: 110767104.00 Stddev: 55383554.96 Max: 110796800 Min: 110731264 That's a reduction of ~580 KiB.
I don't understand. That patch completely prevents the loading of Push related components by removing them from the package (@kats: can you reply if any of {Push,PushService}.{js,manifest} exist in the android build or end up in libxul?), so any regression specific to push should have been eliminated. @jlebar, @johns: Any ideas? Otherwise I'll setup a local android dev environment and try to reproduce.
(In reply to Nikhil Marathe from comment #13) > I don't understand. That patch completely prevents the loading of Push > related components by removing them from the package (@kats: can you reply > if any of {Push,PushService}.{js,manifest} exist in the android build or end > up in libxul?), so any regression specific to push should have been > eliminated. I see none of those files in the APK (I looked inside omni.ja as well). Also a strings | grep on libxul showed only a handful of push-related strings so I don't think anything ended up in there. The build is at http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/kgupta@mozilla.com-6c6f70b2c37d/try-android-armv6/fennec-23.0a1.en-US.android-arm-armv6.apk if you'd like to verify yourself. I am also confused as to why the number didn't go back down by the full amount. However it's possible that even with 4 samples there's some noise in the data that is throwing off the numbers, so maybe land the patch anyway? I don't really have much else in the way of suggestions, unfortunately.
Justin, any ideas? Should I push the patch?
Flags: needinfo?(justin.lebar+bug)
Attachment #733623 - Flags: review?(justin.lebar+bug)
Attachment #733623 - Flags: feedback?(doug.turner)
Flags: needinfo?(justin.lebar+bug)
Comment on attachment 733623 [details] [diff] [review] Remove Push code from Android and Desktop builds. There is no evidence that we need to back this out at this point. I see a lot of confusion and concern that the test is noisy. If you want to back this out for a day or two to see if the number drop, i think that is fine. But if there is no change to the data, it should land immediately back again.
Attachment #733623 - Flags: feedback?(doug.turner) → feedback-
When I look at: https://areweslimyet.com/mobile/ I see a big increase in just about every test. Is this what we are chasing? If so, above each of the big increases, i see a big question mark that when I mouse over says: Switched device running the test to a Galaxy Nexus; baseline values expected to change. Are we chasing snipes, or is there something real here?
Attached image What we are chasing (deleted) —
No, what we are chasing is the teeny tiny bump highlighted in the attached screenshot. You'll probably want to zoom in to the data around March 29 on areweslimyet.com/mobile to see it.
Comment on attachment 733623 [details] [diff] [review] Remove Push code from Android and Desktop builds. I think this the right thing to do. There's no reason to take a possible memory hit on desktop/mobile for code that is disabled there. We have zero to lose by disabling this and a few mb to gain. Separately, we need to figure out how to avoid eating up megabytes of memory for relatively trivial DOM services. We really can't afford even 1mb for this on B2G. That may mean not using JS to implement features like this.
Attachment #733623 - Flags: review?(justin.lebar+bug)
Attachment #733623 - Flags: review+
Attachment #733623 - Flags: feedback?(doug.turner)
Attachment #733623 - Flags: feedback-
We need to know exactly why this is eating up ~1mb. nsm, can you refactor *all* of the start-up component stuff into another file (so that we don't import stuff). I suspect that is what is causing all of mem usage.
Assignee: doug.turner → nsm.nikhil
> We need to know exactly why this is eating up ~1mb. That's cool, but if we believe that Firefox for Android is a real product, we cannot leave regression in place while we investigate.
Comment on attachment 733623 [details] [diff] [review] Remove Push code from Android and Desktop builds. Review of attachment 733623 [details] [diff] [review]: ----------------------------------------------------------------- remove while we investigate. kats, when this lands, please double check that we see the memory savings.
Attachment #733623 - Flags: feedback?(doug.turner) → feedback+
There are memory savings of about 900 KiB on "Start +30s" from the above cset: [view] ba48781b74e49a71329c8be4a5933a30233a8a9d 105.55MiB Δ 28.00KiB [view] fc82676827254b9e517351c45a574c24259e74de 104.68MiB Δ -896.00KiB [view] ad371c559524ab9edbcb78660b932b9652d162c6 104.63MiB Δ -52.00KiB
(In reply to Justin Lebar [:jlebar] from comment #22) > > We need to know exactly why this is eating up ~1mb. > > That's cool, but if we believe that Firefox for Android is a real product, > we cannot leave regression in place while we investigate. We believe! We believe!
thanks Kats. Now the second part -- figuring out why.
Looks like i screwed up in bug 856440. We should backout fc8267682725 when the fix in bug 856440 lands and ensure we see no regression.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 23
@kats: Can you try this set https://tbpl.mozilla.org/?tree=Try&rev=ad6eeb6f8176 Backs out the patch here, and uses the fixed pref object.
@kats: also this one. https://tbpl.mozilla.org/?tree=Try&rev=31d3ba80743e (the no indexeddb patch itself shows up in https://tbpl.mozilla.org/?tree=Try&rev=77661155b6d6, but i forgot to push the backout change with it) This patch is not supposed to land, but it tests a hypotheses that the regression is due to IndexedDB being accessed. It is a possibility that the Talos tests being run never access IndexedDB and the use of Push causes IDB to be loaded or other libraries to be loaded. This patch always enables push on all platforms but disables creating the PushDB object. If this also results in no regression, we'll know the problem is not in the size of code/imports. Reopening bug since we're investigating various approaches.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
For the baseline value (m-c changeset d09a5a5666ba): Samples: 4 Average: 109752320.00 Stddev: 54876193.36 Max: 109871104 Min: 109633536 For try build ad6eeb6f8176 (comment 30): Samples: 4 Average: 110587904.00 Stddev: 55293963.34 Max: 110649344 Min: 110534656 This shows an *increase* of 816 KiB, indicating that the fixed pref thing doesn't really work. For try build 31d3ba80743e (comment 31): Samples: 4 Average: 110262272.00 Stddev: 55131156.80 Max: 110350336 Min: 110223360 This shows an increase of 498 KiB over the baseline.
From Kats: ---- So to test memory usage on android, you can use the instructions at https://wiki.mozilla.org/Mobile/Fennec/Android#about:memory to dump the about:memory. this includes the resident memory usage which is what the top graph on AWSY shows. the AWSY faq describes how the data is generated for that, but usually just getting the about:memory dump shortly after startup and comparing will be good enough to show the differences ----
So I set up the Android emulator on my laptop, and ran some simplified scripts based on kats test harness. In the 'nexus 7' config on the emulator I'm getting inconsistent results versus kats :/ m-c (no Push related files in build) Samples: 4 Average: 79449088.00 Stddev: 39743468.67 Max: 81731584 Min: 78225408 Use global pref (try ad6eeb6f8176) Samples: 4 Average: 80024576.00 Stddev: 40032167.85 Max: 81723392 Min: 78336000 (575.5kb INCREASE over m-c) Disable IDB load at startup (try 668410c3bcc6 applied on top of ad6eeb6f8176) Samples: 4 Average: 79431680.00 Stddev: 39737193.38 Max: 81879040 Min: 78245888 (17kb DECREASE over m-c, i assume the decrease is due to random factors, but there wasn't any sharp increase)
It might be easier to just compare the about:memory dumps to see where the memory is going. I took one of the dumps I had from each of d09a5a5666ba and ad6eeb6f8176 in comment 32 and ran a differ on them. The explicit memory usage increase between the two that I looked at was 610240 bytes. Of this, 46556 came from the newly-added PushService.js zone, 33508 came from a newly-added zone for preferences.js, 23136 came from a newly-added zone for Timer.jsm. There was also an increase in 49984 bytes in the size of the gc-heap and an increase of 90128 bytes in explicit/js-non-window/runtime/script-data. I think almost all of this (a total of 243312 bytes) can be attributed to just loading in the new PushService.js file and the things it imports, even if it is completely disabled. In my data from comment 32, the IDB-disabled version showed an improvement of 318 KiB from the pref-disabled version, so presumably the IDB stuff takes up around that much. So far this accounts for around ~560 KiB which is a pretty good start. I've uploaded all the memory dumps I got from those runs to https://people.mozilla.com/~kgupta/awsy/try-data.tgz (warning: 31MB) if you want to take a closer look.
My suggestion in comment 21 probably would make this go away then..
I can salvage 23K by implementing setTimeout in the file rather than use Timer.jsm as I've done on the b2g18 branch. As of now, the code on m-c disables the files existing in the package, so we should be fine in releases (Comment 24 and 25). When we do want Push, the 46K hit is inevitable as that's the file size :/ I'll work on a patch to delay touching IDB until it is really needed and see if JS can use the system SQLite in some way. I'm just going out on a limb here though. Ideally Android will have a PushService written in Java that ties into GCM or similar, but we'll let cpeterson think about that. Mark as fixed?
fair enough... Re-enable this on desktop and mark the bug as fixed?
(In reply to Doug Turner (:dougt) from comment #38) > Re-enable this on desktop and mark the bug as fixed? Not to stand in your way, but what's the purpose of enabling this on desktop before this is usable on desktop?
Flags: needinfo?(doug.turner)
Flags: needinfo?(doug.turner)
I am closing this as we don't ship and don't intend to ship the JS PushService on android and this code has been superseded by webpush.
Status: REOPENED → RESOLVED
Closed: 12 years ago9 years ago
Resolution: --- → INVALID
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: