Closed Bug 906747 Opened 11 years ago Closed 11 years ago

AWSY: ~6.7 MB regression in startup memory usage on Aug 12

Categories

(Firefox for Android Graveyard :: General, defect)

23 Branch
All
Android
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED INVALID

People

(Reporter: kats, Unassigned)

References

Details

(Whiteboard: [MemShrink:P2])

Attachments

(2 files, 1 obsolete file)

areweslimyet.com/mobile shows a regression in resident memory usage on startup on August 12. The increase (measured on m-i) corresponds to a merge of a bunch of changes from m-c to m-i. Note that the memory usage 30 seconds after startup is roughly unchanged; this means that something is running earlier than it used to be and eating more memory right on startup. 6009fb97314a51926de3b96e38bd2f1eb0df6bae 123.35MiB 80229aca54f9d27e6cf37963f93bcd5b54162f13 130.04MiB Δ 6.69MiB https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=6009fb97314a&tochange=80229aca54f9 Looking at the about:memory dumps for the two changesets, I see that about 2.7 MB of the extra memory is taken up by libnss3, libxul, libnssckbi, libsoftokn3. This generally indicates that some sort of SSL connection is happening on startup and loading more of these libraries than before. 3.1 MB of the extra memory is going to the heap, and looking at the explicit memory stats, I see that a worker thread has been spawned for osfile_async_worker.js and that's taking up about 2.7 MB. The remainder of the extra resident memory is going towards a larger APK, it looks like.
Attached file Full about:memory diff (obsolete) (deleted) —
Attached file Full about:memory diff (deleted) —
With resident sections expanded
Attachment #792257 - Attachment is obsolete: true
Looking at the diff in that range the most likely culprit is bug 853388. To be clear I'm not sure how much we care about this because it looks like the memory usage just moved to be a little more up-front, but it probably does still impact startup time for the worse. I would like to get confirmation that this is expected behaviour and an acceptable tradeoff.
Blocks: 853388
Flags: needinfo?(irving)
The load of osfile_async_worker and the other osfile modules is from bug 853388, but note that they should only load during startup if the profile_dir/extensions.json file doesn't exist or is out of date. This should only happen on new profiles or on application upgrades that modify extensions (including themes). If we'd like to avoid this startup cost on new profiles, we could look at pre-populating the profile with a valid extensions.json instead of creating it on first run. Not sure what's triggering the SSL library load, unless we're triggering extension blacklist or up-to-date checks earlier than we used to. I don't expect bug 853388 would cause that, but that's the only related SSL activity I can think of.
Flags: needinfo?(irving)
Looking at the ts Android talos run, there might be a startup time regression as well: http://graphs.mozilla.org/graph.html#tests=[[16,11,20]]&sel=1375041320332.21,1376942662025,3190.2403043913055,6233.033097184099&displayrange=30&datatype=running I know the graph is almost bimodal and super-noisy, but prior to Aug 12, none of the data points exceed 5500 ms whereas there's quite a few after Aug 12. Also before Aug 12 there are a handful of data points < 3500ms but none after that. I don't know how reliable this is but it probably means we should investigate a little more.
(In reply to :Irving Reid from comment #4) > Not sure what's triggering the SSL library load, unless we're triggering > extension blacklist or up-to-date checks earlier than we used to. I don't > expect bug 853388 would cause that, but that's the only related SSL activity > I can think of. This might be from some other bug in the range I linked to in comment 0. I can bisect into the range to see if I can narrow it down.
(In reply to :Irving Reid from comment #4) > Not sure what's triggering the SSL library load, unless we're triggering > extension blacklist or up-to-date checks earlier than we used to. I don't > expect bug 853388 would cause that, but that's the only related SSL activity > I can think of. Any fetch from AMO would trigger this since AMO APIs use https
Ok, so I bisected the range in comment 0 and got the following measurements (averaged across 4 runs): HG cset Try cset mem usage 61e0fb09a73c 01fd3ddb6135 136501248.00 2e31fb22f922 d95ec40886db 136420352.00 cccd4578a3c6 90aa64ba0cb2 136610816.00 cc533b0c791d 45c1451706dc 136392704.00 acba57ca53b8 c16ef7ea4110 132715520.00 a0114f95b034 aa6a3e9dc7a8 129567744.00 6009fb97314a 833a01575696 129355776.00 The cset at https://hg.mozilla.org/integration/mozilla-inbound/rev/acba57ca53b8 introduced the increased library usage, and the cset at https://hg.mozilla.org/integration/mozilla-inbound/rev/cc533b0c791d introduced the osfile_async worker explicit allocations.
Do we have any measures of startup time that are not as noisy as ts? Irving, did you get a chance to look at the telemetry data to see if there was a noticeable impact?
Flags: needinfo?(irving)
Whiteboard: [MemShrink] → [MemShrink:P2]
Looking at the telemetry for both desktop Firefox and Fennec Nightly, I'm not seeing a significant move in startup time measures (which is a bit disappointing, I was hoping they'd improve noticeably). The measures I looked at are SIMPLE_MEASURES_XPI_STARTUP_END and SIMPLE_MEASURES_SESSIONRESTORED; the change in those since 08/13 is within the noise.
Flags: needinfo?(irving)
There is no obvious dependency from OS.File to SSL, but it may be that Workers initialization somehow causes SSL to be loaded. bent, would you know anything about that?
Flags: needinfo?(bent.mozilla)
The worker code doesn't do anything with SSL. Any Worker object that gets created that uses https, though, would.
Flags: needinfo?(bent.mozilla)
So it seems like so far the net perf effect of this patch is a small talos startup time regression and moving memory usage to also be a bit earlier. Is it possible to land a follow-up that moves this code to run later during startup? That should cancel out the effects we're seeing while preserving the functional changes.
Flags: needinfo?(irving)
I've been discussing this with irving and it may be that some of this memory regression is specific to the test harness because I use an old saved profile with an addon, and the addon manager will do a SQLite -> JSON conversion in that scenario. I will update the profile to be a fresh one and re-run the tests. Flagging myself for ni? to do that once the fiasco with guest mode/symlinking profile bustage is fixed on inbound.
Flags: needinfo?(irving) → needinfo?(bugmail.mozilla)
As kats said, the main issue is that his test uses a pre-built Fennec profile that was created with an older version, so running that profile with the new code will always trigger a sqlite=>JSON conversion at startup. That aside, I looked into what seemed to be causing the startup cost when we *do* need to rebuild the DB. It definitely looks like the bulk of the startup cost here is in starting the OS.File worker thread. There is a bit of library decompression going on during the first time my code touches the DeferredSave.jsm and osfile.jsm modules, though it's hard to tell from the debug log what exactly is demanding those library chunks. I hacked my build to delay my OS.File async writes by a long time, but it turns out other code is doing an async read of 'distribution.json' soon after so the worker thread is firing up anyway; even worse, turning on OS.File logging to see what's causing the thread to start, forces the thread to start at module load time (bug 910827) Again, a bit hard to tell from my trace, but it looks like about 900ms to to spin up the worker thread for OS.File; there is other activity, but here's a timeline extracted from the attached log. 08-29 14:22:26.660 E/GeckoConsole(27748): LOG addons.xpi-utils: Make addon app-profile:projects.awsyarmv6.ffext@staktrace.com visible ... (XPIProvider now triggers load of DeferredSave.jsm and osfile.jsm) 08-29 14:22:26.765 I/Gecko (27748): OS Controller Posting message {"fun":"SET_DEBUG","args":[true],"id":1} 08-29 14:22:26.765 I/Gecko (27748): OS Controller PromiseWorker: starting up ChromeWorker ... some library decompression 08-29 14:22:26.790 I/Gecko (27748): OS Controller Message posted 08-29 14:22:26.795 I/Gecko (27748): *** LOG DeferredSave/extensions.json: Save changes ... control has returned from initializing osfile and DeferredSave 08-29 14:22:26.860 E/Profiler(27748): BPUnw: [3 total] thread_register_for_profiling(me=0x5906b268, stacktop=0x69f64d6b) ... OS notices a new thread, no way to tell if it's our ChromeWorker ... (main thread goes back to XPIProvider, which finishes doing its thing) 08-29 14:22:26.895 I/Gecko (27748): *** LOG addons.manager: Addon Manager startup done ... TONS of library decompression, and a tiny bit of other logged activity 08-29 14:22:27.190 I/Gecko (27748): *** LOG DeferredSave/extensions.json: Starting timer ... a promise scheduled by the DeferredSave "Save changes" above, finally gets a chance to run ... even more TONS of library decompression and 08-29 14:22:27.320 E/Profiler(27748): BPUnw: [4 total] thread_register_for_profiling(me=0x58f20aa0, stacktop=0x6a9ffdef) ... decompress 08-29 14:22:27.320 E/Profiler(27748): BPUnw: [5 total] thread_register_for_profiling(me=0x59092d80, stacktop=0x6aaffdef) ... (not sure who those threads belonged to), decompress more 08-29 14:22:27.675 I/Gecko (27748): OS Agent Method SET_DEBUG succeeded 08-29 14:22:27.675 I/Gecko (27748): OS Agent Sending positive reply id is 1 08-29 14:22:27.675 I/Gecko (27748): OS Controller Received message from worker {"id":1} ... we finally hear back from the ChromeWorker (These times are with a custom Fennec build, optimized, profiling built in but not turned on; running on a Samsung Galaxy S3)
So I rebuilt the profile used in the AWSY harness from a recent m-c build, and ran a few runs of the latest inbound build using that profile. The memory usage drops back by about 6.5 MiB but it's a little hard to tell how much of that is because of this addon-manager related stuff and how much is from other things. The explicit memory usage drops by about 1 MiB, of which ~171k is from osfile_async_worker.js. 800k comes off for libnss3.so. This doesn't quite cover everything in the about:memory diff in attachment 792258 [details] but it does account for some of it. I think in general this supports the theory that the old profile was causing a bunch of extra work on startup, including some in the addon manager, and that updating it has cleaned this up somewhat. I'd be ok to resolve this bug as invalid if Irving is satisfied with his investigation so far, and I'll use the new profile going forward. Perhaps in the future I should update the profile for every version bump to prevent this sort of thing.
Flags: needinfo?(bugmail.mozilla)
Resolving as invalid based on comment 16.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → INVALID
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: