1013333 - HTTP cache v2: ~15MB bump in RSS memory usage, ~11MB bump in Explicit memory usage on mobile

Reporter

Description

•

11 years ago

(In reply to Mark Finkle (:mfinkle) from bug 913806 comment #13) > When the new cache was turned on we saw a ~15MB bump in RSS memory usage What exactly does this measure? > and > ~11MB bump in Explicit memory usage: > https://areweslimyet.com/mobile/ > > It looks like the Explicit jump is due to network/cache2/disk-storage() > which wasn't reported before turning on, but is now ~10MB. Is this memory > that just wasn't reported before with the old cache? This (network/cache2/disk-storage()) is an area used only by the new cache - as the name suggests. So, it was 0 with cache2=off. This however should not be 11MB. We have a limit to keep only metedata, not data, of disk entries and the limit is 250kB, so way bellow 11MB. Possible cause: - simply a bug in the limit counter, but why I don't see this on desktop too? - error to write data, then we may keep data in memory (have to check in the code and with Michal, tho) - extremely slow writes to the media, everything that is received from the network is buffer and than written in background ; when slow, this may pile up

Andrew McCreight [:mccr8]

Updated

•

11 years ago

Summary: [MemShrink] HTTP cache v2: ~15MB bump in RSS memory usage, ~11MB bump in Explicit memory usage on mobile → HTTP cache v2: ~15MB bump in RSS memory usage, ~11MB bump in Explicit memory usage on mobile

Whiteboard: [MemShrink]

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 1

•

11 years ago

One possibility is that this is a problem in the harness itself - I use a canned profile for the Fennec run, and I only re-generate the profile periodically. It could be that the profile being used here is too old and not playing well with the new code. I'll re-generate the profile and re-run the before/after builds on this regression to confirm or eliminate that as a possible source of problems.

Honza Bambas (:mayhemer)

Reporter

Comment 2

•

11 years ago

Looking into the pushlogs on edges of the bumps, there are no cache2 patches at all. Or am I am looking at wrong data? (I've checked the changesets are all expanded).

Blocks: cache2afterenable

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 3

•

11 years ago

Also for reference the raw data from the before run is at [1] and the raw data from the after run is at [2]. In this case the relevant files are the unified-memory-report-Tabs*.json.gz files, and they can be loaded into about:memory to investigate and compare. [1] http://areweslimyet.mobi/data/mozilla-inbound/1400171192/ [2] http://areweslimyet.mobi/data/mozilla-inbound/1400171971/

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 4

•

11 years ago

Attached image AWSY memory increase (deleted) — Details

(In reply to Honza Bambas (:mayhemer) from comment #2) > Looking into the pushlogs on edges of the bumps, there are no cache2 patches > at all. Or am I am looking at wrong data? (I've checked the changesets are > all expanded). I'm not sure which part you're looking at, but I'm attaching a screenshot of the bump in question. You have to zoom in around may 14-15, and then clicking on the data point with the increase will give you changeset info.

Honza Bambas (:mayhemer)

Reporter

Comment 5

•

11 years ago

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #4) > I'm not sure which part you're looking at, but I'm attaching a screenshot of > the bump in question. You have to zoom in around may 14-15, and then > clicking on the data point with the increase will give you changeset info. Aha. Got it. https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=8d7dedb27da9&tochange=4e38abbbd329 So, it's cache2.

Honza Bambas (:mayhemer)

Reporter

Comment 6

•

11 years ago

From the data in comment 3, I can see one strange thing: ├───10.12 MB (07.81%) -- network │ ├──10.12 MB (07.81%) -- cache2 │ │ ├───9.98 MB (07.70%) ── disk-storage() │ │ └───0.14 MB (00.11%) -- (3 tiny) │ │ ├──0.07 MB (00.05%) ── io │ │ ├──0.06 MB (00.05%) ── index │ │ └──0.01 MB (00.01%) ── service The index is like 0, as well as all the rest. Typical for my desktop is: ├────3,672,024 B (00.83%) -- network │ ├──3,015,078 B (00.68%) -- cache2 │ │ ├──1,041,472 B (00.24%) ── index │ │ ├──1,027,718 B (00.23%) ── disk-storage() │ │ ├────936,512 B (00.21%) ── memory-storage(/M) │ │ ├──────6,288 B (00.00%) ── io │ │ ├──────2,720 B (00.00%) ── service │ │ └────────368 B (00.00%) ── disk-storage(a,) Michal, any idea in what state the mobile could be?

Flags: needinfo?(michal.novotny)

Michal Novotny [:michal]

Assignee

Comment 7

•

11 years ago

It is hard to guess without any additional data. If the about:memory status was taken early after startup, the cache index could be corrupted and the rebuild process didn't start yet, so the cache index contains only entries opened in this session. It is also possible that there are really just few big entries.

Flags: needinfo?(michal.novotny)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 8

•

11 years ago

I regenerated the profile and ran the before/after inbound builds 5 times each. Got these results for the resident memory numbers: Start Before Average: 132492492.80 Stddev: 2036496.27 After Average: 133459148.80 Stddev: 748003.62 StartSettled Before Average: 150307635.20 Stddev: 2182000.98 After Average: 151481548.80 Stddev: 102429.49 TabsOpen Before Average: 280479334.40 Stddev: 1239386.52 After Average: 297619456.00 Stddev: 4021658.80 TabsOpenSettled Before Average: 280418713.60 Stddev: 1080066.66 After Average: 279783014.40 Stddev: 32026614.90 TabsOpenForceGC Before Average: 269561856.00 Stddev: 1382852.31 After Average: 270358118.40 Stddev: 28334823.45 TabsClosed Before Average: 257646592.00 Stddev: 633729.43 After Average: 260224614.40 Stddev: 23949774.22 TabsClosedSettled Before Average: 185928908.80 Stddev: 1057543.73 After Average: 187814707.20 Stddev: 2717032.63 TabsClosedForceGC Before Average: 185603686.40 Stddev: 957155.89 After Average: 186922598.40 Stddev: 3907056.38 There still appears to be a ~16MB regression in the "TabsOpen" stage but that seems to disappear later. I can attach the full data as well.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 9

•

11 years ago

Full data in the tarball at people.mozilla.com/~kgupta/tmp/awsy-cache2.tgz - the 1400171192-* folders are from before the cache was enabled and the 1400171971-* folders are from after.

Honza Bambas (:mayhemer)

Reporter

Comment 10

•

11 years ago

Kartikaya, can we get a NSPR_LOG_MODULES=cache2:5 log?

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 11

•

11 years ago

Attached file NSPR_LOG_MODULES=cache2:5 log (deleted) — Details

Yup, here is the logcat from a full run with cache2 logging turned on. In this run the resident memory on the TabsOpen was 294162432 bytes.

Honza Bambas (:mayhemer)

Reporter

Comment 12

•

11 years ago

Thanks! We believe we know the cause now, bug in our preload logic. Fix is one line. Michal will take this now.

Assignee: nobody → michal.novotny

Status: NEW → ASSIGNED

Michal Novotny [:michal]

Assignee

Comment 13

•

11 years ago

Attached patch patch v1 (deleted) — Details — Splinter Review

Attachment #8425765 - Flags: review?(honzab.moz)

Honza Bambas (:mayhemer)

Reporter

Updated

•

11 years ago

Attachment #8425765 - Flags: review?(honzab.moz) → review+

Nicholas Nethercote [inactive]

Comment 14

•

11 years ago

kats++

Honza Bambas (:mayhemer)

Reporter

Comment 15

•

11 years ago

Talos compare: https://tbpl.mozilla.org/?tree=Try&rev=85c54f4e09c4 (pure m-c, cache2) https://tbpl.mozilla.org/?tree=Try&rev=d0afe20c43a1 (+this patch)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 16

•

11 years ago

I ran the two try builds you listed above through the harness and got the results below. "before" is pure m-c, "after" is with the patch. In general the patch seems to make it worse? However the "before" numbers are now also lower than what I reported in comment 8, and looking at areweslimyet.com/mobile over the last day or so there's a lot of fluctuation, it looks like bimodal data now. This patch seems to at least reduce the standard deviation on the numbers so maybe it will be easier to fix the numbers with the patch? Start Before Average: 135365427.20 Stddev: 1187691.40 After Average: 134992691.20 Stddev: 1244891.32 StartSettled Before Average: 152170496.00 Stddev: 327935.90 After Average: 152095129.60 Stddev: 384261.56 TabsOpen Before Average: 276848640.00 Stddev: 19287159.80 After Average: 286408704.00 Stddev: 4176588.99 TabsOpenSettled Before Average: 222875648.00 Stddev: 31899915.70 After Average: 281151897.60 Stddev: 5074918.94 TabsOpenForceGC Before Average: 221931110.40 Stddev: 30593746.35 After Average: 276035174.40 Stddev: 3689857.77 TabsClosed Before Average: 219013939.20 Stddev: 26508946.75 After Average: 264722022.40 Stddev: 3794672.52 TabsClosedSettled Before Average: 184007884.80 Stddev: 7964210.58 After Average: 191765708.80 Stddev: 3975449.73 TabsClosedForceGC Before Average: 184693555.20 Stddev: 7287549.70 After Average: 191699353.60 Stddev: 4163810.43

Honza Bambas (:mayhemer)

Reporter

Comment 17

•

11 years ago

Kats, the patch had to completely fix it. Can you again produce a cache2:5 log please? This is weird. Thanks.

Honza Bambas (:mayhemer)

Reporter

Comment 18

•

11 years ago

And the '"before" is pure m-c', what cset is that? You can also easily turn the cache off on any changes set : make netwerk/cache2/CacheObserver.cpp!CacheObserver::UseNewCache() return false immediately.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 19

•

11 years ago

(In reply to Honza Bambas (:mayhemer) from comment #17) > Kats, the patch had to completely fix it. Can you again produce a cache2:5 > log please? This is weird. Thanks. Will do. (In reply to Honza Bambas (:mayhemer) from comment #18) > And the '"before" is pure m-c', what cset is that? I just used the builds from your try pushes, so cset 85c54f4e09c4. The ARMv6 builds, specifically.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 20

•

11 years ago

Attached file cache2:5 log with patch applied (deleted) — Details

Ugh, I just realized I've been running with the cache2:5 logging enabled since yesterday - I forgot to turn it off after getting that log you wanted. The logging may have affected the numbers, so I'll re-run with that disabled. Here's a log from one of the runs I just did (comment 16) which has the patch on this bug applied.

Honza Bambas (:mayhemer)

Reporter

Comment 21

•

11 years ago

85c54f4e09c4 - it's mozilla-central with cache2 ENABLED, so it should manifest this bug (add 15MB/11MB regressions) d0afe20c43a1 - it's the same + patch for this bug, so it should revert to what you have seen before enabling cache2 on may 15 Haven't you switched the changesets?

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 22

•

11 years ago

(In reply to Honza Bambas (:mayhemer) from comment #21) > Haven't you switched the changesets? I double-checked to make sure I hadn't switched them. But yes I agree the new numbers are not what I would expect, and don't make much sense. Let me finish re-running them with the logging disabled and see if that makes any difference. Also I have noticed in the past that try builds in general have different baseline values than inbound builds, so it's possible that the patch does regress things more, but the baseline is different so you can't compare the numbers in comment 16 to the numbers in comment 8. Did you take a look at the log in comment 20 to see if it demonstrates the desired behaviour from the patch on this bug?

Honza Bambas (:mayhemer)

Reporter

Comment 23

•

11 years ago

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #22) > (In reply to Honza Bambas (:mayhemer) from comment #21) > > Haven't you switched the changesets? > > I double-checked to make sure I hadn't switched them. But yes I agree the > new numbers are not what I would expect, and don't make much sense. Let me > finish re-running them with the logging disabled and see if that makes any > difference. Logging is not the cause here, I'm pretty sure. > > Also I have noticed in the past that try builds in general have different > baseline values than inbound builds, so it's possible that the patch does > regress things more, but the baseline is different so you can't compare the > numbers in comment 16 to the numbers in comment 8. Of course, I thought you were comparing the two runs from comment 15 and not try/d0afe20c43a1 vs m-c. > > Did you take a look at the log in comment 20 to see if it demonstrates the > desired behaviour from the patch on this bug? Yes and it seems fixed. Despite the log is not complete (the shutdown lines are missing), apparently we release all data chunks now. Maybe the leak is else where?

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 24

•

11 years ago

Here is the data without logging. "Before" is 85c54f4e09c4 and "after" is d0afe20c43a1. It does show an improvement in TabsOpen but it's worse in a few of the others. The reduction in stddev is still definitely present with the patch, so that's good at least. Start Before Average: 134497894.40 Stddev: 722326.32 After Average: 134868992.00 Stddev: 894887.42 StartSettled Before Average: 151438131.20 Stddev: 534325.67 After Average: 151687987.20 Stddev: 697650.65 TabsOpen Before Average: 297408921.60 Stddev: 2990509.32 After Average: 287926681.60 Stddev: 3421254.81 TabsOpenSettled Before Average: 264090419.20 Stddev: 40713536.12 After Average: 272293068.80 Stddev: 29536189.76 TabsOpenForceGC Before Average: 255603507.20 Stddev: 37161970.37 After Average: 264262451.20 Stddev: 25701413.40 TabsClosed Before Average: 247148544.00 Stddev: 32913039.80 After Average: 253178675.20 Stddev: 21840942.43 TabsClosedSettled Before Average: 194768076.80 Stddev: 11626403.00 After Average: 188168601.60 Stddev: 4334807.15 TabsClosedForceGC Before Average: 194641920.00 Stddev: 11505177.69 After Average: 188130918.40 Stddev: 3387477.29

Honza Bambas (:mayhemer)

Reporter

Comment 25

•

11 years ago

BTW, the about:memory logs would be good, how to get the link from awsy?

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 26

•

11 years ago

about:memory logs for the runs in comment 24 are at http://people.mozilla.org/~kgupta/tmp/awsy-cache2.tgz (since I'm running these from try builds the data isn't automatically pushed anywhere publicly accessible).

Honza Bambas (:mayhemer)

Reporter

Comment 27

•

11 years ago

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #26) > about:memory logs for the runs in comment 24 are at > http://people.mozilla.org/~kgupta/tmp/awsy-cache2.tgz (since I'm running > these from try builds the data isn't automatically pushed anywhere publicly > accessible). Thanks. According these logs this bug is clearly fixed on the cache2 level. I was checking explicit/network/cache2/disk-storage(). In the 85c54f4e09c4 reports (unfixed m-c with cache2 on) there are almost always >10MB used. In *all* of the d0afe20c43a1 (fixed cache2) it is ~1MB, which is an expected overhead for the 250kB metadata limit.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 28

•

11 years ago

Ok, in that case please feel free to land this patch. At the very least it fixes one known regression. I'm interested in seeing the data from inbound after this lands and comparing to before cache2 was turned on and/or looking for other sources of regressions.

Honza Bambas (:mayhemer)

Reporter

Comment 29

•

11 years ago

Comment on attachment 8425765 [details] [diff] [review] patch v1 https://hg.mozilla.org/integration/mozilla-inbound/rev/004ae4f486c2 https://tbpl.mozilla.org/?tree=Try&rev=c4662b4e6ae2

Attachment #8425765 - Flags: checkin+

Honza Bambas (:mayhemer)

Reporter

Comment 30

•

11 years ago

Also a note that the old cache was eating some memory (for performance reasons) as well, but that was never reported. Now we do, hence there is more memory reported :) cache2 might just be more honest!

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 31

•

11 years ago

https://hg.mozilla.org/mozilla-central/rev/004ae4f486c2

Status: ASSIGNED → RESOLVED

Closed: 11 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla32

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 32

•

11 years ago

I did some more analysis on the TabsClosedForceGC data point for the 200 inbound csets around the cache2 enabling (100 csets before, 100 csets after). The graph data is at [1] although if you're unfamiliar with bug 1000268 the graph data may be hard to interpret (well it's hard to interpret regardless). The upshot of all this is that there were significant jumps in both explicit/network and explicit/heap-overhead/waste at that time. [1] http://areweslimyet.mobi/cache2/plotter/

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 33

•

10 years ago

Sorry for the delay here but AWSY was broken for a bit. Now that the fix above has had time to sit, I did another analysis of the 100 csets before and after the fix landed to see what changed. Data is at http://areweslimyet.mobi/cache2/plotter2/ At around the time this fix landed, these were a lot of significant changes. Here are the first few I found (not necessarily the largest in terms of byte count changes) * decommitted/js-non-window/gc-heap/decommitted-arenas stopped being hexa-modal and settled to a stable low value (dropped by ~5MB on average) * explicit/heap-overhead/waste jumped by about 4MB * explicit/images/content/raster/unused/raw jumped by about 600k * explicit/js-non-window/gc-heap/chunk-admin stopped being hexa-modal and dropped by ~80k on average * general reductions across explicit/js-non-window/zones/ totalling 150k * explicit/storage/sqlite went up by 600k * many more.. It'll take a while to go through all of these; some of them are probably from other changesets that landed at around the same time. Really I'd rather wait until 1000268 is fixed to get some tools that generate all this data rather than looking manually.

AWSY memory increase 11 years ago Kartikaya Gupta (email:kats@mozilla.staktrace.com) (deleted), image/png		Details
NSPR_LOG_MODULES=cache2:5 log 11 years ago Kartikaya Gupta (email:kats@mozilla.staktrace.com) (deleted), application/x-gzip		Details
patch v1 11 years ago Michal Novotny [:michal] (deleted), patch	mayhemer : review+ mayhemer : checkin+	Details \| Diff \| Splinter Review
cache2:5 log with patch applied 11 years ago Kartikaya Gupta (email:kats@mozilla.staktrace.com) (deleted), application/x-gzip		Details