"Resident Unique Memory" measurement on macOS is suspicious
Categories
(Toolkit :: about:memory, defect)
Tracking
()
People
(Reporter: gcp, Assigned: pbone)
References
(Depends on 1 open bug, Blocks 1 open bug)
Details
Attachments
(3 files, 5 obsolete files)
Bug 1546442 should have caused a memory usage regression of about <0.5%. Instead, it's being reported as a 55% improvement in resident unique memory. (https://bugzilla.mozilla.org/show_bug.cgi?id=1546442#c41) This doesn't make any sense.
Looking at the code, this isn't a simple value we probe the OS about, but a complicated calculation:
Based on the above, we should probably check if this is working correctly.
Reporter | ||
Updated•3 years ago
|
Reporter | ||
Updated•3 years ago
|
Comment 1•3 years ago
|
||
You could look at the implementation of psutil_proc_memory_uss in psutil. That was also done by Eric Rahm at around the same time, so I expect it is basically the same, but maybe they've diverged in the last 6 years.
Reporter | ||
Updated•3 years ago
|
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Comment 2•3 years ago
|
||
Sharing an update / remind Monday-Paul what Friday-Paul was thinking:
When MacOS is asked to decommit a page within a region of memory using munmap. The kernel splits the private memory region into two shared memory regions (which map different pages). Which is why Bug 1546442 halves our reported resident unique memory. I don't know why MacOS doesn't special case anonomous memory and just make it two seperate private mappings. I'd say "Let's just count shared mappings towards our resident unique memory" but that would mean counting shared mappings with other processes. I'm looking for a way to distinguish this case.
In a pinch I guess we could map the pages specifically so that the guard page of Bug 1546442 does not split a mapping into two shared mappings. But there may be other code that uses this technique in Firefox to decommit memory. And I don't know that I can fix it in all situations. But for memory spidermonkey uses madvise
rather than munmap
, which might be okay but probably has the same error. Anyway, I'd rather work-around MacOS's reporting than change how we decommit memory.
Assignee | ||
Comment 3•3 years ago
|
||
Assignee | ||
Comment 4•3 years ago
|
||
Depends on D139803
Assignee | ||
Comment 5•3 years ago
|
||
Depends on D139804
Assignee | ||
Comment 6•3 years ago
|
||
Depends on D139805
Comment 7•3 years ago
|
||
When MacOS is asked to decommit a page within a region of memory using munmap. The kernel splits the private memory region into two shared memory regions (which map different pages). Which is why Bug 1546442 halves our reported resident unique memory.
Do you have a reduced testcase? because I can't reproduce this behavior independently in a dumb testcase.
Updated•3 years ago
|
Assignee | ||
Comment 8•3 years ago
|
||
no, I observed it by adding printfs to jemalloc's chunk allocation (so I knew the addresses of chunks) and to the memory reporter in https://searchfox.org/mozilla-central/rev/e9cd2997be1071b9bb76fc14df0f01a2bd721c30/xpcom/base/nsMemoryReporterManager.cpp#459 I observed that in the memory reporter the jemalloc chunks were detected as shared before my patch and private after it. I expect after landing it ot receive a perf alert that resident-unique memory has doubled. I think I also noticed a difference in about:memory
but I don't remember.
I should be able to push to try and see a difference in perfhearder there.
Assignee | ||
Updated•3 years ago
|
Comment 9•3 years ago
|
||
Comment on attachment 9265665 [details]
Bug 1743781 - pt 1. Remove obsolete comment r=glandium
Revision D139803 was moved to bug 1760256. Setting attachment 9265665 [details] to obsolete.
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Comment 10•3 years ago
|
||
Comment on attachment 9265666 [details]
WIP: Bug 1743781 - pt 2. Use MADV_FREE_REUSABLE on MacOS
Revision D139804 was moved to bug 1760254. Setting attachment 9265666 [details] to obsolete.
Comment 11•3 years ago
|
||
Comment on attachment 9265667 [details]
WIP: Bug 1743781 - pt 3. Disable MALLOC_DOUBLE_PURGE on MacOS
Revision D139805 was moved to bug 1760254. Setting attachment 9265667 [details] to obsolete.
Comment 12•3 years ago
|
||
I should be able to push to try and see a difference in perfhearder there.
Did perfherder show something?
Assignee | ||
Comment 13•3 years ago
|
||
Yes, it slows an increase in resident unique memory, this is expected and shows that we're measuring it properly now.
It also shows some performance regressions, which is weird, I wasn't expecting anything noticable, but maybe it really doesn't like unmapping and remapping the memory. I'll look further at this.
Assignee | ||
Comment 14•3 years ago
|
||
The latest test has a 5% performance regression for PDF tests.
Our options are:
- Persist and loose performance on MacOS
- Enable this only for nightly, normal performance for release but inaccurate memory counting.
- Enable this only for nightly, no guard page between header & payload for MacOS.
- Don't change the guard page creation, instead change how we check for private unique memory: If shared memory check for a guard page and if we find it assume the memory is private.
- Change the way chunks are allocated so they come with the guard page already there, I worry that this is hard because of alignment constraints.
- Query jemalloc directly to see if it knows about some chunk and therefore count it as resident unique.
Assignee | ||
Comment 15•3 years ago
|
||
Assignee | ||
Comment 16•3 years ago
|
||
Updated•3 years ago
|
Updated•3 years ago
|
Assignee | ||
Comment 17•3 years ago
|
||
Assignee | ||
Comment 18•3 years ago
|
||
Assignee | ||
Comment 19•3 years ago
|
||
glandium,
In https://phabricator.services.mozilla.com/D144702#inline-808294 you requested a vmmap of when this happens. I have attached vmmap.txt
and log.txt
Every entry is log.txt
is a mapping this patch detects should be unique resident memory. In vmmap.txt
you can find this address and see that it is a 16KB (1 page) mapping of the header of a jemalloc chunk, followed by a 976KB mapping of the payload section of that chunk. There is a guard page between the header and the payload section.
When gcp landed Bug 1546442 he noticed a 55% improvement of our resident unique memory. This was because of this guard page causing a private mapping to be split into two shared mappings.
Comment 20•3 years ago
|
||
In https://phabricator.services.mozilla.com/D144702#inline-808294 you requested a vmmap of when this happens. I have attached vmmap.txt and log.txt Every entry is log.txt is a mapping this patch detects should be unique resident memory. In vmmap.txt you can find this address and see that it is a 16KB (1 page) mapping of the header of a jemalloc chunk, followed by a 976KB mapping of the payload section of that chunk. There is a guard page between the header and the payload section.
They all have SM=ZER or SM=NUL. I can see how the memory reporter might be missing them because we're not doing anything for SM_SHARED, but I also don't see why we should need any knowledge from jemalloc to account for those.
Comment 21•3 years ago
|
||
(it's also not clear what the refcount accounts for, and it's not there in the vmmap)
Assignee | ||
Comment 22•3 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #20)
In https://phabricator.services.mozilla.com/D144702#inline-808294 you requested a vmmap of when this happens. I have attached vmmap.txt and log.txt Every entry is log.txt is a mapping this patch detects should be unique resident memory. In vmmap.txt you can find this address and see that it is a 16KB (1 page) mapping of the header of a jemalloc chunk, followed by a 976KB mapping of the payload section of that chunk. There is a guard page between the header and the payload section.
They all have SM=ZER or SM=NUL. I can see how the memory reporter might be missing them because we're not doing anything for SM_SHARED, but I also don't see why we should need any knowledge from jemalloc to account for those.
Are you saying that rather than ask jemalloc we should instead check if they're SM=ZER or SM=NUL, rather than SM=COW or SM=SHM? That would work, my intention was to ask jemalloc in order to avoid false-positives. It depends on how much we wish to avoid tightly coupling to jemalloc vs avoid false positives.
I saw the refcount of 2 in all of them by putting printfs in the memory reporter and inspecting that.
Comment 23•3 years ago
|
||
From my experimentation, it seems that using VM_REGION_EXTENDED_INFO
instead of VM_REGION_TOP_INFO
provides information that is more accurate for our needs, in that those vmmap sections appear as SM_PRIVATE_ALIASED
instead of SM_SHARED
, and have pages_resident
apparently with appropriate values.
As for the ref_count
, it seems to account the number of mappings that are preserved from the original mapping (so if you poke more holes, you get a larger ref_count
, and if you use madvise
+ mprotect
for the guard pages, the guard pages too count in the ref_count
).
I haven't looked in practice what happens for memory shared between processes.
Assignee | ||
Comment 24•3 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #23)
From my experimentation, it seems that using
VM_REGION_EXTENDED_INFO
instead ofVM_REGION_TOP_INFO
provides information that is more accurate for our needs, in that those vmmap sections appear asSM_PRIVATE_ALIASED
instead ofSM_SHARED
, and havepages_resident
apparently with appropriate values.As for the
ref_count
, it seems to account the number of mappings that are preserved from the original mapping (so if you poke more holes, you get a largerref_count
, and if you usemadvise
+mprotect
for the guard pages, the guard pages too count in theref_count
).I haven't looked in practice what happens for memory shared between processes.
That sounds better.
Yes, I had not worried about the cases where the ref-count is above two in case that would find false positives, even memory allocated by jemalloc then later shared to another process. But I was aware that more holes in the mapping would create more references and hoped there would be few of those.
Using VM_REGION_EXTENDED_INFO
sounds better. I'll modify my patch to try both and have it report when there are differences, so I can be confident with it and verify that we don't capture shared memory by mistake.
Assignee | ||
Comment 25•3 years ago
|
||
BTW @glandium,
When I search for these MacOS APIs I find basically zero documentation. The best sources are either header files or code from other open source projects that all the open-source world tends to share as some kind of shared oral knowledge. Do you have an authorative or otherwise good source for this information?
Thanks.
Comment 26•3 years ago
|
||
I looked at the xnu source, but not deep enough to figure out why it's doing what it's doing.
Comment 27•2 years ago
|
||
Comment 28•2 years ago
|
||
bugherder |
Updated•2 years ago
|
Comment 29•2 years ago
|
||
Perfherder has detected a awsy performance regression from push a23d45700fdb8bbaebb4bd3f764076f1db047e51.
There's no need for this regression to be addressed.
Regressions:
Ratio | Test | Platform | Options | Absolute values (old vs new) |
---|---|---|---|---|
98% | Base Content Resident Unique Memory | macosx1015-64-shippable-qr | fission | 7,936,170.67 -> 15,693,312.00 |
40% | Resident Memory | macosx1015-64-shippable-qr | fission tp6 | 710,827,420.23 -> 995,950,121.90 |
37% | Resident Memory | macosx1015-64-shippable-qr | fission tp6 | 714,100,910.64 -> 979,332,166.95 |
Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests.
If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a sheriff to do that for you.
For more information on performance sheriffing please see our FAQ.
Comment 30•2 years ago
|
||
Backed out of beta 104 as requested for causing bug 1779138
Backed out link:
https://hg.mozilla.org/releases/mozilla-beta/rev/4b1c4d84b63c1843611da3db161528e51d48ea54
Description
•