Closed Bug 1743781 Opened 3 years ago Closed 2 years ago

"Resident Unique Memory" measurement on macOS is suspicious

Categories

(Toolkit :: about:memory, defect)

All
macOS
defect

Tracking

()

RESOLVED FIXED
105 Branch
Tracking Status
firefox96 --- wontfix
firefox104 --- wontfix
firefox105 --- fixed

People

(Reporter: gcp, Assigned: pbone)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

Attachments

(3 files, 5 obsolete files)

Bug 1546442 should have caused a memory usage regression of about <0.5%. Instead, it's being reported as a 55% improvement in resident unique memory. (https://bugzilla.mozilla.org/show_bug.cgi?id=1546442#c41) This doesn't make any sense.

Looking at the code, this isn't a simple value we probe the OS about, but a complicated calculation:

https://searchfox.org/mozilla-central/rev/e9cd2997be1071b9bb76fc14df0f01a2bd721c30/xpcom/base/nsMemoryReporterManager.cpp#459

Based on the above, we should probably check if this is working correctly.

Severity: -- → S4

You could look at the implementation of psutil_proc_memory_uss in psutil. That was also done by Eric Rahm at around the same time, so I expect it is basically the same, but maybe they've diverged in the last 6 years.

Assignee: nobody → pbone
Status: NEW → ASSIGNED

Sharing an update / remind Monday-Paul what Friday-Paul was thinking:

When MacOS is asked to decommit a page within a region of memory using munmap. The kernel splits the private memory region into two shared memory regions (which map different pages). Which is why Bug 1546442 halves our reported resident unique memory. I don't know why MacOS doesn't special case anonomous memory and just make it two seperate private mappings. I'd say "Let's just count shared mappings towards our resident unique memory" but that would mean counting shared mappings with other processes. I'm looking for a way to distinguish this case.

In a pinch I guess we could map the pages specifically so that the guard page of Bug 1546442 does not split a mapping into two shared mappings. But there may be other code that uses this technique in Firefox to decommit memory. And I don't know that I can fix it in all situations. But for memory spidermonkey uses madvise rather than munmap, which might be okay but probably has the same error. Anyway, I'd rather work-around MacOS's reporting than change how we decommit memory.

Blocks: 1742828

Depends on D139803

Depends on D139804

Depends on D139805

Blocks: 1756885

When MacOS is asked to decommit a page within a region of memory using munmap. The kernel splits the private memory region into two shared memory regions (which map different pages). Which is why Bug 1546442 halves our reported resident unique memory.

Do you have a reduced testcase? because I can't reproduce this behavior independently in a dumb testcase.

Flags: needinfo?(pbone)

no, I observed it by adding printfs to jemalloc's chunk allocation (so I knew the addresses of chunks) and to the memory reporter in https://searchfox.org/mozilla-central/rev/e9cd2997be1071b9bb76fc14df0f01a2bd721c30/xpcom/base/nsMemoryReporterManager.cpp#459 I observed that in the memory reporter the jemalloc chunks were detected as shared before my patch and private after it. I expect after landing it ot receive a perf alert that resident-unique memory has doubled. I think I also noticed a difference in about:memory but I don't remember.

I should be able to push to try and see a difference in perfhearder there.

Flags: needinfo?(pbone)

Comment on attachment 9265665 [details]
Bug 1743781 - pt 1. Remove obsolete comment r=glandium

Revision D139803 was moved to bug 1760256. Setting attachment 9265665 [details] to obsolete.

Attachment #9265665 - Attachment is obsolete: true
Attachment #9265668 - Attachment description: Bug 1743781 - pt 4. On MacOS avoid punching a hole in a memory chunk r=glandium → Bug 1743781 - On MacOS avoid punching a hole in a memory chunk r=glandium
Attachment #9265666 - Attachment description: Bug 1743781 - pt 2. Use MADV_FREE_REUSABLE on MacOS r=glandium → WIP: Bug 1743781 - pt 2. Use MADV_FREE_REUSABLE on MacOS
Attachment #9265667 - Attachment description: Bug 1743781 - pt 3. Disable MALLOC_DOUBLE_PURGE on MacOS r=glandium → WIP: Bug 1743781 - pt 3. Disable MALLOC_DOUBLE_PURGE on MacOS

Comment on attachment 9265666 [details]
WIP: Bug 1743781 - pt 2. Use MADV_FREE_REUSABLE on MacOS

Revision D139804 was moved to bug 1760254. Setting attachment 9265666 [details] to obsolete.

Attachment #9265666 - Attachment is obsolete: true

Comment on attachment 9265667 [details]
WIP: Bug 1743781 - pt 3. Disable MALLOC_DOUBLE_PURGE on MacOS

Revision D139805 was moved to bug 1760254. Setting attachment 9265667 [details] to obsolete.

Attachment #9265667 - Attachment is obsolete: true

I should be able to push to try and see a difference in perfhearder there.

Did perfherder show something?

The latest test has a 5% performance regression for PDF tests.

Our options are:

  • Persist and loose performance on MacOS
  • Enable this only for nightly, normal performance for release but inaccurate memory counting.
  • Enable this only for nightly, no guard page between header & payload for MacOS.
  • Don't change the guard page creation, instead change how we check for private unique memory: If shared memory check for a guard page and if we find it assume the memory is private.
  • Change the way chunks are allocated so they come with the guard page already there, I worry that this is hard because of alignment constraints.
  • Query jemalloc directly to see if it knows about some chunk and therefore count it as resident unique.
Attachment #9274173 - Attachment is obsolete: true
Attachment #9265668 - Attachment is obsolete: true
Attached file vmmap.txt (deleted) —
Attached file log.txt (deleted) —

glandium,

In https://phabricator.services.mozilla.com/D144702#inline-808294 you requested a vmmap of when this happens. I have attached vmmap.txt and log.txt Every entry is log.txt is a mapping this patch detects should be unique resident memory. In vmmap.txt you can find this address and see that it is a 16KB (1 page) mapping of the header of a jemalloc chunk, followed by a 976KB mapping of the payload section of that chunk. There is a guard page between the header and the payload section.

When gcp landed Bug 1546442 he noticed a 55% improvement of our resident unique memory. This was because of this guard page causing a private mapping to be split into two shared mappings.

Flags: needinfo?(mh+mozilla)

In https://phabricator.services.mozilla.com/D144702#inline-808294 you requested a vmmap of when this happens. I have attached vmmap.txt and log.txt Every entry is log.txt is a mapping this patch detects should be unique resident memory. In vmmap.txt you can find this address and see that it is a 16KB (1 page) mapping of the header of a jemalloc chunk, followed by a 976KB mapping of the payload section of that chunk. There is a guard page between the header and the payload section.

They all have SM=ZER or SM=NUL. I can see how the memory reporter might be missing them because we're not doing anything for SM_SHARED, but I also don't see why we should need any knowledge from jemalloc to account for those.

Flags: needinfo?(mh+mozilla) → needinfo?(pbone)

(it's also not clear what the refcount accounts for, and it's not there in the vmmap)

(In reply to Mike Hommey [:glandium] from comment #20)

In https://phabricator.services.mozilla.com/D144702#inline-808294 you requested a vmmap of when this happens. I have attached vmmap.txt and log.txt Every entry is log.txt is a mapping this patch detects should be unique resident memory. In vmmap.txt you can find this address and see that it is a 16KB (1 page) mapping of the header of a jemalloc chunk, followed by a 976KB mapping of the payload section of that chunk. There is a guard page between the header and the payload section.

They all have SM=ZER or SM=NUL. I can see how the memory reporter might be missing them because we're not doing anything for SM_SHARED, but I also don't see why we should need any knowledge from jemalloc to account for those.

Are you saying that rather than ask jemalloc we should instead check if they're SM=ZER or SM=NUL, rather than SM=COW or SM=SHM? That would work, my intention was to ask jemalloc in order to avoid false-positives. It depends on how much we wish to avoid tightly coupling to jemalloc vs avoid false positives.

I saw the refcount of 2 in all of them by putting printfs in the memory reporter and inspecting that.

Flags: needinfo?(pbone) → needinfo?(mh+mozilla)

From my experimentation, it seems that using VM_REGION_EXTENDED_INFO instead of VM_REGION_TOP_INFO provides information that is more accurate for our needs, in that those vmmap sections appear as SM_PRIVATE_ALIASED instead of SM_SHARED, and have pages_resident apparently with appropriate values.

As for the ref_count, it seems to account the number of mappings that are preserved from the original mapping (so if you poke more holes, you get a larger ref_count, and if you use madvise + mprotect for the guard pages, the guard pages too count in the ref_count).

I haven't looked in practice what happens for memory shared between processes.

Flags: needinfo?(mh+mozilla)

(In reply to Mike Hommey [:glandium] from comment #23)

From my experimentation, it seems that using VM_REGION_EXTENDED_INFO instead of VM_REGION_TOP_INFO provides information that is more accurate for our needs, in that those vmmap sections appear as SM_PRIVATE_ALIASED instead of SM_SHARED, and have pages_resident apparently with appropriate values.

As for the ref_count, it seems to account the number of mappings that are preserved from the original mapping (so if you poke more holes, you get a larger ref_count, and if you use madvise + mprotect for the guard pages, the guard pages too count in the ref_count).

I haven't looked in practice what happens for memory shared between processes.

That sounds better.

Yes, I had not worried about the cases where the ref-count is above two in case that would find false positives, even memory allocated by jemalloc then later shared to another process. But I was aware that more holes in the mapping would create more references and hoped there would be few of those.

Using VM_REGION_EXTENDED_INFO sounds better. I'll modify my patch to try both and have it report when there are differences, so I can be confident with it and verify that we don't capture shared memory by mistake.

BTW @glandium,

When I search for these MacOS APIs I find basically zero documentation. The best sources are either header files or code from other open source projects that all the open-source world tends to share as some kind of shared oral knowledge. Do you have an authorative or otherwise good source for this information?

Thanks.

Flags: needinfo?(mh+mozilla)

I looked at the xnu source, but not deep enough to figure out why it's doing what it's doing.

Flags: needinfo?(mh+mozilla)
Blocks: 1775089
Pushed by pbone@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/a23d45700fdb The memory reporter will count jemalloc chunks as unique r=glandium
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 104 Branch

Perfherder has detected a awsy performance regression from push a23d45700fdb8bbaebb4bd3f764076f1db047e51.

There's no need for this regression to be addressed.

Regressions:

Ratio Test Platform Options Absolute values (old vs new)
98% Base Content Resident Unique Memory macosx1015-64-shippable-qr fission 7,936,170.67 -> 15,693,312.00
40% Resident Memory macosx1015-64-shippable-qr fission tp6 710,827,420.23 -> 995,950,121.90
37% Resident Memory macosx1015-64-shippable-qr fission tp6 714,100,910.64 -> 979,332,166.95

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests.

If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a sheriff to do that for you.

For more information on performance sheriffing please see our FAQ.

Regressions: 1779138

Backed out of beta 104 as requested for causing bug 1779138

Backed out link:
https://hg.mozilla.org/releases/mozilla-beta/rev/4b1c4d84b63c1843611da3db161528e51d48ea54

Target Milestone: 104 Branch → 105 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: