Assertion failure: length > 0 && length % pageSize == 0, at /mnt/dan/firefox.git/js/src/gc/Memory.cpp:730
Categories
(Core :: JavaScript: GC, defect, P3)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr60 | --- | unaffected |
firefox65 | --- | unaffected |
firefox66 | --- | fixed |
firefox67 | --- | fixed |
People
(Reporter: dan, Assigned: ehoogeveen)
References
Details
(Keywords: regression)
Attachments
(1 file, 2 obsolete files)
(deleted),
patch
|
ehoogeveen
:
review+
lizzard
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
I see the following assertion on starting current trunk FF on Fedora 28 on ppc64le (64KB page size)
Assertion failure: length > 0 && length % pageSize == 0, at /mnt/dan/firefox.git/js/src/gc/Memory.cpp:730
The backtrace is
#0 0x00007fffa6189d5c in raise () at /lib64/libpthread.so.0
#1 0x00007fff9c41ad38 in nsProfileLock::FatalSignalHandler(int, siginfo_t*, void*) (signo=<optimized out>, info=<optimized out>, context=0x7fffe9a13f80)
at /mnt/dan/firefox.git/toolkit/profile/nsProfileLock.cpp:165
#2 0x00007fff9cc41520 in js::UnixExceptionHandler(int, siginfo_t*, void*) (signum=<optimized out>, info=0x7fffe9a14cf8, context=0x7fffe9a13f80)
at /mnt/dan/firefox.git/js/src/ds/MemoryProtectionExceptionHandler.cpp:273
#3 0x00007fffa61f04d8 in <signal handler called> () at arch/powerpc/kernel/vdso64/sigtramp.S
#4 0x00007fff9cf43178 in js::gc::MarkPagesUnused(void*, unsigned long) (region=<optimized out>, length=<optimized out>) at /mnt/dan/firefox.git/js/src/gc/Memory.cpp:730
#5 0x00007fff9ce82370 in js::gc::Chunk::decommitAllArenas() (this=0x2c9de800000) at /mnt/dan/firefox.git/js/src/gc/Allocator.cpp:761
#6 0x00007fff9ce82418 in js::gc::Chunk::init(JSRuntime*) (this=0x2c9de800000, rt=0x7fff92418000) at /mnt/dan/firefox.git/js/src/gc/Allocator.cpp:750
#7 0x00007fff9ce91948 in js::gc::GCRuntime::pickChunk(js::AutoLockGCBgAlloc&) (this=0x7fff924186a0, lock=...) at /mnt/dan/firefox.git/js/src/gc/Allocator.cpp:683
#8 0x00007fff9cea17c8 in js::gc::ArenaLists::refillFreeListAndAllocate(js::gc::FreeLists&, js::gc::AllocKind, js::gc::ShouldCheckThresholds) (this=0x7fff92562098, freeLists=..., thingKind=js::gc::AllocKind::ATOM, checkThresholds=js::gc::ShouldCheckThresholds::CheckThresholds) at /mnt/dan/firefox.git/js/src/gc/Allocator.cpp:477
#9 0x00007fff9cea19a0 in js::gc::GCRuntime::refillFreeListFromMainThread(JSContext*, js::gc::AllocKind) (cx=0x7fff92561000, thingKind=js::gc::AllocKind::ATOM)
at /mnt/dan/firefox.git/js/src/gc/Allocator.cpp:421
#10 0x00007fff9cea1c38 in js::gc::GCRuntime::refillFreeListFromAnyThread(JSContext*, js::gc::AllocKind) (cx=0x7fff92561000, thingKind=js::gc::AllocKind::ATOM)
at /mnt/dan/firefox.git/js/src/gc/Allocator.cpp:409
#11 0x00007fff9cf0c9b8 in js::gc::GCRuntime::tryNewTenuredThing<js::NormalAtom, (js::AllowGC)0>(JSContext*, js::gc::AllocKind, unsigned long) (cx=0x7fff92561000, kind=<optimized out>, thingSize=<optimized out>) at /mnt/dan/firefox.git/js/src/vm/JSContext.h:207
#12 0x00007fff9cf0ca2c in js::Allocate<js::NormalAtom, (js::AllowGC)0>(JSContext*) (cx=0x7fff92561000) at /mnt/dan/firefox.git/js/src/vm/JSContext.h:288
#13 0x00007fff9cb0a708 in JSThinInlineString::new_<(js::AllowGC)0>(JSContext*) (cx=0x7fff92561000)
at /mnt/dan/firefox.git/obj-powerpc64le-unknown-linux-gnu/dist/include/mozilla/Assertions.h:38
#14 0x00007fff9cb0a708 in js::AllocateInlineString<(js::AllowGC)0, unsigned char> (chars=<synthetic pointer>, len=1, cx=0x7fff92561000) at /mnt/dan/firefox.git/js/src/vm/StringType-inl.h:34
#15 0x00007fff9cb0a708 in js::NewInlineString<(js::AllowGC)0, unsigned char>(JSContext*, mozilla::Range<unsigned char const>) (cx=0x7fff92561000, chars=...)
at /mnt/dan/firefox.git/js/src/vm/StringType-inl.h:61
#16 0x00007fff9cb0a988 in js::StaticStrings::init(JSContext*) (this=0x7fff91450000, cx=0x7fff92561000) at /mnt/dan/firefox.git/js/src/vm/StringType.cpp:1130
#17 0x00007fff9c870f40 in JSRuntime::initializeAtoms(JSContext*) (this=0x7fff92418000, cx=0x7fff92561000) at /mnt/dan/firefox.git/js/src/vm/JSAtom.cpp:216
#18 0x00007fff9cc6d0e8 in JS::InitSelfHostedCode(JSContext*) (cx=0x7fff92561000) at /mnt/dan/firefox.git/js/src/jsapi.cpp:438
#19 0x00007fff9694205c in nsXPConnect::InitStatics() () at /mnt/dan/firefox.git/js/xpconnect/src/nsXPConnect.cpp:141
#20 0x00007fff968e9fe4 in xpcModuleCtor() () at /mnt/dan/firefox.git/js/xpconnect/src/XPCModule.cpp:11
#21 0x00007fff9ac99928 in nsLayoutModuleInitialize() () at /mnt/dan/firefox.git/layout/build/nsLayoutModule.cpp:237
#22 0x00007fff9545bb10 in nsComponentManagerImpl::Init() (this=0x7fff93760160) at /mnt/dan/firefox.git/xpcom/components/nsComponentManager.cpp:361
#23 0x00007fff95500d80 in NS_InitXPCOM2(nsIServiceManager**, nsIFile*, nsIDirectoryServiceProvider*) (aResult=0x7fffa29309d0, aBinDirectory=<optimized out>, aAppFileLocationProvider=<optimized out>) at /mnt/dan/firefox.git/xpcom/build/XPCOMInit.cpp:626
#24 0x00007fff9c42e6f4 in ScopedXPCOMStartup::Initialize() (this=0x7fffa29309d0) at /mnt/dan/firefox.git/toolkit/xre/nsXREDirProvider.h:77
#25 0x00007fff9c43c960 in XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&) (this=0x7fffe9a157d0, argc=4, argv=0x7fffe9a16ed8, aConfig=...)
at /mnt/dan/firefox.git/toolkit/xre/nsAppRunner.cpp:4522
#26 0x00007fff9c43d2d4 in XRE_main(int, char**, mozilla::BootstrapConfig const&) (argc=<optimized out>, argv=0x7fffe9a16ed8, aConfig=...)
at /mnt/dan/firefox.git/toolkit/xre/nsAppRunner.cpp:4610
#27 0x00007fff9c44d160 in mozilla::BootstrapImpl::XRE_main(int, char**, mozilla::BootstrapConfig const&) (this=<optimized out>, argc=<optimized out>, argv=<optimized out>, aConfig=...)
at /mnt/dan/firefox.git/toolkit/xre/Bootstrap.cpp:39
#28 0x00000001154d7138 in do_main(int, char**, char**) (argc=4, argv=0x7fffe9a16ed8, envp=<optimized out>) at /mnt/dan/firefox.git/browser/app/nsBrowserApp.cpp:214
#29 0x00000001154d7264 in main(int, char**, char**) (argc=<optimized out>, argv=0x7fffe9a16ed8, envp=0x7fffe9a16f00) at /mnt/dan/firefox.git/browser/app/nsBrowserApp.cpp:293
My last runtime test of FF was on January 14 (no problems), so there was a change in last 8 days that broke it. Will try to look further into it.
Comment 1•6 years ago
|
||
Emanuel has been in that code recently, 64KB pages may not have been tested very well.
Comment 2•6 years ago
|
||
This is unexpected, the page size should be the allocation granularity of the GC, and as such should be the protection granularity as well.
I wonder what the length and pageSize values are at the entry of js::gc::MarkPagesUnused when it crashes.
(In reply to Lars T Hansen [:lth] from comment #1)
Emanuel has been in that code recently, 64KB pages may not have been tested very well.
I do not know why the MemoryReporter does on the stack, but the assertion raised is from the js::gc::MarkPagesUnused function.
(In reply to Nicolas B. Pierron [:nbp] from comment #2)
This is unexpected, the page size should be the allocation granularity of the GC, and as such should be the protection granularity as well.
I wonder what the length and pageSize values are at the entry of js::gc::MarkPagesUnused when it crashes.
after adding a printf() I get
region=0x1d927f900000, length=fc000, pageSize=10000
last 2 numbers are in hexa too
Comment 4•6 years ago
|
||
I'm suspecting bug 1502733.
Comment 5•6 years ago
|
||
I have a reproducing build up. In Chunk::decommitAllArenas()
, ArenasPerChunk
is 252, not 256. That yields the short page size and triggers the assert. However, fixing that causes another assert in Chunk::fetchNextDecommittedArena()
.
Comment 6•6 years ago
|
||
I'm trying to debug this further, but I'm not understanding what the invariants are, so I need some clarification. ppc64le (I'm on a POWER9 Talos II) uses 64K pages natively. The assertions in js::gc::MarkPagesUnused
and js::gc::MarkPagesInUse
seem to require that regions be aligned on a native page boundary, and that lengths be an integral multiple of the page size. Is this correct? If so, why are arenas 4K? Should they be 64K on ppc64le as well so that the call to MarkPagesInUse
in js::gc::Chunk::fetchNextDecommittedArena
, which uses ArenaSize
, is using the page size?
Comment 7•6 years ago
|
||
Incidentally, when I go ahead and set arenas to 64K, I start failing a crapload of static asserts.
Comment 8•6 years ago
|
||
After some more fiddling with it, I'm suspicious that the assertions are just not correct for anything other than a 4K page size. When I comment them out, the browser starts and seems to operate normally. Dan, does the attached patch seem to fix your build?
If it does, I can try to tinker with them to make them more generalized. Failing that, I can just do a bandaid patch to disable them on ppc64le for the time being.
Of course, the assertions could be pointing out a real problem, but if that's so I still need to know what the invariants I should be shooting for are.
Assignee | ||
Comment 9•6 years ago
|
||
Arenas being 4k is fairly fundamental to the GC, making it dynamic is probably not a short term thing (assuming we would even want 64k arenas).
The assertion I added here [1] is just wrong - we shouldn't expect the address argument to match the alignment of the page size when our arenas don't. I'll get a fix up after work today.
[1] https://hg.mozilla.org/mozilla-central/annotate/69a279908152/js/src/gc/Memory.cpp#l729
Comment 10•6 years ago
|
||
Questions from comment 6 are for people with better knowledge than me about the internal of our GC, unless Emanuel provided the right answer already. Forwarding to Jon.
Comment 11•6 years ago
|
||
(In reply to Cameron Kaiser [:spectre] from comment #6)
The assertions in
js::gc::MarkPagesUnused
andjs::gc::MarkPagesInUse
seem to require that regions be aligned on a native page boundary, and that lengths be an integral multiple of the page size. Is this correct? If so, why are arenas 4K? Should they be 64K on ppc64le as well so that the call toMarkPagesInUse
injs::gc::Chunk::fetchNextDecommittedArena
, which usesArenaSize
, is using the page size?
We don't support decommit on systems where the page size is not the same size as an arena (4KB). The recent changes to that file added assertions before the DecommitEnabled() check, which is what I think has caused this.
Comment 12•6 years ago
|
||
(In reply to Jon Coppeard (:jonco) from comment #11)
(In reply to Cameron Kaiser [:spectre] from comment #6)
If so, why are arenas 4K? Should they be 64K on ppc64le as well so that the call to
MarkPagesInUse
injs::gc::Chunk::fetchNextDecommittedArena
, which usesArenaSize
, is using the page size?We don't support decommit on systems where the page size is not the same size as an arena (4KB). The recent changes to that file added assertions before the DecommitEnabled() check, which is what I think has caused this.
That makes sense, but clearly something is trying to decommit the pages, because decommitAllArenas()
is in the call stack (I can confirm this on my own build). So are there two bugs, i.e., a bad assertion, and that we're trying to decommit on a system that doesn't support it?
Comment 13•6 years ago
|
||
(In reply to Cameron Kaiser [:spectre] from comment #12)
Well the way we handle systems where we can't decommit is by making those functions do nothing rather than by not calling them at all. So this is expected, but can certainly be improved.
Assignee | ||
Comment 14•6 years ago
|
||
This relaxes some overzealous assertions to check that the arguments are 4k (ArenaSize) aligned instead. This is what the APIs actually care about.
Assignee | ||
Comment 15•6 years ago
|
||
Dan, could you check if the assertion failures are gone with the patch in comment #14 applied?
Comment 16•6 years ago
|
||
(In reply to Cameron Kaiser [:spectre] from comment #6)
ppc64le (I'm on a POWER9 Talos II) uses 64K pages natively.
Smol comment that in fact, POWER9 can do 4K page sizes as well, and both Gentoo and Adélie Linux ship 4K page size kernels, and Debian has a community-maintained repository with 4K page sizes. This is necessary for proper operation of nouveau, some SAS chips, etc...
Comment 17•6 years ago
|
||
(In reply to Emanuel Hoogeveen [:ehoogeveen] from comment #15)
Dan, could you check if the assertion failures are gone with the patch in comment #14 applied?
I'm not Dan, but the patch works for me on my Talos II (Fedora 29). Thanks!
Comment 18•6 years ago
|
||
Comment 19•6 years ago
|
||
allocGranularity
on this 64K page POWER9 is also 64K (confirmed by printf
in MarkPagesUnused
).
Assignee | ||
Comment 20•6 years ago
|
||
In general I think the assumption is that allocGranularity >= pageSize >= ArenaSize. Allocation granularity only really exists as a separate concept on Windows, where it's 64k for historical reasons [1]. On Posix platforms we just set it to the same value as the page size, because you definitely can't allocate less than a page. In practice, I don't think there exist systems with a page size of less than 4k (= ArenaSize), though this could theoretically happen.
The smallest unit on which MarkPagesUnused and its twin can operate is the page size - you can't mark half a page unused, et cetera. The allocation granularity isn't really relevant here, although decommitting/recommitting memory that spans multiple allocation is not legal on Windows. Now we could do something like look at the size and alignment of the request and only honor the part that spans full pages, but in theory MarkPagesUnused and MarkPagesInUse should be paired - you shouldn't touch pages without telling the OS you want them again. In practice because we do a soft decommit, MarkPagesInUse actually does very little, but it still feels fishy to do that without an explicit contract with the caller (which might start using its memory again in any random order, not necessarily with page granularity). To get around that we'd have to internally track which pages have been decommitted, and eagerly recommit them when the caller touches any part of them through MarkPagesInUse.
So instead we've decided to sidestep the whole issue (which primarily affects the GC with its 4k arenas) by disabling both functions if ArenaSize != pageSize, and so that's what the assertions should check. There could theoretically be a problem if pageSize < ArenaSize - the most correct check would be to have the assertions use std::min(pageSize, ArenaSize) - but given that this is already a workaround it felt overzealous to care about theoretical platforms with such a small page size (yes, I know I just argued in favor of caring about the mostly theoretical pairing of MarkPagesUnused and MarkPagesInUse - but we do already pair them for the sake of tools like Valgrind, and anyway I think it would be a surprising implementation difference from how a hard decommit/recommit would work).
[1] https://blogs.msdn.microsoft.com/oldnewthing/20031008-00/?p=42223/
Reporter | ||
Comment 21•6 years ago
|
||
My feedback will be delayed till next week, I'm on a conference now.
Updated•6 years ago
|
Assignee | ||
Comment 22•6 years ago
|
||
Comment 23•6 years ago
|
||
Can we get the modified asserts, at least, on tree in time for beta and spin off any additional refactoring to another bug? Since these are release asserts, any system with a non-4K page size (which includes some aarch64 systems as well as ppc64) will be dead in the water until then.
Updated•6 years ago
|
Comment 24•6 years ago
|
||
Assignee | ||
Comment 25•6 years ago
|
||
Hmm, they definitely won't work that way around - checking for page alignment is what effectively caused the crashes in this bug. The problem is that the GC calls this function on Arenas, which are 4k-aligned but not necessarily pageSize aligned.
I could rename the function to something more obvious like MarkArenas{Unused,InUse} or add those as separate variants that do assert ArenaSize-alignment. I think I'll do the latter just to be perfectly clear; adjusting the callers looks like a pain and possibly undesirable.
Comment 26•6 years ago
|
||
(In reply to Emanuel Hoogeveen [:ehoogeveen] from comment #25)
Hmm, they definitely won't work that way around - checking for page alignment is what effectively caused the crashes in this bug. The problem is that the GC calls this function on Arenas, which are 4k-aligned but not necessarily pageSize aligned.
I could rename the function to something more obvious like MarkArenas{Unused,InUse} or add those as separate variants that do assert ArenaSize-alignment. I think I'll do the latter just to be perfectly clear; adjusting the callers looks like a pain and possibly undesirable.
Doh! You're right, I got those backwards. Let me try again:
MarkPages() {
MOZ_RELEASE_ASSERT(aligned to ArenaSize, "caller screwed up");
if (!DecommitEnabled()) return false;
MOZ_RELEASE_ASSERT(aligned to PageSize, "we're going to affect more than we want to");
}
The messaging is the same, but for somewhat the opposite reasoning: we expect the caller to be making request in terms of arenas, or something is wrong. Then, if we're actually going to do a decommit, we'd better be asking for an aligned multiple of pagesize or other things will be affected. This code would still be valid if at some point later we set ArenaSize to 2*PageSize, for example, and allowed decommit in that case.
I suppose the clearest of all might be
MarkPages() {
MOZ_RELEASE_ASSERT(aligned to ArenaSize, "caller screwed up");
if (ArenaSize != PageSize) {
/* Do nothing; we can't decommit part of a page. */
} else {
...do the decommit...
}
}
but with either of these, I'm ok with the function name staying as-is. The contract might be to only call it in terms of arenas, but its function is to mark some (whole) number of pages, so it's all good.
Assignee | ||
Comment 27•6 years ago
|
||
Carrying forward r+. I added the 2nd set of assertions as suggested and relaxed the ArenaSize assert into a regular MOZ_ASSERT - these functions are pretty heavily tested so a regular assert should be enough to catch API abuse, and there's no sense in asserting twice in opt builds.
Build only try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=27ebb9788fade7763cb9f1fa0af5f9f33f0e1aa6
Comment 29•6 years ago
|
||
Pushed by csabou@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/4d6a44b35acc
Relax overzealous assertions on memory alignment in commit and decommit functions. r=sfink
Comment 30•6 years ago
|
||
bugherder |
Reporter | ||
Comment 31•6 years ago
|
||
no assertion with trunk, sorry for late reply
Assignee | ||
Comment 32•6 years ago
|
||
No problem, thanks for checking! I'll see about getting this uplifted.
Assignee | ||
Comment 33•6 years ago
|
||
Comment on attachment 9040515 [details] [diff] [review]
v2 - Relax overzealous assertions on memory alignment in commit and decommit functions.
Beta/Release Uplift Approval Request
Feature/Bug causing the regression
User impact if declined
Crashes on systems with a page size greater than 4kiB
Is this code covered by automated tests?
Yes
Has the fix been verified in Nightly?
Yes
Needs manual test from QE?
No
If yes, steps to reproduce
List of other uplifts needed
None
Risk to taking this patch
Low
Why is the change risky/not risky? (and alternatives if risky)
This relaxes assertions added in bug 1502733.
String changes made/needed
Comment 34•6 years ago
|
||
Thank you for requesting the uplift!
Updated•6 years ago
|
Comment 36•6 years ago
|
||
bugherder uplift |
Description
•