Closed
Bug 969012
Opened 11 years ago
Closed 11 years ago
Figure out why MNewSlot sometimes leaks in GGC builds
Categories
(Core :: JavaScript Engine, defect)
Core
JavaScript Engine
Tracking
()
RESOLVED
FIXED
mozilla32
People
(Reporter: terrence, Assigned: terrence)
References
(Blocks 1 open bug)
Details
Attachments
(3 files, 13 obsolete files)
(deleted),
patch
|
terrence
:
review+
terrence
:
checkin+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
terrence
:
review+
terrence
:
checkin+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
jandem
:
review+
|
Details | Diff | Splinter Review |
It seems to be intermittent, so maybe a race?
Assignee | ||
Comment 1•11 years ago
|
||
Right, the basic problem here is that I accidentally dropped notifyInitialSlots when fixing the way we allocate initial slots in the nursery in the interpreter. The attached patch fixes this, but I'm going to keep the bug open to fix the underlying problem as well.
The way we implement allocation in the JIT doesn't quite match up with either our old or new behavior in the interpreter. The current situation: we never allocate in the jit if the object/array requires malloc. This is controlled by MNewFoo::shouldUseVM. The one exception is for CallObject, which I guess is critical for EarleyBoyer. In the case of CallObject, we create an MNewSlots, which does a callVM to js_malloc, then links its exit to MNewCallObject which follows the normal inline object path, then overwrites the inline slots pointer.
I think we should push this behavior down into the allocator itself and allow any random object with external slots to be jit allocated. I'll try to whip something up this week.
Attachment #8373511 -
Flags: review?(jcoppeard)
Comment 2•11 years ago
|
||
Comment on attachment 8373511 [details] [diff] [review]
notify_initial_slots-v0.diff
Review of attachment 8373511 [details] [diff] [review]:
-----------------------------------------------------------------
Right, yes this meant we didn't track slots that were externally allocated and passed in to JSObject::create().
Attachment #8373511 -
Flags: review?(jcoppeard) → review+
Assignee | ||
Comment 3•11 years ago
|
||
Assignee | ||
Updated•11 years ago
|
Whiteboard: [leave open
Comment 4•11 years ago
|
||
Whiteboard: [leave open → [leave open]
Assignee | ||
Comment 5•11 years ago
|
||
This patch simplifies the interpreter side of things by removing the externalSlots parameter from JSObject::create. On the jit side we still allocate and use the MNewSlots slots, leaking the interpreter-allocated slots if we happen to take the non-jit path. There are more dependencies to remove before we can kill off the MNewSlots and plug the leak, but otherwise jit-tests and octane pass.
Note: I'll be squashing this entire series of 5 patches together before landing, but it should be much easier to review split up like this.
Note2: This series is more or less performance neutral currently; I still need to apply the new optimization to object allocations other than CallObject.
Attachment #8375801 -
Flags: review?(jdemooij)
Assignee | ||
Comment 6•11 years ago
|
||
Part 2 of 5: refactor the jit GC object allocator. This is mostly code motion an comment fixes. It also adds the ability to allocate slots inline in the nursery. With this patch we now leak the interpreter allocated slots /and/ the new inline nursery slots. Otherwise, octane and jit-tests still pass.
Attachment #8375805 -
Flags: review?(jdemooij)
Assignee | ||
Comment 7•11 years ago
|
||
Part 3 of 5: teach the jit how to malloc (and free on error) external slots when allocating in the tenured generation. This is required so that we can kill off MNewSlots without taking a 1000 point performance hit on EarleyBoyer when GGC is disabled. I think after GGC is permanently enabled we should rip out this extra complexity.
Attachment #8375808 -
Flags: review?(jdemooij)
Assignee | ||
Comment 8•11 years ago
|
||
Part 4 of 5: rip out MNewSlots.
12 files changed, 39 insertions(+), 232 deletions(-)
Unfortunately, PJS was depending on MNewSlots, however I believe this is more an artifact of cargo culting than intention. I needed to trim down the size of two of their tests to ensure they stay compiling with only inline slots in CallObjects. It should be easy enough to fix if they were actually depending on this for performance, but I guess they don't have any key benchmarks yet.
Attachment #8375815 -
Flags: review?(jdemooij)
Assignee | ||
Comment 9•11 years ago
|
||
Part 5 of 5: test the new code.
It turns out the JSAPI methods for toggling GGC had rotted quite a bit and weren't exposed in the shell either. The tests themselves are copies of date-format-tofte with greatly stripped down iteration counts. I used masm.printf to ensure that the relevant allocation paths are getting hit. I tested the free path manually, but was not able to reduce a reliable test for it: oomAfterALlocations is a hard tool to use manually. I will fold and request fuzzing once I get review on the series.
Attachment #8375820 -
Flags: review?(jdemooij)
Comment 10•11 years ago
|
||
Comment on attachment 8375801 [details] [diff] [review]
1_of_5_stop_accepting_external_slots-v0.diff
Review of attachment 8375801 [details] [diff] [review]:
-----------------------------------------------------------------
Nice simplification.
::: js/src/jit/VMFunctions.cpp
@@ +525,5 @@
> }
>
> JSObject *
> NewCallObject(JSContext *cx, HandleScript script,
> + HandleShape shape, HandleTypeObject type)
Nit: this fits on one line now I think.
Attachment #8375801 -
Flags: review?(jdemooij) → review+
Comment 11•11 years ago
|
||
Comment on attachment 8375805 [details] [diff] [review]
2_of_5_refactor_jit_allocator-v0.diff
Review of attachment 8375805 [details] [diff] [review]:
-----------------------------------------------------------------
::: js/src/jit/IonMacroAssembler.cpp
@@ +733,5 @@
> +{
> + checkAllocatorState(fail);
> + if (nurseryAllocate(result, allocKind, fail, nDynamicSlots, initialHeap))
> + return;
> + freeSpanAllocate(result, allocKind, fail);
Nice.
Attachment #8375805 -
Flags: review?(jdemooij) → review+
Comment 12•11 years ago
|
||
Comment on attachment 8375808 [details] [diff] [review]
3_of_5_malloc_external_slots_for_tenured_jit_allocs-v0.diff
Review of attachment 8375808 [details] [diff] [review]:
-----------------------------------------------------------------
Looks great. r=me with comments addressed.
I agree we should take this out once GGC is enabled, but for now this looks fine.
::: js/src/jit/CodeGenerator.cpp
@@ +145,5 @@
> +// API call to malloc for slots.
> +template <typename LIR>
> +class OutOfLineMallocSlots : public OutOfLineCodeBase<CodeGenerator>
> +{
> + LIR *lir_;
This member is not used, can we remove it + the LIR template parameter? Same for the other OOL class.
@@ +259,5 @@
> +// objects that require them. In the case that such paths are not required, the
> +// given labels are returned as nullptr.
> +template <typename T>
> +bool
> +CodeGenerator::emitOutOfLineMallocFree(T *lir, const Register ®, size_t nDynamicSlots,
You can remove the lir argument + LIR template parameter here IIUC. Also from visitOutOfLineMallocSlots + visitOutOfLineFreeSlots.
@@ +275,5 @@
> + if (!addOutOfLineCode(oolMalloc))
> + return false;
> +
> + OutOfLineFreeSlots<T> *oolFree =
> + new(alloc()) OutOfLineFreeSlots<T>(lir, reg, fallback);
Nit: fits on one line.
::: js/src/jit/IonMacroAssembler.cpp
@@ +758,5 @@
> + push(result);
> +
> + // Allocate the object and write the malloced slots.
> + freeSpanAllocate(result, allocKind, &allocFail);
> + pop(Operand(Address(result, JSObject::offsetOfSlots())));
Please add |pop(const Address &dest)| to Assembler-x86-shared.h (just like the |push(const Address &src)| there), so that you don't need the extra Operand(...).
We should also add it to MacroAssembler-arm.h, there you can use the scratch register I think. The ARM simulator can be used to test this (build an x86 shell and pass --enable-arm-simulator to configure).
@@ +897,5 @@
> + uint32_t nslots = templateObject->lastProperty()->slotSpan(templateObject->getClass());
> + if (nslots == 0)
> + return;
> +
> + //masm.moveValue(UndefinedValue(), reg); then masm.storePtr(reg, ...)
Nit: remove this line.
Attachment #8375808 -
Flags: review?(jdemooij) → review+
Updated•11 years ago
|
Attachment #8375815 -
Flags: review?(jdemooij) → review+
Comment 13•11 years ago
|
||
Comment on attachment 8375820 [details] [diff] [review]
5_of_5_tests-v0.diff
Review of attachment 8375820 [details] [diff] [review]:
-----------------------------------------------------------------
::: js/src/jit-test/tests/ion/callobj-nursery-slots.js
@@ +1,1 @@
> +// |jit-test| tz-pacific
The other test has disableGenerationalGC(), but does this one check anything not tested by check-date-format-tofte?
::: js/src/jsfriendapi.cpp
@@ +946,2 @@
> #ifdef JSGC_GENERATIONAL
> + if (!IsGenerationalGCEnabled(rt)) {
Good catch.
Attachment #8375820 -
Flags: review?(jdemooij) → review+
Assignee | ||
Comment 14•11 years ago
|
||
(In reply to Jan de Mooij [:jandem] from comment #12)
> ::: js/src/jit/CodeGenerator.cpp
> @@ +145,5 @@
> > +// API call to malloc for slots.
> > +template <typename LIR>
> > +class OutOfLineMallocSlots : public OutOfLineCodeBase<CodeGenerator>
> > +{
> > + LIR *lir_;
>
> This member is not used, can we remove it + the LIR template parameter? Same
> for the other OOL class.
D'oh! I was using that to only sync the actually live registers using the safepoint, but forgot to remove it when I realized it's not actually that useful.
(In reply to Jan de Mooij [:jandem] from comment #13)
> Comment on attachment 8375820 [details] [diff] [review]
> ::: js/src/jit-test/tests/ion/callobj-nursery-slots.js
> @@ +1,1 @@
> > +// |jit-test| tz-pacific
>
> The other test has disableGenerationalGC(), but does this one check anything
> not tested by check-date-format-tofte?
This test gets us out-of-line CallObject slots in the nursery. The other test disables the nursery, forcing those slots to use the malloc path. Otherwise, it is very hard to test the fallback in --enable-gcgenerational builds because you need a singleton call object.
> ::: js/src/jsfriendapi.cpp
> @@ +946,2 @@
> > #ifdef JSGC_GENERATIONAL
> > + if (!IsGenerationalGCEnabled(rt)) {
>
> Good catch.
I'm pretty sure I added that code in the first place. O_O
========
I spent all of yesterday taking a closer look at allocation in Octane. We have 9 ways to create an object in the jit and their relative usage looks like this:
Opcode calls | percentage
--------------------------------------------
CreateThisWithTemplate 53,312,388 | 62.98%
NewArray 10,231,017 | 12.08%
NewObject 8,778,040 | 10.37%
NewCallObject 7,557,391 | 8.93%
Lambda 4,760,668 | 5.62%
Array::concat 8,797 | 0.01%
NewDeclEnvObject 471 | 0%
NewStringObject 0 | 0%
Rest 0 | 0%
========================================
Total 84648772 | 100%
Of the 62% CreateThisWithTemplate, 19,670,001 (e.g. 37%) of them occur in EarleyBoyer with a size class of OBJECT2. Of the NewCallObject, 4,964,684 (e.g. 66%) of them occur in EarleyBoyer. Of those, only a couple thousand use dynamic slots, needing this optimization. However, if we do just disable this optimization (e.g. use callVM instead of callAPI), we lose about 1000 points on EarleyBoyer.
I thought it was simple noise at first, but with the above series applied we lose about 50 points on EB. The only difference in codegen is that the new code inverts the stores for |slots| and |elements|, making them occur out-of-order. I need to play around a bit more here. Should have a patch soon that at the very least fixes our ordering here.
Assignee | ||
Comment 15•11 years ago
|
||
Fixing the object initialization write order fixed the regression on EB. Moreover, requesting an extra temp register on all of the above allocators did not affect Octane performance. In combination, I think this means we can get a big boost to allocation performance by making use of a second register. In particular it would let us kill at least 27M unneeded loads of a constant UndefinedValue into a register, even on x86, and take at least one sub out of the hottest allocation path. I'll work on this next.
Assignee | ||
Comment 16•11 years ago
|
||
Great success; with this patch and the next, we gain ~100-300 points on EarleyBoyer. This patch adds the extra register to the allocation paths, but doesn't do anything with it. As far as I can tell running locally, this has no effect on octane performance.
Attachment #8376613 -
Flags: review?(jdemooij)
Comment 17•11 years ago
|
||
Comment on attachment 8376613 [details] [diff] [review]
6_of_5_add_extra_temp_register_for_allocator-v0.diff
Review of attachment 8376613 [details] [diff] [review]:
-----------------------------------------------------------------
::: js/src/jit/IonMacroAssembler.h
@@ +773,5 @@
> void checkAllocatorState(Label *fail);
> + bool nurseryAllocate(const Register &result, const Register &temp, gc::AllocKind allocKind,
> + Label *fail, size_t nDynamicSlots, gc::InitialHeap initialHeap);
> + void freeSpanAllocate(const Register &result, const Register &temp, gc::AllocKind allocKind,
> + Label *fail);
Nit: we use both "const Register ®" and "Register reg" for Register arguments. I think Register is ok as it's a small class that fits in 32-bits, and using that for these methods would make these signatures a bit shorter/more legible.
Attachment #8376613 -
Flags: review?(jdemooij) → review+
Assignee | ||
Comment 18•11 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/bea3f06585ec
Green try run at: https://tbpl.mozilla.org/?tree=Try&rev=3528ebd9858b
(In reply to Jan de Mooij [:jandem] from comment #13)
> ::: js/src/jit-test/tests/ion/callobj-nursery-slots.js
> @@ +1,1 @@
> > +// |jit-test| tz-pacific
>
> The other test has disableGenerationalGC(), but does this one check anything
> not tested by check-date-format-tofte?
No, it isn't. I'm not sure why I misread the question before, but I've removed the new duplicate test now.
Assignee | ||
Comment 19•11 years ago
|
||
We can also get rid of one more sub in the CallObject-with-extended-slots path with our new temp reg and get another 50 points on EarleyBoyer on top of the slot-initialization gains. This may not be a huge performance win, but it is a pretty nice simplicity win. I just hope the extra register doesn't regress x86. Try run is up to check that: https://tbpl.mozilla.org/?tree=Try&rev=a9c9e9e42e0b
I left out the loop-based initialization since there are only a handful of objects that would hit that path currently. I tried to make the CreateThisWithTemplate path use extended slots to test this, but we go to extreme lengths to hide these from IonBuilder entirely. I have yet to figure out how to make that stop happening; e.g. if I make grow-slots on the various CreateThisWithTemplate template objects work as expected, we get several thousand templates floating around that would use extended slots, but somehow none end up flowing into compilations. I expect there is probably a hardcoded test in one of the analysis passes that I haven't found yet.
I think at this point NewArray is likely to be lower hanging fruit, so I'll probably try that next.
Assignee | ||
Comment 20•11 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/9d1e797181e3
This seems to be failing on m-i, even though my local tests ran fine. I'll investigate tomorrow.
Assignee | ||
Comment 21•11 years ago
|
||
The issue here is that we are now generating more than 32MiB of code on arm for some jit-tests, so the arm assembler is exploding. Marty thinks there will be a patch fixing this available soon, so I'm going to hold off on landing until then.
Assignee | ||
Comment 22•11 years ago
|
||
These ARM failures are unexpected and Marty wants to debug them locally.
Attachment #8382401 -
Flags: feedback?(mrosenberg)
Assignee | ||
Comment 23•11 years ago
|
||
As far as I can tell locally this is performance neutral on octane on an x86 build running on x64 and a tiny perf win on x64.
Attachment #8377351 -
Attachment is obsolete: true
Attachment #8383299 -
Flags: review?(jdemooij)
Comment 24•11 years ago
|
||
Comment on attachment 8383299 [details] [diff] [review]
7_use_extra_reg_in_init-v0.diff
Review of attachment 8383299 [details] [diff] [review]:
-----------------------------------------------------------------
::: js/src/jit/IonMacroAssembler.cpp
@@ +803,5 @@
> + movl(Imm32(jv.s.payload.i32), temp);
> + for (unsigned i = start; i < nfixed; i++)
> + movl(temp, ToPayload(Address(obj, JSObject::getFixedSlotOffset(i))));
> +#else
> + moveValue(UndefinedValue(), temp);
This won't work on ARM, maybe we could use JS_NUNBOX32 instead of JS_CODEGEN_X86 and change movl to store32?
@@ +861,5 @@
> int elementsOffset = JSObject::offsetOfFixedElements();
>
> addPtr(Imm32(elementsOffset), obj);
> storePtr(obj, Address(obj, -elementsOffset + JSObject::offsetOfElements()));
> addPtr(Imm32(-elementsOffset), obj);
Doesn't have to happen in this patch, but with the temp register we could do something like this instead:
computeEffectiveAddress(Address(obj, elementsOffset), temp);
storePtr(temp, Address(obj, JSObject::offsetOfElements());
We have similar-looking code in newGCThing that could maybe do this as well.
Attachment #8383299 -
Flags: review?(jdemooij)
Assignee | ||
Comment 25•11 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/3784de966811
I also switching out the same add/store/sub pattern in the allocators for lea/store. This seems to be worth ~1000 on EB, at least on ggc x64.
Assignee | ||
Updated•11 years ago
|
Keywords: leave-open
Whiteboard: [leave open]
Assignee | ||
Updated•11 years ago
|
Attachment #8376613 -
Flags: checkin+
Comment 26•11 years ago
|
||
Assignee | ||
Comment 27•11 years ago
|
||
This is a big and complicated patch with lots of moving parts. Sorry for asking for review for a second time; at least the changed bits are all fairly isolated.
Differences in this version:
1) Switch per-op OOL malloc/free code to a single stub on the JitRuntime to work around an ARM codegen bug and to save us lots of code space as we add more of these. Of particular note: I'm curious if there is a normal or expected way to write the code in MacroAssembler::call{Malloc|Free}Stub. I've just used move/xchg to setup the correct register state, but it seems like there is probably a tool I don't know about for this? Exchanging some allocated registers for a hard-coded set of registers seems like it would be a common problem.
2) Merge new/initGCThing to createGCObject where possible. If we allocate external slots, we return that in the temp reg. I wanted to prevent people from accidentally clobbering the temp reg between newGCThing and initGCThing. It seems that all the callers that can't trivially be merged do not allocate external slots yet, so I've added assertions to the other case.
3) Rework fillSlotsWithUndefined to take an address and manually bump the offset. This lets us use the same code for fixed and dynamic slots. Also invert the two loops for the 32bit case so that we actually fill low to high, as intended. Whoops!
This is still missing some important optimizations that will become more important as we start allocating more things with external slots.
1) We currently fill the fixed and dynamic slots separately. This ends up with us loading the same UndefinedValue immediate into the same register a second time. Not a huge deal, but as we have no hyperoptimizer/peepholer, we might want to pass a bool to kill the second load.
2) Make the slots initialization use a loop. We'll need a scratch reg for the counter, so I guess we only want to do this on ARM and x64?
3) When we allocate the external slots in the nursery, they are right after the fixed slots. We should be able to do the initialization of both in a single loop, rather than having two back-to-back loops. This will also keep us from having to push/pop for a spare reg.
4) And of course, actually use this code for more than NewCallObject. Sadly this is not entirely straightforward, so will have to be done as followup.
Also, for the record, the underlying problem this bug is solving is the leak of the MNewSlots malloc pointer when we inline allocation of the gc thing in the nursery for the MNewCallObject that takes the slots. We only end up recording the slots in hugeSlots if we bail; when we stay in jitcode, LNewCallObject just writes the non-nursery slots into the object without telling the nursery to free them later.
Attachment #8375801 -
Attachment is obsolete: true
Attachment #8375805 -
Attachment is obsolete: true
Attachment #8375808 -
Attachment is obsolete: true
Attachment #8375815 -
Attachment is obsolete: true
Attachment #8375820 -
Attachment is obsolete: true
Attachment #8387896 -
Flags: review?(jdemooij)
Assignee | ||
Updated•11 years ago
|
Attachment #8383299 -
Flags: checkin+
Comment 28•11 years ago
|
||
Comment on attachment 8387896 [details] [diff] [review]
inline_allocation_of_external_slots-v1.diff
Review of attachment 8387896 [details] [diff] [review]:
-----------------------------------------------------------------
Looks great. It's indeed pretty complicated to get right, but splitting it up in different MacroAssembler methods helps a lot.
::: js/src/jit/Ion.cpp
@@ +288,5 @@
> return true;
> }
>
> JitCode *
> +JitRuntime::generateMallocStub(JSContext *cx)
Please move these methods to CodeGenerator.cpp (generateStringConcatStub is also there).
@@ +302,5 @@
> + regs.takeUnchecked(regNBytes);
> + masm.PushRegsInMask(regs);
> +
> + masm.setupUnalignedABICall(2, regTemp);
> + masm.movePtr(ImmPtr(GetIonContext()->runtime), regRuntime);
Nit: cx->runtime() (GetIonContext() is relatively slow and although it won't matter here, we try to avoid it as much as possible.)
::: js/src/jit/IonMacroAssembler.cpp
@@ +723,5 @@
> +
> + JS_ASSERT(nbytes > 0);
> +
> + if (result != regNBytes)
> + movePtr(regNBytes, result);
As discussed, it seems simpler to use push(regNBytes); here and movePtr(regNBytes, result); pop(regNBytes); later.
@@ +724,5 @@
> + JS_ASSERT(nbytes > 0);
> +
> + if (result != regNBytes)
> + movePtr(regNBytes, result);
> + mov(ImmWord(nbytes), regNBytes);
Nit: move32(Imm32(nbytes), regNBytes);
@@ +738,5 @@
> + // This register must match the one in JitRuntime::generateFreeStub.
> + const Register regSlots = CallTempReg0;
> +
> + if (slots != regSlots)
> + xchg(slots, regSlots);
I think it'd be simplest to use push/pop here as well.
@@ +918,3 @@
>
> + Address addr = base;
> + mov(ImmWord(jv.s.payload.i32), temp);
move32(Imm32(...), temp);
@@ +925,1 @@
> mov(ImmWord(jv.s.tag), temp);
Same here.
::: js/src/jit/arm/MacroAssembler-arm.h
@@ +601,5 @@
> }
> + void xchg(Register reg1, Register reg2) {
> + ma_eor(reg2, reg1);
> + ma_eor(reg1, reg2);
> + ma_eor(reg2, reg1);
You can remove this method now (FWIW, on ARM we have a scratch register, so we could use 3 moves.)
Attachment #8387896 -
Flags: review?(jdemooij) → review+
Assignee | ||
Comment 29•11 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/907d5bc3bd26
Passes jit-tests on all 3 tier-1 platforms with and without ggc.
Assignee | ||
Comment 30•11 years ago
|
||
And backed out for windows exclusive M-3 and T-d failures. Will investigate before re-landing.
https://hg.mozilla.org/integration/mozilla-inbound/rev/20e8191247fd
Seems to be Linux and Windows exclusive. Only Mac has finished successfully.
Assignee | ||
Comment 32•11 years ago
|
||
I think the problem is that I dropped the same-reg guards before popping, which clobbers the output. Remind me never to program when I'm sick, no matter how simple it looks.
https://tbpl.mozilla.org/?tree=Try&rev=c214de088292
Assignee | ||
Comment 33•11 years ago
|
||
Manually verified that the orange in the push above was from the other patch.
https://hg.mozilla.org/integration/mozilla-inbound/rev/893b864b4b18
Comment 34•11 years ago
|
||
Flags: in-testsuite+
Comment 35•11 years ago
|
||
(In reply to Terrence Cole [:terrence] from comment #33)
> Manually verified that the orange in the push above was from the other patch.
>
> https://hg.mozilla.org/integration/mozilla-inbound/rev/893b864b4b18
This needs to be backed out. This causes an OS restart crash in bug 984653 on Firefox Marketplace in Firefox OS.
Comment 36•11 years ago
|
||
(In reply to Jason Smith [:jsmith] from comment #35)
> This needs to be backed out. This causes an OS restart crash in bug 984653
> on Firefox Marketplace in Firefox OS.
Backed out:
https://hg.mozilla.org/integration/mozilla-inbound/rev/7181bf175776
Comment 37•11 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #36)
> Backed out:
> https://hg.mozilla.org/integration/mozilla-inbound/rev/7181bf175776
And Aurora:
https://hg.mozilla.org/releases/mozilla-aurora/rev/0546e4e8eac0
Comment 38•11 years ago
|
||
Assignee | ||
Updated•11 years ago
|
Updated•11 years ago
|
Assignee: terrence → jcoppeard
Comment 39•11 years ago
|
||
Rebased patch. Passes SM tests.
Attachment #8373511 -
Attachment is obsolete: true
Attachment #8376613 -
Attachment is obsolete: true
Attachment #8382401 -
Attachment is obsolete: true
Attachment #8383299 -
Attachment is obsolete: true
Attachment #8387896 -
Attachment is obsolete: true
Attachment #8382401 -
Flags: feedback?(mrosenberg)
Attachment #8408377 -
Flags: review+
Comment 41•11 years ago
|
||
Rebased patch.
Attachment #8408377 -
Attachment is obsolete: true
Attachment #8418054 -
Flags: review+
Comment 42•11 years ago
|
||
Comment on attachment 8418054 [details] [diff] [review]
bug969012-ion-slots
:jonco requested a fuzz-run over email (preferably on ARM) for this patch, adding :decoder too.
Attachment #8418054 -
Flags: feedback?(gary)
Attachment #8418054 -
Flags: feedback?(choller)
Assignee | ||
Comment 43•11 years ago
|
||
Jon, the reason you weren't seeing any dynamic slots is that there was a check in CodeGenerator::visitCallObject that was totally disabling the optimization if numDynamicSlots(). I assume this is simple merge bustage. With this hunk fixed, I was able to repro and fix the test in bug 985732: we just need to compile a jump to the ool path instead of asserting.
Even with that fixed, there was still an opt crash in b2g with this, so I'd like to get fuzzing on it. Gary, do we have arm fuzzing now? If so, this patch would benefit.
Assignee: jcoppeard → terrence
Attachment #8418054 -
Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #8418054 -
Flags: feedback?(gary)
Attachment #8418054 -
Flags: feedback?(choller)
Attachment #8418329 -
Flags: review+
Attachment #8418329 -
Flags: feedback?(gary)
Updated•11 years ago
|
Attachment #8418329 -
Flags: feedback?(choller)
Comment 44•11 years ago
|
||
(In reply to Terrence Cole [:terrence] from comment #43)
Sorry, I messed up the merge when I rebased this the first time. In hindsight I should have got you to check the results when it unexpectedly worked...
Assignee | ||
Comment 45•11 years ago
|
||
(In reply to Jon Coppeard (:jonco) from comment #44)
> (In reply to Terrence Cole [:terrence] from comment #43)
> Sorry, I messed up the merge when I rebased this the first time. In
> hindsight I should have got you to check the results when it unexpectedly
> worked...
I read the new patch 3 times before debugging, so it wouldn't have helped ;-).
Comment 46•11 years ago
|
||
(In reply to Terrence Cole [:terrence] from comment #43)
> Even with that fixed, there was still an opt crash in b2g with this, so I'd
> like to get fuzzing on it. Gary, do we have arm fuzzing now? If so, this
> patch would benefit.
The first few hours on opt ARM fuzzing aren't showing up anything yet, but I'll leave this on overnight.
Comment 47•11 years ago
|
||
Hmmm, I get crashes like the following with unreproducible testcases so far:
#0 setNext (next_=0x0, this=0xb4427ac8) at /home/odroid/trees/mozilla-inbound/js/src/jit/shared/IonAssemblerBuffer.h:68
#1 perforate (this=0xb1d6e594) at /home/odroid/trees/mozilla-inbound/js/src/jit/shared/IonAssemblerBuffer.h:251
#2 perforate (this=0xb1d6e594) at /home/odroid/trees/mozilla-inbound/js/src/jit/shared/IonAssemblerBufferWithConstantPools.h:620
#3 markGuard (this=0xb1d6e594) at /home/odroid/trees/mozilla-inbound/js/src/jit/shared/IonAssemblerBufferWithConstantPools.h:997
#4 js::jit::Assembler::as_b (this=0xb1d6e380, off=..., c=<optimized out>, isPatchable=<optimized out>) at /home/odroid/trees/mozilla-inbound/js/src/jit/arm/Assembler-arm.cpp:1839
#5 0x0018279c in js::jit::Assembler::as_b (this=0xb1d6e380, l=0xb62efcd0, c=js::jit::Assembler::Always, isPatchable=<optimized out>) at /home/odroid/trees/mozilla-inbound/js/src/jit/arm/Assembler-arm.cpp:1871
#6 0x00182804 in js::jit::Assembler::as_b (this=<optimized out>, l=l@entry=0xb62efcd0, c=c@entry=js::jit::Assembler::Always, isPatchable=isPatchable@entry=false) at /home/odroid/trees/mozilla-inbound/js/src/jit/arm/Assembler-arm.cpp:1876
#7 0x000c3c70 in jump (label=0xb62efcd0, this=<optimized out>) at /home/odroid/trees/mozilla-inbound/js/src/jit/arm/MacroAssembler-arm.h:711
#8 js::jit::CodeGenerator::visitCallKnown (this=0xb1d6e350, call=0xb24761f8) at /home/odroid/trees/mozilla-inbound/js/src/jit/CodeGenerator.cpp:2398
#9 0x000c54a0 in js::jit::CodeGenerator::generateBody (this=this@entry=0xb1d6e350) at /home/odroid/trees/mozilla-inbound/js/src/jit/CodeGenerator.cpp:3417
#10 0x000c79e4 in js::jit::CodeGenerator::generate (this=this@entry=0xb1d6e350) at /home/odroid/trees/mozilla-inbound/js/src/jit/CodeGenerator.cpp:6419
#11 0x00120528 in GenerateCode (lir=0xb2eaf5f0, mir=0x17297980) at /home/odroid/trees/mozilla-inbound/js/src/jit/Ion.cpp:1662
#12 js::jit::CompileBackEnd (mir=mir@entry=0x17297980) at /home/odroid/trees/mozilla-inbound/js/src/jit/Ion.cpp:1680
#13 0x001fbacc in js::WorkerThread::handleIonWorkload (this=this@entry=0xed5838) at /home/odroid/trees/mozilla-inbound/js/src/jsworkers.cpp:806
#14 0x001fbd74 in js::WorkerThread::threadLoop (this=0xed5838) at /home/odroid/trees/mozilla-inbound/js/src/jsworkers.cpp:1043
#15 0xb6f5e11e in _pt_root () from /home/odroid/Desktop/shell-cache/js-opt-32-dm-ts-hfp-linux-8a5a9a06f59a-969012-c43-3368daa84dec/libnspr4.so
#16 0xb6f76fbc in start_thread (arg=0xb62f0450) at pthread_create.c:314
#17 0xb6d92b3c in ?? () at ../ports/sysdeps/unix/sysv/linux/arm/nptl/../clone.S:92 from /lib/arm-linux-gnueabihf/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Are these somewhat related?
Assignee | ||
Comment 48•11 years ago
|
||
I have no idea. Maybe Marty will know what's going on here?
Flags: needinfo?(mrosenberg)
Comment 49•11 years ago
|
||
Man, I wish all the assembler buffer bugs were this easy.
Attachment #8419685 -
Flags: review?(terrence)
Flags: needinfo?(mrosenberg)
Assignee | ||
Comment 50•11 years ago
|
||
Comment on attachment 8419685 [details] [diff] [review]
derp_oom-r0.patch
Review of attachment 8419685 [details] [diff] [review]:
-----------------------------------------------------------------
r=me
Great find, Gary!
Attachment #8419685 -
Flags: review?(terrence) → review+
Assignee | ||
Comment 51•11 years ago
|
||
Relanding with the test from bug 985732:
https://hg.mozilla.org/integration/mozilla-inbound/rev/f844291b895b
No try run this time, but we have a day and a half of fuzzing and it didn't destroy the tree last time it landed.
Assignee | ||
Comment 53•11 years ago
|
||
Comment on attachment 8418329 [details] [diff] [review]
remove_mnewslots_and_use_malloc_directly-rebased-20140506.diff
Thanks for the fuzzing!
Attachment #8418329 -
Flags: feedback?(gary)
Attachment #8418329 -
Flags: feedback?(choller)
Attachment #8418329 -
Flags: checkin+
Assignee | ||
Comment 54•11 years ago
|
||
Comment on attachment 8419685 [details] [diff] [review]
derp_oom-r0.patch
And Marty's oom fix, since I'm pushing things now.
https://hg.mozilla.org/integration/mozilla-inbound/rev/01f27ad85b1b
Attachment #8419685 -
Flags: checkin+
Assignee | ||
Updated•11 years ago
|
Keywords: leave-open
(In reply to Terrence Cole [:terrence] from comment #51)
> https://hg.mozilla.org/integration/mozilla-inbound/rev/f844291b895b
(In reply to Terrence Cole [:terrence] from comment #54)
> https://hg.mozilla.org/integration/mozilla-inbound/rev/01f27ad85b1b
Both backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/4e33e9729ab6 for failures:
https://tbpl.mozilla.org/php/getParsedLog.php?id=39312946&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=39312525&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=39313640&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=39313752&tree=Mozilla-Inbound
Flags: needinfo?(terrence)
Assignee | ||
Comment 56•11 years ago
|
||
Yeah, I forgot to put a header on the test telling it to expect a js error.
remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/1a7b71c4b284
remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/accdf191ac4e
Flags: needinfo?(terrence)
Comment 57•11 years ago
|
||
this bug appears to have regressed Octane-Raytrace on machine 11 by 6.3% on AWFY.
From AWFY : http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=11291a7123f4&tochange=accdf191ac4e
Comment 58•11 years ago
|
||
(In reply to mayankleoboy1 from comment #57)
> this bug appears to have regressed Octane-Raytrace on machine 11 by 6.3% on
> AWFY.
>
> From AWFY :
> http://hg.mozilla.org/integration/mozilla-inbound/
> pushloghtml?fromchange=11291a7123f4&tochange=accdf191ac4e
Needinfo? from Terrence about comment 57, perhaps it needs a new bug?
Flags: needinfo?(terrence)
Comment 59•11 years ago
|
||
Confirmed it is indeed this patch. I can reproduce locally and bisecting gives http://hg.mozilla.org/integration/mozilla-inbound/rev/accdf191ac4e .
Comment 60•11 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/1a7b71c4b284
https://hg.mozilla.org/mozilla-central/rev/accdf191ac4e
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla32
Assignee | ||
Comment 61•11 years ago
|
||
Found it. Here is the code we are now generating for Allocation and init:
Check if the object metadata hook is set:
0x7fffeec2c4dc: cmpl $0x0,0xf2ade8
0x7fffeec2c4e4: jne 0x7fffeec7d684
Allocate object4 from the nursery:
0x7fffeec2c4ea: mov 0xf2acc8,%rdi
0x7fffeec2c4f2: lea 0x40(%rdi),%r8
0x7fffeec2c4f6: cmp %r8,0xf2acd8
0x7fffeec2c4fe: jbe 0x7fffeec7d684
0x7fffeec2c504: mov %r8,0xf2acc8
Assign shape and type from template:
0x7fffeec2c50c: movabs $0x7fffee967740,%r11
0x7fffeec2c516: mov %r11,(%rdi)
0x7fffeec2c519: movabs $0x7fffee9682e0,%r11
0x7fffeec2c523: mov %r11,0x8(%rdi)
Assign null slots and empty elements:
0x7fffeec2c527: movq $0x0,0x10(%rdi)
0x7fffeec2c52f: movq $0xc84610,0x18(%rdi)
Initialize all slots with undefined:
0x7fffeec2c538: movabs $0xfff9000000000000,%r8
0x7fffeec2c542: mov %r8,0x20(%rdi)
0x7fffeec2c546: mov %r8,0x28(%rdi)
0x7fffeec2c54a: mov %r8,0x30(%rdi)
=> 0x7fffeec2c54e: mov %r8,0x38(%rdi)
This is completely identical before and after with the exception of the last instruction. The slot-span for the object is less than the fixed slot count. The fix is a one-liner, naturally.
Of course immediately after this we go off to "initialize" the slots with zero. This looks like:
=> 0x7ffff7e36a1c: movabs $0xfff8800000000000,%r11
0x7ffff7e36a26: mov %r11,0x20(%rdx)
0x7ffff7e36a2a: movabs $0xfff8800000000000,%r11
0x7ffff7e36a34: mov %r11,0x28(%rdx)
0x7ffff7e36a38: movabs $0xfff8800000000000,%r11
0x7ffff7e36a42: mov %r11,0x30(%rdx)
We should be able to kill some of these loads and some of these writes, somehow.
Attachment #8421359 -
Flags: review?(jdemooij)
Flags: needinfo?(terrence)
Comment 62•11 years ago
|
||
Comment on attachment 8421359 [details] [diff] [review]
raytrace_regression-v0.diff
Review of attachment 8421359 [details] [diff] [review]:
-----------------------------------------------------------------
Subtle, good find.
Attachment #8421359 -
Flags: review?(jdemooij) → review+
Assignee | ||
Comment 63•11 years ago
|
||
Comment 64•11 years ago
|
||
Awfy seems to agree this is fixed. Thanks!
You need to log in
before you can comment on or make changes to this bug.
Description
•