Closed Bug 884079 Opened 11 years ago Closed 11 years ago

APCS frame pointer compatibility option for Ion/BC(/Odin?) stack frames

Categories

(Core :: JavaScript Engine, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: jld, Assigned: jld)

References

Details

(Keywords: perf, Whiteboard: [c=profiling p=5 s=2013.08.09])

Attachments

(2 files, 1 obsolete file)

To review:

For B2G profiling, we're probably not going to be able to get away from some kind of frame-pointer-based unwinding, as being able to use the kernel profiler to get a view of the entire system (and at relatively high time resolution, if need be) can be very useful.  (See bug 856899 comment 2 for a lament on why these are connected, among other related topics.)

So far we've been using the original ARM Procedure Call Standard's frame pointer variant, in order to not have to modify the kernel.  It's β€œobsolete” according to ARM and not supported at all for the Thumb instruction set, which we're using, but I've modified GCC to support enough of it for our purposes for Thumb2 devices (which all the Tier-1 B2G targets are).

In the time of JaegerMonkey this was fine, as it ignored r11 (the frame pointer).  IonMonkey and BC do not so cooperate, and presumably neither does Odin but I haven't tested it.  So, it would be nice if we had the option of compiling our new monkeys to interoperate somehow.

A small problem is that Ion will allocate r11 to hold program values, but that's a simple change.

The bigger problem is that BC uses r11 as a "frame pointer" in an x86-like way: the old frame pointer is at [fp, #0], and a return address might be at [fp, #4].  In contrast, for the APCS frame these are at [fp, #-12] and [fp, #-4] respectively.  ([fp, #-8] and [fp, 0] should be the saved SP and PC, but we're not using them.  Which is good, because Thumb2 makes it hard to do that.  Yes, it's an odd layout; it's from the 1980s.)

I was working on a patch to augment BC's frames to have the info in both places, and adjust various kinds of trampolines and stubs that create or alter them to respect this; it mostly works but seems to have subtle bugs, and it introduces ifdef'ed machine-dependent code into place that seems as if they really shouldn't be doing that.

It might also be possible to change BC's frame pointer to use a different register, but there'd need to be one available for that.  (Also, this wouldn't help get native frames for JS frames, which would be nice, but it's a side goal and there are other approaches to it.)
As previously mentioned, I don't really like how this patch does what it does, and it seems to be not entirely right β€” it broke the B2G keyboard app the last time I tried it.  But if I'm doing it wrong, I may as well be doing it wrong in public.
Attachment #763844 - Flags: feedback?
Keywords: perf
Whiteboard: c= p=2 → c= p=2 ,
Comment on attachment 763844 [details] [diff] [review]
Attempt to make IonMonkey and BC cooperate with APCS frame pointer ABI

Review of attachment 763844 [details] [diff] [review]:
-----------------------------------------------------------------

I don't know much of the baseline compiler yet, I think Kannan can help there.

Odin is supposed to follow the architecture call convention, so I do not see any issue to make it follow APCS [1].
Ion and Baseline are using a different frame format, as the baseline compiler is adding a frame pointer again.  I will see if I can revise that later.  The problem being that we need to re-organize the layout of the stack and the way we mark it.

[1] http://www.cl.cam.ac.uk/~fms27/teaching/2001-02/arm-project/02-sort/apcs.txt

::: js/src/ion/arm/Trampoline-arm.cpp
@@ +202,5 @@
>          regs.take(r11);
>          regs.take(ReturnReg);
>  
> +#ifdef HAVE_APCS_FRAME
> +        const Address slot_numStackValues(r11, offsetof(EnterJITStack, numStackValues) - 40);

nit: You should avoid constant such as 40 and prefer sizeof(EnterJITStack) instead, if you cannot please add a comment next to each use of the constant.  In addition to be sane, this help keep track what needs to be updated.

@@ +239,5 @@
>          Register framePtr = r11;
>          masm.subPtr(Imm32(BaselineFrame::Size()), sp);
> +#ifdef HAVE_APCS_FRAME
> +        masm.storePtr(framePtr, Address(sp, BaselineFrame::Size() - 12));
> +        masm.storePtr(lr, Address(sp, BaselineFrame::Size() - 4));

Is the following relation true:

-12 == offsetof(EnterJITStack, r11) - sizeof(EnterJITStack)
-4 == offsetof(EnterJITStack, lr) - sizeof(EnterJITStack)

?

@@ -229,1 @@
>          masm.push(r0); // jitcode

I guess you want to keep this line when not compiled with APCS.

@@ +261,5 @@
>          masm.push(r0); // jitcode
>  
>          masm.setupUnalignedABICall(3, scratch);
> +#ifdef HAVE_APCS_FRAME
> +        aasm->as_sub(scratch, framePtr, Imm8(BaselineFrame::Size()));

nit: You should prefer the masm variant of it.
Attachment #763844 - Flags: feedback? → feedback?(jdemooij)
This is in preparation for conditionally changing BaselineFrameReg to r7 if the ABI would use r11 in a way that conflicts with the monkeys, as it does for the APCS frame pointer option.

No functional change intended.
Attachment #764527 - Flags: review?
Specifically: don't allocate r11 in IonMonkey, and use r7 instead for the BC frame pointer (incompatible with the APCS frame format).  The Baseline trampoline is adjusted to deal with a register conflict that arises.
Attachment #763844 - Attachment is obsolete: true
Attachment #763844 - Flags: feedback?(jdemooij)
Attachment #764539 - Flags: review?
Attachment #764527 - Attachment description: Ion ARM cleanups: use BaselineFrameReg, not r11; remove dead write to r7. → #1: Ion ARM cleanups: use BaselineFrameReg, not r11; remove dead write to r7.
Attachment #764539 - Attachment description: Make JIT code avoid r11 on ARM if APCS frame pointers are in use. → #2: Make JIT code avoid r11 on ARM if APCS frame pointers are in use.
(In reply to Nicolas B. Pierron [:nbp] from comment #2)
> Review of attachment 763844 [details] [diff] [review]:

I decided not to attempt this (yet), but it seems wrong not to answer the feedback:

> ::: js/src/ion/arm/Trampoline-arm.cpp
> > +#ifdef HAVE_APCS_FRAME
> > +        const Address slot_numStackValues(r11, offsetof(EnterJITStack, numStackValues) - 40);
> 
> nit: You should avoid constant such as 40 and prefer sizeof(EnterJITStack)
> instead, if you cannot please add a comment next to each use of the
> constant.  In addition to be sane, this help keep track what needs to be
> updated.

Yeah, I'd have cleaned up the magic numbers when I'd gotten the change working, if I had.  The problem with sizeof(EnterJITStack) is that there are more fields in it than just the saved registers, so I'd have to factor out that part of the struct, or do things with offsetof instead.

> @@ +239,5 @@
> >          Register framePtr = r11;
> >          masm.subPtr(Imm32(BaselineFrame::Size()), sp);
> > +#ifdef HAVE_APCS_FRAME
> > +        masm.storePtr(framePtr, Address(sp, BaselineFrame::Size() - 12));
> > +        masm.storePtr(lr, Address(sp, BaselineFrame::Size() - 4));
> 
> Is the following relation true:
> 
> -12 == offsetof(EnterJITStack, r11) - sizeof(EnterJITStack)
> -4 == offsetof(EnterJITStack, lr) - sizeof(EnterJITStack)

That's true for the part of the EnterJITStack with the saved registers, but not the whole EnterJITStack.  Adding static assertions for these offset dependencies would have made all of this clearer, I think.

> @@ -229,1 @@
> >          masm.push(r0); // jitcode
> 
> I guess you want to keep this line when not compiled with APCS.

I'm not sure if you're asking about this line specifically or the change immediately above it, so I'll answer both, because in hindsight that change wasn't as obvious as it could be.

What's going on in that part of the patch is that r11 never needs to be saved across the call to C++, because it's callee-saved in both ABI variants (and the stack alignment change doesn't matter, because of setupUnalignedABICall).  I could also have left it in for both cases Β­β€” the baseline trampoline isn't a fast path, I don't think, so it's not an important optimization β€” but I was trying to make the frame-pointer-related data flow simpler to convince myself that I hadn't missed anything.

As for r0, it's an argument register, so both cases need to save it.

> @@ +261,5 @@
> >          masm.push(r0); // jitcode
> >  
> >          masm.setupUnalignedABICall(3, scratch);
> > +#ifdef HAVE_APCS_FRAME
> > +        aasm->as_sub(scratch, framePtr, Imm8(BaselineFrame::Size()));
> 
> nit: You should prefer the masm variant of it.

I don't think there is a masm variant of it?  lea(Address, RegisterID) seems like the closest, but it will generate an add instruction, and the negative "offset" can't be encoded as an immediate so it will have to use an extra instruction and a temporary register.
Attachment #764527 - Flags: review? → review?(nicolas.b.pierron)
Attachment #764539 - Flags: review? → review?(nicolas.b.pierron)
Comment on attachment 764527 [details] [diff] [review]
#1: Ion ARM cleanups: use BaselineFrameReg, not r11; remove dead write to r7.

Review of attachment 764527 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/ion/arm/Trampoline-arm.cpp
@@ -373,5 @@
>  
>      masm.moveValue(UndefinedValue(), r5, r4);
>  
>      masm.ma_mov(sp, r3); // Save %sp.
> -    masm.ma_mov(sp, r7); // Save %sp again.

Nice catch.
Attachment #764527 - Flags: review?(nicolas.b.pierron) → review+
Is it possible to always use r7? Using r11 or r7 based on some configure flag is going to break or introduce bugs that are hard to track down.
Comment on attachment 764539 [details] [diff] [review]
#2: Make JIT code avoid r11 on ARM if APCS frame pointers are in use.

Review of attachment 764539 [details] [diff] [review]:
-----------------------------------------------------------------

This change is fine for me, I will let jandem check on his own as this might change the way gdb recognize baseline frame and if he has any objections against this patch.

::: js/src/ion/arm/BaselineRegisters-arm.h
@@ +20,5 @@
>  // r13 = stack-pointer
> +#ifdef HAVE_APCS_FRAME
> +// r11 is a frame pointer for C/C++, but it's a different frame layout from BC.
> +// Thus, "r7 = frame-pointer" for our purposes.
> +static const Register BaselineFrameReg = r7;

Note: r7 is in conflict with the argument rectifier, and the temporary registers used by Ion calls.  So this baseline frame register cannot be used as-is in Ion.
Attachment #764539 - Flags: review?(nicolas.b.pierron)
Attachment #764539 - Flags: review?(jdemooij)
Attachment #764539 - Flags: feedback+
I'd be interested in doing the Odin patch for this.  Work in bug 860736 on the Odin ARM stack should help make adding APCS support easier.  R11 is currently used for the heap base but this can probably be easily moved to another register.
(In reply to Jan de Mooij [:jandem] from comment #7)
> Is it possible to always use r7? Using r11 or r7 based on some configure
> flag is going to break or introduce bugs that are hard to track down.

Good point.  I've been trying to minimize the impact of my changes on builds that aren't using this kind of stack walking, but breakage dependent on non-default configuration isn't a good idea either (and in particular has a history of making my life more painful).  Assuming we can deal with the issues raised in comment #8, I'd be fine with unifdef'ing that part of the changes.
(In reply to Nicolas B. Pierron [:nbp] from comment #8)
> > +static const Register BaselineFrameReg = r7;
> 
> Note: r7 is in conflict with the argument rectifier, and the temporary
> registers used by Ion calls.  So this baseline frame register cannot be used
> as-is in Ion.

Oh, I see: CallTempReg2?  But on x86, BaselineFrameReg is ebp, and so is CallTempReg6.  Which of these register assignments can safely alias on arm?
(In reply to Jed Davis [:jld] from comment #11)
> (In reply to Nicolas B. Pierron [:nbp] from comment #8)
> > > +static const Register BaselineFrameReg = r7;
> > 
> > Note: r7 is in conflict with the argument rectifier, and the temporary
> > registers used by Ion calls.  So this baseline frame register cannot be used
> > as-is in Ion.
> 
> Oh, I see: CallTempReg2?  But on x86, BaselineFrameReg is ebp, and so is
> CallTempReg6.  Which of these register assignments can safely alias on arm?

As far as I know, CallTempRegs are only used in the Lowering phase which is used for Odin/Ion code compilation.  So, this should not matter that much for instrumenting the baseline as you do in your first patch.

As for Ion/Odin, CallTempReg are registers which used to follow the order for the call convention.  In practice, we should remove these CallTempRegs and use GetTempRegForIntArg & CallTempNonArg functions.  These functions are more verbose but we can tie a clear meaning to them.
jld, do you have patches that pass jit-tests etc with APCS frames? I'm a bit afraid this will also require changes to IonFrames-arm.h etc.
Now that I've finally gotten jit tests to work on b2g, this is what's broken after the patch I posted: http://pastebin.mozilla.org/2566442 and here's what's broken without my change: http://pastebin.mozilla.org/2566439

89 failures with and 33 failures without, so it clearly doesn't work yet.  And I'm not at all clear on what else is referencing r7 in order to cause this to be a problem.
Jed, could you run one of the failing parallelarray tests in a debug shell with PAFLAGS=full and paste the output?
(In reply to Shu-yu Guo [:shu] from comment #15)
> Jed, could you run one of the failing parallelarray tests in a debug shell
> with PAFLAGS=full and paste the output?

http://pastebin.mozilla.org/2569540

If this is an actual bug, we should get a separate bug for it, since it's also broken without my IonMonkey address.
Status: NEW → ASSIGNED
Whiteboard: c= p=2 , → [c= p=2]
Attachment #764539 - Flags: review?(jdemooij)
It's not clear at this point that we're going to try to wrestle the monkeys into giving up a register; alternatives based on the Exception Handling ABI (and some way of dealing with compiled script) seem more promising, but I don't know that we're sure enough of that to close this bug yet.
No longer blocks: 856899
Depends on: 856899
Whiteboard: [c= p=2] → [c= p=5]
Whiteboard: [c= p=5] → [c=profiling p=5]
The monkeys win.  (We'll need some other way to walk jitcode frames, but that's a different bug.)
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → WONTFIX
Whiteboard: [c=profiling p=5] → [c=profiling p=5 s=2013.08.09]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: