Closed Bug 810526 Opened 12 years ago Closed 11 years ago

Initial support for native stacks in SPS on B2G

Categories

(Core :: Gecko Profiler, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla26
Tracking Status
b2g18 ? ---

People

(Reporter: fabrice, Assigned: jld)

References

Details

(Keywords: perf, Whiteboard: [c=profiling p=4][d-watch])

Attachments

(7 files, 10 obsolete files)

(deleted), text/html
dhylands
: review+
Details
(deleted), text/plain
Details
(deleted), patch
BenWa
: review+
Details | Diff | Splinter Review
(deleted), patch
BenWa
: review+
Details | Diff | Splinter Review
(deleted), patch
BenWa
: review+
Details | Diff | Splinter Review
(deleted), patch
BenWa
: review+
Details | Diff | Splinter Review
(deleted), patch
jld
: review+
Details | Diff | Splinter Review
/home/fabrice/dev/inbound/tools/profiler/TableTicker.cpp: In member function 'void TableTicker::doBacktrace(ThreadProfile&, TickSample*)':
/home/fabrice/dev/inbound/tools/profiler/TableTicker.cpp:829: warning: ISO C++ forbids braced-groups within expressions
/home/fabrice/dev/inbound/tools/profiler/TableTicker.cpp:838: error: 'struct sigcontext' has no member named 'gregs'
/home/fabrice/dev/inbound/tools/profiler/TableTicker.cpp:838: error: 'R0' was not declared in this scope
/home/fabrice/dev/inbound/tools/profiler/TableTicker.cpp:839: error: 'struct sigcontext' has no member named 'gregs'
/home/fabrice/dev/inbound/tools/profiler/TableTicker.cpp:839: error: 'R1' was not declared in this scope
/home/fabrice/dev/inbound/tools/profiler/TableTicker.cpp:840: error: 'struct sigcontext' has no member named 'gregs'
/home/fabrice/dev/inbound/tools/profiler/TableTicker.cpp:840: error: 'R2' was not declared in this scope
/home/fabrice/dev/inbound/tools/profiler/TableTicker.cpp:841: error: 'struct sigcontext' has no member named 'gregs'
/home/fabrice/dev/inbound/tools/profiler/TableTicker.cpp:841: error: 'R3' was not declared in this scope
/home/fabrice/dev/inbound/tools/profiler/TableTicker.cpp:842: error: 'struct sigcontext' has no member named 'gregs'
/home/fabrice/dev/inbound/tools/profiler/TableTicker.cpp:842: error: 'R4' was not declared in this scope
/home/fabrice/dev/inbound/tools/profiler/TableTicker.cpp:843: error: 'struct sigcontext' has no member named 'gregs'
/home/fabrice/dev/inbound/tools/profiler/TableTicker.cpp:843: error: 'R5' was not declared in this scope
/home/fabrice/dev/inbound/tools/profiler/TableTicker.cpp:844: error: 'struct sigcontext' has no member named 'gregs'
/home/fabrice/dev/inbound/tools/profiler/TableTicker.cpp:844: error: 'R6' was not declared in this scope
/home/fabrice/dev/inbound/tools/profiler/TableTicker.cpp:845: error: 'struct sigcontext' has no member named 'gregs'
/home/fabrice/dev/inbound/tools/profiler/TableTicker.cpp:845: error: 'R7' was not declared in this scope
/home/fabrice/dev/inbound/tools/profiler/TableTicker.cpp:846: error: 'struct sigcontext' has no member named 'gregs'
/home/fabrice/dev/inbound/tools/profiler/TableTicker.cpp:846: error: 'R8' was not declared in this scope
/home/fabrice/dev/inbound/tools/profiler/TableTicker.cpp:847: error: 'struct sigcontext' has no member named 'gregs'
/home/fabrice/dev/inbound/tools/profiler/TableTicker.cpp:847: error: 'R9' was not declared in this scope
/home/fabrice/dev/inbound/tools/profiler/TableTicker.cpp:848: error: 'struct sigcontext' has no member named 'gregs'
/home/fabrice/dev/inbound/tools/profiler/TableTicker.cpp:848: error: 'R10' was not declared in this scope
/home/fabrice/dev/inbound/tools/profiler/TableTicker.cpp:849: error: 'struct sigcontext' has no member named 'gregs'
/home/fabrice/dev/inbound/tools/profiler/TableTicker.cpp:849: error: 'R11' was not declared in this scope
/home/fabrice/dev/inbound/tools/profiler/TableTicker.cpp:850: error: 'struct sigcontext' has no member named 'gregs'
/home/fabrice/dev/inbound/tools/profiler/TableTicker.cpp:850: error: 'R12' was not declared in this scope
/home/fabrice/dev/inbound/tools/profiler/TableTicker.cpp:851: error: 'struct sigcontext' has no member named 'gregs'
/home/fabrice/dev/inbound/tools/profiler/TableTicker.cpp:851: error: 'R13' was not declared in this scope
/home/fabrice/dev/inbound/tools/profiler/TableTicker.cpp:852: error: 'struct sigcontext' has no member named 'gregs'
/home/fabrice/dev/inbound/tools/profiler/TableTicker.cpp:852: error: 'R14' was not declared in this scope
/home/fabrice/dev/inbound/tools/profiler/TableTicker.cpp:853: error: 'struct sigcontext' has no member named 'gregs'
/home/fabrice/dev/inbound/tools/profiler/TableTicker.cpp:853: error: 'R15' was not declared in this scope
BenWa, I want to build b2g for the phone with frame pointers, in the hopes that will give us real stacks (c.f. the profile in bug 809668 comment 1, which we think is missing C++ backtraces).

We had been using --enable-profiling for "enable frame pointers", but we hit this compile error.  Can we make --enable-profiling on B2G not build this stuff, assuming it's not necessary?  Otherwise, can you help us fix this problem?
Blocks: slim-fast
Summary: Compilation of TableTicker.cpp fails when building with --enable-profiling → Can't build B2G with --enable-profiling -- compilation of TableTicker.cpp fails
What you need to do here is pretty simple:
1) Change the ifdef's in TableTicker.cpp from using glibc backtrace 'USE_BACKTRACE' ifdef which isn't safe to use anyways to use 'USE_NS_STACKWALK' ifdef instead. Then you want to make sure that the NS_Stackwalk implementation calls b2g for framepointers.
2) Read in the important registers from sigcontext for NS_Stackwalk which is the pc, frame pointers and maybe stack pointers.
3) Ideally you also want to pass in the right stackEnd to prevent crashing if the frame pointers fall off the stacks.

Note that if you unwind while executing binary user space drivers they wont have frame pointers so doing (3) is important.
Thanks, Benoit.

Gabriele, this might be up your alley.  Are you interested in taking this bug, so we can get better backtraces with the profiler on target?
(In reply to Justin Lebar [:jlebar] from comment #3)
> Gabriele, this might be up your alley.  Are you interested in taking this
> bug, so we can get better backtraces with the profiler on target?

Yup, I'll take this bug. Having more detailed traces will be quite useful, if you look at cold application startup for example there's a whole chunk of libc calls in which we don't have any visibility at the moment.
Assignee: nobody → gsvelto
Status: NEW → ASSIGNED
Benoit, fixing this should give us proper stack traces for native code, right?  This isn't just wishful thinking on our part?
The NS_Stackwalk code may need to be adapted. I really don't know what ARM + framepointers looks like. Maybe gcc can tell you more about it. Ideally it's identical. You may want to watch out for this:
http://stackoverflow.com/questions/10928646/whats-the-equivalent-of-bp-register-frame-pointer-on-arm-processors

This response seems to indicate the you may want a different resister for thumb vs. arm. In any case I think gcc has the final word so check the documentation for the flags you use.

I don't know anything for certain because I'm not familiar with low level ARM. But on the SPS side of things it the rest should just work.
Summary: Can't build B2G with --enable-profiling -- compilation of TableTicker.cpp fails → SPS on B2G doesn't get useful native stacks, because we can't build with --enable-profiling; compilation of TableTicker.cpp fails
This patch makes default-gecko-config in gonk recognize the B2G_PROFILING environment variable and turn on profiling at configure time when it is set
After working a bit with the source I was able to establish that we're currently using _Unwind_Backtrace to navigate the stack trace. I've changed the defines so that the NS_StackWalk code for x86/PowerPC is used instead and modified that code to accommodate for ARM; if I understand it correctly we're using only Thumb2 code so I'm assuming the frame-pointer is always in R7.

With this code I'm getting the same traces as before so I'm probably missing something. I'll post a WIP patch soon and dig further into the traces tomorrow.
We wont call NS_StackWalk by default, you want to make sure 'mUseStackWalk' is true. Is NS_StackWalk getting called?
(In reply to Benoit Girard (:BenWa) from comment #9)
> We wont call NS_StackWalk by default, you want to make sure 'mUseStackWalk'
> is true. Is NS_StackWalk getting called?

I stepped in with the profiler and NS_StackWalk wasn't being called; while at it I've discovered that this bug seems related to bug 761287, I'm syncing the tree now that those patches are in to see how they affect us. If we can use the stock Android code for walking stacks in B2G then there is no need for modifying NS_StackWalk.
In the ideal case that code should go into NS_StackWalk as a B2G implementation which aims to provide the best stackwalk possible for the platform and can be used by other tools.
(In reply to Benoit Girard (:BenWa) from comment #11)
> In the ideal case that code should go into NS_StackWalk as a B2G
> implementation which aims to provide the best stackwalk possible for the
> platform and can be used by other tools.

I see, well I will probably have to fix NS_StackWalk anyway because I've finally got the Android boilerplate code (using libunwind) to compile cleanly and it crashes the application when trying to pull a profile. I hit a build issue today that made me waste tons of time so I didn't have a chance to figure out what is causing the crash but if it turns out that libunwind doesn't work I'll try to make NS_StackWalk code work on ARM/Gonk.
Modified the patch to disable the crash reporter when profiling is required, otherwise the libunwind-based profiler would not start at all (see here http://mxr.mozilla.org/mozilla-central/source/tools/profiler/TableTicker.cpp#1013).

Unfortunately running ./profile.sh symbolicate doesn't yield a better trace nevertheless. However while inspecting the output I realized that the issue might actually lie in the profile-symbolicate.py script or in how the libc library is compiled. Inspecting it with objdump shows no regular symbols, only dynamic ones. This is giving me the feeling that libc is getting stripped completely during the build or profile-symbolicate.py cannot cope with dynamic symbols. Either way I'll try to fix the issue.

Anyway to test the (very simple) patch just apply it to your gonk-misc checkout, then add this line:

export B2G_PROFILING=1

To your .userconfig file in the B2G root directory.
Attachment #681528 - Attachment is obsolete: true
After some investigation I discovered that profile-symbolicate.py is looking up libraries in directories where they're sometimes already stripped, a small change to make it prefer the ones in out/target/product/<DEVICE>/symbols seems to fix the problem.

Here is a sample trace with the libc calls properly labeled:

http://people.mozilla.com/~bgirard/cleopatra/#report=54fca58923abfaf9b36086535e50b5ef0977ed33

I'll post a patch shortly.
In that profile there's only the pseudostack and the leaf. Getting these profiles doesn't require profiling builds so you're not using your frame pointer. Note that we've tried using libunwind on Android with zero success.

There's always breakpad you can used being worked in bug 779291 if you're willing to wait.
(In reply to Benoit Girard (:BenWa) from comment #15)
> In that profile there's only the pseudostack and the leaf. Getting these
> profiles doesn't require profiling builds so you're not using your frame
> pointer. Note that we've tried using libunwind on Android with zero success.

I see, I've ensured I was also using the Android libunwind code but as you suggest it doesn't work then.

> There's always breakpad you can used being worked in bug 779291 if you're
> willing to wait.

I'm afraid we might need something sooner; I'm all for getting proper traces and I would be very willing to do the work but we might have more pressing matters regarding B2G at this stage. I'll try to land these small fixes quickly for what they're worth and then I'll bring up this bug during the next B2G triage so as to know how much it should be prioritized.

If there aren't other more important things I'll work on it next week, or maybe we could open another bug specifically for getting proper traces in B2G and close this one once the build is fully fixed.
> I'll try to land these small fixes quickly for what they're worth

I'm confused as to what these patches are "worth", so to speak.  What do they buy us?

I'm all for a hack which gets us a not-perfect stack, but if we only get the leaf of the stacktrace, that's probably not helpful.
(In reply to Justin Lebar [:jlebar] from comment #17)
> I'm confused as to what these patches are "worth", so to speak.  What do
> they buy us?

They fix building with --enable-profiling and make sure libc calls are properly labeled when adding symbols to a trace; in a sense they pave the way for doing proper stack traces.

> I'm all for a hack which gets us a not-perfect stack, but if we only get the
> leaf of the stacktrace, that's probably not helpful.

I will open another ticket for the task of making sure the stack-walking code works. I'll first check the Android bugs because the code will be shared because as far as I can see the code based on libunwind doesn't work there anyway (see comment 15).
FYI we plan on removing libunwind from the tree unless someone can get it working before that. bug 809571
Attachment #683036 - Flags: review?(dhylands)
Pointer to Github pull-request
Attachment #683186 - Flags: review?(mwu)
Attachment #682517 - Attachment is obsolete: true
I've spent most of the last two days trying to coerce libunwind into working but with very little success. With the crash reporter disabled I can get the profiler to start but the main b2g process seems to crash somewhere during startup when profiling is enabled and I wasn't able to debug it yet. I also tried getting partial traces with _Unwind_Backtrace() to get at least a few more calls above the leaf but it tends to segfault (I think the C++ frames are confusing it).

So I would say that either we make this bug block on bug 779291 (and on bug 650239 too?) and then I can start helping out getting breakpad to work or I can keep trying to get something out of libunwind but I'm not sure if it's worth the effort (especially if it's already been decided that it should go).
FWIW I have managed to get _Unwind_Backtrace to work (with -funwind-tables).  I haven't seen any segfaults.

Perhaps you're not stopping the backtrace when you see a PC of 0?  Have a look at bionic's malloc_debug_leak.c (in the bionic directory in B2G).
(In reply to Justin Lebar [:jlebar] from comment #23)
> FWIW I have managed to get _Unwind_Backtrace to work (with -funwind-tables).
> I haven't seen any segfaults.

That's excellent news! If I could also get it to work we would at least have a stop-gap solution for profiling.

> Perhaps you're not stopping the backtrace when you see a PC of 0?  Have a
> look at bionic's malloc_debug_leak.c (in the bionic directory in B2G).

That might be the cause, the code in nsStackWalk.c doesn't seem to check for that condition though it's not easy to parse it because of all the #ifdef's. I'll see if I can get it to work, thanks for the tip!
Comment on attachment 683036 [details]
Pointer to Github pull request: https://github.com/mozilla-b2g/B2G/pull/175

Duplicating the comment on the pull request.

Appending /symbols to the path will only work if it takes the PRODUCT_OUT environment side of the if.

Since the code in main requires PRODUCT_OUT in the environment, we should remove the else and append /symbols to the product_out and then check for existence of the directory.

r=me with that change.
Attachment #683036 - Flags: review?(dhylands) → review+
A further update on this one after much stepping into code and looking at raw stack dumps. One thing I hadn't taken into account is that NS_StackWalk is being called from inside a signal handler, turns out _Unwind_Backtrace cannot unwind from it, interestingly neither gdb can. It might be possible that the issues I was hitting with libunwind were also caused by this even though with the right parameters set libunwind should be able to generate the backtrace correctly.

I will look at the way we're compiling the profiler code and the signal handler in the hope that the issue is caused by either the way we compiler or link it.
I finally discovered what is causing the issue: the gecko code is using Thumb-2 encoding while the frame generated by the signal handler itself is non-Thumb 32-bit ARM code and this equally confuses _Unwind_Backtrace, libunwind and GDB. I'll figure out where that code is and make sure it also gets compiled with Thumb-2 or I'll enable inter-working when using --enable-profiling. Either way once unwinding from signal handlers is fixed we should be able to get proper back-traces.
The signal handler will give you the state of the thread before it was interrupted. Looks like you might be able to resume from there using _Unwind_SetGR _Unwind_SetIP.

A more important problem is that _Unwind_Backtrace is not signal safe:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24724

Apparently gpreftools will generate a first backtrace to, what appears, to initialize Unwind_Backtrace but you'd have to verify that it can't allocate after that:
http://google-perftools.googlecode.com/svn/trunk/src/stacktrace_x86_64-inl.h
(In reply to Benoit Girard (:BenWa) from comment #28)
> The signal handler will give you the state of the thread before it was
> interrupted. Looks like you might be able to resume from there using
> _Unwind_SetGR _Unwind_SetIP.

Yes, that was my plan.

> A more important problem is that _Unwind_Backtrace is not signal safe:
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24724

Ah, nasty :( I had quickly looked at the ARM _Unwind_Backtrace code but hadn't noticed this particular issue (and the code is conveniently documentation-free). At this stage I don't know if libunwind was failing for the same issue, but if it was and I can fix it I can use that instead of _Unwind_Backtrace. Hopefully that should work from the signal handler.

> Apparently gpreftools will generate a first backtrace to, what appears, to
> initialize Unwind_Backtrace but you'd have to verify that it can't allocate
> after that:
> http://google-perftools.googlecode.com/svn/trunk/src/stacktrace_x86_64-inl.h

OK, I'll try stepping through it to figure out when the allocations are happening.
I tried making _Unwind_Backtrace() work across the signal handler frame over the week-end by using _Unwind_SetGR() in the callback to restore the state that the signal handler had encountered. Unfortunately I wasn't very successful with it, in most of the cases this would yield to either segmentation faults or _Unwind_Backtrace() just immediately bailing out with an error. I'm not sure if having mixed Thumb/non-Thumb frames is making the problem worse but it's certainly not helping me in figuring out what to try and pass to _Unwind_Backtrace().

Generally this seems a very fragile approach to me and considering the amount of work involved in bug 779291 I think it might be better if we'd focus on getting breakpad to work instead. As I had mentioned before I'm all for helping out for that (if help is needed/wanted that is).
Attachment #683186 - Flags: review?(mwu) → review+
I've managed to apply the last patch in bug 779291 to my B2G build after some refreshing and after disabling a couple of things. I've succeeded in pulling this trace out of it:

http://people.mozilla.com/~bgirard/cleopatra/#report=2bdf5dc589ace9119c7a54e73b32cba72d8e6ead

It's not pretty (missing symbols and what look like a few glitches) but it's far more detailed than what we had before. I'll post more updates on this tomorrow, now it's getting late over here.
This was my last WIP patch that applies to gonk-misc, it adds a B2G_PROFILING variable that can be set in .userconfig and which will turn on the following options when running the configure script:

- --enable-profiling
- --disable-elfhack
- add '-mtpcs-frame -fno-omit-frame-pointer' to the CXXFLAGS

I've fiddled with the CXXFLAGS & NS_StackWalk() quite a bit but w/o being able to force the generated code to place the FP in a reliable place on the stack.

Pick this up freely if you have a quick solution for this. It would also be worth renaming this bug to "make native B2G profiling work" or something along the lines as the current name doesn't describe the issue correctly anymore.
Attachment #683186 - Attachment is obsolete: true
Gabriele, are you still working on it?
Are there any WiP patches flying around?
Jed could help here as well.
It looks as if, compiling with -mapcs-frame -mtpcs-frame -fno-omit-frame-pointer, every stack frame has its caller's r11 stored at [r11, #-12], and lr at [r11, #-4], for both ARM and Thumb.  Also, this appears to work across signal handlers (i.e., the handler is called with r11 unchanged from where the signal is raised).  So, if this is actually the case, it should be possible to do a simple stack walk until unwinding can be made to work.

Has this already been tried?
Summary: SPS on B2G doesn't get useful native stacks, because we can't build with --enable-profiling; compilation of TableTicker.cpp fails → SPS on B2G doesn't get maximally useful native stacks
(In reply to Jed Davis [:jld] from comment #34)
> It looks as if, compiling with -mapcs-frame -mtpcs-frame
> -fno-omit-frame-pointer, every stack frame has its caller's r11 stored at
> [r11, #-12], and lr at [r11, #-4], for both ARM and Thumb.  Also, this
> appears to work across signal handlers (i.e., the handler is called with r11
> unchanged from where the signal is raised).  So, if this is actually the
> case, it should be possible to do a simple stack walk until unwinding can be
> made to work.
> 
> Has this already been tried?

No it hasn't been tried. If trying it is not much effort I would welcome for someone to take a look. We're going to retain both unwind methods for the foreseeable future and if this one proves to be either/both more accurate and fast then it worth doing.
> Has this already been tried?

I was going to look at this when I finished my b2g blockers, but if you can get to it first, I would very much welcome it.  I'd be happy to review.
(In reply to Jed Davis [:jld] from comment #34)
> It looks as if, compiling with -mapcs-frame -mtpcs-frame
> -fno-omit-frame-pointer, every stack frame has its caller's r11 stored at
> [r11, #-12], and lr at [r11, #-4], for both ARM and Thumb.  Also, this
> appears to work across signal handlers (i.e., the handler is called with r11
> unchanged from where the signal is raised).  So, if this is actually the
> case, it should be possible to do a simple stack walk until unwinding can be
> made to work.
> 
> Has this already been tried?

Personally I haven't tried it yet, the last combination I had tried was -mtpcs-frame -fno-omit-frame-pointer but that wasn't enough to get a reliable frame chain in the stack. I already have code that modifies NS_StackWalk() for this purpose so I should be able to modify it quickly and test this.

As Justin mentioned however we're stuck fixing tef+/shira+ bugs until the end of the week so unless I get some extra spare time it'll have to wait until next week.
> It looks as if, compiling with -mapcs-frame -mtpcs-frame -fno-omit-frame-pointer, every stack frame 
> has its caller's r11 stored at [r11, #-12], and lr at [r11, #-4], for both ARM and Thumb.

You probably also want -mtpcs-leaf-frame.  But also, is it r7 in Thumb mode, instead of r11?
Gabriele, since you and I have other bugs we need to work on, perhaps we should unleash Jed on this.  Can you post your patches?
(In reply to Justin Lebar [:jlebar] from comment #39)
> Gabriele, since you and I have other bugs we need to work on, perhaps we
> should unleash Jed on this.  Can you post your patches?

You're right, that would be the best course of action. I'll dig out the patches that get rid of libunwind and ensure NS_StackWalk() is being used, the one for gonk-misc is already attached and would need only changes to the CXXFLAGS variable.
Assignee: gsvelto → jld
(In reply to Justin Lebar [:jlebar] from comment #38)
> You probably also want -mtpcs-leaf-frame.  But also, is it r7 in Thumb mode,
> instead of r11?

It's complicated.  Compiling with -O2 -fno-omit-frame-pointer -mthumb gives me this:

   0:   b580            push    {r7, lr}
   2:   af00            add     r7, sp, #0

Adding -mtpcs-frame turns the prologue into this sequence:

   0:   b084            sub     sp, #16
   2:   b580            push    {r7, lr}
   4:   ab06            add     r3, sp, #24
   6:   9303            str     r3, [sp, #12]
   8:   467b            mov     r3, pc
   a:   9305            str     r3, [sp, #20]
   c:   465b            mov     r3, fp
   e:   9302            str     r3, [sp, #8]
  10:   4673            mov     r3, lr
  12:   9304            str     r3, [sp, #16]
  14:   ab05            add     r3, sp, #20
  16:   469b            mov     fp, r3
  18:   af00            add     r7, sp, #0

I pieced this together on a whiteboard, and what it's doing is building the same 4-word frame as ARM code gets with -mapcs-frame:

        mov     ip, sp
        stmfd   sp!, {fp, ip, lr, pc}
        sub     fp, ip, #4

and putting that on top of the 2-word Thumb frame.

So, I don't know why all of these flags do what they do, but it looks as if it should work.
(In reply to Jed Davis [:jld] from comment #41)
> Adding -mtpcs-frame turns the prologue into this sequence:

except when it doesn't.  Which seems to depend on the -mcpu flag and possibly other things?  I don't understand this as well as I thought.  Time to go read the GCC source....
The relevant bit is whether the target CPU supports Thumb2 (TARGET_THUMB1 and TARGET_32BIT).  The additional 4-word item is created iff it's Thumb1-only.

So we might be back to needing to either modify the compiler or get real unwinding to work.
> So we might be back to needing to either modify the compiler or get real unwinding to work.

Is writing our own unwinder for tpcs-frame still an option?
(Or is that what you meant by "real unwinding"?)
(1) -mtpcs-frame -mtpcs-leaf-frame does nothing in the presence of a Thumb2 target like -march-=armv7-a

(2) The regular Thumb frame can look like this:

        push    {r7}
        add     r7, sp, #0

or this:

        push    {r7, lr}
        add     r7, sp, #0

or this:

        push    {r4, r5, r7, lr}
        add     r7, sp, #0

Those are all from the same translation unit.  The offset from r7 to the saved r7 would need to be obtained from the unwind info (or disassembling the binary, but that might be more work than waiting for PR 779291).

The relevant part of GCC is this, in thumb_set_frame_pointer in arm.c:

  amount = offsets->outgoing_args - offsets->locals_base;
  if (amount < 1024)
    insn = emit_insn (gen_addsi3 (hard_frame_pointer_rtx,
                                  stack_pointer_rtx, GEN_INT (amount)));

So it's a pointer to the start of the stacked locals, if I'm reading that right.

And this doesn't get into how to find the saved r7/r11 at ARM/Thumb boundaries.
Would we have a better shot here if we compiled with -marm -mapcs-frame?  We need to compile B2G with GCC 4.6 in order for this to even run (I got mine from NDK r8d).  You can force Gecko to compile for arm with ac_add_options --with-thumb=no and you can add -mapcs-frame to the c/cxxflags.
Any thoughs on this approach Julian?
Attached patch NS_StackWalk() scaffolding (obsolete) (deleted) — Splinter Review
This patch contains the changes required to force usage of NS_StackWalk() in mozilla-b2g18. This involves getting rid of libunwind and turning on the right defines and options. It doesn't do anything else besides this as all my attempts at properly implementing the frame-pointer chasing in NS_StackWalk() failed.

I didn't bother testing on mozilla-central because the patches for using breakpad for unwinding are coming in (see bug 779291) so I'll rather test those on B2G when using central than trying to come up with our own solution.
I'm unable to say whether this particular scheme of examining prologue
instructions will work well enough.  It might, it might not.

Overall, prologue analysis is a well known technique, but does not
have a good reputation -- it is generally fragile to changes in
compiler output.

If you are using a 779291 snapshot, it might be worth turning on some
of its debug output so we can assess how well it is doing and to check
it is actually finding and reading the CFI.

* set LOGLEVEL to 3 (at first) and 4 (maybe) in UnwinderThread.cpp

* what does your function 'LoadSymbols' in
  src/common/linux/dump_symbols.cc look like?  Does it have 
  any logging output to indicate success/failure?

* any other info in the Android log about success/failure in
  debuginfo loading?

* later versions of 779291 have a fix for a static_cast vs
  dynamic_cast mixup which could potentially reduce unwinding
  accuracy.  You might want to try that.  I can break out the
  relevant diffs.
(In reply to Justin Lebar [:jlebar] from comment #47)
> Would we have a better shot here if we compiled with -marm -mapcs-frame?  We
> need to compile B2G with GCC 4.6 in order for this to even run

Would -mno-thumb -mapcs-frame work?
Last night I tried building GCC trunk for arm-linux-androideabi, to see if -mtpcs-frame not working on Thumb2 targets was a bug that had been fixed at some point.  (Thumb2 should be able to use LDM/STM the same way as ARM, if I'm reading the spec correctly, instead of the move/store sequences needed for Thumb1.)  Instead, it's worse: it still doesn't do anything if Thumb2 is present, and in the Thumb1 case I get a function with this prologue:

        push    {r7, lr}
        add     r7, sp, #0

and this epilogue:

        mov     sp, r7
        pop     {r7}
        pop     {r1, r2, r3}
        mov     fp, r2
        mov     sp, r3
        bx      r1

That is, the code that generates the APCS frame in the prologue is absent, but the code that tries to retrieve the saved FP/SP from it is still present.

I also notice that there are no test cases that specify -mtpcs-frame.  I wonder if it just bit-rotted.
(In reply to Jed Davis [:jld] from comment #51)
> (In reply to Justin Lebar [:jlebar] from comment #47)
> > Would we have a better shot here if we compiled with -marm -mapcs-frame?  We
> > need to compile B2G with GCC 4.6 in order for this to even run
> 
> Would -mno-thumb -mapcs-frame work?

I haven't tried, so I don't know.  I'm sort of surprised -mno-thumb would be different than -marm, but that's probably just my ignorance of the toolchain...
It looks like I misunderstood the comment about -marm/-mno-thumb — the problem there is bug 832379.
So... bug 779291 (breakpad unwinding for SPS) might take care of this, but I think that would require copying debug symbols to the device.  That can probably be made to work with SD card space, but maybe we don't want to rely on that?

Thus, see bug 831611 comment #47 for the current state of a GCC modification to get APCS-like frames for Thumb2 code, which seems to work well enough for perf.  That plus making NS_StackWalk do the same thing as perf should mostly work.  It will run into trouble with code that isn't compiled normally (binaries from vendors or Android, or the asm code in pixman that uses all the registers), so things need to not chase frame pointers completely blindly, but perf can sometimes find the stack anyway.
> So... bug 779291 (breakpad unwinding for SPS) might take care of this, but I think that would 
> require copying debug symbols to the device.

FWIW strip -g'ed libxul.so's are pretty small.  You lose line numbers, but line numbers aren't all that meaningful in release builds anyway.  I don't know if that would work for breakpad's purposes, but I suspect it would...
Line information aren't needed for unwinding. It can simply live on a symbol server or a copy of the unstripped library obtain when symbolicating the profile.
(In reply to Justin Lebar [:jlebar] from comment #57)
> FWIW strip -g'ed libxul.so's are pretty small.  You lose line numbers, but
> line numbers aren't all that meaningful in release builds anyway.  I don't
> know if that would work for breakpad's purposes, but I suspect it would...

Having investigated a little, I think what we'd need at runtime is .debug_frame; libxul's is only 5-6 MiB.  In particular, I think we don't need the really huge parts of the debug information (mainly .debug_info; also .debug_line and .debug_str) on the phone.

`strip -g` isn't quite what we'd want (it removes .debug_frame and leaves .symtab/.strtab, which also shouldn't be needed until symbolication time), but there will be some way to use strip or objcopy to do this.
With the experimental toolchain from bug 831611, I modified NS_StackWalk to walk the frame pointers I described in the earlier comments (and TableTicker to work with NS_StackWalk on Linux, which is not something that's been done before, it looks like).  And that seems to more or less work ­— or at least the stacks make sense, modulo a change I had to make in scripts/profile-symbolicate.py.

I suspect that JITed script will break my stacks, but I haven't investigated enough to see this in action yet.
This is a profile of me swiping the homescreen back and forth, in case anyone wants to see what things look like.
This patch is undoubtedly doing many things the wrong way (in particular it looks like this should incorporate attachment 713831 [details] [diff] [review]), but it's here if people need something profiled before I have it cleaned up.

Note that it's not useful without the toolchain changes, and I haven't yet figured out how those might be integrated.
Whiteboard: c=performance
(The first time I renamed this bug I was unfamiliar with the profiler and misunderstood what the extensive pseudo-frames actually were.)
Summary: SPS on B2G doesn't get maximally useful native stacks → Native stacks for SPS on B2G
Depends on: 809571
Depends on: 870639
Whiteboard: c=performance → c=profiling
Depends on: 873332
Attached patch Allow NS_StackWalk to use APCS frame pointers (obsolete) (deleted) — Splinter Review
This patch has the Gecko bits for making NS_StackWalk work on B2G, if HAVE_APCS_FRAME is defined and something sufficiently like -mapcs-frame is used (e.g., -mthumb2-fake-apcs-frame with https://github.com/jld/linaro-android-gcc/compare/master...gcc443-apcs applied to GCC), and for making TableTicker use it for SPS, if those conditions hold and MOZ_PROFILING is defined.

I have not attempted to expose -DHAVE_APCS_FRAME as a configure option, nor to have the configure script try to probe for APCS frame pointer support automatically.  I already need to add -DHAVE_APCS_FRAME globally for Gonk, so I don't know if it would help much.
Attachment #713831 - Attachment is obsolete: true
Attachment #728099 - Attachment is obsolete: true
Attachment #751985 - Flags: review?
Component: General → Gecko Profiler
Product: Boot2Gecko → Core
Version: unspecified → Trunk
Attachment #751985 - Flags: review? → review?(bgirard)
Comment on attachment 751985 [details] [diff] [review]
Allow NS_StackWalk to use APCS frame pointers

I think I can do better than the bionic-internal __get_stack_base here.
Attachment #751985 - Flags: review?(bgirard)
…also, this is mozilla-central and not b2g18, so I need to make IonMonkey not step on r11.  (Or generate its own stack frames.)  But this assumes that we want to take this approach; once the current work week is over and people are in a smaller range of time zones there's some discussion that needs to happen.
Blocks: 886918
Whiteboard: c=profiling → c=profiling p=5
Depends on: 892201
No longer depends on: 870639
Blocks: 895154
Keywords: perf
Whiteboard: c=profiling p=5 → [c=profiling p=5]
A MOZ_PROFILING build already uses -funwind-tables, which gives us the exception handling info (even though we don't use exceptions) for this.  Note that this can't unwind through jitcode, or libraries that aren't loaded when the profiler starts; these can be handled separately and there are separate bugs for them.

The breakpad unwinder can use this info, but interpreting it in-place with specialized code like this consumed approximately 1/40 the CPU time and 1/100 the RAM when last I tested (including the time overhead of the profiler infrastructure, which is on the same order as my unwinder), and this is important on b2g.

Feel free to add/change reviewers if needed.
Attachment #708135 - Attachment is obsolete: true
Attachment #751985 - Attachment is obsolete: true
Attachment #789314 - Flags: review?(bgirard)
Summary: Native stacks for SPS on B2G → Initial support for native stacks in SPS on B2G
Comment on attachment 789314 [details] [diff] [review]
Add an implementation of ARM EHABI stack unwinding for the profiler.

Review of attachment 789314 [details] [diff] [review]:
-----------------------------------------------------------------

r- because of async safety

::: tools/profiler/EHABIStackWalk.cpp
@@ +17,5 @@
> + * possible places where an async signal could occur (e.g., in a
> + * prologue or epilogue), this bounds-checks all stack accesses.
> + */
> +
> +#include "EHABIStackWalk.h"

I can't leave a useful review for this. Maybe someone from the perf team can?

@@ +450,5 @@
> +}
> +
> +
> +const AddrSpace *AddrSpace::Current() {
> +  SharedLibraryInfo info = SharedLibraryInfo::GetInfoForSelf();

This does many calls which aren't async signal safe (GetInfoForSelf/new/vector).

::: tools/profiler/EHABIStackWalk.h
@@ +10,5 @@
> + *   http://infocenter.arm.com/help/topic/com.arm.doc.ihi0038a/IHI0038A_ehabi.pdf
> + */
> +
> +#ifndef mozilla_EHABIStackWalk_h__
> +#define mozilla_EHABIStackWalk_h__

This header doesn't expose a clean interface to what's implemented here. Conceptually we're providing something where the client can populate registers via a mcontext_t and get a unwind out. I think this is all that should be exposed in this header. The rest can move to the cpp. I'd also rename State to something like EHABIUnwinder.

@@ +34,5 @@
> +  // influence the unwinding process, so this must track all of them.
> +  uint32_t mRegs[16];
> +  bool unwind(const Entry *aEntry, const void *stackBase);
> +  uint32_t &operator[](int i) { return mRegs[i]; }
> +  const uint32_t &operator[](int i) const { return mRegs[i]; }

I'm not a fan of using operator[] for this TBH but I don't feel strongly against it.

::: tools/profiler/TableTicker.cpp
@@ +346,5 @@
>    }
>  }
>  
> +#ifdef USE_BACKTRACE
> +void TableTicker::doNativeBacktrace(ThreadProfile &aProfile, TickSample* aSample)

This was removed in bug 880158. Did you mean to include this?

@@ +484,5 @@
> +  const ehabi::AddrSpace *space = spaceCell;
> +  if (!space) {
> +    space = ehabi::AddrSpace::Current();
> +    if (!spaceCell.compareExchange(nullptr, space)) {
> +      delete space;

We can't call delete since we might of pause something holding the heap lock. Everything in the unwinder called by this must also only use async signal safe code. Note that it's possible to write a custom allocator if really required but we've been avoiding this so far.

@@ +496,5 @@
> +    void *stackLimit;
> +    size_t stackSize;
> +
> +    pthread_getattr_np(pthread_self(), &attr);
> +    pthread_attr_getstack(&attr, &stackLimit, &stackSize);

Yes, this needs to be fixed because it's not async signal safe.
Attachment #789314 - Flags: review?(bgirard) → review-
Whiteboard: [c=profiling p=5] → [c=profiling s=2013.08.23 p=4]
Blocks: 904869
Blocks: 895195
No longer blocks: 895195
Blocks: 904881
Blocks: 895175
Yes, the calls that allocate/deallocate definitely don't belong in the signal handler — I truly don't know what I was thinking when I wrote that.  They should be where the profiler is started, which ought to be in normal context (but maybe not?  Don't we have support for starting the profiler in response to a signal?).  Or the signal-sending thread might work for this.

(As an aside, it might be nice if the non-async-signal-safe parts of code like this could assert that they're called in an appropriate context.  (Of course, we'd need profiling + DEBUG + ARM to work for that to help here.)  Looks like it'd need a call to sigprocmask, though, which might be too expensive for some things.)

The USE_BACKTRACE stuff is a merge mistake; I rebased across the removal and didn't realize what had happened.

The stack bounds... I think the Android versions of those routines are probably safe (for now), but it's true that they aren't in general.  I seem to recall the profiler already tracks stack bounds, for the breakpad stack-copier, so that should be easy enough to remove.
Whiteboard: [c=profiling s=2013.08.23 p=4] → [c=profiling p=4]
Whiteboard: [c=profiling p=4] → [c=profiling p=4][d-watch]
In sprint planning this morning it was decided that I'd take over this bug since it is blocking so many people and jld is on jury duty.
(In reply to Jed Davis [:jld] from comment #71)
> Don't we have support for starting the profiler in
> response to a signal?

Yes we do and it's not safe. In theory we're taking a risk there but because it only happens once during user interaction it's not a huge issue. It would be better if the signal just posted a tasks to the main event queue but even that isn't safe. The safe thing would be to flip a 'start me' bit that is getting checked from time to time from the main thread but that's gross, maybe there's a better way for the signal handler to safely dispatch an event to the main thread?

In general starting the profiler should be assumed to not be running from a signal handler.
The revised patch is in several pieces, for (hopefully) ease of reading.
Attachment #795205 - Flags: review?(bgirard)
In this revised version I've attempted to fix the functional problems of the last patch: the async signal safety issues, and the mis-merge of the deleted USE_BACKTRACE stuff.  I have not attempted to make the EHABIStackWalk interface more high-level at this time.  If we want that kind of thing dealt with before checkin, I won't have much time until Friday, but :huseby can stack more patches on top of this if need be.
Attachment #789314 - Attachment is obsolete: true
Attachment #795208 - Flags: review?(bgirard)
Attachment #795208 - Attachment description: 0004-Bug-810526-Add-an-implementation-of-ARM-EHABI-stack-.hg.diff → [4/4] Add an implementation of ARM EHABI stack unwinding for the profiler.
> The safe thing would be to flip a 'start me' bit that is
> getting checked from time to time from the main thread but that's gross,
> maybe there's a better way for the signal handler to safely dispatch an
> event to the main thread?

IIRC, that's what signalfd is for: http://linux.die.net/man/2/signalfd
(In reply to Dave Huseby [:huseby] from comment #78)
> > The safe thing would be to flip a 'start me' bit that is
> > getting checked from time to time from the main thread but that's gross,
> > maybe there's a better way for the signal handler to safely dispatch an
> > event to the main thread?
> 
> IIRC, that's what signalfd is for: http://linux.die.net/man/2/signalfd

Thanks, filed bug 909403
Attachment #795205 - Flags: review?(bgirard) → review+
Comment on attachment 795206 [details] [diff] [review]
[2/4] Factor out native/pseudo stack merging.

Review of attachment 795206 [details] [diff] [review]:
-----------------------------------------------------------------

If I recall correctly bug 867757 is also doing something similar but let's not block on it. If this lands first great!
Attachment #795206 - Flags: review?(bgirard) → review+
Attachment #795207 - Flags: review?(bgirard) → review+
Comment on attachment 795208 [details] [diff] [review]
[4/4] Add an implementation of ARM EHABI stack unwinding for the profiler.

Review of attachment 795208 [details] [diff] [review]:
-----------------------------------------------------------------

I think this is signal safe now. Looking much better. Leaving Dave to review the EHABI unwinding bits since its outside my domain.

r- because I want EHABIStackWalk.h to be more self contain and smaller to keep the maintenance burden on the profiler low.

::: tools/profiler/EHABIStackWalk.cpp
@@ +4,5 @@
> + * License, v. 2.0. If a copy of the MPL was not distributed with this
> + * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
> +
> +/*
> + * This is an implementation of stack unwinding according to a subset

This should be removed or replace with a comment pointing to the header.

@@ +446,5 @@
> +                                    + exidxHdr->p_memsz);
> +  mEntries.reserve(endTable - startTable);
> +  for (const Entry *i = startTable; i < endTable; ++i)
> +    mEntries.push_back(i);
> +  std::sort(mEntries.begin(), mEntries.end());

Are we sure sort is in place (signal safe)? Reasonable assumptions but not necessarily. Not a huge deal I guess, we will quickly find a deadlock and attach a debugger.

@@ +452,5 @@
> +
> +
> +mozilla::Atomic<const AddrSpace*> AddrSpace::sCurrent(nullptr);
> +
> +const AddrSpace *AddrSpace::Current(bool aSignalContext) {

I don't like this. This is begging to be two functions.
if aSignalContext is false then we update the AddrSpace, if its true then we used the cache version. We should have two functions:
AddrSpace::Get() / AddrSpace::Update()

::: tools/profiler/EHABIStackWalk.h
@@ +25,5 @@
> +
> +#include "mozilla/Atomics.h"
> +
> +namespace mozilla {
> +namespace ehabi {

Let's avoid nested namespaces. Id prefer to see stuff prefixed with EHABI if they will collide.
https://developer.mozilla.org/en-US/docs/Developer_Guide/Coding_Style#Namespaces

@@ +29,5 @@
> +namespace ehabi {
> +
> +struct Entry;
> +
> +class State {

I don't like the names, but everything here should be local to the compilation unit and doesn't belong in the header. See my next comment about moving this.

::: tools/profiler/TableTicker.cpp
@@ +478,5 @@
> +      break;
> +    const ehabi::Entry *entry = table->lookup(pc);
> +    if (!entry)
> +      break;
> +    if (!state.unwind(entry, stackBase))

I think EHABIStackWalk.h shouldn't expose all of these internal details. Instead it should only need to expose something like 'eabi:unwind(PCArray* array)'.
Attachment #795208 - Flags: review?(dhuseby)
Attachment #795208 - Flags: review?(bgirard)
Attachment #795208 - Flags: review-
I'll get to the rest of this review soon, but one thing can be answered more easily:

(In reply to Benoit Girard (:BenWa) from comment #81)
> @@ +446,5 @@
> > +                                    + exidxHdr->p_memsz);
> > +  mEntries.reserve(endTable - startTable);
> > +  for (const Entry *i = startTable; i < endTable; ++i)
> > +    mEntries.push_back(i);
> > +  std::sort(mEntries.begin(), mEntries.end());
> 
> Are we sure sort is in place (signal safe)? Reasonable assumptions but not
> necessarily. Not a huge deal I guess, we will quickly find a deadlock and
> attach a debugger.

This code isn't meant to be called in <del>interrupt</del> signal handler context — that vector::reserve will definitely allocate.  It's used at profiler startup time only (and since we're not done constructing the AddrSpace yet, we can't be racing with the signal handler).
taking this from jld since he's on jury duty.  let's see if we can get this landed.
Assignee: jld → dhuseby
Comment on attachment 795208 [details] [diff] [review]
[4/4] Add an implementation of ARM EHABI stack unwinding for the profiler.

Review of attachment 795208 [details] [diff] [review]:
-----------------------------------------------------------------

I can't figure out why you choose to use struct instead of class in some cases.  I think you should try to make that more consistent.  I usually only use struct when POD semantics are appropriate.  I also am concerned about the endian issues.  If we are assuming little endian on all ARM targets, then I have no problems, but we should probably wrap the little endian specific code in checks for endianness.

::: tools/profiler/EHABIStackWalk.cpp
@@ +36,5 @@
> +
> +namespace mozilla {
> +namespace ehabi {
> +
> +struct PRel31 {

class?  there is private data.  I'm having a hard time determining your rules for when to use a struct and when to use a class.

@@ +38,5 @@
> +namespace ehabi {
> +
> +struct PRel31 {
> +  uint32_t bits() const { return mBits; }
> +  bool topBit() const { return mBits & 0x80000000; }

endian issues?

@@ +39,5 @@
> +
> +struct PRel31 {
> +  uint32_t bits() const { return mBits; }
> +  bool topBit() const { return mBits & 0x80000000; }
> +  uint32_t value() const { return mBits & 0x7fffffff; }

endian issues here too.

@@ +210,5 @@
> +
> +    // 00xxxxxx: vsp = vsp + (xxxxxx << 2) + 4
> +    // 01xxxxxx: vsp = vsp - (xxxxxx << 2) - 4
> +    if ((insn & M_ADDSP) == I_ADDSP) {
> +      uint32_t offset = ((insn & 0x3f) << 2) + 4;

are we just assuming little endian for all arm machines?

@@ +446,5 @@
> +                                    + exidxHdr->p_memsz);
> +  mEntries.reserve(endTable - startTable);
> +  for (const Entry *i = startTable; i < endTable; ++i)
> +    mEntries.push_back(i);
> +  std::sort(mEntries.begin(), mEntries.end());

std::sort is in place.

@@ +452,5 @@
> +
> +
> +mozilla::Atomic<const AddrSpace*> AddrSpace::sCurrent(nullptr);
> +
> +const AddrSpace *AddrSpace::Current(bool aSignalContext) {

+1 please make this into two functions.

::: tools/profiler/EHABIStackWalk.h
@@ +33,5 @@
> +class State {
> +public:
> +  // Note that any core register can be used as a "frame pointer" to
> +  // influence the unwinding process, so this must track all of them.
> +  uint32_t mRegs[16];

If you're going to provide operator[] access, this should be private.

@@ +36,5 @@
> +  // influence the unwinding process, so this must track all of them.
> +  uint32_t mRegs[16];
> +  bool unwind(const Entry *aEntry, const void *stackBase);
> +  uint32_t &operator[](int i) { return mRegs[i]; }
> +  const uint32_t &operator[](int i) const { return mRegs[i]; }

these can/should be inlined.

@@ +54,5 @@
> +
> +struct Table {
> +  uint32_t mStartPC;
> +  uint32_t mEndPC;
> +  uint32_t mLoadOffset;

Make this a proper class and encapsulate the member data?

@@ +63,5 @@
> +  std::string mName;
> +
> +  Table(const void *aELF, size_t aSize, const std::string &aName);
> +  const Entry *lookup(uint32_t aPC) const;
> +  operator bool() const { return mEntries.size() > 0; }

I know this is subjective, but I don't like this shortcut.  It hides what is going.  I think you should expose mEntries.size() through Table::size() and force clients of this class to do if (Table.size() > 0).  It makes client code easier to read.

@@ +69,5 @@
> +  uint32_t loadOffset() const { return mLoadOffset; }
> +};
> +
> +class AddrSpace {
> +  std::vector<uint32_t> mStarts;

Again, this is subjective, but I like explicit "private:" especially since you're mixing structs and classes in this file.

::: tools/profiler/TableTicker.cpp
@@ +478,5 @@
> +      break;
> +    const ehabi::Entry *entry = table->lookup(pc);
> +    if (!entry)
> +      break;
> +    if (!state.unwind(entry, stackBase))

+1
Attachment #795208 - Flags: review?(dhuseby) → review-
giving this back to jed since he's working full time this week.
Assignee: dhuseby → jld
With respect to endianness: I was assuming that the exception handling data, which is defined in terms of machine words, would use the regular data endianness.  I'm failing to see any explicit reference to endianness or byte order in the spec, but let's try assembling a test program:

.text
foo:
        .fnstart
        .save {r4, lr}
        stmdb sp!, {r4, lr}
        ldmia sp!, {r4, pc}
        .fnend

To reduce the possibility of followup questions about this, I've convinced the linker to relocate the result, so we can see both fields of an index entry.  Here's the exception index with -EL:

0000007c  f8 ff ff 7f b0 b0 a8 80

And with -EB:

0000007c  7f ff ff f8 80 a8 b0 b0
"Include what you use."  This turned up after I made EHABIStackWalk.h more minimal.

This patch is meant to be applied after [3/4] and before [4/4].  Hopefully we'll be done with this bug before the patch numbering turns into a BASIC program.
Attachment #801610 - Flags: review?(bgirard)
Attachment #801610 - Flags: review?(bgirard) → review+
Changes:
* Moved all the details into EHABIStackWalk.cpp.
* Ripped out `namespace ehabi`.
* Split the bool-parameterized function that was doing two different things.
* Made the comments less bad.
* Cleaned up the various struct/class and public/private issues.  Mostly; `class N { private: ` has not been done.
* Made the `operator bool` a thing with a name instead.
Attachment #795208 - Attachment is obsolete: true
Attachment #801615 - Flags: review?(dhuseby)
Attachment #801615 - Flags: review?(bgirard)
Comment on attachment 801615 [details] [diff] [review]
[4/4 v2] Add an implementation of ARM EHABI stack unwinding for the profiler.

Review of attachment 801615 [details] [diff] [review]:
-----------------------------------------------------------------

I'll leave the review of the EHABI bits to huseby.

::: tools/profiler/EHABIStackWalk.cpp
@@ +201,5 @@
> +
> +  bool unwind();
> +
> +private:
> +  // FIXME: GCC seems to think that `mState` can alias `*this`.

When do you plan on fixing this? If not before landing can you make the comment clearer? I've never seen something like this.

::: tools/profiler/TableTicker.cpp
@@ +48,5 @@
>  #ifdef USE_NS_STACKWALK
>   #include "nsStackWalk.h"
>  #endif
>  
> +#if defined(SPS_ARCH_arm) && defined(linux)

Have you tested this on android? I believe the ifdef will enable it there too
Attachment #801615 - Flags: review?(bgirard) → review+
(In reply to Benoit Girard (:BenWa) from comment #89)
> ::: tools/profiler/EHABIStackWalk.cpp
> @@ +201,5 @@
> > +
> > +  bool unwind();
> > +
> > +private:
> > +  // FIXME: GCC seems to think that `mState` can alias `*this`.
> 
> When do you plan on fixing this? If not before landing can you make the
> comment clearer? I've never seen something like this.

It should have been able to optimize away repeated reads of vSP (meaning mState.mRegs[13]) between the different stack pointer checks, but it wasn't, and I conjecture that it's because it thinks setting the mFailed flag could alias it.

With the new simpler interface I could just flatten EHState into EHInterp so that they can't alias (which also simplifies the code) and see if that helps, but I think it might be better to leave the code as-is, since it works, and improve the comment / file a bug for the followup.

> ::: tools/profiler/TableTicker.cpp
> @@ +48,5 @@
> >  #ifdef USE_NS_STACKWALK
> >   #include "nsStackWalk.h"
> >  #endif
> >  
> > +#if defined(SPS_ARCH_arm) && defined(linux)
> 
> Have you tested this on android? I believe the ifdef will enable it there too

It ought to work, but no, this "feature" has not been tested there, and it probably needs more than our automated tests before turning it on.  So I think I'll change the conditional to be B2G-only and leave the question of Fennec for people who already have a development environment set up for it.
Comment on attachment 801615 [details] [diff] [review]
[4/4 v2] Add an implementation of ARM EHABI stack unwinding for the profiler.

Review of attachment 801615 [details] [diff] [review]:
-----------------------------------------------------------------

I have just one little nitpick in EHABIStackWalk.cpp, just make that change and then proceed.

::: tools/profiler/EHABIStackWalk.cpp
@@ +300,5 @@
> +  while (!mFailed) {
> +    uint8_t insn = next();
> +#if 0
> +    LOGF("unwind insn = %02x", (unsigned)insn);
> +#endif

Maybe do an #ifdef DEBUG_B2G_UNWIND around the LOGF so that we can turn it on when doing a build.
Attachment #801615 - Flags: review?(dhuseby) → review+
Carrying over r+.  Incremental changes, as suggested:
* More/better TODO comments.
* Disable on non-B2G ARM platforms, pending more testing.
* ifdef the LOGFs

Try builds in progress:
https://tbpl.mozilla.org/?tree=Try&rev=86c5272fb610
https://tbpl.mozilla.org/?tree=Try&rev=8823590b4a61
Attachment #801615 - Attachment is obsolete: true
Attachment #803075 - Flags: review+
Passes all the builds, and all the tests that are either on a platform where the change is enabled (B2G) or a test that would exercise the profiler (xpcshell, as I understand it; there aren't many profiler tests at present).  Note that the reds seen there are for timeouts much like the other spurious timeouts I've been intermittently getting on try today.

For checkin to b2g-inbound, applied in increasing order (1, 2, 3, 3.1, 4).
Keywords: checkin-needed
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: