Closed
Bug 1016629
Opened 11 years ago
Closed 10 years ago
Add native stack support in ThreadStackHelper
Categories
(Core :: XPCOM, defect)
Core
XPCOM
Tracking
()
RESOLVED
FIXED
mozilla34
People
(Reporter: jchen, Assigned: jchen)
References
Details
Attachments
(8 files, 1 obsolete file)
(deleted),
patch
|
vladan
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
snorp
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
snorp
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
snorp
:
review+
jseward
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
snorp
:
review+
jseward
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
BenWa
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
snorp
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
jchen
:
review+
|
Details | Diff | Splinter Review |
ThreadStackHelper should be able to use the new unwinder library to perform unwinding and get a native stack in addition to the pseudo-stack.
Assignee | ||
Updated•10 years ago
|
Summary: Add LUL support in ThreadStackHelper → Add native stack support in ThreadStackHelper
Assignee | ||
Comment 1•10 years ago
|
||
Keep a separate native stack for each pseudostack.
Attachment #8448982 -
Flags: review?(vdjeric)
Assignee | ||
Comment 2•10 years ago
|
||
Simple class to set/clear a pointer within a scope.
Attachment #8448991 -
Flags: review?(snorp)
Assignee | ||
Comment 3•10 years ago
|
||
Simplify what we had before.
Attachment #8448995 -
Flags: review?(snorp)
Updated•10 years ago
|
Attachment #8448991 -
Flags: review?(snorp) → review+
Updated•10 years ago
|
Attachment #8448995 -
Flags: review?(snorp) → review+
Assignee | ||
Comment 4•10 years ago
|
||
Add a GetNativeStack method to ThreadStackHelper. It first uses GetStack to get the pseudostack, as well as the thread context within the signal handler. It then unwinds the stack outside of the signal handler.
Attachment #8450300 -
Flags: review?(snorp)
Assignee | ||
Comment 5•10 years ago
|
||
Add implementation for platform-specific bits. Tested on Windows, OS X, Linux, and Android.
Attachment #8450301 -
Flags: review?(snorp)
Assignee | ||
Comment 6•10 years ago
|
||
Get the native stack during a permanent hang if we don't have a native stack yet.
Attachment #8450303 -
Flags: review?(snorp)
Assignee | ||
Comment 7•10 years ago
|
||
Comment on attachment 8450303 [details] [diff] [review]
f. Get native stack for permahangs in BHM (v1)
Actually I'm going to move this patch to bug 1034138.
Attachment #8450303 -
Attachment is obsolete: true
Attachment #8450303 -
Flags: review?(snorp)
Assignee | ||
Comment 8•10 years ago
|
||
On ARM Android, dl_iterate_phdr is provided by the custom linker. So if libxul was loaded by the system linker (e.g. as part of xpcshell when running tests), it won't be available and we should not call it.
Attachment #8450324 -
Flags: review?(bgirard)
Assignee | ||
Comment 9•10 years ago
|
||
The address sanitizer catches us copying the stack and thinks it's a bug. We should blacklist FillThreadContext and not use memcpy in that case.
Attachment #8450351 -
Flags: review?(snorp)
Comment 10•10 years ago
|
||
Comment on attachment 8450324 [details] [diff] [review]
f. Don't call dl_iterate_phdr if it's not available (v1)
Review of attachment 8450324 [details] [diff] [review]:
-----------------------------------------------------------------
Can you put the comment in the patch description in line please?
Attachment #8450324 -
Flags: review?(bgirard) → review+
Comment 11•10 years ago
|
||
Comment on attachment 8448982 [details] [diff] [review]
Associate a native stack for each hang (v1)
Review of attachment 8448982 [details] [diff] [review]:
-----------------------------------------------------------------
Trivial review :)
What's the bigger picture here though.. are you just reporting C++ stacks of all threads during a permanent hang on Fennec, or do you plan to extend it beyond that?
Attachment #8448982 -
Flags: review?(vdjeric) → review+
Assignee | ||
Comment 12•10 years ago
|
||
(In reply to Vladan Djeric (:vladan) from comment #11)
> Comment on attachment 8448982 [details] [diff] [review]
> Associate a native stack for each hang (v1)
>
> Review of attachment 8448982 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> Trivial review :)
>
> What's the bigger picture here though.. are you just reporting C++ stacks of
> all threads during a permanent hang on Fennec, or do you plan to extend it
> beyond that?
Right now it's just the C++ stack for the hung thread, in addition to the pseudostack that we already report. I don't think we have plans beyond that; we want to at least look at the gathered data first.
Comment 13•10 years ago
|
||
Comment on attachment 8450300 [details] [diff] [review]
d. Add and implement GetNativeStack method in ThreadStackHelper (v1)
Review of attachment 8450300 [details] [diff] [review]:
-----------------------------------------------------------------
Seems like I'm not the right person to review this, but looks good to me. Someone who knows breakpad should also have a look.
Attachment #8450300 -
Flags: review?(snorp) → review+
Comment 14•10 years ago
|
||
Comment on attachment 8450351 [details] [diff] [review]
g. Avoid ASan flag when copying stack (v1)
Review of attachment 8450351 [details] [diff] [review]:
-----------------------------------------------------------------
Is there a way to get ASAN or our test harness to ignore this? I wonder if the loop there is significantly slower than the (presumably optimized?) libc implementation.
Attachment #8450351 -
Flags: review?(snorp) → review+
Comment 15•10 years ago
|
||
Comment on attachment 8450301 [details] [diff] [review]
e. Implement platform-specific code for filling in context (v1)
Review of attachment 8450301 [details] [diff] [review]:
-----------------------------------------------------------------
Looks good I guess, but I don't know any of that code. Should definitely get another person to review.
Attachment #8450301 -
Flags: review?(snorp) → review+
Assignee | ||
Comment 16•10 years ago
|
||
(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #14)
> Comment on attachment 8450351 [details] [diff] [review]
> g. Avoid ASan flag when copying stack (v1)
>
> Review of attachment 8450351 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> Is there a way to get ASAN or our test harness to ignore this? I wonder if
> the loop there is significantly slower than the (presumably optimized?) libc
> implementation.
The loop is only enabled for ASAN builds so I don't think we have to worry about performance here.
Assignee | ||
Comment 17•10 years ago
|
||
Comment on attachment 8450300 [details] [diff] [review]
d. Add and implement GetNativeStack method in ThreadStackHelper (v1)
Asking :jseward to review this because he wrote the original Breakpad unwinding code in the profiler.
I chose to use Breakpad for unwinding because it's relatively straightforward to use and it covers all platforms/architectures (AFAIK the LUL only covers Linux at the moment). Also, unwinding performance is not critical in this case because we only unwind when there's a hang of several seconds.
Attachment #8450300 -
Flags: review?(jseward)
Assignee | ||
Comment 18•10 years ago
|
||
Comment on attachment 8450301 [details] [diff] [review]
e. Implement platform-specific code for filling in context (v1)
See comment 17
Attachment #8450301 -
Flags: review?(jseward)
Comment 19•10 years ago
|
||
(In reply to Jim Chen [:jchen :nchen] from comment #17)
> Comment on attachment 8450300 [details] [diff] [review]
> d. Add and implement GetNativeStack method in ThreadStackHelper (v1)
On looking at this, and in at the implementation of class
ThreadStackHelper::CodeModulesProvider, I was unclear how it
guarantees not to fall off the upper end (base) of the stack and
segfault. Can you clarify?
Comment 20•10 years ago
|
||
Comment on attachment 8450301 [details] [diff] [review]
e. Implement platform-specific code for filling in context (v1)
Review of attachment 8450301 [details] [diff] [review]:
-----------------------------------------------------------------
Linux and MacOS bits are plausible. I can't say anything useful
for the Windows specific bits.
How well tested is it? In particular, have you tested
ThreadStackHelper::GetThreadStackBase to ensure it gives plausible
values on all three targets? Finding safe stack bounds reliably has
is a difficult problem IME.
::: xpcom/threads/ThreadStackHelper.cpp
@@ +641,5 @@
> + mContextToFill->mContext.rbp = context.uc_mcontext.gregs[REG_RBP];
> + mContextToFill->mContext.rsi = context.uc_mcontext.gregs[REG_RSI];
> + mContextToFill->mContext.rdi = context.uc_mcontext.gregs[REG_RDI];
> + memcpy(&mContextToFill->mContext.r8,
> + &context.uc_mcontext.gregs[REG_R8], 8 * sizeof(int64_t));
This seems a bit strange to me. For RAX .. RDI you copy each one
individually. No problem there. But for R8 to R15 you're assuming
that they are packed back-to-back, in order, in uc_mcontext.gregs[],
and that that layout is never going to change in future. It would be
safer and more consistent to assume no particular layout in the
mcontext, and copy all of them individually.
@@ +708,5 @@
> + mContextToFill->mContext.rbp = GET_REGISTER(state, rbp);
> + mContextToFill->mContext.rsi = GET_REGISTER(state, rsi);
> + mContextToFill->mContext.rdi = GET_REGISTER(state, rdi);
> + memcpy(&mContextToFill->mContext.r8,
> + &GET_REGISTER(state, r8), 8 * sizeof(int64_t));
Same comment as above.
Attachment #8450301 -
Flags: review?(jseward) → review+
Assignee | ||
Comment 21•10 years ago
|
||
(In reply to Julian Seward [:jseward] from comment #20)
> Comment on attachment 8450301 [details] [diff] [review]
> e. Implement platform-specific code for filling in context (v1)
>
> Review of attachment 8450301 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> How well tested is it? In particular, have you tested
> ThreadStackHelper::GetThreadStackBase to ensure it gives plausible
> values on all three targets? Finding safe stack bounds reliably has
> is a difficult problem IME.
I tested that we get reasonable stacks on Windows, OS X, Linux, and Android. I assume B2G is close enough to Android to also work. I didn't look at what GetThreadStackBase returned specifically, but I will take a look.
> ::: xpcom/threads/ThreadStackHelper.cpp
> @@ +641,5 @@
> > + mContextToFill->mContext.rbp = context.uc_mcontext.gregs[REG_RBP];
> > + mContextToFill->mContext.rsi = context.uc_mcontext.gregs[REG_RSI];
> > + mContextToFill->mContext.rdi = context.uc_mcontext.gregs[REG_RDI];
> > + memcpy(&mContextToFill->mContext.r8,
> > + &context.uc_mcontext.gregs[REG_R8], 8 * sizeof(int64_t));
>
> This seems a bit strange to me. For RAX .. RDI you copy each one
> individually. No problem there. But for R8 to R15 you're assuming
> that they are packed back-to-back, in order, in uc_mcontext.gregs[],
> and that that layout is never going to change in future. It would be
> safer and more consistent to assume no particular layout in the
> mcontext, and copy all of them individually.
Yeah I verified that these registers are consecutive in the structs, and I wanted to make the code less messy. I'm pretty confident that, for binary compatibility, these structures will never change for these architectures.
Assignee | ||
Comment 22•10 years ago
|
||
(In reply to Julian Seward [:jseward] from comment #19)
> (In reply to Jim Chen [:jchen :nchen] from comment #17)
> > Comment on attachment 8450300 [details] [diff] [review]
> > d. Add and implement GetNativeStack method in ThreadStackHelper (v1)
>
> On looking at this, and in at the implementation of class
> ThreadStackHelper::CodeModulesProvider, I was unclear how it
> guarantees not to fall off the upper end (base) of the stack and
> segfault. Can you clarify?
We keep the stack buffer in ThreadStackHelper::ThreadContext, and ThreadStackHelper::ThreadContext::GetMemoryAtAddressInternal() makes sure that we don't read from an address that's outside of the buffer's bounds. Is this what you meant?
Comment 23•10 years ago
|
||
Comment on attachment 8450300 [details] [diff] [review]
d. Add and implement GetNativeStack method in ThreadStackHelper (v1)
Review of attachment 8450300 [details] [diff] [review]:
-----------------------------------------------------------------
::: xpcom/threads/ThreadStackHelper.cpp
@@ +296,5 @@
> + size_t mStackSize;
> + // End of stack area
> + const void* mStackEnd;
> +
> + ThreadContext() : mValid(false), mStackSize(0), mStackEnd(nullptr) {}
This doesn't initialise mStackBase. Should it?
Attachment #8450300 -
Flags: review?(jseward) → review+
Assignee | ||
Comment 24•10 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/19fe3fd54e72
https://hg.mozilla.org/integration/mozilla-inbound/rev/4e4358c01816
https://hg.mozilla.org/integration/mozilla-inbound/rev/3e869dd7e82a
https://hg.mozilla.org/integration/mozilla-inbound/rev/8d539315541d
https://hg.mozilla.org/integration/mozilla-inbound/rev/e008d505335a
https://hg.mozilla.org/integration/mozilla-inbound/rev/4cd08369896b
https://hg.mozilla.org/integration/mozilla-inbound/rev/7cae21b839ad
Comment 25•10 years ago
|
||
sorry had to back this changes in https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=6ffcd2030ed8 since one of this changesets caused a nearly permanent error like https://tbpl.mozilla.org/php/getParsedLog.php?id=44598113&tree=Mozilla-Inbound
Assignee | ||
Comment 26•10 years ago
|
||
Trivial patch to fix wrong sysinfo usage.
Attachment #8463479 -
Flags: review+
Assignee | ||
Comment 27•10 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/bab4b9114778
https://hg.mozilla.org/integration/mozilla-inbound/rev/76bd2f5fa1f0
https://hg.mozilla.org/integration/mozilla-inbound/rev/8e34e09d49a1
https://hg.mozilla.org/integration/mozilla-inbound/rev/a47ff1dae6d2
https://hg.mozilla.org/integration/mozilla-inbound/rev/96de856b2c82
https://hg.mozilla.org/integration/mozilla-inbound/rev/4592de6b1b14
https://hg.mozilla.org/integration/mozilla-inbound/rev/60a248ab5036
https://hg.mozilla.org/integration/mozilla-inbound/rev/fa87806142d4
Comment 28•10 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/bab4b9114778
https://hg.mozilla.org/mozilla-central/rev/76bd2f5fa1f0
https://hg.mozilla.org/mozilla-central/rev/8e34e09d49a1
https://hg.mozilla.org/mozilla-central/rev/a47ff1dae6d2
https://hg.mozilla.org/mozilla-central/rev/96de856b2c82
https://hg.mozilla.org/mozilla-central/rev/4592de6b1b14
https://hg.mozilla.org/mozilla-central/rev/60a248ab5036
https://hg.mozilla.org/mozilla-central/rev/fa87806142d4
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla34
You need to log in
before you can comment on or make changes to this bug.
Description
•