Closed
Bug 1236108
Opened 9 years ago
Closed 9 years ago
Crash report memory annotations not implemented for child processes (content processes)
Categories
(Toolkit :: Crash Reporting, defect, P1)
Toolkit
Crash Reporting
Tracking
()
RESOLVED
FIXED
mozilla48
People
(Reporter: mccr8, Assigned: bugzilla)
References
(Blocks 1 open bug)
Details
(Whiteboard: [MemShrink:P1])
Attachments
(3 files, 1 obsolete file)
(deleted),
text/x-review-board-request
|
benjamin
:
review+
|
Details |
(deleted),
text/x-review-board-request
|
haik
:
review+
bobowen
:
review+
|
Details |
(deleted),
text/x-review-board-request
|
benjamin
:
review+
lizzard
:
approval-mozilla-aurora+
lizzard
:
approval-mozilla-beta+
|
Details |
While looking at the crash reports in bug 1235633, I noticed that crash signature is "OOM | unknown" rather than "OOM | large" or "OOM | small", which is odd, because I've seen a crash at the same allocation site before and it was annotated properly. Then I looked at the individual crash signatures and none of the ones I looked at have the nice memory information (Total Virtual Memory, Available Virtual Memory, System Memory Use Percentage, etc.) that we usually get in a crash report. I'm guessing the "unknown" results from the allocation size field being missing, too.
If you click on the "experiment" link in bug 1234647 comment 0, every OOM crash signature is "unknown", so it looks like this isn't working at all.
The odd thing is, if I look at the top 300 crashes from Nightly or Aurora, there are no "OOM | unknown", so I'm not sure if there's a problem specific to beta with these memory annotations or what.
These memory annotations are important to understanding our OOM crashes.
Reporter | ||
Comment 1•9 years ago
|
||
Benjamin, do you know who might be able to look into this now that dmajor has gone? Thanks.
Reporter | ||
Comment 2•9 years ago
|
||
It turns out the experiment link isn't aggregating crashes together, so it isn't an apples-to-apples comparison, but still, in the first three pages of crashes I found a number of "OOM | unknown" crashes, but no other "OOM" crashes.
Reporter | ||
Comment 3•9 years ago
|
||
The "unknown" ones seem to have a "Process Type" field "content" and the other ones don't, so maybe we aren't properly recording the memory information for content process OOM crashes? Depending on how exactly that information is being generated and recorded, I could imagine there being some kind of sandboxing issues.
Comment 4•9 years ago
|
||
This is about content processes. We record the memory information here for main-process crashes: http://hg.mozilla.org/mozilla-central/annotate/0771c5eab32f/toolkit/crashreporter/nsExceptionHandler.cpp#l833
But since content process crashes don't go through this code path (the parent writes the annotations on behalf of the child process, rather than the child writing them directly), these aren't recorded.
Flags: needinfo?(benjamin)
Summary: Crash report memory annotations not working in e10s beta experiment → Crash report memory annotations not implemented for child processes (content processes)
Comment 5•9 years ago
|
||
Georg, is this something you can take care of?
Assignee | ||
Comment 6•9 years ago
|
||
I'll take it.
Assignee: nobody → aklotz
Status: NEW → ASSIGNED
Flags: needinfo?(gfritzsche)
Comment 7•9 years ago
|
||
(In reply to Aaron Klotz [:aklotz] (please use needinfo) from comment #6)
> I'll take it.
Thanks
Updated•9 years ago
|
Whiteboard: [MemShrink] → [MemShrink:P1]
Comment 8•9 years ago
|
||
This is substantial to be able to do useful crash comparisons and to know which OOM crashes are worth looking at, as "small" (<256k) OOMs are pretty much unactionable (just a sign that we generally are running out of memory), while large ones probably need to be worked on.
Comment 9•9 years ago
|
||
Sounds like we need to revisit the P4 classification here.
Flags: needinfo?(jgriffiths)
Comment 10•9 years ago
|
||
Based on discussion today, I think this is a P1. It's already assigned and when landed should get uplifted to 46 to catch our Beta cycle there.
Flags: needinfo?(jgriffiths)
Priority: P4 → P1
Updated•9 years ago
|
Updated•9 years ago
|
Assignee | ||
Comment 11•9 years ago
|
||
Assignee | ||
Comment 12•9 years ago
|
||
Assignee | ||
Comment 13•9 years ago
|
||
Assignee | ||
Comment 14•9 years ago
|
||
This code is super-close to being reviewable, but I need to sort out problems with the regression test that I wrote for it. Stand by...
Assignee | ||
Comment 15•9 years ago
|
||
Assignee | ||
Comment 16•9 years ago
|
||
Assignee | ||
Comment 17•9 years ago
|
||
Assignee | ||
Comment 18•9 years ago
|
||
Review commit: https://reviewboard.mozilla.org/r/37623/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/37623/
Attachment #8725777 -
Flags: review?(benjamin)
Assignee | ||
Comment 19•9 years ago
|
||
The previous patch in this series creates a new directory service entry
specifically for obtaining the content process temp directory.
This patch converts everything else to reference that entry. It also sets
appropriate environment variables in the content processes so that system
APIs automatically pick up the directory. This is necessary for the crash
reporter to be able to call those APIs in exception handling contexts.
Review commit: https://reviewboard.mozilla.org/r/37625/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/37625/
Attachment #8725778 -
Flags: review?(haftandilian)
Attachment #8725778 -
Flags: review?(bobowen.code)
Assignee | ||
Comment 20•9 years ago
|
||
Review commit: https://reviewboard.mozilla.org/r/37627/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/37627/
Attachment #8725779 -
Flags: review?(bobowen.code)
Assignee | ||
Comment 21•9 years ago
|
||
This patch redefines XP_PATH_MAX on Windows to be MAX_PATH + 1. I did this
because the longer definition would actually not work with most Windows APIs.
Some APIs can work with longer lengths if the path is prefixed with "\\?\", but
that is not guaranteed in general.
See https://msdn.microsoft.com/en-us/library/windows/desktop/aa365247(v=vs.85).aspx#maxpath
Review commit: https://reviewboard.mozilla.org/r/37629/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/37629/
Attachment #8725780 -
Flags: review?(benjamin)
Assignee | ||
Comment 22•9 years ago
|
||
Comment on attachment 8725777 [details]
MozReview Request: Bug 1236108: Add temp directory for sandboxed content processes to directory
Review request updated; see interdiff: https://reviewboard.mozilla.org/r/37623/diff/1-2/
Attachment #8725777 -
Attachment description: MozReview Request: Bug 1236108: Add temp directory for low-integrity content processes to directory service; r?bsmedberg → MozReview Request: Bug 1236108: Add temp directory for sandboxed content processes to directory service; r?bsmedberg
Assignee | ||
Comment 23•9 years ago
|
||
Comment on attachment 8725778 [details]
MozReview Request: Bug 1236108: Modify sandbox initialization code to use directory service to obtain content process temp directory; r?bobowen, haik
Review request updated; see interdiff: https://reviewboard.mozilla.org/r/37625/diff/1-2/
Assignee | ||
Comment 24•9 years ago
|
||
Comment on attachment 8725779 [details]
MozReview Request: Bug 1236108: Add Windows sandbox policy for exception-context crash report annotations; r?bobowen
Review request updated; see interdiff: https://reviewboard.mozilla.org/r/37627/diff/1-2/
Assignee | ||
Comment 25•9 years ago
|
||
Comment on attachment 8725780 [details]
MozReview Request: Bug 1236108: Add support for exception-context annotations for content processes to the crash reporter; r?bsmedberg
Review request updated; see interdiff: https://reviewboard.mozilla.org/r/37629/diff/1-2/
Assignee | ||
Comment 26•9 years ago
|
||
Comment 27•9 years ago
|
||
Comment on attachment 8725778 [details]
MozReview Request: Bug 1236108: Modify sandbox initialization code to use directory service to obtain content process temp directory; r?bobowen, haik
https://reviewboard.mozilla.org/r/37625/#review34377
When I originally created the low integrity temp, I was setting TEMP and TMP.
I had to get rid of this as it was causing locking problems when I was trying to delete the temp directory in the content process.
At that point it was creating a unique per process dir, so this meant they got left lying around (I added temporary code to clean these up).
This shouldn't be an issue now that the deletion happens in the parent process as any locks should have gone.
However, I still decided not to add it back in straight away because I was unsure whether it was the best thing to do.
If we override these then other things pick it up, they might not be expecting the directory to get deleted all the time.
See bug 1166637 for example.
I suggested using sandboxing rules for this in the description, but I'm less sure that's a good idea.
Maybe we could create a separate AppData\LocalLow\Mozilla\Temp and override TEMP and TMP to that.
Of course that wouldn't get cleared and people would not know that it was something they might clear manually.
So, perhaps the always deleted solution is best, even if it would only give benefits like caching for that process.
::: dom/ipc/ContentProcess.cpp:52
(Diff revision 2)
> + NS_WARN_IF(!SetEnvironmentVariableW(L"TMP", fullTmpPath.get()));
Should we be setting TEMP here as well?
Attachment #8725778 -
Flags: review?(bobowen.code)
Comment 28•9 years ago
|
||
Comment on attachment 8725779 [details]
MozReview Request: Bug 1236108: Add Windows sandbox policy for exception-context crash report annotations; r?bobowen
https://reviewboard.mozilla.org/r/37627/#review34401
::: security/sandbox/win/src/sandboxbroker/sandboxBroker.cpp:208
(Diff revision 2)
> + mPolicy->AddRule(sandbox::TargetPolicy::SUBSYS_FILES,
I don't understand why we would need this if we are using a low integrity directory.
Either way, GetTempPath will be different and this would be the pid of the parent not child.
Attachment #8725779 -
Flags: review?(bobowen.code)
Comment 29•9 years ago
|
||
I meant to say thanks for cleaning up the getting of the directory in the first place.
I originally hadn't put things into the directory service, because I didn't think I could use prefs there, but I hadn't thought of using the app one.
Updated•9 years ago
|
Attachment #8725778 -
Flags: review?(haftandilian) → review+
Comment 30•9 years ago
|
||
Comment on attachment 8725778 [details]
MozReview Request: Bug 1236108: Modify sandbox initialization code to use directory service to obtain content process temp directory; r?bobowen, haik
https://reviewboard.mozilla.org/r/37625/#review34421
On the Mac side, looks good to me, just as long as it won't be a problem that the temp directory is removed/recreated during startup. i.e., assuming that wouldn't result in lost crash data.
I referenced another temp dir location for OS X on comment 23 of 1237847 in case that's useful: https://bugzilla.mozilla.org/show_bug.cgi?id=1237847#c23
Assignee | ||
Comment 31•9 years ago
|
||
Comment on attachment 8725780 [details]
MozReview Request: Bug 1236108: Add support for exception-context annotations for content processes to the crash reporter; r?bsmedberg
Review request updated; see interdiff: https://reviewboard.mozilla.org/r/37629/diff/2-3/
Assignee | ||
Updated•9 years ago
|
Attachment #8725779 -
Attachment is obsolete: true
Assignee | ||
Comment 32•9 years ago
|
||
https://reviewboard.mozilla.org/r/37625/#review34421
Yeah, this is only used during a very short window of time between the exception being caught in content and the crash being picked up in chrome. It doesn't need to persist any longer than that.
Assignee | ||
Comment 33•9 years ago
|
||
https://reviewboard.mozilla.org/r/37625/#review34377
It sounds like we should QA this at least?
> Should we be setting TEMP here as well?
That shouldn't be necessary: GetTempPath searches the environment variables in a specific order, and uses the first one that it finds. TMP is the first one that it checks.
Assignee | ||
Comment 34•9 years ago
|
||
https://reviewboard.mozilla.org/r/37625/#review34377
> That shouldn't be necessary: GetTempPath searches the environment variables in a specific order, and uses the first one that it finds. TMP is the first one that it checks.
Unless we're worrying about third-party garbage?
Comment 35•9 years ago
|
||
https://reviewboard.mozilla.org/r/37625/#review34421
Thanks for explaining. Should we be concerned about a crash in content that somehow results in a crash in the parent before it processes the content crash? If we had to pick one, I suppose the parent crash would be more important to resolve first.
Comment 36•9 years ago
|
||
https://reviewboard.mozilla.org/r/37625/#review34377
> Unless we're worrying about third-party garbage?
Yes, I was thinking of other things using the TMP and TEMP variable and not GetTempPath.
Hopefully they wouldn't do that, but having TMP and TEMP different might cause issues if that other code is internally inconsistent, so why not set both.
Comment 37•9 years ago
|
||
Comment on attachment 8725777 [details]
MozReview Request: Bug 1236108: Add temp directory for sandboxed content processes to directory
https://reviewboard.mozilla.org/r/37623/#review34525
Please add a more descriptive commit message: explain what code you're moving around and what this dir is "for".
::: toolkit/xre/nsXREDirProvider.cpp:634
(Diff revision 2)
> +GetContentProcessTempBaseDir()
This should be Get...DirKey
Attachment #8725777 -
Flags: review?(benjamin) → review+
Updated•9 years ago
|
Attachment #8725780 -
Flags: review?(benjamin) → review+
Comment 38•9 years ago
|
||
Comment on attachment 8725780 [details]
MozReview Request: Bug 1236108: Add support for exception-context annotations for content processes to the crash reporter; r?bsmedberg
https://reviewboard.mozilla.org/r/37629/#review34527
r+ with corrections as posted
::: toolkit/crashreporter/nsExceptionHandler.cpp:166
(Diff revision 3)
> +static const XP_CHAR childCrashAnnotationFileName[] = XP_TEXT("GeckoChildCrash");
s/FileName/BaseName/
::: toolkit/crashreporter/nsExceptionHandler.cpp:262
(Diff revision 3)
> +static nsIFile* contentProcessTmpDir = nullptr;
1. This should be documented as being valid only in chrome process (not content)
2. This shouldn't be a nsIFile* because of shutdown issues and leak checking. xpstring (or maybe XP_CHAR) is the better choice.
::: toolkit/crashreporter/nsExceptionHandler.cpp:1186
(Diff revision 3)
> + static XP_CHAR tempPath[XP_PATH_MAX] = {0};
Can you assert that this is a content-type process here? It helps with readability.
::: toolkit/crashreporter/nsExceptionHandler.cpp:2848
(Diff revision 3)
> +{
Assert: this runs in chrome process.
::: toolkit/crashreporter/nsExceptionHandler.cpp:2952
(Diff revision 3)
> + AnnotationTable exceptionTimeAnnotations;
Make this a separate helper function.
::: toolkit/crashreporter/nsExceptionHandler.cpp:2986
(Diff revision 3)
> + if (exceptionTimeExtra) {
This is a side effect which is important to document in nsExceptionHandler.h
Assignee | ||
Comment 39•9 years ago
|
||
Comment on attachment 8725777 [details]
MozReview Request: Bug 1236108: Add temp directory for sandboxed content processes to directory
Review request updated; see interdiff: https://reviewboard.mozilla.org/r/37623/diff/2-3/
Attachment #8725777 -
Attachment description: MozReview Request: Bug 1236108: Add temp directory for sandboxed content processes to directory service; r?bsmedberg → MozReview Request: Bug 1236108: Add temp directory for sandboxed content processes to directory
Assignee | ||
Comment 40•9 years ago
|
||
Comment on attachment 8725778 [details]
MozReview Request: Bug 1236108: Modify sandbox initialization code to use directory service to obtain content process temp directory; r?bobowen, haik
Review request updated; see interdiff: https://reviewboard.mozilla.org/r/37625/diff/2-3/
Attachment #8725778 -
Flags: review?(bobowen.code)
Assignee | ||
Comment 41•9 years ago
|
||
Comment on attachment 8725780 [details]
MozReview Request: Bug 1236108: Add support for exception-context annotations for content processes to the crash reporter; r?bsmedberg
Review request updated; see interdiff: https://reviewboard.mozilla.org/r/37629/diff/3-4/
Assignee | ||
Comment 42•9 years ago
|
||
Assignee | ||
Comment 43•9 years ago
|
||
Comment 44•9 years ago
|
||
Comment on attachment 8725778 [details]
MozReview Request: Bug 1236108: Modify sandbox initialization code to use directory service to obtain content process temp directory; r?bobowen, haik
https://reviewboard.mozilla.org/r/37625/#review35173
::: dom/ipc/ContentProcess.cpp:44
(Diff revision 3)
> - // On Windows, the sandbox-writable temp directory resides in the
> - // low integrity sandbox base directory.
> - return NS_WIN_LOW_INTEGRITY_TEMP_BASE;
> + // Save the TMP environment variable so that is is picked up by GetTempPath().
> + // Note that we specifically write to the TMP variable, as that is the first
> + // variable that is checked by GetTempPath() to determine its output.
nit: comment needs updating slightly maybe.
Attachment #8725778 -
Flags: review?(bobowen.code) → review+
Assignee | ||
Comment 45•9 years ago
|
||
Comment on attachment 8725778 [details]
MozReview Request: Bug 1236108: Modify sandbox initialization code to use directory service to obtain content process temp directory; r?bobowen, haik
Review request updated; see interdiff: https://reviewboard.mozilla.org/r/37625/diff/3-4/
Assignee | ||
Comment 46•9 years ago
|
||
Comment on attachment 8725780 [details]
MozReview Request: Bug 1236108: Add support for exception-context annotations for content processes to the crash reporter; r?bsmedberg
Review request updated; see interdiff: https://reviewboard.mozilla.org/r/37629/diff/4-5/
Comment 47•9 years ago
|
||
Comment 48•9 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/a6ad577af025
https://hg.mozilla.org/mozilla-central/rev/19074ba43502
https://hg.mozilla.org/mozilla-central/rev/4b734a8db65c
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
status-firefox48:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla48
Comment 49•9 years ago
|
||
Only appears to partially have worked.
OOM Allocation Size in a couple but no other memory information.
bp-ca43d6b2-f28c-4f14-a477-ee3f32160310
bp-aa75ad1b-8595-48ad-9d81-0f0c82160310 (Thread 1)
Not even OOM Allocation Size (possibly large)
bp-db979201-a2a7-4f81-8997-434532160310 (Thread 1)
bp-00b8d782-b989-40f6-bab7-f72da2160310
Don't think this has allocation size but should have other memory information.
bp-f3dc2cb6-68d0-4916-8126-21de72160310
Comment 50•9 years ago
|
||
Just spotted after posting first two are correct, both linux; does not list memory information.
Assignee | ||
Comment 51•9 years ago
|
||
(In reply to Jonathan Howard from comment #50)
> Just spotted after posting first two are correct, both linux; does not list
> memory information.
FWIW, for the purposes of this bug, the OOMAllocationSize annotation is the only additional information that should be expected.
I'm not sure what is going on with those Windows builds, but it depends on a few factors. If we're not calculating the OOM allocation size correctly and it is being set to 0, there won't be an annotation. OTOH there could be some other kind of problem.
I'm going to wait another day and see what else comes up. If this is a persistent issue with the next day's crash reports then I'll try to investigate further.
Comment 52•9 years ago
|
||
Aaron, should we uplift this crash reporter fix to Beta 46? We're running an e10s experiment on Beta 46, so we want to be testing with all the e10s fixes for known issues.
Flags: needinfo?(aklotz)
Comment 53•9 years ago
|
||
From my side, I'm already happy if we end up with signatures that are either "OOM | ..." or "js::AutoEnterOOMUnsafeRegion::crash | ..." (though I personally would like the latter to also start with "OOM |" but that's out of scope for this bug and not e10s-dependent) - those kinds of signatures we can detect as being OOM and therefore we can start usefully comparing e10s with non-e10s crashes.
Comment 54•9 years ago
|
||
Oh, and yes, I'd love to see an uplift to 46.
Assignee | ||
Comment 55•9 years ago
|
||
(In reply to Chris Peterson [:cpeterson] from comment #52)
> Aaron, should we uplift this crash reporter fix to Beta 46? We're running an
> e10s experiment on Beta 46, so we want to be testing with all the e10s fixes
> for known issues.
I plan to uplift, yes. I'd like this to bake for one more day so that I can look at some more incoming crash reports and ensure that everything looks okay.
Flags: needinfo?(aklotz)
Comment 56•9 years ago
|
||
Sounds good.
[Tracking Requested - why for this release]:
We'd like to track this crash reporter fix for 46 because we need it to accurately monitor stability of our e10s beta experiment.
status-firefox45:
--- → wontfix
status-firefox46:
--- → affected
status-firefox47:
--- → affected
tracking-firefox46:
--- → ?
Assignee | ||
Comment 57•9 years ago
|
||
Comment on attachment 8725780 [details]
MozReview Request: Bug 1236108: Add support for exception-context annotations for content processes to the crash reporter; r?bsmedberg
(This request is for all three patches in this bug)
Approval Request Comment
[Feature/regressing bug #]: OOM crash report annotations for content processes
[User impact if declined]: More difficult for us to determine OOM crash rates with e10s
[Describe test coverage new/current, TreeHerder]: These patches add a new test which has landed and is passing on Nightly. The expected OOM annotations are now available on crash-stats for Nightly builds.
[Risks and why]: These are not trivial patches, but they are not super complex either. I'd say that the risk is fairly low, and the benefits from being able to properly classify content OOMs is worth it.
[String/UUID change made/needed]: None
Attachment #8725780 -
Flags: approval-mozilla-beta?
Attachment #8725780 -
Flags: approval-mozilla-aurora?
Assignee | ||
Comment 58•9 years ago
|
||
Note to Sheriffs: These patches don't push cleanly on beta. I have a rebase that I can push myself once approved.
Marking affected/tracking for 46 since we are running a beta 46 e10s experiment.
Comment on attachment 8725780 [details]
MozReview Request: Bug 1236108: Add support for exception-context annotations for content processes to the crash reporter; r?bsmedberg
Better OOM info in crash-stats sounds great to me.
Aaron, please go ahead and push your beta fix, I would love this in beta 2 which goes to build on Monday.
Attachment #8725780 -
Flags: approval-mozilla-beta?
Attachment #8725780 -
Flags: approval-mozilla-beta+
Attachment #8725780 -
Flags: approval-mozilla-aurora?
Attachment #8725780 -
Flags: approval-mozilla-aurora+
Comment 61•9 years ago
|
||
bugherder uplift |
Comment 62•9 years ago
|
||
Backed out from Aurora for test_content_exception_time_annotation.js failures. Looks like you'll need to handle all the uplifts, Aaron.
https://hg.mozilla.org/releases/mozilla-aurora/rev/34a219cbd67d
https://treeherder.mozilla.org/logviewer.html#?job_id=2125699&repo=mozilla-aurora
test_content_exception_time_annotation.js | run_test/< - [run_test/< : 15] false == true
Flags: needinfo?(aklotz)
Assignee | ||
Comment 63•9 years ago
|
||
Assignee | ||
Comment 64•9 years ago
|
||
Bug 1256541 is blocking uplift. Leaving myself on ni? To land this once that has been taken care of.
Bug 1256541 landed in m-c, so we should be able to uplift both of these soon (for beta 6 or 7)
Assignee | ||
Updated•9 years ago
|
Assignee | ||
Comment 67•9 years ago
|
||
Comment 68•9 years ago
|
||
hey Liz, do you know which beta build this made it into?
Flags: needinfo?(lhenry)
Beta 7, released April 4th. We released Beta 8 on April 5th so there may not be very much data for beta 7.
Flags: needinfo?(lhenry)
Updated•8 years ago
|
You need to log in
before you can comment on or make changes to this bug.
Description
•