Closed
Bug 1366217
Opened 7 years ago
Closed 7 years ago
Crash in LongestPhaseSelfTime
Categories
(Core :: JavaScript: GC, defect)
Tracking
()
RESOLVED
FIXED
mozilla57
Tracking | Status | |
---|---|---|
firefox-esr52 | --- | unaffected |
firefox53 | --- | unaffected |
firefox54 | --- | unaffected |
firefox55 | --- | wontfix |
firefox56 | --- | fixed |
firefox57 | --- | fixed |
People
(Reporter: calixte, Assigned: jonco)
References
(Blocks 1 open bug)
Details
(Keywords: crash, regression, Whiteboard: [clouseau][tbird crash])
Crash Data
Attachments
(4 files)
(deleted),
patch
|
sfink
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
sfink
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
sfink
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
sfink
:
review+
lizzard
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
This bug was filed from the Socorro interface and is
report bp-6ff83aa6-587f-4952-81c6-a67f80170519.
=============================================================
There is 1 crash in nightly 55 with buildid 20170518030213. In analyzing the backtrace, the regression may have been introduced by patch [1] to fix bug 1361458.
[1] https://hg.mozilla.org/mozilla-central/rev?node=b2eb05d5fad2fea928f47d3e6a329024c0aaf70e
Flags: needinfo?(jcoppeard)
Assignee | ||
Updated•7 years ago
|
Assignee | ||
Comment 1•7 years ago
|
||
This shouldn't be possible.
I'm wondering if it's worth changing the way we store the times for each phase to only store self-time and not the sum of time under this phase. That would make this go away although it wouldn't fix whatever is causing this.
Comment 2•7 years ago
|
||
49 crashes across 53 installations in the last 7 days, on 55.
Comment 3•7 years ago
|
||
56 is affected as well, updating the status flag.
status-firefox56:
--- → affected
Comment 4•7 years ago
|
||
Let's check crashstats next week, once it has some more exposure on Beta.
Flags: needinfo?(miket)
Assignee | ||
Comment 5•7 years ago
|
||
Not a fix, but adds some more assertions that:
1) Time increases monotonically
2) The current phase is only changed from the main thread
I made the former a release assert since the failure here is a release check with infrequent failures. The latter will fail in testing if this is in fact a problem.
Assignee: nobody → jcoppeard
Flags: needinfo?(jcoppeard)
Attachment #8879090 -
Flags: review?(sphink)
Assignee | ||
Updated•7 years ago
|
Keywords: leave-open
Updated•7 years ago
|
Attachment #8879090 -
Flags: review?(sphink) → review+
Pushed by jcoppeard@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/123b7b9082b2
Add assertions to GC stats code that time increases monotonically r=sfink
Updated•7 years ago
|
Flags: needinfo?(miket)
Comment 7•7 years ago
|
||
bugherder |
Reporter | ||
Comment 8•7 years ago
|
||
There is 1 crash ([1]) in nightly 56 with signature 'js::gcstats::Statistics::recordPhaseBegin'. The backtrace shows that the crash is due to a just added assertion.
[1] https://crash-stats.mozilla.com/report/index/df348caa-2951-4c18-8e12-599f80170622
Crash Signature: [@ LongestPhaseSelfTime] → [@ LongestPhaseSelfTime]
[@ js::gcstats::Statistics::recordPhaseBegin ]
Comment 9•7 years ago
|
||
[@ js::gcstats::Statistics::endPhase] shows up in crash-stats as another crash after the assertion was added.
Crash Signature: [@ LongestPhaseSelfTime]
[@ js::gcstats::Statistics::recordPhaseBegin ] → [@ LongestPhaseSelfTime]
[@ js::gcstats::Statistics::recordPhaseBegin ]
[@ js::gcstats::Statistics::endPhase]
Updated•7 years ago
|
Crash Signature: [@ LongestPhaseSelfTime]
[@ js::gcstats::Statistics::recordPhaseBegin ]
[@ js::gcstats::Statistics::endPhase] → [@ LongestPhaseSelfTime]
[@ js::gcstats::Statistics::recordPhaseBegin ]
[@ js::gcstats::Statistics::endPhase]
[@ js::gcstats::Statistics::recordPhaseEnd]
Assignee | ||
Comment 10•7 years ago
|
||
These asserts are failing rarely on Android.
The patch bodges the now time on Android if it is before the previous time and sets the aborted flag.
I also did some cleanup around the aborted flag, making it so we don't report any telemetry if this flag is set (rather than only some telemetry). We also checked this flag in endSlice() after it could have been cleared by endGC() so I moved the clearing to the end.
Attachment #8884006 -
Flags: review?(sphink)
Comment 11•7 years ago
|
||
Comment on attachment 8884006 [details] [diff] [review]
bug1366217-stats-time
Review of attachment 8884006 [details] [diff] [review]:
-----------------------------------------------------------------
I really ought to figure out a good way to test that aborted flag. Thanks.
Attachment #8884006 -
Flags: review?(sphink) → review+
Comment 12•7 years ago
|
||
Pushed by jcoppeard@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/b07957a01b98
Disregard GC stats if we get an inconsistent timestamp on Android r=sfink
Comment 13•7 years ago
|
||
bugherder |
Comment 14•7 years ago
|
||
Jon, is there anything useful to glean from crashstats in the past week?
The only crash with a comment says "Opening Google Docs": https://crash-stats.mozilla.com/report/index/34f7db77-2960-41ab-8105-885ae0170709
Flags: needinfo?(jcoppeard)
Assignee | ||
Comment 15•7 years ago
|
||
I'm not hopeful, but I added a few more assertions to the stats code in the hope of tracking this down, removed some dead code and applied the Android time leniency change to recording parallal tasks.
Attachment #8887418 -
Flags: review?(sphink)
Comment 16•7 years ago
|
||
Comment on attachment 8887418 [details] [diff] [review]
bug1366217-add-more-assertions
Review of attachment 8887418 [details] [diff] [review]:
-----------------------------------------------------------------
Man, this thing is infuriating. And I thought one of the main points of TimeStamp was to have monotonic time. It's tempting to inject consistency checks into eg TimeStamp::Now() or something. (But it'd require a TLS read/write, so maybe not doable in a place where it would get wide enough testing.)
Attachment #8887418 -
Flags: review?(sphink) → review+
Comment 17•7 years ago
|
||
Pushed by jcoppeard@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/9b93d7a06157
Add more assertions to help track down LongestPhaseSelfTime crashes r=sfink
Comment 18•7 years ago
|
||
bugherder |
Comment 19•7 years ago
|
||
There's been one new hit in bug 1367055 since the latest diagnostic patches landed. Not sure if it'll help, though, since it was on an opt run.
https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-central&job_id=117054484
Assignee | ||
Updated•7 years ago
|
Crash Signature: [@ LongestPhaseSelfTime]
[@ js::gcstats::Statistics::recordPhaseBegin ]
[@ js::gcstats::Statistics::endPhase]
[@ js::gcstats::Statistics::recordPhaseEnd] → [@ LongestPhaseSelfTime]
[@ LongestPhaseSelfTimeInMajorGC ]
[@ js::gcstats::Statistics::recordPhaseBegin ]
[@ js::gcstats::Statistics::endPhase]
[@ js::gcstats::Statistics::recordPhaseEnd]
Assignee | ||
Comment 21•7 years ago
|
||
Crash stats shows that we're getting inconsistent timestamps on Windows too and that the checks in LongestPhaseSelfTime are still failing.
Here's a patch to make the checks assert in debug mode only and to ignore the timestamps in release mode. It's better that we don't crash and just skip collecting some telemetry data.
Flags: needinfo?(jcoppeard)
Attachment #8893302 -
Flags: review?(sphink)
Comment 22•7 years ago
|
||
Comment on attachment 8893302 [details] [diff] [review]
bug1366217-stats-assert
Review of attachment 8893302 [details] [diff] [review]:
-----------------------------------------------------------------
This is definitely good. I suppose to make it complete, we ought to have telemetry for how often this happens. I don't think it's anywhere near the rate at which it could explain our 1.4% massively long GC_MAX_PAUSE_MS, but if we had the measurement we could rule it out. And it would be nice to see the OS breakdown. That could be another bug.
Attachment #8893302 -
Flags: review?(sphink) → review+
Assignee | ||
Updated•7 years ago
|
Comment 23•7 years ago
|
||
Pushed by jcoppeard@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/95846f969c25
Discard GC stats if timing inconsistencies are found r=sfink
Comment 24•7 years ago
|
||
bugherder |
Updated•7 years ago
|
status-firefox57:
--- → affected
Comment 25•7 years ago
|
||
#6 crash for Firefox 56.0b1. Needs uplift to beta
https://crash-stats.mozilla.com/topcrashers/?product=Firefox&version=56.0b1
Updated•7 years ago
|
Whiteboard: [clouseau] → [clouseau][tbird crash]
Comment 26•7 years ago
|
||
now in 56.0b the new asserts are getting hit a couple of times more often:
https://crash-stats.mozilla.com/signature/?signature=TimeSince&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=moz_crash_reason&_sort=-date&page=1#reports
Crash Signature: [@ LongestPhaseSelfTime]
[@ LongestPhaseSelfTimeInMajorGC ]
[@ js::gcstats::Statistics::recordPhaseBegin ]
[@ js::gcstats::Statistics::endPhase]
[@ js::gcstats::Statistics::recordPhaseEnd] → [@ LongestPhaseSelfTime]
[@ LongestPhaseSelfTimeInMajorGC ]
[@ js::gcstats::Statistics::recordPhaseBegin ]
[@ js::gcstats::Statistics::endPhase]
[@ js::gcstats::Statistics::recordPhaseEnd]
[@ TimeSince ]
Assignee | ||
Comment 27•7 years ago
|
||
Comment on attachment 8893302 [details] [diff] [review]
bug1366217-stats-assert
Approval Request Comment
[Feature/Bug causing the regression]: Previous release assertions added by the first patch in this bug are failing.
[User impact if declined]: Possible crashes.
[Is this code covered by automated tests?]: Yes.
[Has the fix been verified in Nightly?]: Yes.
[Needs manual test from QE? If yes, steps to reproduce]: No.
[List of other uplifts needed for the feature/fix]: None.
[Is the change risky?]: No.
[Why is the change risky/not risky?]: This patch relaxes release mode assertions into debug mode ones and has been on m-c for 27 days.
[String changes made/needed]: None.
Attachment #8893302 -
Flags: approval-mozilla-beta?
Comment on attachment 8893302 [details] [diff] [review]
bug1366217-stats-assert
This should help with a top crash, and it's been on Nightly for a while. Let's try uplift for beta 10.
Attachment #8893302 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
Comment 29•7 years ago
|
||
bugherder uplift |
Assignee | ||
Comment 30•7 years ago
|
||
This should be fixed since the final patch landed.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Updated•7 years ago
|
You need to log in
before you can comment on or make changes to this bug.
Description
•