Closed Bug 1366217 Opened 7 years ago Closed 7 years ago

Crash in LongestPhaseSelfTime

Categories

(Core :: JavaScript: GC, defect)

55 Branch
Unspecified
Windows 10
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla57
Tracking Status
firefox-esr52 --- unaffected
firefox53 --- unaffected
firefox54 --- unaffected
firefox55 --- wontfix
firefox56 --- fixed
firefox57 --- fixed

People

(Reporter: calixte, Assigned: jonco)

References

(Blocks 1 open bug)

Details

(Keywords: crash, regression, Whiteboard: [clouseau][tbird crash])

Crash Data

Attachments

(4 files)

This bug was filed from the Socorro interface and is report bp-6ff83aa6-587f-4952-81c6-a67f80170519. ============================================================= There is 1 crash in nightly 55 with buildid 20170518030213. In analyzing the backtrace, the regression may have been introduced by patch [1] to fix bug 1361458. [1] https://hg.mozilla.org/mozilla-central/rev?node=b2eb05d5fad2fea928f47d3e6a329024c0aaf70e
Flags: needinfo?(jcoppeard)
This shouldn't be possible. I'm wondering if it's worth changing the way we store the times for each phase to only store self-time and not the sum of time under this phase. That would make this go away although it wouldn't fix whatever is causing this.
49 crashes across 53 installations in the last 7 days, on 55.
56 is affected as well, updating the status flag.
Let's check crashstats next week, once it has some more exposure on Beta.
Flags: needinfo?(miket)
Attached patch bug1366217-add-stats-assertions (deleted) — Splinter Review
Not a fix, but adds some more assertions that: 1) Time increases monotonically 2) The current phase is only changed from the main thread I made the former a release assert since the failure here is a release check with infrequent failures. The latter will fail in testing if this is in fact a problem.
Assignee: nobody → jcoppeard
Flags: needinfo?(jcoppeard)
Attachment #8879090 - Flags: review?(sphink)
Attachment #8879090 - Flags: review?(sphink) → review+
Pushed by jcoppeard@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/123b7b9082b2 Add assertions to GC stats code that time increases monotonically r=sfink
Flags: needinfo?(miket)
There is 1 crash ([1]) in nightly 56 with signature 'js::gcstats::Statistics::recordPhaseBegin'. The backtrace shows that the crash is due to a just added assertion. [1] https://crash-stats.mozilla.com/report/index/df348caa-2951-4c18-8e12-599f80170622
Crash Signature: [@ LongestPhaseSelfTime] → [@ LongestPhaseSelfTime] [@ js::gcstats::Statistics::recordPhaseBegin ]
[@ js::gcstats::Statistics::endPhase] shows up in crash-stats as another crash after the assertion was added.
Crash Signature: [@ LongestPhaseSelfTime] [@ js::gcstats::Statistics::recordPhaseBegin ] → [@ LongestPhaseSelfTime] [@ js::gcstats::Statistics::recordPhaseBegin ] [@ js::gcstats::Statistics::endPhase]
Crash Signature: [@ LongestPhaseSelfTime] [@ js::gcstats::Statistics::recordPhaseBegin ] [@ js::gcstats::Statistics::endPhase] → [@ LongestPhaseSelfTime] [@ js::gcstats::Statistics::recordPhaseBegin ] [@ js::gcstats::Statistics::endPhase] [@ js::gcstats::Statistics::recordPhaseEnd]
Attached patch bug1366217-stats-time (deleted) — Splinter Review
These asserts are failing rarely on Android. The patch bodges the now time on Android if it is before the previous time and sets the aborted flag. I also did some cleanup around the aborted flag, making it so we don't report any telemetry if this flag is set (rather than only some telemetry). We also checked this flag in endSlice() after it could have been cleared by endGC() so I moved the clearing to the end.
Attachment #8884006 - Flags: review?(sphink)
Comment on attachment 8884006 [details] [diff] [review] bug1366217-stats-time Review of attachment 8884006 [details] [diff] [review]: ----------------------------------------------------------------- I really ought to figure out a good way to test that aborted flag. Thanks.
Attachment #8884006 - Flags: review?(sphink) → review+
Pushed by jcoppeard@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/b07957a01b98 Disregard GC stats if we get an inconsistent timestamp on Android r=sfink
Jon, is there anything useful to glean from crashstats in the past week? The only crash with a comment says "Opening Google Docs": https://crash-stats.mozilla.com/report/index/34f7db77-2960-41ab-8105-885ae0170709
Flags: needinfo?(jcoppeard)
Attached patch bug1366217-add-more-assertions (deleted) — Splinter Review
I'm not hopeful, but I added a few more assertions to the stats code in the hope of tracking this down, removed some dead code and applied the Android time leniency change to recording parallal tasks.
Attachment #8887418 - Flags: review?(sphink)
Comment on attachment 8887418 [details] [diff] [review] bug1366217-add-more-assertions Review of attachment 8887418 [details] [diff] [review]: ----------------------------------------------------------------- Man, this thing is infuriating. And I thought one of the main points of TimeStamp was to have monotonic time. It's tempting to inject consistency checks into eg TimeStamp::Now() or something. (But it'd require a TLS read/write, so maybe not doable in a place where it would get wide enough testing.)
Attachment #8887418 - Flags: review?(sphink) → review+
Pushed by jcoppeard@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/9b93d7a06157 Add more assertions to help track down LongestPhaseSelfTime crashes r=sfink
There's been one new hit in bug 1367055 since the latest diagnostic patches landed. Not sure if it'll help, though, since it was on an opt run. https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-central&job_id=117054484
Crash Signature: [@ LongestPhaseSelfTime] [@ js::gcstats::Statistics::recordPhaseBegin ] [@ js::gcstats::Statistics::endPhase] [@ js::gcstats::Statistics::recordPhaseEnd] → [@ LongestPhaseSelfTime] [@ LongestPhaseSelfTimeInMajorGC ] [@ js::gcstats::Statistics::recordPhaseBegin ] [@ js::gcstats::Statistics::endPhase] [@ js::gcstats::Statistics::recordPhaseEnd]
Attached patch bug1366217-stats-assert (deleted) — Splinter Review
Crash stats shows that we're getting inconsistent timestamps on Windows too and that the checks in LongestPhaseSelfTime are still failing. Here's a patch to make the checks assert in debug mode only and to ignore the timestamps in release mode. It's better that we don't crash and just skip collecting some telemetry data.
Flags: needinfo?(jcoppeard)
Attachment #8893302 - Flags: review?(sphink)
Comment on attachment 8893302 [details] [diff] [review] bug1366217-stats-assert Review of attachment 8893302 [details] [diff] [review]: ----------------------------------------------------------------- This is definitely good. I suppose to make it complete, we ought to have telemetry for how often this happens. I don't think it's anywhere near the rate at which it could explain our 1.4% massively long GC_MAX_PAUSE_MS, but if we had the measurement we could rule it out. And it would be nice to see the OS breakdown. That could be another bug.
Attachment #8893302 - Flags: review?(sphink) → review+
Pushed by jcoppeard@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/95846f969c25 Discard GC stats if timing inconsistencies are found r=sfink
Whiteboard: [clouseau] → [clouseau][tbird crash]
Crash Signature: [@ LongestPhaseSelfTime] [@ LongestPhaseSelfTimeInMajorGC ] [@ js::gcstats::Statistics::recordPhaseBegin ] [@ js::gcstats::Statistics::endPhase] [@ js::gcstats::Statistics::recordPhaseEnd] → [@ LongestPhaseSelfTime] [@ LongestPhaseSelfTimeInMajorGC ] [@ js::gcstats::Statistics::recordPhaseBegin ] [@ js::gcstats::Statistics::endPhase] [@ js::gcstats::Statistics::recordPhaseEnd] [@ TimeSince ]
Comment on attachment 8893302 [details] [diff] [review] bug1366217-stats-assert Approval Request Comment [Feature/Bug causing the regression]: Previous release assertions added by the first patch in this bug are failing. [User impact if declined]: Possible crashes. [Is this code covered by automated tests?]: Yes. [Has the fix been verified in Nightly?]: Yes. [Needs manual test from QE? If yes, steps to reproduce]: No. [List of other uplifts needed for the feature/fix]: None. [Is the change risky?]: No. [Why is the change risky/not risky?]: This patch relaxes release mode assertions into debug mode ones and has been on m-c for 27 days. [String changes made/needed]: None.
Attachment #8893302 - Flags: approval-mozilla-beta?
Comment on attachment 8893302 [details] [diff] [review] bug1366217-stats-assert This should help with a top crash, and it's been on Nightly for a while. Let's try uplift for beta 10.
Attachment #8893302 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
This should be fixed since the final patch landed.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Keywords: leave-open
Target Milestone: --- → mozilla57
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: