Closed
Bug 1385971
Opened 7 years ago
Closed 7 years ago
stylo: Crash in core::result::unwrap_failed<T> | std::collections::hash::map::{{impl}}::new::KEYS::__init | style::properties::cascade
Categories
(Core :: CSS Parsing and Computation, defect, P2)
Tracking
()
RESOLVED
FIXED
mozilla58
Tracking | Status | |
---|---|---|
firefox-esr52 | --- | unaffected |
firefox54 | --- | unaffected |
firefox55 | --- | unaffected |
firefox56 | --- | wontfix |
firefox57 | --- | fixed |
firefox58 | --- | fixed |
People
(Reporter: calixte, Assigned: manishearth, NeedInfo)
References
Details
(Keywords: crash)
Crash Data
Attachments
(3 files)
(deleted),
text/x-review-board-request
|
bholley
:
review+
|
Details |
(deleted),
patch
|
ritu
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
ritu
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
This bug was filed from the Socorro interface and is
report bp-2a72f9aa-bb7c-42f5-ac45-0f7880170729.
=============================================================
There is 1 crash in nightly 56 with buildid 20170729100254.
The backtrace is showing that 'style::properties::cascade' is calling std::collections::hash::map.
By the way this crash signature exists for non-stylo crashes.
Updated•7 years ago
|
Priority: -- → P2
Updated•7 years ago
|
Priority: P2 → --
Updated•7 years ago
|
Priority: -- → P3
Comment 1•7 years ago
|
||
FWIW, bug 1385976 has a crash report with the same signature starting in mp4parse_new, where I don't see any way the HashMap init machinery would be called, so I fear this is unrelated corruption.
Comment 2•7 years ago
|
||
(In reply to Ralph Giles (:rillian) | needinfo me from comment #1)
> FWIW, bug 1385976 has a crash report with the same signature starting in
> mp4parse_new, where I don't see any way the HashMap init machinery would be
> called, so I fear this is unrelated corruption.
Ok.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WORKSFORME
Comment 3•7 years ago
|
||
Reopening this bug and marking this bug as Stylo P1 because this is top crash #10 in Beta 56. I see 88 instances of this crash signature in Beta 56.0b1, but none from Nightly 57.0a1.
https://crash-stats.mozilla.com/search/?signature=%3Dcore%3A%3Aresult%3A%3Aunwrap_failed%3CT%3E%20%7C%20std%3A%3Acollections%3A%3Ahash%3A%3Amap%3A%3A%7B%7Bimpl%7D%7D%3A%3Anew%3A%3AKEYS%3A%3A__init%20%7C%20style%3A%3Aproperties%3A%3Acascade&product=Firefox&date=%3E%3D2017-07-11T08%3A00%3A28.000Z&date=%3C2017-08-11T08%3A00%3A28.000Z&_sort=-date&_facets=signature&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-signature
Status: RESOLVED → REOPENED
status-firefox57:
--- → unaffected
Priority: P3 → P1
Resolution: WORKSFORME → ---
Summary: Stylo: Crash in core::result::unwrap_failed<T> | std::collections::hash::map::{{impl}}::new::KEYS::__init → stylo: Crash in core::result::unwrap_failed<T> | std::collections::hash::map::{{impl}}::new::KEYS::__init | style::properties::cascade
Comment 4•7 years ago
|
||
(In reply to Chris Peterson [:cpeterson] from comment #3)
> Reopening this bug and marking this bug as Stylo P1 because this is top
> crash #10 in Beta 56. I see 88 instances of this crash signature in Beta
> 56.0b1, but none from Nightly 57.0a1.
Is the stylo experiment running in beta 56?
Flags: needinfo?(cpeterson)
Comment 5•7 years ago
|
||
(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #4)
> Is the stylo experiment running in beta 56?
No. We haven't started the beta experiment yet. I see now that all 88 crash reports are from a single Windows 7 user. They must have manually set the layout.css.servo.enabled perf.
Many of their crash URLs look like JIRA bug dashboards. I'll ask around to see if anyone has access to a JIRA system.
Crash Signature: [@ core::result::unwrap_failed<T> | std::collections::hash::map::{{impl}}::new::KEYS::__init] → [@ core::result::unwrap_failed<T> | std::collections::hash::map::{{impl}}::new::KEYS::__init
[@ core::result::unwrap_failed<T> | std::collections::hash::map::{{impl}}::new::KEYS::__init | style::properties::cascade ]
Flags: needinfo?(cpeterson)
Priority: P1 → P3
Comment 6•7 years ago
|
||
Doesn't seem to happen after build 20170813100233. Closing as WFM.
Status: REOPENED → RESOLVED
Closed: 7 years ago → 7 years ago
Resolution: --- → WORKSFORME
Comment 7•7 years ago
|
||
I still see a couple crash reports from more recent Nightly 57 builds, such as bp-110ace36-65b4-469b-902b-bffe10170830 from build id 20170829100404. But not enough instances to warrant reopening this bug at this time.
Reporter | ||
Comment 8•7 years ago
|
||
For information, there are 102 crashes for 5 installations in 57.0b0 (dev edition).
Comment 9•7 years ago
|
||
The 57.0b crashes are from build ids 20170917031738 and 20170921191414 on Windows 7. There are no crashes reported in Nightly 58 yet.
Crash Signature: [@ core::result::unwrap_failed<T> | std::collections::hash::map::{{impl}}::new::KEYS::__init
[@ core::result::unwrap_failed<T> | std::collections::hash::map::{{impl}}::new::KEYS::__init | style::properties::cascade ] → [@ core::result::unwrap_failed<T> | std::collections::hash::map::{{impl}}::new::KEYS::__init]
[@ core::result::unwrap_failed<T> | std::collections::hash::map::{{impl}}::new::KEYS::__init | style::properties::cascade ]
status-firefox58:
--- → unaffected
Comment 10•7 years ago
|
||
These new reports all seem to have MOZ_CRASH_REASON: to create an OS RNG: Error { repr: Os { code: 127, message: "OS Error 127 (FormatMessageW() returned error 15100)" } }
Status: RESOLVED → REOPENED
Flags: needinfo?(manishearth)
Resolution: WORKSFORME → ---
Comment 11•7 years ago
|
||
If we're failing to open an RNG filehandle or something, we should consider failing gracefully somehow and just not reseeding rather than bringing down the whole process.
Assignee | ||
Comment 12•7 years ago
|
||
Ok, so what's happening here is that the OS RNG returned some error (we don't know what). This triggered a panic, and in attempting to get more error information from the OS we get ERROR_MUI_FILE_NOT_FOUND, which means that the language resources (l10n files likely) for the system language are somehow broken, which also triggered a panic. This is not a double-panic situation since the second panic is triggered before the first panic actually starts, in the Display implementation of the error (which is called in attempting to create a string to pass to the panic).
I don't know how to fix this, but we can fix up the Windows error handling stuff in Rust to display the original error code when this double-error situation happens. Sound good?
Assignee | ||
Comment 13•7 years ago
|
||
Yeah, we're failing to somehow init an OS RNG instance.
The stdlib doesn't expose an error state for this, and hashglobe doesn't actually fork RandomState (we reuse the stdlib one).
Flags: needinfo?(manishearth)
Assignee | ||
Comment 14•7 years ago
|
||
Hm, wait, error code 127 is ERROR_PROC_NOT_FOUND.
That's ... weird.
Comment 15•7 years ago
|
||
unique crash reasons from the reports:
failed to create an OS RNG: Error { repr: Os { code: 127, message: "The specified procedure could not be found." } }
failed to create an OS RNG: Error { repr: Os { code: 127, message: "OS Error 127 (FormatMessageW() returned error 15100)" } }
failed to create an OS RNG: Error { repr: Os { code: -2146893818, message: "OS Error -2146893818 (FormatMessageW() returned error 15100)" } }
failed to create an OS RNG: Error { repr: Os { code: -2146893818, message: "Firma no válida." } }
failed to create an OS RNG: Error { repr: Os { code: -2146893801, message: "Provider type not defined." } }
failed to create an OS RNG: Error { repr: Os { code: -2146893795, message: "Библиотека поставщика проинициализирована неправильно." } }
Comment 16•7 years ago
|
||
NI Manish to write up his analysis here per discussion in the meeting.
Flags: needinfo?(manishearth)
Assignee | ||
Comment 17•7 years ago
|
||
So my analysis of error 127 was basically "it shouldn't happen", that error occurs when you request a nonexistant API.
Similarly, the other error has to do with the provider not existing. However in OsRng we *always* pass down PROV_RSA_FULL and that's a preexisting provider (not user-defined) so it should always exist.
Basically this feels a lot like the DLL is corrupted somehow.
The Go folks had similar troubles and also chalked it up to DLL corruption on the target system (https://github.com/golang/go/issues/15655).
What's interesting is that Firefox seems to not have had such crashes before? (or maybe it has, I can't find anything on socorro) Perhaps we don't report the error as clearly (or we ignore it? looking into this)
Since it's very likely a corrupted DLL, I'd probably wontfix this unless it gets more frequent. The fix involves forking RandomState as well and making all of its APIs fallible too, which won't be nice.
Flags: needinfo?(manishearth)
Comment 18•7 years ago
|
||
We've had periodic problems with Firefox startup crashes in calls to the Windows API rand_s() because bad third-party software injected DLL hooks into advapi32.dll (e.g. bug 694344, bug 1167248, bug 1369361). OsRng calls CryptGenRandom() in advapi32.dll, so maybe we're hitting a similar problem here.
I looked for suspicious third-party DLLs in a few of these crash reports' module lists, but I didn't see any DLLs in common that stood out. If we do find a problem DLL, we can add it to our DLL blocklist.
status-firefox-esr52:
--- → unaffected
OS: Windows 10 → Windows
Comment 19•7 years ago
|
||
Now that Beta 57.0b3 is live, we're seeing more of these __init crashes. There are 477 crashes from 17 installations of 57.0b3.
I'm bumping this bug's priority from P3 to P2 because it is now Beta top crash #17.
Priority: P3 → P2
Comment 20•7 years ago
|
||
Could we add a separate codepath to fallibly try to create an OS RNG, and if that fails, set a flag on hashglobe to skip the random seed?
Flags: needinfo?(manishearth)
Comment 21•7 years ago
|
||
(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #20)
> Could we add a separate codepath to fallibly try to create an OS RNG, and if
> that fails, set a flag on hashglobe to skip the random seed?
(To be clear, I'm talking about doing this at startup)
Assignee | ||
Comment 22•7 years ago
|
||
We could, yes. A global atomic flag I presume?
I'm also going to file a Rust bug about this, this seems pretty bad.
Flags: needinfo?(manishearth)
Comment 23•7 years ago
|
||
(In reply to Manish Goregaokar [:manishearth] from comment #22)
> We could, yes. A global atomic flag I presume?
Yep. If we load/store it Relaxed it should be free.
> I'm also going to file a Rust bug about this, this seems pretty bad.
Yeah, definitely worth doing.
Assignee | ||
Comment 24•7 years ago
|
||
I have interviews tomorrow, but will try to get to this as soon as i can.
Flags: needinfo?(manishearth)
Updated•7 years ago
|
Assignee: nobody → manishearth
Assignee | ||
Comment 25•7 years ago
|
||
Assignee | ||
Comment 26•7 years ago
|
||
FWIW I tried the approach of forking RandomState first because if it's actually easy I'd prefer to do that over using global statics.
It's not easy.
The problem is that while it's easy to fork RandomState and have it use hardcoded "random"[1] values when OsRng fails, the BuildHasher implementation on RandomState can't be forked, because it needs the ability to initialize a SipHasher13 with its random state. SipHasher13 is also an unstable/private API. We could fork SipHasher13 and DefaultHasher as well, but that's a lot of forking. Alternatively we could fork DefaultHasher only and switch it over to using SipHasher24, which for some reason does have a stable API. I believe both hashing algorithms are fine for this purpose but one may be slower than the other; and in this case the penalty would be borne by all.
So I'll stick to using the OsRng.
[1]: https://xkcd.com/221/
Flags: needinfo?(manishearth)
Assignee | ||
Comment 27•7 years ago
|
||
> So I'll stick to using the OsRng.
er, stick to using the "trigger OsRng in Servo_Initialize() with a once-init static" solution
Comment hidden (mozreview-request) |
Assignee | ||
Comment 29•7 years ago
|
||
Really this patch should remove the new() functions entirely so when we unfork HashMap we'll still use FNV. I'll make this change after emilio's patch for fixing the property hashset lands.
Comment 30•7 years ago
|
||
mozreview-review |
Comment on attachment 8913848 [details]
Bug 1385971 - stylo: Use FnvHashMap everywhere, remove default HashMap construction methods;
https://reviewboard.mozilla.org/r/185240/#review190244
Per IRC discussion, we're just going to remove the default here.
Attachment #8913848 -
Flags: review?(bobbyholley) → review-
Updated•7 years ago
|
Comment 31•7 years ago
|
||
Note that even though this hasn't landed, https://github.com/servo/servo/pull/18679 should have removed the offending code.
Comment hidden (mozreview-request) |
Comment 33•7 years ago
|
||
mozreview-review |
Comment on attachment 8913848 [details]
Bug 1385971 - stylo: Use FnvHashMap everywhere, remove default HashMap construction methods;
https://reviewboard.mozilla.org/r/185240/#review190670
Attachment #8913848 -
Flags: review?(bobbyholley) → review+
Assignee | ||
Comment 34•7 years ago
|
||
https://github.com/servo/servo/pull/18712
We'll want to uplift both, all of these seem to be in paths that will eventually get called; the cascade() one just gets called first.
Assignee | ||
Comment 35•7 years ago
|
||
Assignee | ||
Comment 36•7 years ago
|
||
Approval Request Comment
[Feature/Bug causing the regression]: None
[User impact if declined]: May crash on windows sometimes
[Is this code covered by automated tests?]: No
[Has the fix been verified in Nightly?]: No (but the crashing code is gone)
[Needs manual test from QE? If yes, steps to reproduce]: No
[List of other uplifts needed for the feature/fix]: The other patch attached here
[Is the change risky?]: May cause minor changes in performance, probably positive
[Why is the change risky/not risky?]:
[String changes made/needed]: None
Attachment #8915011 -
Flags: approval-mozilla-beta?
Assignee | ||
Comment 37•7 years ago
|
||
Attachment #8915012 -
Flags: approval-mozilla-beta?
Comment 38•7 years ago
|
||
Can we mark this bug as fixed now given the comment 35?
Flags: needinfo?(manishearth)
Assignee | ||
Comment 39•7 years ago
|
||
Status: REOPENED → RESOLVED
Closed: 7 years ago → 7 years ago
Flags: needinfo?(manishearth)
Resolution: --- → FIXED
Updated•7 years ago
|
Target Milestone: --- → mozilla58
Comment 40•7 years ago
|
||
(In reply to Manish Goregaokar [:manishearth] from comment #35)
> Emilio's patch: https://hg.mozilla.org/mozilla-central/rev/7e790d5276a1
Note to relman - I really want the above in the next beta, since it may help us diagnose some other unrelated crashes.
Updated•7 years ago
|
Flags: needinfo?(rkothari)
Comment on attachment 8915011 [details] [diff] [review]
Emilio's patch (for uplift)
Crash fix in Stylo code, Beta57+
Flags: needinfo?(rkothari)
Attachment #8915011 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #8915012 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
Hi Stuart, Kevin, Joel (from the monitoring/perf team), there is a potential perf impact (mostly positive). Just FYI in case you notice something on the talos tests.
Flags: needinfo?(sphilp)
Flags: needinfo?(kbrosnan)
Flags: needinfo?(jmaher)
Comment 43•7 years ago
|
||
The uplift patches attached don't apply cleanly to Beta. Please rebase.
Flags: needinfo?(manishearth)
Comment 45•7 years ago
|
||
uplift |
Updated•7 years ago
|
Flags: needinfo?(kbrosnan)
You need to log in
before you can comment on or make changes to this bug.
Description
•