Closed
Bug 1492241
Opened 6 years ago
Closed 6 years ago
Crash in core::option::expect_failed | core::iter::{{impl}}::next<T>
Categories
(Core :: Graphics: WebRender, defect, P2)
Tracking
()
RESOLVED
FIXED
mozilla66
Tracking | Status | |
---|---|---|
firefox-esr60 | --- | unaffected |
firefox63 | --- | unaffected |
firefox64 | --- | disabled |
firefox65 | --- | fixed |
firefox66 | --- | fixed |
People
(Reporter: marcia, Assigned: nical)
References
(Blocks 1 open bug)
Details
(Keywords: crash, regression, topcrash)
Crash Data
This bug was filed from the Socorro interface and is
report bp-3b62a536-27d4-4f60-a236-ccf0e0180915.
=============================================================
Small volume Win 10 crash spotted during nightly crash triage: https://bit.ly/2NmprgM. Crashes started using 20180914220208.
Possible regression range based on Build ID: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=efccb758c78cc4c42170c886294bf335d90e4383&tochange=750e71a8f79b3b745a6653e555e60acb122f1241
There was a webrender update in that changeset.
Top 10 frames of crashing thread:
0 xul.dll static void std::panicking::rust_panic_with_hook src/libstd/panicking.rs:521
1 xul.dll static void std::panicking::continue_panic_fmt src/libstd/panicking.rs:426
2 xul.dll static void std::panicking::rust_begin_panic src/libstd/panicking.rs:337
3 xul.dll static void core::panicking::panic_fmt src/libcore/panicking.rs:92
4 xul.dll static void core::option::expect_failed src/libcore/option.rs:960
5 xul.dll static union core::option::Option<webrender_bindings::moz2d_renderer::{{impl}}::rasterize::Job> core::iter::{{impl}}::next<webrender_bindings::moz2d_renderer::{{impl}}::rasterize::Job, core::slice::Iter<webrender_api::image::BlobImageParams>, closure> src/libcore/iter/mod.rs:1326
6 xul.dll static struct alloc::vec::Vec< gfx/webrender_bindings/src/moz2d_renderer.rs:358
7 xul.dll static void webrender::resource_cache::ResourceCache::block_until_all_resources_added gfx/webrender/src/resource_cache.rs:1365
8 xul.dll static struct webrender::tiling::Frame webrender::frame_builder::FrameBuilder::build gfx/webrender/src/frame_builder.rs:368
9 @0x20f3fa83b87
=============================================================
Comment 1•6 years ago
|
||
> no entry found for key
Blocks: wr-stability, stage-wr-trains
Priority: -- → P3
Updated•6 years ago
|
Priority: P3 → P4
Reporter | ||
Comment 2•6 years ago
|
||
Last crashes were in the 10-10 build. No crashes seen since then.
Updated•6 years ago
|
status-firefox65:
--- → affected
status-firefox-esr60:
--- → unaffected
Updated•6 years ago
|
Updated•6 years ago
|
Crash Signature: [@ core::option::expect_failed | core::iter::{{impl}}::next<T>] → [@ core::option::expect_failed | core::iter::{{impl}}::next<T>]
[@ core::option::expect_failed | <core::iter::Map<I, F> as core::iter::iterator::Iterator>::next]
Priority: P4 → P2
Comment 4•6 years ago
|
||
From bug 1508714:
(In reply to Jan Andre Ikenmeyer [:darkspirit] from comment #1)
> According to Socorro (Bugzilla Crash Stop) and Mozregression:
> last build without this crash signature: 20181119100448
> first build with this crash signature: 20181119220031
> https://hg.mozilla.org/mozilla-central/
> pushloghtml?fromchange=e44bb5b4bc79be613d29b3f95d7b508e68e3d128&tochange=bd4c
> ebdbed4bcd34d449f67d51f139f9bdf75edc
Comment 5•6 years ago
|
||
Increased the priority here since it looks like this has spiked to the #1 GPU process crasher now.
Comment 6•6 years ago
|
||
There are only 2 WR changes in that regression range, and one is just a version bump for a dependency (euclid).
Seems like that this is due to https://github.com/servo/webrender/pull/3323
Comment 7•6 years ago
|
||
Tried to untangle this, but the logic is pretty complicated. What I think is happening is:
We're crashing trying to access Moz2DBlobRasterizer::blob_commands[key] [1], because there's a key in the requests parameter that we don't have in our hashmap.
The callstack is coming from rasterize_missing_blob_images (inlined into block_until_all_resources_added), and that calls self.blob_image_handler.prepare_resources. prepare_resources checks the blob_commands HashMap on BlobImageHandler, so the relevant key must be in that one.
Moz2DBlobRasterizer is created with a clone of the blob_commands from BlobImageHandler, so it seems like we must have added a new blob since we created the rasterizer, or the objects are actually unrelated.
It appears that the normal process is that we get ApiMsg::UpdateDocument on the render_backend, that updates the BlobImageHandler with new blobs, creates a Moz2DBlobRasterizer, and then sends it all to the scene builder. The scene builder rasterizes blobs, and then returns the Moz2DBlobRasterizer [2] to the render_backend's resource cache.
It also seems like ApiMsg::UpdateResources also adds new blobs to the BlobImageHandler, but doesn't do scene building, so it I think the Moz2DBlobRasterizer's internal map of blobs will now be out of date.
Should we be recreating the blob rasterizing with a new clone of the blob_commands map? What if there's an existing scene build in-progress, how do we handle ordering of those and making sure we don't clobber our new rasterizer with an older one?
It's not super obvious why Bobby's patch regressed this, though I guess just eviction timing changes resulting in us needing to rasterize missing blobs in different situations.
[1] https://searchfox.org/mozilla-central/rev/f997bd6bbfc4773e774fdb6cd010142370d186f9/gfx/webrender_bindings/src/moz2d_renderer.rs#475
[2] https://searchfox.org/mozilla-central/rev/f997bd6bbfc4773e774fdb6cd010142370d186f9/gfx/wr/webrender/src/render_backend.rs#770
Flags: needinfo?(nical.bugzilla)
Comment 8•6 years ago
|
||
It might be simpler to just persist a blob rasterizer on each of the render_backend and scene_builders, instead of trying to pass them around.
Then each scene builder request can pass a clone of the current blob commands to be integrated into the existing rasterizer, and when done we can just discard the commands instead of trying to return them (since the render_backend already has them, and they're guaranteed to be the same, or newer).
Updated•6 years ago
|
See Also: → https://github.com/servo/webrender/pull/3351
Comment 9•6 years ago
|
||
Hi Nical -- This is an important one. Can you take this?
Assignee: nobody → nical.bugzilla
Assignee | ||
Comment 10•6 years ago
|
||
> Hi Nical -- This is an important one. Can you take this?
Yep
> It's not super obvious why Bobby's patch regressed this, though I guess just eviction timing changes resulting in us needing to rasterize missing blobs in different situations.
Blobs should not be affected by the eviction timings (they are always manually evicted from the cache), or if they are it's another bug to look at.
Flags: needinfo?(nical.bugzilla)
Comment 11•6 years ago
|
||
(In reply to Matt Woodrow (:mattwoodrow) from comment #6)
> There are only 2 WR changes in that regression range, and one is just a
> version bump for a dependency (euclid).
>
> Seems like that this is due to https://github.com/servo/webrender/pull/3323
Those are the only two changes to upstream WebRender, and as has been noted, neither is a particularly likely candidate.
However, it's quite a big range, and there's certainly some other interesting stuff in there that might have triggered this. Bug 1446309, bug 1477551, and bug 1486958 all seem like they could be related here.
Comment 12•6 years ago
|
||
I put up https://github.com/servo/webrender/pull/3371 to maybe help.
Comment 13•6 years ago
|
||
So https://crash-stats.mozilla.com/report/index/fc9523ac-16a6-4b8a-84c3-3ec010181130 and https://crash-stats.mozilla.com/report/index/3762f238-2e31-4e9d-86f6-e64170181130 are crashes that happened after bug 1510882 / pull/3371 landed. However we're still crashing in the same place. This is very surprising as the crash should have moved earlier.
Comment 14•6 years ago
|
||
So it looks like the inconsistency probably comes from: https://searchfox.org/mozilla-central/rev/8f0db72fb6e35414fb9a6fc88af19c69f332425f/gfx/webrender_bindings/src/moz2d_renderer.rs#593
We clone the blob_commands table there and so it's believable that inconsistencies between the two arise.
Comment 15•6 years ago
|
||
I poked around the push logs for when this signature first appeared, but we enabled gfx.webrender.qualified.all on nightly around this time (20180913100107) so I suspect it was already present.
Earliest signatures for reference:
https://crash-stats.mozilla.com/report/index/3b62a536-27d4-4f60-a236-ccf0e0180915 [20180914220208, Windows]
https://crash-stats.mozilla.com/report/index/4e8df691-9d4f-4c9d-8bdb-cdbe20181123 [20181119220031, OSX]
https://crash-stats.mozilla.com/report/index/af5bad9d-70f0-4ed1-8d73-cf83b0181120 [20181119220031, Linux]
Comment 16•6 years ago
|
||
Here is a theory that kvark and I hashed out:
On the first pass of prepare_transaction:
1) Resource update contains only an AddBlobImage, but it is not visible according to viewport_tiles (set in SetBlobImageVisibleArea)
2) We call ResourceCache::create_blob_scene_builder_requests but since the image is not visible, it doesn't create any BlobImageParams entries (containing the blob image key) to be stored in txn.blob_requests
3) Since the transaction has no blob image updates, we can take the shortcut to call RenderBackend::update_document directly
4) We didn't call ResourceCache::set_blob_rasterizer with the updated txn.blob_rasterizer.
On the second pass of prepare_transaction:
1) Resource update contains only an SetBlobImageVisibleArea
2) We don't call ResourceCache::create_blob_scene_builder_requests since there is no Add/UpdateBlobImage entries.
3) Since the transaction has no blob image updates, we can take the shortcut to call RenderBackend::update_document directly
4) We are missing the blob we added in the previous step in ResourceCache::request_image, and it adds it to missing_blob_images.
5) We try to process missing_blob_images in ResourceCache::rasterize_missing_blob_images with the old blob_rasterizer.
6) Crash.
Comment 17•6 years ago
|
||
bp-25efa768-2eff-4c1a-b7fe-bb3390181130 I marked text in a bugzilla comment box and pressed Del.
Comment 18•6 years ago
|
||
Putting together a fix now, fingers crossed this is it.
Comment 19•6 years ago
|
||
Comment 20•6 years ago
|
||
(In reply to Andrew Osmond [:aosmond] from comment #18)
> Putting together a fix now, fingers crossed this is it.
Doesn't look like it. Sadness.
Comment 21•6 years ago
|
||
I reproduced locally readily outside of rr, and with some effort, in as well.
STR:
1) Comment out https://searchfox.org/mozilla-central/rev/3fdc51e66c9487b39220ad58dcee275fca070ccd/gfx/wr/webrender/src/scene_builder.rs#471-476 to force us to always create blobs on the RenderBackend thread.
2) MOZ_CHAOSMODE=1 ./mach run https://en.wikipedia.org/wiki/Fedora_(operating_system)#Releases
3) Ensure you are scrolled to the top of the page.
4) Resize vigorously until it crashes.
This confirmed nical's theory that the a low priority and high priority request raced. They both created their own rasterizer, the order got inverted, and when we tried to rasterize the missing blob, our snapshot was missing the key added in the other transaction.
Comment 23•6 years ago
|
||
Just got https://crash-stats.mozilla.com/report/index/bc16ca25-ebec-4259-ae08-a18a70181210 on macOS. Different or the same?
Comment 24•6 years ago
|
||
I don't think what landed can solve the priority inversion I found. Trying to reproduce again... I have a patch to force the correct ordering by forcing high priority scene builder transactions follow the low priority route if it has blob images to rasterize. Probably not ideal however.
Comment 25•6 years ago
|
||
Linked to another solution (which is simple and appears to fix the issue).
See Also: → https://github.com/servo/webrender/pull/3398
Comment 26•6 years ago
|
||
\o/ Looks like bug 1512730 fixed the crash.
Comment 27•6 years ago
|
||
Indeed! Let's mark it fixed until proven otherwise.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Comment 28•6 years ago
|
||
Should bug 1512730 be uplifted to 65?
Comment 29•6 years ago
|
||
Yeah, that would be good. :aosmond or :kvark, do you want to request uplift on that WR update?
status-firefox66:
--- → fixed
Target Milestone: --- → mozilla66
Comment 30•6 years ago
|
||
kats, the request is up at https://bugzilla.mozilla.org/show_bug.cgi?id=1512730#c5
Updated•6 years ago
|
Comment hidden (Intermittent Failures Robot) |
You need to log in
before you can comment on or make changes to this bug.
Description
•