Closed
Bug 1176363
Opened 9 years ago
Closed 9 years ago
crash in SkPixelRef::lockPixels()
Categories
(Core :: WebRTC, defect, P1)
Tracking
()
RESOLVED
FIXED
mozilla42
backlog | webrtc/webaudio+ |
People
(Reporter: potch, Assigned: pehrsons)
References
Details
(Keywords: crash)
Crash Data
Attachments
(3 files, 2 obsolete files)
(deleted),
text/x-review-board-request
|
pehrsons
:
review+
ritu
:
approval-mozilla-aurora+
|
Details |
(deleted),
patch
|
pehrsons
:
review+
ritu
:
approval-mozilla-aurora+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
mattwoodrow
:
review+
|
Details | Diff | Splinter Review |
This bug was filed from the Socorro interface and is
report bp-18a57c8e-1d15-4ac1-9a36-eb10c2150617.
=============================================================
This usually happens a few minutes after calling canvas.captureStream().
Here's the code I used: http://codepen.io/anon/pen/Ejvqbw
Comment 1•9 years ago
|
||
Is this a GFX bug? (Or should someone else answer the question?)
backlog: --- → webRTC+
Rank: 10
Flags: needinfo?(jgilbert)
Priority: -- → P1
Assignee | ||
Comment 2•9 years ago
|
||
George is the skia person I believe.
Flags: needinfo?(jgilbert) → needinfo?(gwright)
Comment 3•9 years ago
|
||
This is a thread safety issue with the way captureStream is implemented.
Moz2D isn't really threadsafe, and taking a snapshot (via CanvasRenderingContext2D::GetSurfaceSnapshot) returns a copy-on-write lightweight snapshot that refers back to the original DrawTarget (each DrawTarget implements this separately).
This mechanism isn't threadsafe, so have the snapshot accessed on the ImageBridge thread (the crashing thread) is racing with access to the DrawTarget for canvas drawing on the main thread.
Comment 4•9 years ago
|
||
Aha. Thanks Matt. Is there some way to add a thread-safety assertion to the snapshots? And is there a workaround to resolve the problem without causing a threading issue?
Related question: how does the impact of such workarounds affect performance?
Flags: needinfo?(matt.woodrow)
Comment 5•9 years ago
|
||
I'll also note I tried it on my Mac and got no crashes in 10 minutes. Andreas, did we make any chances in how we grab frames since this was reported?
Flags: needinfo?(pehrsons)
Comment 6•9 years ago
|
||
By the looks of things we actually use runnables to avoid releasing off the main thread and hitting those assertions. Threading in Moz2D is something we need to look into further.
The best fix is to change the implementation of TimerDriver::TakeSnapshot() to allocate a new SourceSurfaceRawData and copy the pixels from the canvas snapshot into that. That way we guarantee that we have ownership of the data we're using.
This will likely perform worse as its making an extra copy, but there isn't much choice right now.
Updated•9 years ago
|
Flags: needinfo?(matt.woodrow)
Flags: needinfo?(gwright)
Assignee | ||
Comment 7•9 years ago
|
||
Ah, I wish I had known this earlier. As Randell mentioned, some thread safety assertions would have been very nice.
I'll have a patch up real soon. It can hopefully fix some of the other issues we've been seeing with captureStream as well.
(In reply to Randell Jesup [:jesup] from comment #5)
> I'll also note I tried it on my Mac and got no crashes in 10 minutes.
> Andreas, did we make any chances in how we grab frames since this was
> reported?
No changes, no. It's probably just rare.
Flags: needinfo?(pehrsons)
Assignee | ||
Comment 8•9 years ago
|
||
Bug 1176363 - Make a raw copy of each Canvas::CaptureStream frame. r=mattwoodrow
Attachment #8629783 -
Flags: review?(matt.woodrow)
Assignee | ||
Updated•9 years ago
|
Assignee | ||
Comment 9•9 years ago
|
||
Comment 10•9 years ago
|
||
Comment on attachment 8629783 [details]
MozReview Request: Bug 1176363 - Part 1. Make a raw copy of each Canvas::CaptureStream frame. r=mattwoodrow
https://reviewboard.mozilla.org/r/12645/#review11133
Ship It!
Attachment #8629783 -
Flags: review?(matt.woodrow) → review+
Assignee | ||
Updated•9 years ago
|
Updated•9 years ago
|
status-firefox41:
--- → affected
status-firefox42:
--- → affected
Comment 11•9 years ago
|
||
Keywords: checkin-needed
Comment 12•9 years ago
|
||
Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/80da8a74fad2 for frequent Win8 mochitest-gl and less frequent mochitest-1 assertion failures like https://treeherder.mozilla.org/logviewer.html#?job_id=11416725&repo=mozilla-inbound or the ones I retriggered up for you on your try push.
Assignee | ||
Comment 13•9 years ago
|
||
OK, I figured out what is racing now.
The stack trace in most failures (like comment 12) is:
> 04:44:35 INFO - Thread 0 (crashed) <- main thread
> 04:44:35 INFO - 0 xul.dll!mozilla::gfx::DataSourceSurface::Unmap() [2D.h:f90e28d039c1 : 475 + 0x53]
> 04:44:35 INFO - 1 xul.dll!mozilla::gfx::DrawTargetD2D1::OptimizeSourceSurface(mozilla::gfx::SourceSurface *) [DrawTargetD2D1.cpp:f90e28d039c1 : 1584 + 0x1d]
> 04:44:35 INFO - 2 xul.dll!nsLayoutUtils::SurfaceFromElement(mozilla::dom::HTMLVideoElement *,unsigned int,mozilla::gfx::DrawTarget *) [nsLayoutUtils.cpp:f90e28d039c1 : 6918 + 0x10]
> 04:44:35 INFO - 3 xul.dll!nsLayoutUtils::SurfaceFromElement(mozilla::dom::Element *,unsigned int,mozilla::gfx::DrawTarget *) [nsLayoutUtils.cpp:f90e28d039c1 : 6947 + 0x7]
> 04:44:35 INFO - 4 xul.dll!mozilla::dom::CanvasRenderingContext2D::DrawImage(mozilla::dom::HTMLImageElementOrHTMLCanvasElementOrHTMLVideoElement const &,double,double,double,double,double,double,double,double,unsigned char,mozilla::ErrorResult &) [CanvasRenderingContext2D.cpp:f90e28d039c1 : 4379 + 0x16]
> 04:44:35 INFO - 5 xul.dll!mozilla::dom::CanvasRenderingContext2DBinding::drawImage [CanvasRenderingContext2DBinding.cpp:f90e28d039c1 : 4126 + 0x9]
> ...
I was debugging on try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=b4f063845c0b
One of the M-gl tests failed, but due to timing changes gave me the other stack trace, http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/pehrsons@gmail.com-b4f063845c0b/try-win64-debug/try_win8_64-debug_test-mochitest-gl-bm119-tests1-windows-build1085.txt.gz:
> 05:59:39 INFO - Thread 34 (crashed)
> 05:59:39 INFO - 0 xul.dll!mozilla::gfx::DataSourceSurface::Unmap() [2D.h:b4f063845c0b : 477 + 0x53]
> 05:59:39 INFO - 1 xul.dll!mozilla::gfx::CreatePartialBitmapForSurface [HelpersD2D.h:b4f063845c0b : 620 + 0x13]
> 05:59:39 INFO - 2 xul.dll!mozilla::gfx::DrawTargetD2D1::GetImageForSurface(mozilla::gfx::SourceSurface *,mozilla::gfx::Matrix &,mozilla::gfx::ExtendMode,mozilla::gfx::IntRectTyped<mozilla::gfx::UnknownUnits> const *) [DrawTargetD2D1.cpp:b4f063845c0b : 1552 + 0x3d]
> 05:59:39 INFO - 3 xul.dll!mozilla::gfx::DrawTargetD2D1::CopySurface(mozilla::gfx::SourceSurface *,mozilla::gfx::IntRectTyped<mozilla::gfx::UnknownUnits> const &,mozilla::gfx::IntPointTyped<mozilla::gfx::UnknownUnits> const &) [DrawTargetD2D1.cpp:b4f063845c0b : 329 + 0x2d]
> 05:59:39 INFO - 4 xul.dll!mozilla::layers::CairoImage::GetTextureClient(mozilla::layers::CompositableClient *) [ImageContainer.cpp:b4f063845c0b : 654 + 0x3b]
> 05:59:39 INFO - 5 xul.dll!mozilla::layers::ImageClientSingle::UpdateImage(mozilla::layers::ImageContainer *,unsigned int) [ImageClient.cpp:b4f063845c0b : 154 + 0x8]
> 05:59:39 INFO - 6 xul.dll!mozilla::layers::UpdateImageClientNow [ImageBridgeChild.cpp:b4f063845c0b : 408 + 0x11]
> 05:59:39 INFO - 7 xul.dll!RunnableFunction<void (*)(mozilla::layers::ImageClient *,mozilla::layers::ImageContainer *),Tuple2<mozilla::layers::ImageClient *,nsRefPtr<mozilla::layers::ImageContainer> > >::Run() [task.h:b4f063845c0b : 420 + 0x9]
> ...
So it's a race between
* drawing the video element (with our now self-owned raw datasourcesurface) to a 2d canvas on the main thread,
* and compositing the video element with the same raw surface on the image client bridge thread (?)
So it basically boils down to an existing issue on Windows that I just happen to exercise a tad bit more than existing tests.
Matt, any ideas for fixes?
Flags: needinfo?(matt.woodrow)
Assignee | ||
Updated•9 years ago
|
Comment 14•9 years ago
|
||
Flags: needinfo?(matt.woodrow)
Attachment #8633709 -
Flags: review?(bas)
Comment 15•9 years ago
|
||
Attachment #8633769 -
Flags: review?(matt.woodrow)
Comment 16•9 years ago
|
||
Updated•9 years ago
|
Attachment #8633709 -
Flags: review?(bas) → review+
Comment 17•9 years ago
|
||
Comment on attachment 8633769 [details] [diff] [review]
Stop using DrawTargets off the main thread
Review of attachment 8633769 [details] [diff] [review]:
-----------------------------------------------------------------
::: gfx/layers/d3d11/TextureD3D11.cpp
@@ +491,5 @@
> }
>
> gfxWindowsPlatform* windowsPlatform = gfxWindowsPlatform::GetPlatform();
> + ID3D11Device* d3d11device = windowsPlatform->GetD3D11DeviceForCurrentThread();
> + bool haveD3d11Backend = windowsPlatform->GetContentBackend() == BackendType::DIRECT2D1_1 || NS_IsMainThread();
Comment about why using d3d11 is ok (even with d2d 1.1) off the main thread.
@@ +502,5 @@
> D3D11_BIND_RENDER_TARGET | D3D11_BIND_SHADER_RESOURCE);
>
> newDesc.MiscFlags = D3D11_RESOURCE_MISC_SHARED;
>
> + if (!NS_IsMainThread()) {
Comment here about sync object handling this for us on the main thread.
::: gfx/layers/d3d11/TextureD3D11.h
@@ +63,5 @@
> virtual bool CanExposeDrawTarget() const override { return true; }
>
> virtual gfx::DrawTarget* BorrowDrawTarget() override;
>
> + virtual void UpdateFromSurface(gfx::DataSourceSurface* aSurface) override;
Comment about how we use this for off-main-thread updates, and that BorrowDrawTarget() is not safe for this.
Attachment #8633769 -
Flags: review?(matt.woodrow) → review+
This is increasingly a Windows bug.
OS: Mac OS X → Windows
Comment 19•9 years ago
|
||
Andreas, feel free to land my patch if you want to land your stuff. We can just leave the bug open until Bas lands with the review comments fixed.
Assignee | ||
Comment 20•9 years ago
|
||
Thanks for the work here guys. I'm running our patches on try now Matt: https://treeherder.mozilla.org/#/jobs?repo=try&revision=83fd58f54a8f
Assignee | ||
Comment 21•9 years ago
|
||
Comment on attachment 8629783 [details]
MozReview Request: Bug 1176363 - Part 1. Make a raw copy of each Canvas::CaptureStream frame. r=mattwoodrow
Bug 1176363 - Part 1. Make a raw copy of each Canvas::CaptureStream frame. r=mattwoodrow
Attachment #8629783 -
Attachment description: MozReview Request: Bug 1176363 - Make a raw copy of each Canvas::CaptureStream frame. r=mattwoodrow → MozReview Request: Bug 1176363 - Part 1. Make a raw copy of each Canvas::CaptureStream frame. r=mattwoodrow
Attachment #8629783 -
Flags: review+
Assignee | ||
Comment 22•9 years ago
|
||
Comment on attachment 8629783 [details]
MozReview Request: Bug 1176363 - Part 1. Make a raw copy of each Canvas::CaptureStream frame. r=mattwoodrow
Rebased and added part number. Carrying r=mattwoodrow.
Attachment #8629783 -
Flags: review+
Assignee | ||
Comment 23•9 years ago
|
||
Rebased Matt's patch and fixed a typo (Atomic.h -> Atomics.h). Carrying forward r=bas.
Attachment #8633709 -
Attachment is obsolete: true
Attachment #8634648 -
Flags: review+
Assignee | ||
Comment 24•9 years ago
|
||
Martin, see linux M-e10s-2 on [1], `application crashed [@ mozilla::TransportLayerShutdown]` is going crazy now. Maybe some timing changed with these patches but they seem unrelated to the crash. Is this on your radar?
[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=83fd58f54a8f
Flags: needinfo?(martin.thomson)
Comment 25•9 years ago
|
||
I think that I want your patches if they are that effective.I've landed a fix for the crash, but not the underlying problem, which I've been trying to reproduce unsuccessfully for most of this week.
Flags: needinfo?(martin.thomson)
Assignee | ||
Comment 26•9 years ago
|
||
Please land part 1 from reviewboard and part 2 from bugzilla.
Try push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=83fd58f54a8f
The failing test was since disabled, so we should be good here.
Leaving open for bas to land his patch when it's ready.
Thanks!
Keywords: checkin-needed,
leave-open
Comment 27•9 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/1ece1edb6956
https://hg.mozilla.org/integration/mozilla-inbound/rev/0d3ad656d93f
Keywords: checkin-needed
Comment 29•9 years ago
|
||
Update with a bunch of fixes and made to pass try.
Attachment #8639136 -
Flags: review?(matt.woodrow)
Comment 30•9 years ago
|
||
Comment on attachment 8639136 [details] [diff] [review]
Part 1: Stop using DrawTargets off the main thread v2
Review of attachment 8639136 [details] [diff] [review]:
-----------------------------------------------------------------
::: gfx/layers/ImageContainer.cpp
@@ +649,5 @@
> + RefPtr<DataSourceSurface> dataSurf = surface->GetDataSurface();
> + textureClient->UpdateFromSurface(dataSurf);
> +
> + if (NS_IsMainThread()) {
> + textureClient->SyncWithObject(forwarder->GetSyncObject());
This is a bit weird because the TextureClient implicitly decides whether to use the sync object (or keyedmutex), but we're required to explicitly tell it to sync (sometimes).
It'd be a bit clearer if these two things matched, or if we had a textureClient->NeedsSync() method to check (rather than NS_IsMainThread which isn't very obvious).
Anyway, a comment about why we only do this for the main thread would be a great start.
Attachment #8639136 -
Flags: review?(matt.woodrow) → review+
Updated•9 years ago
|
Keywords: leave-open
Comment 31•9 years ago
|
||
Comment 32•9 years ago
|
||
Comment 33•9 years ago
|
||
Comment 34•9 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla42
Assignee | ||
Comment 35•9 years ago
|
||
Comment on attachment 8629783 [details]
MozReview Request: Bug 1176363 - Part 1. Make a raw copy of each Canvas::CaptureStream frame. r=mattwoodrow
Approval Request Comment
[Feature/regressing bug #]: Canvas CaptureStream
[User impact if declined]: A live 30FPS canvas captureStream causes Firefox to crash after a couple of minutes. Depends a bit on the platform (timing sensitive) but has shown up frequently in automation, and been reported by users. Canvas captureStreams are protected by a pref and are not available by default.
[Describe test coverage new/current, TreeHerder]: Good coverage in automation that shows the issue as largely fixed. (occasional crashes have occurred, perhaps once a month, but not several a day like before this fix)
[Risks and why]: low. This patch only affects code behind a pref. Accompanying patches (part 2 on this bug and bug 1185011) are in very central code and thus well tested in automation.
[String/UUID change made/needed]: none
Attachment #8629783 -
Flags: approval-mozilla-aurora?
Assignee | ||
Comment 36•9 years ago
|
||
Comment on attachment 8634648 [details] [diff] [review]
Part 2. Allow mapping of SourceSurfaceRawData from multiple threads
Approval Request Comment
See comment 35.
Attachment #8634648 -
Flags: approval-mozilla-aurora?
Matt, I was unable to find this crash in FF41 in the past 28 days. Do you see it? I am just wondering whether this change can just ride the train instead of getting uplifted to Aurora. Thoughts?
Flags: needinfo?(matt.woodrow)
Comment 38•9 years ago
|
||
I was never able to reproduce this, so unsure.
Flags: needinfo?(matt.woodrow)
Comment 39•9 years ago
|
||
The test failures in the dependent bugs definitely affect Aurora41.
Comment on attachment 8629783 [details]
MozReview Request: Bug 1176363 - Part 1. Make a raw copy of each Canvas::CaptureStream frame. r=mattwoodrow
Seems safe to uplift to Aurora as 1) this has been in m-c for a week and 2) by default CanvasStream is pref'd off.
Attachment #8629783 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #8634648 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment 41•9 years ago
|
||
I was able to uplift Andreas' patches, but Bas' patch needs rebasing.
https://hg.mozilla.org/releases/mozilla-aurora/rev/ef164895b708
https://hg.mozilla.org/releases/mozilla-aurora/rev/ff1c96567f25
Flags: needinfo?(bas)
Assignee | ||
Comment 42•9 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #41)
> I was able to uplift Andreas' patches, but Bas' patch needs rebasing.
>
> https://hg.mozilla.org/releases/mozilla-aurora/rev/ef164895b708
> https://hg.mozilla.org/releases/mozilla-aurora/rev/ff1c96567f25
I only requested aurora approval for those because I don't think Bas' patch is necessary. As I take it it just asserts that no-one is using DrawTargets in the wrong way and thus is mainly needed for new offenders - existing issues would probably pop up as intermittent oranges anyway.
I'll let Bas chip in as well.
Comment 43•9 years ago
|
||
The crashes didn't go away for good until Bas' patch landed.
Assignee | ||
Comment 44•9 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #43)
> The crashes didn't go away for good until Bas' patch landed.
Ah, got it. That's why there were some stray oranges for a few days..
Updated•9 years ago
|
Updated•9 years ago
|
Comment 45•9 years ago
|
||
(In reply to Andreas Pehrson [:pehrsons] (Telenor) from comment #42)
> (In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #41)
> > I was able to uplift Andreas' patches, but Bas' patch needs rebasing.
> >
> > https://hg.mozilla.org/releases/mozilla-aurora/rev/ef164895b708
> > https://hg.mozilla.org/releases/mozilla-aurora/rev/ff1c96567f25
>
> I only requested aurora approval for those because I don't think Bas' patch
> is necessary. As I take it it just asserts that no-one is using DrawTargets
> in the wrong way and thus is mainly needed for new offenders - existing
> issues would probably pop up as intermittent oranges anyway.
>
> I'll let Bas chip in as well.
No, my patch fixes the actual problem :-). Matt's patches just make the crash significantly less likely to occur. Having said that, my patches are significantly more risky. They've had some testing but they at least need bug 1190950 uplifted as well. I'll leave it up to release management to determine if we want to do that uplift.
Flags: needinfo?(rkothari)
Flags: needinfo?(pehrsons)
Flags: needinfo?(bas)
Updated•9 years ago
|
Attachment #8633769 -
Attachment is obsolete: true
Assignee | ||
Comment 46•9 years ago
|
||
(In reply to Bas Schouten (:bas.schouten) from comment #45)
> No, my patch fixes the actual problem :-).
Right, not sure why I thought they were just asserting...
Flags: needinfo?(pehrsons)
Bas, can you please nominate the patch(s) from bug 1190950 for uplift to Beta? Without that tracking flag "approval-mozilla-aurora/beta" set to "?" I cannot do much.
Flags: needinfo?(rkothari) → needinfo?(bas)
Comment 49•9 years ago
|
||
Bas, were you still going to look at this rebase for Beta41?
Flags: needinfo?(bas)
Comment 50•9 years ago
|
||
Hi Bas and Matt,
I have a question about the main thread assertion in TextureClient::BorrowDrawTarget()[1].
If we have off-main-thread painting, we still need to call the BorrowDrawTarget() at painting thread and hit the assertion.
With [2], if we borrow the drawTarget and get the snapshot at main thread, we still have the data race problem when we access the snapshot at another thread.
Any suggestion for calling borrowTarget() at painting thread?
[1]
https://hg.mozilla.org/mozilla-central/annotate/a6786bf8d71d4cf40c3d40e06d8e3c9866863475/gfx/layers/client/TextureClient.cpp#l836
[2]
https://bugzilla.mozilla.org/attachment.cgi?id=8639136
Flags: needinfo?(matt.woodrow)
Comment 51•9 years ago
|
||
Jerry, we should move this to a new bug, since fixing it might take a bit of work.
BorrowDrawTarget asserts that it's only called on the main thread because Moz2D isn't really threadsafe and some backends have issues with using it across threads (especially d2d).
We're only going to be using a DrawTarget (and snapshots of it) from a single thread, so this should be solvable fairly easy. D2D in particular will need to be setup with a separate device for each thread that it is used on.
Flags: needinfo?(matt.woodrow)
Updated•9 years ago
|
Updated•9 years ago
|
Flags: needinfo?(bas)
Depends on: slow-canvas-to-webgl
Updated•9 years ago
|
No longer depends on: slow-canvas-to-webgl
You need to log in
before you can comment on or make changes to this bug.
Description
•