Closed Bug 1641256 Opened 4 years ago Closed 4 years ago

Crash in [@ mozilla::gfx::ReadElementConstrained<T>]

Categories

(Core :: Graphics: Canvas2D, defect, P1)

All
Windows
defect

Tracking

()

RESOLVED FIXED
mozilla80
Tracking Status
firefox-esr68 --- disabled
firefox-esr78 --- disabled
firefox76 --- disabled
firefox77 --- disabled
firefox78 --- disabled
firefox79 --- disabled
firefox80 --- fixed

People

(Reporter: bobowen, Assigned: bobowen)

References

Details

(Keywords: crash)

Crash Data

Attachments

(1 file)

This bug is for crash report bp-8b2f5e30-873d-4f6f-8979-675c90200525.

Top 10 frames of crashing thread:

0 xul.dll CrashStatsLogForwarder::CrashAction gfx/thebes/gfxPlatform.cpp:407
1 xul.dll mozilla::gfx::Log<1, mozilla::gfx::CriticalLogger>::Flush gfx/2d/Logging.h:279
2 xul.dll mozilla::gfx::Log<1, mozilla::gfx::CriticalLogger>::~Log gfx/2d/Logging.h:272
3 xul.dll mozilla::gfx::ReadElementConstrained<mozilla::gfx::EventRingBuffer, mozilla::gfx::BackendType> gfx/2d/RecordingTypes.h:57
4 xul.dll mozilla::gfx::RecordedDrawTargetCreation::RecordedDrawTargetCreation<mozilla::gfx::EventRingBuffer> gfx/2d/RecordedEventImpl.h:1921
5 xul.dll static mozilla::gfx::RecordedEvent::DoWithEvent<mozilla::gfx::EventRingBuffer> gfx/2d/RecordedEventImpl.h:3923
6 xul.dll mozilla::layers::CanvasTranslator::TranslateRecording gfx/layers/ipc/CanvasTranslator.cpp:196
7 xul.dll mozilla::layers::CanvasTranslator::StartTranslation gfx/layers/ipc/CanvasTranslator.cpp:154
8 xul.dll mozilla::detail::RunnableMethodImpl< xpcom/threads/nsThreadUtils.h:1237
9 xul.dll mozilla::TaskQueue::Runner::Run xpcom/threads/TaskQueue.cpp:212

Some of these seem to be failures to allocate and then the stream being misread because of that.
I'll probably file a separate bug for these and get the fix landed.

However many of them seem to be after a failure to create a surface of size (1,4161536) or (2,4161536).
4161536 in hex is 3F8000, which looks like a corrupted reading of float 1.
It's not clear at the moment whether this might be a fault on the writing or reading side.

Depends on: 1641578
Depends on: 1641722

Hi Bob, what's the status of this bug? Is remote canvas still aiming to ride 79 to release?

Flags: needinfo?(bobowencode)

(In reply to Ryan VanderMeulen [:RyanVM] from comment #2)

Hi Bob, what's the status of this bug? Is remote canvas still aiming to ride 79 to release?

Hi Ryan, I'm still looking into this.
In bug 1641722 I landed a change to deactivate remote canvas when we fail to create a device and when we get an error reading from the stream.
This crash is Nightly / Dev Edition only, so I was hoping that with the fallback to software for these small number of cases we could still roll out.

However I also landed telemetry for the number of activations and deactivations of remote canvas and the count for the stream read errors are more common.
Very close to 1% of all sessions and a bit better in Beta. It's possible that these are a similar root cause to this bug or they could just be an artefact of shutdown. Until I can understand more about what is causing those read errors I think I'm less inclined to let this roll out in 79.
I've been running one of our canvas fuzzing tests locally for the last three days, but I haven't hit a read error yet.
It's still only enabled for early beta, so I was going to have to uplift to enable it fully anyway.

Flags: needinfo?(bobowencode)

As for this specific bug, I have got a little closer.

It looks like the reading is getting misaligned and we are misreading a RecordedSetTransform(Identity Matrix) followed by a RecordedSnapshot as RecordedDrawTargetCreations.
I can tell this because:

  • we see a RecordedDrawTargetCreation failure because of one of the Float(1)s from the transform Matrix is being read as part of the draw target IntRect, which fails in D2D1 code because it is too big
  • the other Float(1) gets read as the reference pointer for the next bogus RecordedDrawTargetCreation
  • the 24 (RecordedSnapshot event type) gets read as part of that RecordedDrawTargetCreation's IntRect and the SurfaceFormat gets read from the RecordedSnapshot's reference pointer, which will nearly always result in this crash.

Hmm, I think I've just realised what the problem might be.
/me goes to investigate.

My suspicion is that these are nearly all down to shutdown of the content process mid way through writing.
This is down to the fact that this shot up after bug 1633791 landed.
I'm also hoping that the large number of deactivations due to stream read errors (that don't cause this crash) is down to the same problem.

This also removes the gfxDevCrashes when an invalid enum is read, because we
suspect these are mainly due to the writer shutting down mid event.
We could only crash when the writer hasn't failed, but because these reads are
in templated code it would mean updating the different event streams with a new
function. If we are still getting high numbers in the deactivation telemetry we
will need to do this to try and track down the problem.

Pushed by bobowencode@gmail.com: https://hg.mozilla.org/integration/autoland/rev/50bf5757f911 Don't deactivate remote canvas due to read error when the writer has already failed. r=jrmuizel
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla80

We're not going to uplift, remote canvas will be disabled on 79 after early Beta.
Actually this gfxDevCrash wouldn't happen in Beta anyway.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: