Closed Bug 1574935 Opened 5 years ago Closed 5 years ago

Intermittent testing\marionette\harness\marionette_harness\tests\unit\test_screenshot.py TestScreenCaptureContent.test_capture_vertical_bounds | UnknownException: [Exception... "Failure" nsresult: "0x80004005 (NS_ERROR_FAILURE)" in "capture.toBase64()"

Categories

(Remote Protocol :: Marionette, defect, P5)

Version 3
All
Windows 7
defect

Tracking

(firefox70 fixed)

RESOLVED FIXED
mozilla70
Tracking Status
firefox70 --- fixed

People

(Reporter: whimboo, Assigned: whimboo)

References

Details

This is an intermittent failure which will start happening once my patch on bug 1559592 landed. It's Windows only, and unclear yet what specifically causes the problem.

The call to canvas.toDataURL() fails with a generic exception: UnknownException: [Exception... "Failure" nsresult: "0x80004005 (NS_ERROR_FAILURE).

https://searchfox.org/mozilla-central/rev/5912f376ab6a17afcba2b7654586013158ed64b5/testing/marionette/harness/marionette_harness/tests/unit/test_screenshot.py#285

Here the code:
https://searchfox.org/mozilla-central/rev/5912f376ab6a17afcba2b7654586013158ed64b5/dom/html/HTMLCanvasElement.cpp#725-770

I don't know what could be wrong here given that the call to drawSnapshot didn't raise any error. So maybe there is some kind of invalid data? But given the documentation on MDN only a security error is allowed to be raised.

Olli, any idea what this could be related to? Thanks

Flags: needinfo?(bugs)

Without someone debugging where the error is coming, hard to say anything. Looks like that particular caller shouldn't trigger CanvasRenderingContextHelper.cpp#239-248
(I don't have access to a Windows machine atm)

Flags: needinfo?(bugs)

Debug builds actually show the following:

https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=262082900&repo=try&lineNumber=38096-38097

[task 2019-08-16T21:15:41.655Z] 21:15:41 INFO - [Parent 2044, Main Thread] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80004005: file z:/build/build/src/dom/base/ImageEncoder.cpp, line 410
[task 2019-08-16T21:15:41.655Z] 21:15:41 INFO - [Parent 2044, Main Thread] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80004005: file z:/build/build/src/dom/html/HTMLCanvasElement.cpp, line 758

Also this is Windows 7 only. Windows 10 isn't affected:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=3813dd87ba4933bf333ae33d0dc5fe0ab2e2aa43

OS: Windows → Windows 7

Timothy, do you have an idea why that is failing? See the recent comments on this bug. If you need more please let me know and I could trigger some try builds. Thanks

Flags: needinfo?(tnikkel)

So that means we are hitting this

https://searchfox.org/mozilla-central/rev/428cf94f4ddfb80eebc6028023a7d89eb277791b/dom/base/ImageEncoder.cpp#410

It doesn't tell us which one of the giant if branches we took above. Going through all the places in there where the rv could have come from it's either in the actual image encoder (imgIEncoder::InitFromData) or somewhere in canvas code (nsICanvasRenderingContextInternal::GetInputStream and AsyncCanvasRenderer::GetInputStream). We need more info on what calls are failing to know. Since it's intermittent my guess would be in the canvas code because I would expect the actual image encoders to be mostly deterministic and the name AsyncCanvasRenderer makes me think we could have synchronization bugs. Sprinkling some more NS_ENSURE_SUCCESS around to narrow down what's failing would be the next step if debugging this via try server is the easiest way to reproduce.

Flags: needinfo?(tnikkel)

Doesn't seem to fail on my Windows 7 machine.

I made a push with a bunch of ns ensure success to get more logging.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=53af9e2a13e2490110d4b736a24c8a1fc0b782f6

But when I look at the log the line numbers and warnings seem as they come from a build without my extra ns ensure success calls. The same two ensure success fails with the same line numbers, even though the ImageEncoder line numbers have changed and line 410 no longer correspond to a line with ns ensure success.

Oh, I copied the try fuzzy commit from the try push in comment 3 and it enabled artifact builds.

We fail on this line

https://searchfox.org/mozilla-central/rev/03853a6e87c4a9405fce1de49e5d03b9e7a7a274/gfx/2d/DataSurfaceHelpers.cpp#149

the size of the image we are allocating is

1147327674 = 150334996

Not too big, but this is a 32bit system, and I guess if there is other stuff using memory it could fail? Henrik pointed out to me that the new api requires keeping the screenshot around as well as drawing it to a canvas, so that would use extra memory. I guess this is the cause.

Lets needinfo Matt for further details. Matt, can you please check the comment 8 from Timothy? Thanks

Flags: needinfo?(matt.woodrow)

Note that for the time being I will skip this particular test only for Windows 32bit over on bug 1559592.

Blocks: 1487124

Actually before skipping the test I wanted to try if just deleting the snapshot after drawing it into the canvas might help to temporarily get rid of the second copy. I would love to keep this test running on 32bit Windows.

Here a try build:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=570473bd28bbdbe58239fa65a9ca6a4364776421

As Matt pointed out on IRC there is snapshot.close() which we could use for now. It's way better than having to force a GC.

Here the updated try build:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=7032df66872996fcb04928d0d70b3102fbd8c570

Using snapshot.close() works fine for the interim. So I will make use of it similar to other places in Firefox itself.

Timothy, shall we move this bug to another component, or better file a new one for the underlying memory footprint?

Flags: needinfo?(matt.woodrow) → needinfo?(tnikkel)

Sorry I think that this question actually should have still gone to Matt.

Flags: needinfo?(tnikkel) → needinfo?(matt.woodrow)

I think we should close this for now, since the bug as reported no longer happens.

I don't have any great ideas right now that are better than just using close(), so I think it's fine to just keep doing that for now.

Flags: needinfo?(matt.woodrow)
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED

Maybe 32bit systems are gone at some point, so that we don't have to think more about it... Anyway, works for me for the time being.

Assignee: nobody → hskupin
Target Milestone: --- → mozilla70
Product: Testing → Remote Protocol
You need to log in before you can comment on or make changes to this bug.