Intermittent testing\marionette\harness\marionette_harness\tests\unit\test_screenshot.py TestScreenCaptureContent.test_capture_vertical_bounds | UnknownException: [Exception... "Failure" nsresult: "0x80004005 (NS_ERROR_FAILURE)" in "capture.toBase64()"
Categories
(Remote Protocol :: Marionette, defect, P5)
Tracking
(firefox70 fixed)
Tracking | Status | |
---|---|---|
firefox70 | --- | fixed |
People
(Reporter: whimboo, Assigned: whimboo)
References
Details
This is an intermittent failure which will start happening once my patch on bug 1559592 landed. It's Windows only, and unclear yet what specifically causes the problem.
The call to canvas.toDataURL() fails with a generic exception: UnknownException: [Exception... "Failure" nsresult: "0x80004005 (NS_ERROR_FAILURE).
Here the code:
https://searchfox.org/mozilla-central/rev/5912f376ab6a17afcba2b7654586013158ed64b5/dom/html/HTMLCanvasElement.cpp#725-770
I don't know what could be wrong here given that the call to drawSnapshot didn't raise any error. So maybe there is some kind of invalid data? But given the documentation on MDN only a security error is allowed to be raised.
Olli, any idea what this could be related to? Thanks
Assignee | ||
Comment 1•5 years ago
|
||
Reply from Olli on the other bug was:
There is at least https://searchfox.org/mozilla-central/rev/5912f376ab6a17afcba2b7654586013158ed64b5/dom/canvas/CanvasRenderingContextHelper.cpp#239-248
Where is the code calling toDataURL?
The JS code from Marionette lives here:
https://searchfox.org/mozilla-central/rev/5912f376ab6a17afcba2b7654586013158ed64b5/testing/marionette/capture.js#199
Comment 2•5 years ago
|
||
Without someone debugging where the error is coming, hard to say anything. Looks like that particular caller shouldn't trigger CanvasRenderingContextHelper.cpp#239-248
(I don't have access to a Windows machine atm)
Assignee | ||
Comment 3•5 years ago
|
||
Debug builds actually show the following:
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=262082900&repo=try&lineNumber=38096-38097
[task 2019-08-16T21:15:41.655Z] 21:15:41 INFO - [Parent 2044, Main Thread] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80004005: file z:/build/build/src/dom/base/ImageEncoder.cpp, line 410
[task 2019-08-16T21:15:41.655Z] 21:15:41 INFO - [Parent 2044, Main Thread] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80004005: file z:/build/build/src/dom/html/HTMLCanvasElement.cpp, line 758
Also this is Windows 7 only. Windows 10 isn't affected:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=3813dd87ba4933bf333ae33d0dc5fe0ab2e2aa43
Assignee | ||
Comment 4•5 years ago
|
||
Timothy, do you have an idea why that is failing? See the recent comments on this bug. If you need more please let me know and I could trigger some try builds. Thanks
Comment 5•5 years ago
|
||
So that means we are hitting this
It doesn't tell us which one of the giant if branches we took above. Going through all the places in there where the rv could have come from it's either in the actual image encoder (imgIEncoder::InitFromData) or somewhere in canvas code (nsICanvasRenderingContextInternal::GetInputStream and AsyncCanvasRenderer::GetInputStream). We need more info on what calls are failing to know. Since it's intermittent my guess would be in the canvas code because I would expect the actual image encoders to be mostly deterministic and the name AsyncCanvasRenderer makes me think we could have synchronization bugs. Sprinkling some more NS_ENSURE_SUCCESS around to narrow down what's failing would be the next step if debugging this via try server is the easiest way to reproduce.
Comment 6•5 years ago
|
||
Doesn't seem to fail on my Windows 7 machine.
I made a push with a bunch of ns ensure success to get more logging.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=53af9e2a13e2490110d4b736a24c8a1fc0b782f6
But when I look at the log the line numbers and warnings seem as they come from a build without my extra ns ensure success calls. The same two ensure success fails with the same line numbers, even though the ImageEncoder line numbers have changed and line 410 no longer correspond to a line with ns ensure success.
Comment 7•5 years ago
|
||
Oh, I copied the try fuzzy commit from the try push in comment 3 and it enabled artifact builds.
Comment 8•5 years ago
|
||
We fail on this line
the size of the image we are allocating is
1147327674 = 150334996
Not too big, but this is a 32bit system, and I guess if there is other stuff using memory it could fail? Henrik pointed out to me that the new api requires keeping the screenshot around as well as drawing it to a canvas, so that would use extra memory. I guess this is the cause.
Assignee | ||
Comment 9•5 years ago
|
||
Lets needinfo Matt for further details. Matt, can you please check the comment 8 from Timothy? Thanks
Assignee | ||
Comment 10•5 years ago
|
||
Note that for the time being I will skip this particular test only for Windows 32bit over on bug 1559592.
Assignee | ||
Comment 11•5 years ago
|
||
Actually before skipping the test I wanted to try if just deleting the snapshot after drawing it into the canvas might help to temporarily get rid of the second copy. I would love to keep this test running on 32bit Windows.
Here a try build:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=570473bd28bbdbe58239fa65a9ca6a4364776421
Assignee | ||
Comment 12•5 years ago
|
||
As Matt pointed out on IRC there is snapshot.close()
which we could use for now. It's way better than having to force a GC.
Here the updated try build:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=7032df66872996fcb04928d0d70b3102fbd8c570
Assignee | ||
Comment 13•5 years ago
|
||
Using snapshot.close()
works fine for the interim. So I will make use of it similar to other places in Firefox itself.
Timothy, shall we move this bug to another component, or better file a new one for the underlying memory footprint?
Assignee | ||
Comment 14•5 years ago
|
||
Sorry I think that this question actually should have still gone to Matt.
Comment hidden (Intermittent Failures Robot) |
Comment 16•5 years ago
|
||
I think we should close this for now, since the bug as reported no longer happens.
I don't have any great ideas right now that are better than just using close(), so I think it's fine to just keep doing that for now.
Updated•5 years ago
|
Assignee | ||
Comment 17•5 years ago
|
||
Maybe 32bit systems are gone at some point, so that we don't have to think more about it... Anyway, works for me for the time being.
Updated•2 years ago
|
Description
•