Get reftests and crashtests running on geckoview-qr
Categories
(Core :: Graphics: WebRender, enhancement, P3)
Tracking
()
Tracking | Status | |
---|---|---|
firefox69 | --- | fixed |
People
(Reporter: kats, Assigned: kats)
References
(Depends on 6 open bugs, Blocks 1 open bug)
Details
(Whiteboard: [gfx-noted][wr-amvp][wr-q2])
Attachments
(6 files)
We should get reftests running in automation for GeckoView with WebRender enabled. This bug tracks that work (will likely turn into a metabug)
Assignee | ||
Comment 1•6 years ago
|
||
Link to a recent try run of reftests on GeckoView by gbrown (for my future reference): https://treeherder.mozilla.org/#/jobs?repo=try&tier=1%2C2%2C3&revision=a50af786c7dfe77cb535e2ab698c4efde81f23e5
Updated•6 years ago
|
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Comment 2•6 years ago
|
||
I tried adding geckoview QR jobs:
Looks like the emulator/AVD that we're using doesn't support GL ES 3.0 so we either need to upgrade that or find some other solution.
Assignee | ||
Comment 3•6 years ago
|
||
Specifically in the logcat I see this output:
04-03 15:30:57.533 1046 1046 I SurfaceFlinger: OpenGL ES informations:
04-03 15:30:57.533 1046 1046 I SurfaceFlinger: vendor : Google (Google Inc.)
04-03 15:30:57.533 1046 1046 I SurfaceFlinger: renderer : Android Emulator OpenGL ES Translator (Google SwiftShader)
04-03 15:30:57.533 1046 1046 I SurfaceFlinger: version : OpenGL ES 2.0 (OpenGL ES 3.0 SwiftShader 4.0.0.1)
04-03 15:30:57.533 1046 1046 I SurfaceFlinger: extensions: GL_EXT_debug_marker GL_OES_EGL_image GL_OES_EGL_image_external GL_OES_depth24 GL_OES_depth32 GL_OES_element_index_uint GL_OES_texture_float GL_OES_texture_float_linear GL_OES_compressed_ETC1_RGB8_texture GL_OES_depth_texture GL_OES_texture_half_float GL_OES_texture_half_float_linear GL_OES_packed_depth_stencil GL_OES_standard_derivatives GL_OES_texture_npot GL_OES_rgb8_rgba8 ANDROID_EMU_CHECKSUM_HELPER_v1 GL_OES_vertex_array_object ANDROID_EMU_gles_max_version_2
04-03 15:30:57.533 1046 1046 I SurfaceFlinger: GL_MAX_TEXTURE_SIZE = 8192
04-03 15:30:57.533 1046 1046 I SurfaceFlinger: GL_MAX_VIEWPORT_DIMS = 8192
and then when we start Gecko we get this:
04-03 15:32:01.680 2451 2467 D EGL_emulation: eglCreateContext: 0xe1584440: maj 2 min 0 rcv 2
04-03 15:32:01.680 2451 2467 D EGL_emulation: eglMakeCurrent: 0xe1584440: ver 2 0 (tinfo 0xe1593d70)
04-03 15:32:01.690 2451 2500 I Gecko : [GFX1-]: Failed to create EGLConfig!
04-03 15:32:01.690 2451 2500 I Gecko : [GFX1-]: Failed GL context creation for WebRender: 0x0
04-03 15:32:01.690 2451 2500 I Gecko : [GFX1-]: Failed to create EGLConfig!
04-03 15:32:01.690 2451 2500 I Gecko : [GFX1-]: Failed GL context creation for WebRender: 0x0
04-03 15:32:01.890 2451 2467 I Gecko : 1554301921890 Marionette TRACE Received observer notification command-line-startup
04-03 15:32:01.900 2451 2467 W ResourceType: Too many attribute references, stopped at: 0x01010099
04-03 15:32:01.910 2451 2500 D : HostConnection::get() New Host Connection established 0xcafc66c0, tid 2500
04-03 15:32:01.920 2451 2500 E EGL_emulation: eglCreateContext: EGL_BAD_CONFIG: no ES 3 support
04-03 15:32:01.920 2451 2500 E EGL_emulation: tid 2500: eglCreateContext(1404): error 0x3005 (EGL_BAD_CONFIG)
04-03 15:32:01.920 2451 2500 I Gecko : [GFX1-]: Failed to create EGLContext!: 0x3005
04-03 15:32:01.920 2451 2500 I Gecko : [GFX1-]: Failed GL context creation for WebRender: 0x0
04-03 15:32:01.920 2451 2500 I Gecko : [GFX1-]: Failed to get shared GL context
04-03 15:32:01.920 2451 2500 E EGL_emulation: eglCreateContext: EGL_BAD_CONFIG: no ES 3 support
Comment 4•6 years ago
|
||
:aerickson -- Can you investigate? I found, but did not verify, this tip: https://stackoverflow.com/questions/40797975/android-emulator-and-opengl-es3-egl-bad-config. I would think the existing emulator and avd would not need an upgrade for this, but who knows?
Updated•6 years ago
|
Comment 5•6 years ago
|
||
(In reply to Geoff Brown [:gbrown] from comment #4)
I would think the existing emulator and avd would not need an upgrade for this
Actually, our deployed emulator is version 27.3.10, and https://developer.android.com/studio/releases/emulator indicates there were egl improvements in version 28.0.16, so maybe an emulator (sdk) update is the first thing to try.
Assignee | ||
Comment 6•6 years ago
|
||
Is this something I can try? It looks like we get the emulator and AVDs out of tooltool but I'm not sure how to go about testing an updated version.
Comment 7•6 years ago
|
||
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #6)
Is this something I can try?
Without hacking to get around tooltool, you'd need to create an updated emulator/sdk archive, upload to tooltool, then update the manifest in a try push. Maybe best to leave it to Andrew?
Comment 8•6 years ago
|
||
Yes, I will investigate. I've created Bug 1541955 for tracking.
Assignee | ||
Comment 9•6 years ago
|
||
I tried using the patch on bug 1541955 and I get the same results. I wonder if there's something else we need to do (e.g. passing additional flags to the emulator) to enable ES 3. I'll try experimenting locally.
Assignee | ||
Comment 10•6 years ago
|
||
Even with emulator 28.0.23 installed locally, it looks like I still need to add GLESDynamicVersion = on
to the ~/.android/advancedFeatures.ini
file in order to get GL ES 3. According to this thread they are whitelisting host GPUs and so presumably my host GPU (and whatever we're using in automation) is not whitelisted.
Assignee | ||
Comment 11•6 years ago
|
||
With the GLESDynamicVersion thing added in automation via bug 1541955, the reftests are running, but a lot of things are failing. Including some sanity tests and such. So there's some work to do to investigate why that's happening. A lot of stuff is rendering with a black background on all/most of the page, and I'm not entirely sure why.
Updated•6 years ago
|
Assignee | ||
Comment 12•6 years ago
|
||
Quick update: I'm still seeing a lot of black stuff when running in the emulator. Getting a WR capture (using a x86_64 android build, because of bug 1546516) didn't show the problem - the capture rendered fine on desktop. Which is not totally surprising, but it eliminates the geckoview test app as a source of the problem. More likely the problem is in WR or the GLES implementation in the emulator.
To narrow this down I tried running reftests on a Pixel 2 device (I had a detour to root it). All the snapshots were coming out blank, for which I filed bug 1547097.
Assignee | ||
Comment 13•6 years ago
|
||
I got the reftests running on a Pixel 2 device, but was seeing intermittent failures from nondeterminism somewhere in the pipeline. I was just running the reftest-sanity suite, and getting ~15 failures, mostly to do with text rendering. I looked at one of the simpler ones and found that it would fail intermittently, even if I ran it as == div.html div.html
which should always render exactly the same. This obviously points to nondeterminism in the rendering, but I wasn't sure if it was in WR code or in the pixel 2 graphics driver/stack.
After that I tried to get the wrench reftests running on the emulator in the hopes that it would help narrow down problems. After some fiddling I got those working. I ran into bug 1547833 which is easy to work around for now, but am also running into a problem where tex_sub_image_3d_pbo
is returning a GL error 0x502. This affects multiple tests in the wrench/reftests/image/
directory. I tried an emulator image based on Android 9 in the hopes that a newer implementation wouldn't have this problem but it still did.
Updated•6 years ago
|
Assignee | ||
Comment 14•6 years ago
|
||
Other than the tex_sub_image_3d_pbo
problem I ran into bug 1548092, bug 1548099, bug 1548131, and another assertion failure due to a reftest being too wide for the pixel 2 screen dimensions that I'm running with. There were also a bunch of test failures that I haven't yet looked at.
Assignee | ||
Comment 15•6 years ago
|
||
There were also a bunch of test failures that I haven't yet looked at.
I suspect a lot of these are because I'm not running in headless mode. Doing that involves cross-compiling osmesa for android which I'll probably have to do eventually but it was really painful with macOS so I'm procrastinating having to tackle that.
Assignee | ||
Updated•6 years ago
|
Updated•5 years ago
|
Updated•5 years ago
|
Assignee | ||
Comment 16•5 years ago
|
||
Unwinding the stack here a bit...
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #15)
I suspect a lot of these are because I'm not running in headless mode. Doing that involves cross-compiling osmesa for android which I'll probably have to do eventually but it was really painful with macOS so I'm procrastinating having to tackle that.
We decided not to do this, and instead just annotate the failures. I have wrench reftests running in CI now on the emulator, and patches are up in bug 1555479 to run them on a pixel 2 device in CI as well. So far I have NOT seen any evidence of nondeterminism in the results.
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #13)
I got the reftests running on a Pixel 2 device, but was seeing intermittent failures from nondeterminism somewhere in the pipeline.
I'll try this again and see if I can still reproduce the nondeterminism. If there's no nondeterminism in the wrench reftests, but there is in the gecko reftests, then that's going to be tricky to deal with. I already verified that the display list emitted by gecko for the intermittent reftests are deterministic, so the nondeterminism must be coming from some sort of complex interaction between different parts.
Assignee | ||
Comment 17•5 years ago
|
||
Another update: we now have wrench reftests running in CI on both emulator and device (pixel 2 running Android 8). So far no evidence of nondeterminism there. I also just pushed the land button for gecko reftests on non-WR geckoview in the emulator (in bug 1501582) which will serve as a baseline for the corresponding WR-enabled reftests. I did find a bunch of nondeterminism and I don't know which component is producing that.
Anyway, I'll do another try run with gecko reftests on WR-enabled geckoview and see what it looks like now.
Assignee | ||
Comment 18•5 years ago
|
||
The nondeterminism when running on a Pixel2 is quite troublesome. I've done a few rounds of annotations and try pushes and I'm still getting lots of fuzzy failures: https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&selectedJob=252314928&revision=338fba58c190d5ae04d4c2c89b1aea9ffe80786d
Assignee | ||
Comment 19•5 years ago
|
||
I tried modifying the reftest harness for the webrender && geckoview
case to just eat maxDifference values of 1, and that seems to work better. Instead of annotating a bazillion tests that are constantly shifting I just need to annotate a much smaller mostly-constant set.
Assignee | ||
Comment 20•5 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=f52cc2c0241aadfacf7aae7de150f80b62440a5d is looking much better. Still a few random intermittents but mostly now just hitting the crasher bugs that are marked as deps of this one.
Assignee | ||
Comment 21•5 years ago
|
||
Latest try push is at https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&tier=1%2C2%2C3&revision=d4bc947ea7f3291625cdb029c6c893c1537af929 and has the patches rebased on top of bug 1558598. I'm tempted to increase the autofuzz from 1 to 2 because I'm still getting a trickle of intermittents with maxDifference=2. Anyway I'll wait for some of the dependencies to land while I try and debug the crash in bug 1560367.
Assignee | ||
Comment 22•5 years ago
|
||
I increased the autofuzz to 2. New try push:
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 23•5 years ago
|
||
I disabled a bunch of tests that were triggering crashes. Opt is now green, debug is still hitting bug 1559958 randomly.
Assignee | ||
Comment 24•5 years ago
|
||
Just a few more fuzz tweaks (which I've done locally) and we're good to go.
Assignee | ||
Comment 25•5 years ago
|
||
Assignee | ||
Comment 26•5 years ago
|
||
Due to the sheer number of tests that exhibit a random fuzz with maxDifference=1
and maxDifference=2 with WR on Android, it's easier to just tweak the harness
to autofuzz these away. This adds machinery to do so, and also adds a new
annotation that can be used to disable the autofuzzing on specific tests.
Depends on D36794
Assignee | ||
Comment 27•5 years ago
|
||
Depends on D36796
Assignee | ||
Comment 28•5 years ago
|
||
Depends on D36797
Assignee | ||
Comment 29•5 years ago
|
||
Depends on D36798
Assignee | ||
Comment 30•5 years ago
|
||
Only enabled on try/m-c as tier-2 for now, per email discussion, to minimize
load on bitbar Pixel 2 devices.
Depends on D36799
Comment 31•5 years ago
|
||
Comment 32•5 years ago
|
||
Backed out 6 changesets (bug 1525314) for reftest failures at reftests/svg/filters/css-filters/saturate-zero.html
Backout: https://hg.mozilla.org/integration/autoland/rev/2e7a7f345b274b29bf742619daafc81428ff4ca9
Failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=254829162&repo=autoland&lineNumber=36505
Assignee | ||
Comment 33•5 years ago
|
||
I had a typo, webrender&&!webrender
instead of webrender&&!geckoview
. Whoops.
Comment 34•5 years ago
|
||
Comment 35•5 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/7032c413182e
https://hg.mozilla.org/mozilla-central/rev/4f43e8655fef
https://hg.mozilla.org/mozilla-central/rev/b9b49a1f5e97
https://hg.mozilla.org/mozilla-central/rev/065c8eee9249
https://hg.mozilla.org/mozilla-central/rev/4c912cace666
https://hg.mozilla.org/mozilla-central/rev/a1666a9348ce
Updated•5 years ago
|
Description
•