TEST-UNEXPECTED-ERROR | testing/marionette/harness/marionette_harness/tests/unit/test_window_rect.py TestWindowRect.test_resize_larger_than_screen | OSError: Process has been unexpectedly closed (Exit code: 1) (Reason: No data received over socket)
Categories
(Core :: Widget: Gtk, defect)
Tracking
()
Tracking | Status | |
---|---|---|
firefox112 | --- | disabled |
People
(Reporter: rmader, Unassigned)
References
(Depends on 1 open bug, Blocks 1 open bug)
Details
test-linux1804-64-qr/debug-marionette-e10s
crashes on EGL with:
Gdk-Message: 02:28:25.677: firefox: Fatal IO error 11 (Resource temporarily unavailable) on X server :0.
https://treeherder.mozilla.org/logviewer?job_id=338769888&repo=try&lineNumber=33761
https://treeherder.mozilla.org/jobs?repo=try&revision=23bdfbfa47d3f82a174a96389bccb94cf57d6761&selectedTaskRun=HXxI_Q8_St2H_MXVE5MP0A.0
GLX run for comparison: https://treeherder.mozilla.org/jobs?repo=try&revision=b615e04059eb5415c28686722afcb31f49f2902e&selectedTaskRun=ADiHguSdTJuybJyJkDUGPg.2
Comment 1•4 years ago
|
||
Logs point to a GDK/X11 IO error here.
Reporter | ||
Comment 2•4 years ago
|
||
So IIUC this should be reproducible locally by running:
MOZ_X11_EGL=1 LIBGL_ALWAYS_SOFTWARE=1 ./mach marionette-test --enable-webrender testing/marionette/harness/marionette_harness/tests/unit/test_window_rect.py
I verified that the env variable is taken up - MOZ_ENABLE_WAYLAND=1
also works and produces the expected results (test including set_positon
fail there).
I can't reproduce the crash here - all I get is an occasional:
FAIL testing/marionette/harness/marionette_harness/tests/unit/test_window_rect.py TestWindowRect.test_resize_larger_than_screen - AssertionError: 1536 != 3072
But it happens on both, EGL and GLX (not on Wayland though). So it looks to me like this might be a mesa bug that's already fixed in newer versions (21.0 here).
Reporter | ||
Comment 3•3 years ago
|
||
This has not failed in two consecutive runs now: https://treeherder.mozilla.org/jobs?repo=try&revision=9f7ce2537ea59a4f8602d13138235e4556a35e1b&selectedTaskRun=Ldo9yD5fS4KBOgnkiFDPnA.0 and https://treeherder.mozilla.org/jobs?repo=try&revision=9efa5d079c3ec8c20625dea46b292a05b3c9ccfa&selectedTaskRun=H5O_9cTzSeCqy_BORjrL-A.1
Thus closing.
Reporter | ||
Comment 4•3 years ago
|
||
Ouch, unfortunately this was a typo by me: MOZ_x11_EGL
instead of MOZ_X11_EGL
. Once corrected, things fail again: https://treeherder.mozilla.org/jobs?repo=try&revision=c47509a1ed13f382b480cb4941b54ccc5801b833&selectedTaskRun=GgBWjnDoTPyIYXLksRYS-Q.0
Comment 5•3 years ago
|
||
bug 1723112 mentioned EGL and Marionette. Could bug 1712665 also play a role?
https://firefoxci.taskcluster-artifacts.net/GgBWjnDoTPyIYXLksRYS-Q/0/public/logs/live_backing.log
[task 2021-08-24T19:04:22.934Z] 19:04:22 INFO - [Parent 4599, Main Thread] WARNING: Failed to create EGLContext with khr_rbab_attribs: file /builds/worker/checkouts/gecko/gfx/gl/GLContextProviderEGL.cpp:735
[task 2021-08-24T19:04:22.935Z] 19:04:22 INFO - [Parent 4599, Main Thread] WARNING: Failed to create EGLContext with khr_robustness_attribs: file /builds/worker/checkouts/gecko/gfx/gl/GLContextProviderEGL.cpp:747
Is this relevant?
gnome-session-check-accelerated: GL Helper exited with code 512
gnome-session-check-accelerated: GLES Helper exited with code 512
gnome-session-binary[57]: WARNING: Could not get session id for session. Check that logind is properly installed and pam_systemd is getting used at login.
_IceTransmkdir: ERROR: euid != 0,directory /tmp/.ICE-unix will not be created.
gnome-session-binary[57]: WARNING: Could not parse desktop file nm-applet.desktop or it references a not found TryExec binary
gnome-keyring-daemon: insufficient process capabilities, insecure memory might get used
GNOME_KEYRING_CONTROL=/builds/worker/.cache/keyring-YPMT80
gnome-keyring-daemon: insufficient process capabilities, insecure memory might get used
gnome-keyring-daemon: insufficient process capabilities, insecure memory might get used
GNOME_KEYRING_CONTROL=/builds/worker/.cache/keyring-YPMT80
GNOME_KEYRING_CONTROL=/builds/worker/.cache/keyring-YPMT80
SSH_AUTH_SOCK=/builds/worker/.cache/keyring-YPMT80/ssh(gnome-shell:278): mutter-WARNING **: 18:55:38.099: Failed to use linear monitor configuration: Invalid mode 1600x1200 (-nan) for monitor 'unknown unknown'
(gnome-shell:278): mutter-WARNING **: 18:55:38.099: Failed to use fallback monitor configuration: Invalid mode 1600x1200 (-nan) for monitor 'unknown unknown'
Window manager warning: Display “:0†already has a window manager; try using the --replace option to replace the current window manager.gnome-session-binary[57]: WARNING: App 'org.gnome.Shell.desktop' exited with code 1(gnome-shell:332): mutter-WARNING **: 18:55:38.453: Failed to use linear monitor configuration: Invalid mode 1600x1200 (-nan) for monitor 'unknown unknown'
(gnome-shell:332): mutter-WARNING **: 18:55:38.454: Failed to use fallback monitor configuration: Invalid mode 1600x1200 (-nan) for monitor 'unknown unknown'
Window manager warning: Display “:0†already has a window manager; try using the --replace option to replace the current window manager.gnome-session-binary[57]: WARNING: App 'org.gnome.Shell.desktop' exited with code 1
gnome-session-binary[57]: WARNING: App 'org.gnome.Shell.desktop' respawning too quickly
gnome-session-binary[57]: CRITICAL: We failed, but the fail whale is dead. Sorry....
Reporter | ||
Comment 6•3 years ago
|
||
(In reply to Darkspirit from comment #5)
bug 1723112 mentioned EGL and Marionette. Could bug 1712665 also play a role?
bug 1723112 looks unrelated to me - there headless mode is used and libegl only gets mentioned because it runs in glxtest even in headless mode.
bug 1712665 could indeed play a role and we should fix it either way - will look into it.
Reporter | ||
Comment 7•3 years ago
|
||
Unlike bug 1709585 and bug 1709586, this issue can not be worked around by reverting bug 1684194. Thus it also appears to be unrelated to bug 1712665.
Comment 8•3 years ago
|
||
https://firefoxci.taskcluster-artifacts.net/dWu5td-QRL6ZyGGG11X_mw/0/public/logs/live_backing.log
[task 2021-08-25T09:59:38.115Z] executing ['/builds/worker/bin/test-linux.sh', '--setpref=toolkit.asyncshutdown.log=true', '--setpref=media.peerconnection.mtransport_process=false', '--setpref=network.process.enabled=false', '--allow-software-gl-layers', '--enable-webrender', '--setpref=layers.d3d11.enable-blacklist=false', '--download-symbols=true']
[task 2021-08-25T10:01:03.351Z] 10:01:03 INFO - 'MOZ_LAYERS_ALLOW_SOFTWARE_GL': '1',
[task 2021-08-25T09:59:38.121Z] ++ VERSION='18.04.5 LTS (Bionic Beaver)'
WARNING: GLX_swap_control unsupported, ASAP mode may still block on buffer swaps.: file /builds/worker/checkouts/gecko/gfx/gl/GLContextProviderGLX.cpp:225
WARNING: SGI_video_sync unsupported. Falling back to software vsync.: file /builds/worker/checkouts/gecko/gfx/thebes/gfxPlatformGtk.cpp:870
So is this LIBGL_ALWAYS_SOFTWARE=1 MOZ_X11_EGL=1 MOZ_WEBRENDER=1 ./firefox
on Ubuntu 18.04 with software vsync and dmabuf webgl enabled by default?
Testing on my Debian Testing:
GALLIUM_DRIVER=softpipe is super slow.
GALLIUM_DRIVER=llvmpipe is good. Hopefully llvmpipe is the default on Ubuntu 18.04?
The third option seems removed:
$ GALLIUM_DRIVER=swr LIBGL_ALWAYS_SOFTWARE=1 MOZ_X11_EGL=1 MOZ_WEBRENDER=1 ./firefox
[GFX1-]: glxtest: libEGL initialize failed
[GFX1-]: glxtest: X error, error_code=158, request_code=150, minor_code=6
[GFX1-]: glxtest: process failed (exited with status 1)
libGL error: failed to create dri screen
libGL error: failed to load driver: swrast
[GFX1-]: Failed GL context creation for WebRender: 0
[GFX1-]: FEATURE_FAILURE_WEBRENDER_INITIALIZE_UNSPECIFIED
[GFX1-]: Failed to connect WebRenderBridgeChild.
[GFX1-]: Fallback WR to SW-WR
No EGL and no GLX (bug 1680512) = SW-WR
bug 1709585 and bug 1709586 seem to run on hardware (Worker Group: mdc1, Worker ID: t-linux64-ms-011), but this one runs in Docker:
https://firefox-ci-tc.services.mozilla.com/tasks/dWu5td-QRL6ZyGGG11X_mw
Worker Group: us-east-1
Worker ID: i-0fa95d586b5398196)
docker-image-ubuntu1804-test
https://searchfox.org/mozilla-central/rev/00be3c92c269d789663791cf518161d0f47c9b96/taskcluster/ci/docker-image/kind.yml#56
https://searchfox.org/mozilla-central/source/taskcluster/docker/recipes/ubuntu1804-test-system-setup-base.sh
-> libegl-mesa0
could simply be missing here. This xvfb tutorial explicitly installs it.
Reporter | ||
Comment 9•3 years ago
|
||
(In reply to Darkspirit from comment #8)
...
->libegl-mesa0
could simply be missing here. This xvfb tutorial explicitly installs it.
To me it does not look like it directly fails - it only fails on specific tasks such as:
[task 2021-08-25T10:11:34.585Z] 10:11:34 INFO - 1629886294583 Marionette TRACE [37] MarionetteCommands actor created for window id 4294967297
[task 2021-08-25T10:11:34.588Z] 10:11:34 INFO - 1629886294588 Marionette DEBUG 21 <- [1,6,null,{"value":{"width":1600,"height":1200}}]
[task 2021-08-25T10:11:34.589Z] 10:11:34 INFO - 1629886294589 Marionette DEBUG 21 -> [0,7,"WebDriver:SetWindowRect",{"x":null,"y":null,"height":1040,"width":1280}]
[task 2021-08-25T10:11:34.591Z] 10:11:34 INFO - 1629886294591 Marionette DEBUG 21 <- [1,7,null,{"x":0,"y":0,"width":1280,"height":1040}]
[task 2021-08-25T10:11:34.592Z] 10:11:34 INFO - 1629886294592 Marionette DEBUG 21 -> [0,8,"WebDriver:SetWindowRect",{"x":null,"y":null,"height":2400,"width":3200}]
[task 2021-08-25T10:11:34.618Z] 10:11:34 INFO - 1629886294617 Marionette DEBUG 21 <- [1,8,null,{"x":0,"y":0,"width":3200,"height":2400}]
[task 2021-08-25T10:11:35.039Z] 10:11:35 INFO - Gdk-Message: 10:11:35.038: firefox: Fatal IO error 0 (Success) on X server :0.
One random guess of mine is that:
- the test runs on software mesa
- EGL may require more ram than GLX, e.g. because it's double buffered or so
- allocating very big buffers (3200x2400) thus gets us OOMed or simply failes
So one thing to try here could be increasing the RAM limit - maybe it works around the issue.
Reporter | ||
Comment 10•3 years ago
|
||
From matrix:
it would be --worker-override "t-linux-large=gecko-t/t-linux-xlarge"
or your can adjust instance-size from default -> xlarge
where to edit: https://searchfox.org/mozilla-central/source/taskcluster/ci/test/marionette.yml#49
example of using xlarge: https://searchfox.org/mozilla-central/source/taskcluster/ci/test/awsy.yml#8
A corresponding try run unfortunately still fails[1], BUT appears to get slightly further:
[task 2021-08-26T14:36:39.114Z] 14:36:39 INFO - 1629988599114 Marionette TRACE [37] MarionetteCommands actor created for window id 4294967297
[task 2021-08-26T14:36:39.121Z] 14:36:39 INFO - 1629988599120 Marionette DEBUG 21 <- [1,6,null,{"value":{"width":1600,"height":1200}}]
[task 2021-08-26T14:36:39.123Z] 14:36:39 INFO - 1629988599122 Marionette DEBUG 21 -> [0,7,"WebDriver:SetWindowRect",{"x":null,"y":null,"height":1040,"width":1280}]
[task 2021-08-26T14:36:39.126Z] 14:36:39 INFO - 1629988599125 Marionette DEBUG 21 <- [1,7,null,{"x":0,"y":0,"width":1280,"height":1040}]
[task 2021-08-26T14:36:39.128Z] 14:36:39 INFO - 1629988599127 Marionette DEBUG 21 -> [0,8,"WebDriver:SetWindowRect",{"x":null,"y":null,"height":2400,"width":3200}]
[task 2021-08-26T14:36:39.154Z] 14:36:39 INFO - 1629988599153 Marionette DEBUG 21 <- [1,8,null,{"x":0,"y":0,"width":3200,"height":2400}]
[task 2021-08-26T14:36:39.160Z] 14:36:39 INFO - 1629988599160 Marionette DEBUG 21 -> [0,9,"WebDriver:GetWindowRect",{}]
[task 2021-08-26T14:36:39.164Z] 14:36:39 INFO - 1629988599163 Marionette DEBUG 21 <- [1,9,null,{"x":0,"y":0,"width":3200,"height":2400}]
[task 2021-08-26T14:36:39.168Z] 14:36:39 INFO - 1629988599168 Marionette DEBUG 21 -> [0,10,"WebDriver:SetWindowRect",{"x":0,"y":0,"height":1040,"width":1280}]
[task 2021-08-26T14:36:39.174Z] 14:36:39 INFO - [Child 7127, Main Thread] WARNING: Scrolled rect smaller than scrollport?: file /builds/worker/checkouts/gecko/layout/generic/nsGfxScrollFrame.cpp:7067
[task 2021-08-26T14:36:39.175Z] 14:36:39 INFO - [Child 7127, Main Thread] WARNING: Scrolled rect smaller than scrollport?: file /builds/worker/checkouts/gecko/layout/generic/nsGfxScrollFrame.cpp:7070
[task 2021-08-26T14:36:39.187Z] 14:36:39 INFO - 1629988599186 Marionette DEBUG 21 <- [1,10,null,{"x":0,"y":0,"width":3200,"height":2400}]
[task 2021-08-26T14:36:39.195Z] 14:36:39 INFO - 1629988599194 Marionette DEBUG 21 -> [0,11,"WebDriver:ExecuteScript",{"script":"return document.fullscreenElement;","args":[],"newSandbox":true,"sandbox":null,"line":52,"filename":"tests/testing/marionette/harness/marionette_harness/tests/unit/test_window_rect.py"}]
[task 2021-08-26T14:36:39.206Z] 14:36:39 INFO - [Child 7127, Main Thread] WARNING: Scrolled rect smaller than scrollport?: file /builds/worker/checkouts/gecko/layout/generic/nsGfxScrollFrame.cpp:7067
[task 2021-08-26T14:36:39.206Z] 14:36:39 INFO - [Child 7127, Main Thread] WARNING: Scrolled rect smaller than scrollport?: file /builds/worker/checkouts/gecko/layout/generic/nsGfxScrollFrame.cpp:7070
[task 2021-08-26T14:36:39.686Z] 14:36:39 INFO - Gdk-Message: 14:36:39.685: firefox: Fatal IO error 11 (Resource temporarily unavailable) on X server :0.
So we might be on the right track here. As said in comment 2 I also suspect this to be a Mesa bug as I can't reproduce the issue locally - maybe an update to Ubuntu 20.04.3, which has a very recent Mesa (given the hardware enablement stack is enabled).
Reporter | ||
Updated•3 years ago
|
Reporter | ||
Comment 11•3 years ago
|
||
Correction: another run with xlarge
doesn't show us getting further, so the run above was probably an outlier. As the crash does not reproduce locally, I'll hope for bug 1725245 to maybe fix this.
Reporter | ||
Comment 12•3 years ago
|
||
Interestingly the test in question passes in an optimized build. It later on fails in test_resize_to_available_screen_size
, but that could be unrelated (possibly related to bug 1684194, need to confirm).
Reporter | ||
Comment 13•3 years ago
|
||
FTR, I tried again if I can reproduce the issue locally, but it does not reproduce on my mashine.
Reporter | ||
Comment 14•3 years ago
|
||
Moving this to bug 788319.
Reporter | ||
Comment 15•3 years ago
|
||
This can currently not be tested because of bug 1732671 Update: works again, that bug seems to fixed.
Reporter | ||
Comment 16•3 years ago
|
||
Some small updates:
- the test fails affect at least
test-linux1804-64-qr/debug-marionette-e10s
,test-linux1804-64-qr/debug-marionette-fis-e10s
,test-linux1804-64-qr/opt-marionette-e10s
andtest-linux1804-64-qr/opt-marionette-fis-e10s
, i.e. affect both debug and optimized builds. - They can be run via
./mach try fuzzy --full --env MOZ_X11_EGL=1
. - The test usually fails in
TestWindowRect.test_resize_larger_than_screen
(link) at[0,8,"WebDriver:SetWindowRect",{"x":null,"y":null,"height":2400,"width":3200}]
, but sometimes successfully finishes and only fails early in the followingTestWindowRect.test_resize_to_available_screen_size
test (link). - The test currently runs on llvmpipe, Mesa 20.0.8, LLVM 10.0.0 (Ubuntu 18.04.6).
- Local reproducer should AFAIK be:
MOZ_X11_EGL=1 LIBGL_ALWAYS_SOFTWARE=1 MOZ_WEBRENDER=1 ./mach marionette-test --enable-webrender testing/marionette/harness/marionette_harness/tests/unit/test_window_rect.py
. - Not yet sure if increasing the ram size (
--worker-override "t-linux-large=gecko-t/t-linux-xlarge"
) helps - if it does, not much.
Reporter | ||
Comment 17•3 years ago
|
||
This may be related to bug 1741956 / bug 1743551 which may get fixed in https://launchpad.net/ubuntu/+source/libx11/2:1.6.4-3ubuntu0.5
Comment 18•3 years ago
|
||
Please test disabling GLX vsync.
Reporter | ||
Comment 19•3 years ago
|
||
(In reply to Darkspirit from comment #18)
Please test disabling GLX vsync.
Already tried, doesn't help :(
Comment 20•3 years ago
|
||
Can you execute $ xrandr --listproviders
in a try build (bug 1742708)?
Reporter | ||
Comment 21•3 years ago
|
||
(In reply to Darkspirit from comment #20)
Can you execute
$ xrandr --listproviders
in a try build (bug 1742708)?
This is on a headless system AFAIK, i.e. no graphics hardware (mesa driver is llvmpipe).
Updated•2 years ago
|
Description
•