Closed Bug 1432309 Opened 7 years ago Closed 7 years ago

Text rendering corruptions with WebRender

Categories

(Core :: Graphics: WebRender, defect, P1)

x86_64
All
defect

Tracking

()

RESOLVED FIXED
mozilla60
Tracking Status
firefox-esr52 --- unaffected
firefox58 --- unaffected
firefox59 --- unaffected
firefox60 --- disabled

People

(Reporter: linuxhippy, Assigned: Gankra)

References

(Blocks 1 open bug, )

Details

(Keywords: nightly-community)

Attachments

(6 files)

Attached image Bildschirmfoto_2018-01-22_22-47-56.png (deleted) —
User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:59.0) Gecko/20100101 Firefox/59.0 Build ID: 20180122100120 Steps to reproduce: 1. Used Firefox for several hours with WebRender enabled (youtube playback, gmail, ...) 2. browsed to planet3dnow.de Actual results: 3. text was garbled (looked somehow like klingon), but fine during layer transitions (e.g. at 0:02 and 0:12) : https://youtu.be/k-qJVguTvWI 4. after some time I noticed the same text issues in the chrome of other tabs while content was fine -> looks like a glyph cache corruption Expected results: text should always render fine
Nightly 59 x64 20180122100120 de_DE @ Debian Testing (KDE, Radeon RX480) I can't reproduce this, but other people saw similar.
Component: Untriaged → Graphics: WebRender
OS: Unspecified → Linux
Product: Firefox → Core
Hardware: Unspecified → x86_64
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Linux → All
Yes, I got something like this on dragging a tab out to create a new (second) window
Assignee: nobody → a.beingessner
Priority: -- → P1
Attached image Screenshot_20180125_201010.png (deleted) —
I copied the revision hash from bug 1432101 comment 4 to get the previous WebRender update from bug 1430829. mozregression --repo autoland --launch 6d9dc65ca0ed1a374dde7592a5b4191a7a10759c --pref gfx.webrender.all:true gfx.webrender.hit-test:true general.autoScroll:true privacy.trackingprotection.enabled:true startup.homepage_welcome_url:"https://www.planet3dnow.de/cms/|https://www.planet3dnow.de/cms/|https://www.planet3dnow.de/cms/" I moved the second and third tab out of the window to create two new windows, so I have three. The third window was broken like in bug 1431955 comment 4. Then I moved the tab from the second window back into the first window and switched to the first tab. That's what you can see here. In an earlier test those social buttons looked exactly the same broken as in comment 0. It's the same first bad revsion as in bug 1431955 comment 0.
Attached video 2018-01-25_20-32-08.mp4 (deleted) —
mozregression --repo autoland --launch 6d9dc65ca0ed1a374dde7592a5b4191a7a10759c --pref gfx.webrender.all:true general.autoScroll:true privacy.trackingprotection.enabled:true startup.homepage_welcome_url:"https://www.planet3dnow.de/cms/|https://www.planet3dnow.de/cms/|https://www.planet3dnow.de/cms/" I could reproduce this even without hit-test :(, but with the exact same steps. Now I'll try to find the regressing range.
Flags: needinfo?(linuxhippy)
I should note that I moved the second and third tab out of the first window while everything was still loading.
Attached image Screenshot_20180125_204857.png (deleted) —
It's still bad with kat's try build from bug 1432541 comment 2. But I got my scrollbar red, lol. mozregression --repo try --launch 7c443bd0fa3951123aa8b21c06cece0c8fb3b386 --pref gfx.webrender.all:true general.autoScroll:true privacy.trackingprotection.enabled:true startup.homepage_welcome_url:"https://www.planet3dnow.de/cms/|https://www.planet3dnow.de/cms/|https://www.planet3dnow.de/cms/"
This issue definitivly has something to do with multiple windows. When Firefox restores previously open windows/tabs after each upgrade, I get the corruption right after startup.
Can also confirm this is still happening on macos by just snapping tabs off to new windows.
this issue can easily reproduced by playing youtube videos in two separate windows.
Running with MESA_DEBUG=1 errors start shortly after opening multiple windows. I'll disable WebRender until this issue is fixed. WebRender - OpenGL version new 4.5 (Core Profile) Mesa 17.2.4 WebRender - OpenGL version new 4.5 (Core Profile) Mesa 17.2.4 WebRender - OpenGL version new 4.5 (Core Profile) Mesa 17.2.4 WebRender - OpenGL version new 4.5 (Core Profile) Mesa 17.2.4 Mesa: User error: GL_INVALID_OPERATION in glBindTexture(non-gen name) Mesa: 1 similar GL_INVALID_OPERATION errors Mesa: User error: GL_INVALID_OPERATION in glTexSubImage2D(invalid texture image) Mesa: 2 similar GL_INVALID_OPERATION errors Mesa: User error: GL_INVALID_OPERATION in glBindFramebuffer(buffer) Mesa: User error: GL_INVALID_VALUE in glUseProgram Mesa: User error: GL_INVALID_OPERATION in glUniformMatrix(program not linked) Mesa: 2 similar GL_INVALID_OPERATION errors Mesa: User error: GL_INVALID_OPERATION in glBindTexture(non-gen name) Mesa: 1 similar GL_INVALID_OPERATION errors Mesa: User error: GL_INVALID_OPERATION in glBindFramebuffer(buffer) Mesa: User error: GL_INVALID_VALUE in glUseProgram Mesa: User error: GL_INVALID_OPERATION in glUniformMatrix(program not linked) Mesa: 2 similar GL_INVALID_OPERATION errors Mesa: User error: GL_INVALID_OPERATION in glBindTexture(non-gen name) Mesa: User error: GL_INVALID_VALUE in glUseProgram Mesa: User error: GL_INVALID_OPERATION in glUniformMatrix(program not linked) Mesa: 2 similar GL_INVALID_OPERATION errors Mesa: User error: GL_INVALID_VALUE in glUseProgram Mesa: User error: GL_INVALID_OPERATION in glUniformMatrix(program not linked) Mesa: 2 similar GL_INVALID_OPERATION errors Mesa: User error: GL_INVALID_VALUE in glUseProgram
Could this be as simple as Gecko not ensuring make_current is called on the right GL context before calling wr.update() and wr.render() ? The GL errors above are what I'd expect to see if that was occurring, and it could certainly cause the exact visual corruption that can be seen in the screenshots above.
I added some logging which confirms the theory above. It appears that, at least in some cases, moving a tab to another window results in incorrect GL context at some point. What I did (this is Linux specific, but the same idea applies on Mac/Windows using the correct APIs). * Add a call to glXGetCurrentContext() in Renderer::new() and store that pointer. * Add an assert in renderer.update() and renderer.render() that glXGetCurrentContext() returns the same pointer as stored in new(). When running Gecko in single window mode, I can browse normally without assertions firing. As soon as I drag a tab out to a new window, I get an assert in renderer.update() that there is a GL context mismatch. It's slightly tricky to add this validation to WR itself, since glXGetCurrentContext() and friends are platform-specific. I imagine Gecko already has a platform independent way to get the current GL context? If so, we could add a validation inside Gecko before calling any of those functions above. I'm fairly confident this will explain all the corruption we're seeing in various bugs.
It looks like we're not calling makeCurrent before calling wr_renderer_update in RendererOGL::Update. Apparently this has always been wrong, but this is more likely to cause a problem with recent versions of webrender. Currently testing a patch that adds it.
Ah, it seems like a regression of Bug 1328602. Before that, renderer.update() and renderer.render() were called after MakeCurrent(). But Bug 1328602 made renderer.update() to be called before MakeCurrent() :(
Blocks: 1328602
Pushed up a fairly naive patch, let me know if there's a better way to handle this. Hard to be certain it fixes it because repro is so inconsistent, but I haven't broken it yet.
I also confirmed that the problem is addressed in window10 with the change. I confirmed it with youtube playback on multiple tabs. When I opened multiple tabs with youtube playback then drag the tab for opening new window, I could reproduce the problem easily. To fix the problem, it seems safer to call wr_renderer_update() just before calling wr_renderer_render() in RendererOGL::Render(). and remove RendererOGL::update().
Attachment #8947333 - Flags: review?(sotaro.ikeda.g)
Attachment #8947333 - Flags: review?(nical.bugzilla)
Attachment #8947333 - Flags: review?(bugmail)
This certainly seems plausible. Sotaro or nical would be better reviewers though.
Patch updated to sotaro's suggested implementation. Try build with linux+win+mac: https://treeherder.mozilla.org/#/jobs?repo=try&revision=b99243dfc5b1207447d147bdaa2d64936fde904b (haven't finished building locally as I needed to rebase)
Comment on attachment 8947333 [details] Bug 1432309 - ensure the GL context is current when updating wr. https://reviewboard.mozilla.org/r/217062/#review222900 Looks good and I confirmed the patch worked locally on Windows 10 :)
Attachment #8947333 - Flags: review?(sotaro.ikeda.g) → review+
Keywords: checkin-needed
Pushed by kgupta@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/c7dffbc2563a ensure the GL context is current when updating wr. r=sotaro
Keywords: checkin-needed
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla60
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: