Closed
Bug 769949
Opened 12 years ago
Closed 12 years ago
With WebGL antialiasing enabled, leaving a WebGL FBO bound causes broken compositing
Categories
(Core :: Graphics: Layers, defect)
Tracking
()
VERIFIED
FIXED
mozilla16
People
(Reporter: epinal99-bugzilla2, Assigned: jgilbert)
References
(Blocks 1 open bug, )
Details
(Keywords: regression, Whiteboard: [games:p1])
Attachments
(3 files, 1 obsolete file)
(deleted),
text/html
|
Details | |
(deleted),
patch
|
bjacob
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
jgilbert
:
review+
lsblakk
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
STR
1) Be sure HWA is enabled
2) Open Tony Lee WebGL demo (textures and meshes on 3D model)
http://sandbox.ayzenberg.com/creative_prototypes/html5_/webgl/prototypeEightMe.html
3) You need to wait 5-10 sec to see the 3D model rotating
Once you are on the demo page, you can toggle through different textures by clicking anywhere on the tab.
More details: http://labs.ayzenberg.com/2012/03/html5-webgl-demo-tony-lee/
Result:
With HWA on, the 3D model stays frozen and doesn't rotate with the mouse and clicking anywhere on the page doesn't change the texture.
With HWA off, the demo plays fine.
In FF13, the demo plays fine with HWA on/off.
Mozregression range:
m-c
good=2012-05-04
bad=2012-05-05
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=2db9df42823d&tochange=0a48e6561534
m-i
good=2012-05-02
bad=2012-05-03
http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=c3813fbb1c9a&tochange=de5745bce8bc
Suspected bug:
Nicholas Cameron - Bug 716439 - (GPU-clipping-rounded) Implement clipping to rectangles with rounded corners on the GPU
Updated•12 years ago
|
status-firefox15:
--- → affected
status-firefox16:
--- → affected
tracking-firefox15:
--- → ?
tracking-firefox16:
--- → ?
Assignee: nobody → ncameron
Comment 1•12 years ago
|
||
I seem to get different results than reported on comment 0:
- with default settings (layers accel enabled), the demo plays "fine" except that it's extremely slow (have to wait far more than 10 seconds to see the results of mouse interaction).
- with layers.acceleration.disabled=true, the canvas seems to remain blank black.
This with NVIDIA 296.88 driver on Windows 7 64bit, 32bit Nightly from 06-29.
Comment 2•12 years ago
|
||
Ah, wait!
With layers.acceleration.disabled, switching to another tab and coming back to the Tony Lee tab seems to immediately "fix" the rendering.
Yes but the 3D model move stays "frozen" and is'nt smooth like with FF13.
Comment 4•12 years ago
|
||
Does anyone else get an ASSERT thrown by Texture.cpp:241 in the Angle library when running this e.g. in a debug build? I get it fairly often, but not every run. I see similar behaviour to bjacob with HWA disabled, and as reported for HWA enabled. Looking into this now...
Comment 5•12 years ago
|
||
Actually, I get the same behaviour with and without accelerated layers, it is frozen and generally borked, but switching tabs causes an update. I can switch textures with accelerated layers on or off, but only with a tab change. Turning Direct2D off fixes everything.
Comment 6•12 years ago
|
||
This webgl demo, http://schumann.elis.ugent.be/#, seems to work fine with HWA, but not without, failing in the the same way as the reported one, again it is fine with or without HWA if D2D is turned off, so I assume it is the same bug. Not sure what that is though.
Comment 7•12 years ago
|
||
On the HWA side, not using a shareHandle to make mTexture in CanvasLayerD3D10 and leaving mUsingSharedTexture false (around line 85 in CanvasLayerD3D10.cpp) makes the problem go away. I couldn't find an equivalent fix for basic layers. I don't understand why, and I couldn't 'fix' the problem by doing anything different when updating or rendering the canvas (e.g., getting a fresh shareHandle and mTexture or reattaching the shader resource view).
I tried a few more prefs (in particular Azure content), but the only thing that seems to make a difference is Direct2D, so I'm assuming this problem is some interaction between Direct2D and WebGL/ANGLE. I can't see anything in the mask layers patches which would cause this, and preventing most of the mask layers code from executing does not fix this. But, I agree that it still looks like the most likely thing to affect this in the regression range.
I think the precise problem is that the webgl canvas is not repainting unless it is re-initialised.
The failed ANGLE assertion (Texture.cpp:241) is extremely intermittent and only presents in debug builds (obviously it won't assert in an optimised build but it could bork, I can't recover after the assertion in a debug build). Possibly due to time? WebGL in a debug build really crawls. I can't work out if it is related to the bug here or not, but the bug occurs always, and the assertion not very often, so possibly not. I don't know what causes the assertion.
I'm pretty stuck with this. I'm going to leave it for today and have another look tomorrow, I'd really appreciate it if anyone has any ideas for things to look at/try.
Comment 8•12 years ago
|
||
[Triage comment]
Tracking, since it's an unshipped regression so getting this fixed before release is ideal.
Nick, you can try to run another WebGL demo of Tony Lee if it's more "simple" to find the bug like this one about LOTR Ring:
http://sandbox.ayzenberg.com/creative_prototypes/html5_/webgl/prototypeEightLOD.html
On FF16, the ring doesn't rotate slowly and stays frozen.
Comment 10•12 years ago
|
||
I did some inbound builds this morning, and I can confirm that it is something in the mask layers (bug 716439) patches that is causing this problem, trying to work out exactly what now...
Comment 11•12 years ago
|
||
narrowed it down to changeset f7b8deeb0cc4.
I didn't mean to change the tracking flags, but can't change them back, sorry :-(
Comment 12•12 years ago
|
||
The problem appears to be that the webgl canvas is being rendered as an active layer (Canvas layer), whereas before the mask layers patch it was treated as inactive and rendered on a Thebes layer. No idea why that causes the breakage yet though.
Comment 13•12 years ago
|
||
Turning bug 618892 off fixes this, and doesn't break other WebGL examples, but I guess we lose speed. No idea why D2D + pbuffers + Tony Lee breaks.
Comment 14•12 years ago
|
||
Maybe this path breaks the DidTransactionCallback mechanism? The WebGL implementation needs to know when compositing occurred, and knows that when the DidTransactionCallback function is called. If this callback isn't properly called by this path, that could explain it.
Comment 15•12 years ago
|
||
(In reply to Benoit Jacob [:bjacob] from comment #14)
> Maybe this path breaks the DidTransactionCallback mechanism? The WebGL
> implementation needs to know when compositing occurred, and knows that when
> the DidTransactionCallback function is called. If this callback isn't
> properly called by this path, that could explain it.
Thanks for the tip. I checked it though, and it seems to be getting called just fine.
Comment 16•12 years ago
|
||
I think bug 725747 may have broken this, commenting out http://mxr.mozilla.org/mozilla-central/source/gfx/gl/GLContext.h#1115 seems to fix this problem. Perhaps the internal FBO needs to be reset before calling this method or something?
That's very bad if that fixes it -- it means that some of the FBO tracking is busted. I don't -fully- understand the internal/external FBO stuff, though I think I understand the intent. But gets called before any read operation, and I guess the check is to see if there's a user bound fbo or not. If there isn't, then it does a blit from the offscreen draw fbo to the offscreen read fbo if they're different, which, I admit, I don't understand when that would be the case.
Or wait, yes I do. It happens if we want to have a multisample front buffer, where we can't ReadPixels from directly -- we'd need to do a framebuffer resolve first, but we want to hide that fact from content (especially WebGL). Ok; so I'd do some dumps to see what framebuffer stuff that WebGL thing is doing, and then see where our bindings are going wrong.
My guess -- it's drawing to framebuffer 0 (the front/display buffer), but is leaving some other FBO bound at the end of the JS draw operation. If some code isn't being careful, that could cause this... I'd see what code is being read when fReadPixels (or raw_ReadPixels) is being called, and who's calling it, if we're going down the readpixels path. Otherwise there might be something similar going on in another path.
Whiteboard: [games:p1]
Comment 18•12 years ago
|
||
Aha --- setting webgl.msaa=0 (which disables WebGL multisampling) fixes it.
Comment 19•12 years ago
|
||
minimal test case
Updated•12 years ago
|
Summary: Tony Lee WebGL demo (textures and meshes on 3D model) is broken with hardware acceleration enabled → With WebGL antialiasing enabled, leaving a WebGL FBO bound causes broken compositing
Comment 20•12 years ago
|
||
Jeff: See Nick's Comment 16 and the testcase.
Assignee: ncameron → jgilbert
Comment 21•12 years ago
|
||
I think I have a fix for this, it seems to me that BeforeGLReadCall should blit the offscreen read to the offscreen draw buffer if we are about to do a glFinish, even if the offscreen buffer is not currently bound. This certainly fixes the original Tony Lee example and bjacob's test case, not 100% sure it is right though.
I've pushed to Try: https://tbpl.mozilla.org/?tree=Try&rev=fdf1104da540
Comment 22•12 years ago
|
||
could do with a comment, but just to get the idea of what I think the fix might be
Attachment #639221 -
Flags: review?(jgilbert)
Reporter | ||
Comment 23•12 years ago
|
||
(In reply to Benoit Jacob [:bjacob] from comment #19)
> Created attachment 639217 [details]
> testcase
>
> minimal test case
There is something I don't understand. Why does your testcase not work with FF13 while Tony Lee's demo seems to work fine with FF13?
Indeed, mozregression found a different range for your testcase:
m-i
good=2012-03-12
bad=2012-03-13
http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=eceb16d4057b&tochange=d4496bf4bb64
Is this bug in 2 regression steps?
Comment 24•12 years ago
|
||
Pre-mask layers, any object with rounded corners was forced to be rendered to an inactive layer, this uses a different code path for canvases, so the Tony Lee demo rendered fine. With the mask layers patches (in the regression range), the Tony Lee demo got it's own layer (with a mask, but the mask has nothing to do with the bug), and so you get the same effects (bug) as in the minimal test case. So, the rounded corners hid the underlying bug which was exposed when mask layers where introduced, the mask layers didn't introduce the bug. You can test this by removing the rounded corners from the demo, and it breaks in the same way as for more recent versions of FF.
I haven't checked the regression range for the test case, but I suspect it will be for one of the patches which improved the FBO handling.
Reporter | ||
Comment 25•12 years ago
|
||
(In reply to Nick Cameron [:nrc] from comment #24)
> I haven't checked the regression range for the test case, but I suspect it
> will be for one of the patches which improved the FBO handling.
Thanks for the explanation.
Indeed, the suspected bug seems to be:
Jeff Gilbert — Bug 726396 - Repair ANGLE d3d share handle fetching an PBuffer creation behavior - r=bjacob
The patch touched code about FBO bindings.
(In reply to Nick Cameron [:nrc] from comment #21)
> I think I have a fix for this, it seems to me that BeforeGLReadCall should
> blit the offscreen read to the offscreen draw buffer if we are about to do a
> glFinish, even if the offscreen buffer is not currently bound. This
> certainly fixes the original Tony Lee example and bjacob's test case, not
> 100% sure it is right though.
>
> I've pushed to Try: https://tbpl.mozilla.org/?tree=Try&rev=fdf1104da540
Nope, not a good fix -- you're accidentally fixing the symptom, but without fixing the underlying bug. glFinish should have nothing to do with blitting nor should it care about what FBO is currently bound. Putting the BeforeGLReadCall wrapper around there probably causes the right front buffer to get updated at that time when the FBO bindings just happen to be "right".
Updated•12 years ago
|
Attachment #639221 -
Flags: review?(jgilbert) → review-
Assignee | ||
Comment 27•12 years ago
|
||
Comment on attachment 639221 [details] [diff] [review]
possible patch
Review of attachment 639221 [details] [diff] [review]:
-----------------------------------------------------------------
Vlad's right, that's not it. I'll take a look. My guess is something touched the somewhat-delicate FBO binding code.
Assignee | ||
Comment 28•12 years ago
|
||
Yeah, looks GuaranteeResolve() will not trigger a AA-resolve blit if we're bound to something other than 0 (from the user's point of view). Patch forthcoming.
Assignee | ||
Updated•12 years ago
|
Status: NEW → ASSIGNED
Assignee | ||
Updated•12 years ago
|
Assignee: jgilbert → nobody
Component: Graphics → Graphics: Layers
Assignee | ||
Updated•12 years ago
|
Assignee: nobody → jgilbert
Assignee | ||
Comment 29•12 years ago
|
||
GuaranteeResolved() must call BlitDirtyFBOs(). Previously, we were only calling fFinish(), which only blitted FBs if we were bound to FB '0'.
On my linux machine, bug confirmed, and fix verified.
Attachment #639221 -
Attachment is obsolete: true
Attachment #639525 -
Flags: review?(bjacob)
Assignee | ||
Comment 30•12 years ago
|
||
This affects all desktop platforms, but not mobile.
OS: Windows 7 → All
Hardware: x86_64 → All
Comment 31•12 years ago
|
||
Comment on attachment 639525 [details] [diff] [review]
patch
Review of attachment 639525 [details] [diff] [review]:
-----------------------------------------------------------------
I'm happy to see the "flush guarantees resolve" thing go away since it didn't seem to be used anyway; and happy too that fFinish() is no longer wrapped as a read-call.
Attachment #639525 -
Flags: review?(bjacob) → review+
Assignee | ||
Comment 32•12 years ago
|
||
Target Milestone: --- → mozilla16
Comment 33•12 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Updated•12 years ago
|
Updated•12 years ago
|
tracking-firefox16:
? → ---
Assignee | ||
Comment 34•12 years ago
|
||
I'll put together a backport of this, since it's really simple.
Comment 35•12 years ago
|
||
(In reply to Jeff Gilbert [:jgilbert] from comment #34)
> I'll put together a backport of this, since it's really simple.
Please nominate the backport for mozilla-beta approval now that 15 is on Beta. Since we have STR it would be great to get a fix landed in an earlier beta so we can get time to verify.
Comment 36•12 years ago
|
||
(In reply to Jeff Gilbert [:jgilbert] from comment #34)
> I'll put together a backport of this, since it's really simple.
Please do ASAP.
Assignee | ||
Comment 37•12 years ago
|
||
[Approval Request Comment]
Bug caused by (feature/regressing bug #): bug 726396
User impact if declined: Failure to render webgl sometimes under normal operation.
Testing completed (on m-c, etc.): m-c, on aurora.
Risk to taking this patch (and alternatives if risky): Low: This is well understood.
String or UUID changes made by this patch: None
Carry r+ forward from original patch.
Attachment #645564 -
Flags: review+
Attachment #645564 -
Flags: approval-mozilla-beta?
Comment 38•12 years ago
|
||
Comment on attachment 645564 [details] [diff] [review]
backport
Still early enough in beta cycle that we can take this fix for a regression.
Attachment #645564 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
Assignee | ||
Comment 39•12 years ago
|
||
Comment 40•12 years ago
|
||
Verified on Firefox 15 beta 3 that WebGL loads fine having HWA enabled - the demo from the Description plays fine and that the canvas from the test case attached in Comment 19 repaints constantly.
Verified on Windows 7, Ubuntu 12.04 and Mac OS X 10.7:
Mozilla/5.0 (Windows NT 6.1; rv:15.0) Gecko/20100101 Firefox/15.0
Mozilla/5.0 (X11; Linux i686; rv:15.0) Gecko/20100101 Firefox/15.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:15.0) Gecko/20100101 Firefox/15.0
Updated•11 years ago
|
Blocks: gecko-games
Comment 41•11 years ago
|
||
Verified fixed 32.0a1 (2014-05-18), Win 7 x64
Status: RESOLVED → VERIFIED
status-firefox32:
--- → verified
You need to log in
before you can comment on or make changes to this bug.
Description
•