Closed
Bug 907463
Opened 11 years ago
Closed 11 years ago
Deal properly with IPDL abnormal shutdowns and fix some other OMTC d3d11 reftest bugs
Categories
(Core :: Graphics: Layers, defect)
Tracking
()
RESOLVED
FIXED
mozilla28
People
(Reporter: nrc, Assigned: nrc)
References
Details
Attachments
(3 files, 6 obsolete files)
(deleted),
patch
|
roc
:
review+
nrc
:
checkin+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
nical
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
Details | Diff | Splinter Review |
We have a whole bunch of reftest failures for d3d11 OMTC (between 18 and 24 depending on the platform). See https://tbpl.mozilla.org/?tree=Try&rev=8ce25d14a7a2 (but ignore the Win XP line).
There are (at first glance) three categories:
Large canvases, which usually pass but occasionally fail. Mark as random.
Gradients which are hard alternations. Mark as fail, see bug 582236.
Anti-aliasing errors for text. Not sure what to do here. I would like to understand the cause and why it does not manifest using MTC.
Assignee | ||
Comment 1•11 years ago
|
||
This makes me sad because I put a lot of work in to make this work, but it only works like 80% of the time.
Attachment #793191 -
Flags: review?(bas)
Assignee | ||
Comment 2•11 years ago
|
||
Attachment #793198 -
Flags: review?(bas)
Assignee | ||
Comment 3•11 years ago
|
||
https://tbpl.mozilla.org/?tree=Try&rev=8ce25d14a7a2
So the remaining reftests are all text. They all seem to be off by fuzz in anti-aliasing. I don't see that we are going down different rendering paths for the different cases so that suggests that maybe DirectWrite is non-deterministic. But we don't have these problems with the current code. Also it seems like a very similar set of tests fail on each run and with very, very similar numbers of wrong pixels, so things are a little predictable.
Any ideas for what is causing this? And for what we should do about it?
Flags: needinfo?(jdaggett)
Flags: needinfo?(bas)
Comment 4•11 years ago
|
||
Comment on attachment 793191 [details] [diff] [review]
patch 1: large canvases
Review of attachment 793191 [details] [diff] [review]:
-----------------------------------------------------------------
Wait... why does it only work 80% of the time, large canvases not always working is bad no?
Assignee | ||
Comment 5•11 years ago
|
||
(In reply to Bas Schouten (:bas.schouten) from comment #4)
> Comment on attachment 793191 [details] [diff] [review]
> patch 1: large canvases
>
> Review of attachment 793191 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> Wait... why does it only work 80% of the time, large canvases not always
> working is bad no?
As far as I can tell, it works 100% of the time in real life (subject to OOM, but there is not much we can do about that). It only passes tests 80% of the time, I presume also due to memory, but I'm not really sure why.
Comment 6•11 years ago
|
||
(In reply to Nick Cameron [:nrc] from comment #2)
> Created attachment 793198 [details] [diff] [review]
> patch 2: gradients
I need to understand why this happens, does this happen on current trunk D3D11 OMTC?
I'll have a look at the text as well, I suspect maybe we're not setting the ALLOW_SUBPIXEL_AA (or whatever the flag's called) correctly.
Assignee | ||
Comment 7•11 years ago
|
||
(In reply to Bas Schouten (:bas.schouten) from comment #6)
> (In reply to Nick Cameron [:nrc] from comment #2)
> > Created attachment 793198 [details] [diff] [review]
> > patch 2: gradients
>
> I need to understand why this happens, does this happen on current trunk
> D3D11 OMTC?
>
Yes, all these are for current trunk.
> I'll have a look at the text as well, I suspect maybe we're not setting the
> ALLOW_SUBPIXEL_AA (or whatever the flag's called) correctly.
Thanks!
Assignee | ||
Comment 8•11 years ago
|
||
(In reply to Bas Schouten (:bas.schouten) from comment #6)
> I'll have a look at the text as well, I suspect maybe we're not setting the
> ALLOW_SUBPIXEL_AA (or whatever the flag's called) correctly.
So, the only place we set the Moz2D aa flag from client layers is http://mxr.mozilla.org/mozilla-central/source/gfx/layers/client/ClientThebesLayer.cpp#31. That looks like it is doing the right thing to me. And whenever we use it, it seems to be doing the right thing. But, a lot of the test failures don't have any transparency so we don't set the flag in either test or ref cases. In fact most of the failing tests don't use subpixel AA, so I suspect this flag is not the issue. (Looking at the failures in the reftest analyser, one or two have errors in colour fringing, but most do not.)
Assignee | ||
Comment 9•11 years ago
|
||
I just noticed that we are failing a box shadow test on Win8 only. We seem to be one rgb value out on every pixel on two lines at the edge of the shadow. Could it be a d2d 1.1 issue?
(Try push is here https://tbpl.mozilla.org/?tree=Try&rev=8dc6f5c64683).
Assignee | ||
Comment 10•11 years ago
|
||
The font errors are due to Azure vs Thebes rendering. When we render the first (test) page we have no texture clients to test, so we have to assume we can't use Azure. The next time around we figure out that we can use Azure so do. This leads to small errors when rendering lines at non-integer offsets.
Matt Woodrow is very close to getting Azure/Cairo done. So I think the best fix for this is to just use Azure for everything and get rid of the Thebes paths (which we want to do anyway). Just waiting for some Try runs to confirm that is bug free. Then we can rip out Thebes from ThebesLayerBuffer and these bugs should all just disappear.
Flags: needinfo?(jdaggett)
Flags: needinfo?(bas)
Assignee | ||
Comment 11•11 years ago
|
||
Bug 914505 fixes most of these failures. That will land sooner than Azure everywhere :-)
Depends on: 914505
Assignee | ||
Updated•11 years ago
|
Attachment #793191 -
Attachment is obsolete: true
Attachment #793191 -
Flags: review?(bas)
Assignee | ||
Updated•11 years ago
|
Attachment #793198 -
Attachment is obsolete: true
Attachment #793198 -
Flags: review?(bas)
Assignee | ||
Comment 12•11 years ago
|
||
I confirmed that the patch with these changes regressed the OMTC win7 reftests. The only way I can see why is these changes. I'm not really sure why. It looks like we are blending in a tiny amount from outside our rect, but I checked all the sizes, positions, and transforms, and everything is an integer, so I am not sure how that could happen. I prefer to lose this optimisation and get OMTC turned on. We can always come back to this if we have more ideas.
Attachment #805858 -
Flags: review?(matt.woodrow)
Comment 13•11 years ago
|
||
Comment on attachment 805858 [details] [diff] [review]
revert OP_SOURCE optimisatin
Review of attachment 805858 [details] [diff] [review]:
-----------------------------------------------------------------
Sorry, but this isn't an optimization. It's required for correct rendering with BasicLayers.
Attachment #805858 -
Flags: review?(matt.woodrow) → review-
Assignee | ||
Comment 14•11 years ago
|
||
(In reply to Matt Woodrow (:mattwoodrow) from comment #13)
> Comment on attachment 805858 [details] [diff] [review]
> revert OP_SOURCE optimisatin
>
> Review of attachment 805858 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> Sorry, but this isn't an optimization. It's required for correct rendering
> with BasicLayers.
Why?
Could we keep OP_SOURCE and still clear the surface?
Comment 15•11 years ago
|
||
Clearing the surface means that we'll draw the window pixels in two passes. It's possible for the pixels to get presented to the screen in the middle of that, resulting in flashing colors.
We really need things to be 'atomic'.
Assignee | ||
Comment 16•11 years ago
|
||
We are failing this test for OMTC Windows due to OOMing. Running this test individually (locally) is fine. However, running it multiple times causes us to fail with the same logs and result as on Try. This is because we can't allocate more memory. I believe because the previously created canvases are referenced by JS and need to be GCed - waiting for a short time then refreshing brings success. This is only an issue on OMTC because we need to allocate memory to transport the canvas output from main thread to compositor and that is where we fail - we have enough memory for the canvas itself.
We don't crash when runnning this test locally or on Try (a fix in another bug), and using random-if rather than skip-if means we continue to test for crashes.
Attachment #806422 -
Flags: review?(roc)
Comment 17•11 years ago
|
||
Comment on attachment 805858 [details] [diff] [review]
revert OP_SOURCE optimisatin
Review of attachment 805858 [details] [diff] [review]:
-----------------------------------------------------------------
Trailing whitespace!
Attachment #805858 -
Flags: review- → review+
Comment 18•11 years ago
|
||
(In reply to Nick Cameron [:nrc] from comment #16)
> Created attachment 806422 [details] [diff] [review]
> ignore result of very big canvas test
>
> We are failing this test for OMTC Windows due to OOMing. Running this test
> individually (locally) is fine. However, running it multiple times causes us
> to fail with the same logs and result as on Try. This is because we can't
> allocate more memory. I believe because the previously created canvases are
> referenced by JS and need to be GCed - waiting for a short time then
> refreshing brings success. This is only an issue on OMTC because we need to
> allocate memory to transport the canvas output from main thread to
> compositor and that is where we fail - we have enough memory for the canvas
> itself.
>
> We don't crash when runnning this test locally or on Try (a fix in another
> bug), and using random-if rather than skip-if means we continue to test for
> crashes.
Some of our tests use SpecialPowers.forceGC();
Would that be enough to fix the failing test here?
Assignee | ||
Comment 19•11 years ago
|
||
(In reply to Nicolas Silva [:nical] from comment #18)
> (In reply to Nick Cameron [:nrc] from comment #16)
> > Created attachment 806422 [details] [diff] [review]
> > ignore result of very big canvas test
> >
> > We are failing this test for OMTC Windows due to OOMing. Running this test
> > individually (locally) is fine. However, running it multiple times causes us
> > to fail with the same logs and result as on Try. This is because we can't
> > allocate more memory. I believe because the previously created canvases are
> > referenced by JS and need to be GCed - waiting for a short time then
> > refreshing brings success. This is only an issue on OMTC because we need to
> > allocate memory to transport the canvas output from main thread to
> > compositor and that is where we fail - we have enough memory for the canvas
> > itself.
> >
> > We don't crash when runnning this test locally or on Try (a fix in another
> > bug), and using random-if rather than skip-if means we continue to test for
> > crashes.
>
> Some of our tests use SpecialPowers.forceGC();
> Would that be enough to fix the failing test here?
Probably, but that is only available in mochitests (afaik) and this is a reftest.
Attachment #806422 -
Flags: review?(roc) → review+
Assignee | ||
Comment 20•11 years ago
|
||
See https://bugzilla.mozilla.org/show_bug.cgi?id=910962#c5 for why we should do this.
Doing this prevents an assertion in bc mochitests where one of our processes (child) crashes and we try to release shmem in the parent, but IPDL has already taken care of it.
Attachment #808094 -
Flags: review?(nical.bugzilla)
Comment 21•11 years ago
|
||
Comment on attachment 808094 [details] [diff] [review]
Do not release shmems if our IPDL tree is going to be destroyed
Review of attachment 808094 [details] [diff] [review]:
-----------------------------------------------------------------
I would much much prefer to propagate an "OnActorDestroy()" hook (or something of the like) on all compositable/texture clients, rather than having using the notion of Shmem in all of these classes.
::: gfx/layers/client/CanvasClient.h
@@ +88,5 @@
> virtual void OnDetach() MOZ_OVERRIDE
> {
> mBuffer = nullptr;
> }
> +
nit: trailing space
Assignee | ||
Comment 22•11 years ago
|
||
I used OnActorDestroy and then changed to ForgetShmem, to make it clearer what was going on, but I don't feel strongly either way. This version uses OnActorDestroy (I do want to use that rather than just ActorDestroy to make clear that we don't think we are overriding the IPDL method).
Attachment #808094 -
Attachment is obsolete: true
Attachment #808094 -
Flags: review?(nical.bugzilla)
Attachment #808333 -
Flags: review?(nical.bugzilla)
Comment 23•11 years ago
|
||
Comment on attachment 808333 [details] [diff] [review]
Do not release shmems if our IPDL tree is going to be destroyed
Review of attachment 808333 [details] [diff] [review]:
-----------------------------------------------------------------
Thanks!
Attachment #808333 -
Flags: review?(nical.bugzilla) → review+
Assignee | ||
Comment 24•11 years ago
|
||
Comment 25•11 years ago
|
||
Backed out for compilation errors:
https://tbpl.mozilla.org/php/getParsedLog.php?id=28458455&tree=Mozilla-Inbound
remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/d25c7133a76e
remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/1172762e4fe8
remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/6b9e77c98785
Comment 26•11 years ago
|
||
Also failures on crashtest-ipc:
https://tbpl.mozilla.org/php/getParsedLog.php?id=28461128&tree=Mozilla-Inbound
Assignee | ||
Comment 27•11 years ago
|
||
(In reply to Ed Morley [:edmorley UTC+1] from comment #26)
> Also failures on crashtest-ipc:
> https://tbpl.mozilla.org/php/getParsedLog.php?id=28461128&tree=Mozilla-
> Inbound
Annoyingly Cipc was not working properly when I did my Try push a week or so ago. Sigh.
Assignee | ||
Comment 28•11 years ago
|
||
Asking for re-review because I changed quite a few things - now destroys client side as well as host, only forgets on abnormal shutdown, etc.
Attachment #808333 -
Attachment is obsolete: true
Attachment #820861 -
Flags: review?(nical.bugzilla)
Assignee | ||
Comment 29•11 years ago
|
||
Updated•11 years ago
|
Attachment #820861 -
Flags: review?(nical.bugzilla) → review+
Assignee | ||
Comment 30•11 years ago
|
||
Comment 31•11 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/8e1b8db80906
https://hg.mozilla.org/mozilla-central/rev/b89258451b74
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla27
Comment 32•11 years ago
|
||
This might have caused bug 930575. If it did, then this needs to be backed out.
Comment 33•11 years ago
|
||
Marcia asked me to back this out:
remote: https://hg.mozilla.org/integration/b2g-inbound/rev/0a033a45ca4d
remote: https://hg.mozilla.org/integration/b2g-inbound/rev/96adbcb3ba6a
remote: https://hg.mozilla.org/integration/b2g-inbound/rev/2e77e215c7f3
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 34•11 years ago
|
||
Bug 930575 still occurs after the backout, so that did not address the graphical scrolling issue which is still being seen on Buri using:
Gaia afbf45f26a73b7cd5e0a831bea48087331975286
SourceStamp 2f2a45f04e7c
BuildID 20131025100746
Version 27.0a1
Comment 35•11 years ago
|
||
(In reply to Marcia Knous [:marcia] from comment #34)
> Bug 930575 still occurs after the backout, so that did not address the
> graphical scrolling issue which is still being seen on Buri using:
>
> Gaia afbf45f26a73b7cd5e0a831bea48087331975286
> SourceStamp 2f2a45f04e7c
> BuildID 20131025100746
> Version 27.0a1
The build that Ed spun does not contain the backout of this patch, as it hasn't landed on central.
Comment 36•11 years ago
|
||
Please see Bug 930575#c12 - Backing this patch did address the issue seen in that bug.
Comment 37•11 years ago
|
||
We should probably change the summary of this bug to better reflect what we're actually doing in the fix?
Assignee | ||
Updated•11 years ago
|
Summary: Fix OMTC d3d11 reftest bugs → Deal properly with IPDL abnormal shutdowns and fix some other OMTC d3d11 reftest bugs
Assignee | ||
Comment 38•11 years ago
|
||
Slightly tweaked - speculative fix for smoke test issues.
Comment 39•11 years ago
|
||
Nick, any chance this is related to bug 933867?
Assignee | ||
Comment 40•11 years ago
|
||
(In reply to Milan Sreckovic [:milan] from comment #39)
> Nick, any chance this is related to bug 933867?
Sort of, commenting there.
Assignee | ||
Comment 42•11 years ago
|
||
Assignee | ||
Updated•11 years ago
|
Attachment #806422 -
Flags: checkin+
Assignee | ||
Updated•11 years ago
|
Attachment #805858 -
Attachment is obsolete: true
Assignee | ||
Comment 43•11 years ago
|
||
We no longer need that patch, as demonstrated by https://tbpl.mozilla.org/?tree=Try&rev=2436e9f4b2e5. I bet it was fixed by bug 923434. That is nice, since that patch was the source of bug 930575 which got this lot backed out the last time.
Comment 44•11 years ago
|
||
Status: REOPENED → RESOLVED
Closed: 11 years ago → 11 years ago
Resolution: --- → FIXED
Target Milestone: mozilla27 → mozilla28
Assignee | ||
Comment 45•11 years ago
|
||
Whoops, forget my [leave open] whiteboard thingy.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee | ||
Comment 46•11 years ago
|
||
There is this issue which is stopping me landing the last patch (from bug 930575):
> > > 5. 3d test from the marketplace, seems to work.
> > > 6. played with http://bit.ly/css3d-mol (from bug 862097).
> > >
> > > - I had one experience where if the device goes to sleep some graphics
> > > doesn't render correctly when coming back on the http://bit.ly/css3d-mol.
> > > Specifically, the molecules didn't render.
> > >
> >
> > This is something that could be due to my patch and is a bit worrying. I
> > would like to investigate this before landing.
> >
> > Is this reproducible? Does it happen every time? And have you seen this
> > behaviour before?
>
> It happened one time and I can not reproduce it. I am currently unsure how to
> reproduce it; I tried multiple times to just have the device sleep and then wake
> it. I think there's some other state I have to get into in order to reproduce
> it... I hadn't figured it out. I have not seen the behavior before in the main
> build.
This sounds like something this patch could be responsible for, not sure how to proceed though
Comment 47•11 years ago
|
||
There are some B2G bugs pending on this to land (where we try to deallocate shmems twice). Nick, do you have a rough ETA on this? Is it possible to take the shmem part into a separate patch which would be maybe easier to land without causing the regression in bug 930575?
Assignee | ||
Comment 48•11 years ago
|
||
(In reply to Nicolas Silva [:nical] from comment #47)
> There are some B2G bugs pending on this to land (where we try to deallocate
> shmems twice). Nick, do you have a rough ETA on this? Is it possible to take
> the shmem part into a separate patch which would be maybe easier to land
> without causing the regression in bug 930575?
I think we can just land this. There is only the shmem part to land really. There is one issue which happened which I am puzzled by and would like to investigate (see the recent comments in 930575), but given that it is rare and hard to repro, I'm not sure what to do about it. If you want to spin up this patch and have a play and see if you can repro, that would be appreciated. Otherwise I think we can land this once inbound opens again.
Assignee | ||
Comment 49•11 years ago
|
||
Assignee | ||
Comment 50•11 years ago
|
||
(In reply to Nicolas Silva [:nical] from comment #47)
> There are some B2G bugs pending on this to land (where we try to deallocate
> shmems twice). Nick, do you have a rough ETA on this? Is it possible to take
> the shmem part into a separate patch which would be maybe easier to land
> without causing the regression in bug 930575?
Given that we're not sure this will stick past smoke testing, please don't land stuff on top for a few days so we can easily backout if necessary. Thanks!
Comment 51•11 years ago
|
||
Status: REOPENED → RESOLVED
Closed: 11 years ago → 11 years ago
Resolution: --- → FIXED
Oops. Does this bug have to do with bug 940745?
Assignee | ||
Comment 53•11 years ago
|
||
(In reply to Naoki Hirata :nhirata (please use needinfo instead of cc) from comment #52)
> Oops. Does this bug have to do with bug 940745?
I don't think so. The patch doesn't change any IPDL protocols, it only changes what happens when IPDL trees are shutdown. Therefore, I wouldn't suspect it except in situations where we see something like 'destroy' or 'shutdown' in the stack or which happen when a process crashes or something is shutdown or changes state dramatically (like when a phone goes to sleep or wakes up).
You need to log in
before you can comment on or make changes to this bug.
Description
•