Closed Bug 1373578 Opened 7 years ago Closed 7 years ago

Intermittent Assertion failure: [GFX1]: Failed to create DrawTarget, Type: 3 Size: Size(800,1000), at z:\build\build\src\obj-firefox\dist\include\mozilla/gfx/Logging.h:520

Categories

(Core :: Layout, defect, P3)

defect

Tracking

()

RESOLVED FIXED
mozilla57
Tracking Status
firefox-esr52 --- wontfix
firefox55 --- wontfix
firefox56 --- wontfix
firefox57 --- fixed

People

(Reporter: aryx, Assigned: jmaher)

References

(Blocks 2 open bugs)

Details

(Keywords: assertion, intermittent-failure, Whiteboard: [stockwell disabled])

Attachments

(2 files)

https://treeherder.mozilla.org/logviewer.html#?job_id=107575203&repo=autoland 23:40:06 INFO - REFTEST TEST-START | file:///C:/slave/test/build/tests/reftest/tests/layout/reftests/w3c-css/submitted/shapes1/shape-outside-polygon-018.html == file:///C:/slave/test/build/tests/reftest/tests/layout/reftests/w3c-css/submitted/shapes1/shape-outside-polygon-018-ref.html 23:40:06 INFO - REFTEST INFO | RESTORE PREFERENCE pref(layout.css.shape-outside.enabled,false) 23:40:06 INFO - REFTEST INFO | SET PREFERENCE pref(layout.css.shape-outside.enabled,true) 23:40:06 INFO - REFTEST TEST-LOAD | file:///C:/slave/test/build/tests/reftest/tests/layout/reftests/w3c-css/submitted/shapes1/shape-outside-polygon-018.html | 1631 / 7783 (20%) 23:40:06 INFO - ++DOMWINDOW == 287 (12A05C00) [pid = 3940] [serial = 7065] [outer = 1B2A5400] 23:40:06 INFO - [GFX1-]: Failed 2 buffer db=00000000 dw=00000000 for 0, 0, 800, 1000 23:40:06 INFO - [GFX1]: Failed to create DrawTarget, Type: 3 Size: Size(800,1000) 23:40:06 INFO - Assertion failure: [GFX1]: Failed to create DrawTarget, Type: 3 Size: Size(800,1000), at c:\builds\moz2_slave\autoland-w32-d-000000000000000\build\src\obj-firefox\dist\include\mozilla/gfx/Logging.h:519 23:40:24 INFO - #01: mozilla::gfx::Log<1,mozilla::gfx::CriticalLogger>::Flush() [obj-firefox/dist/include/mozilla/gfx/Logging.h:283] 23:40:24 INFO - 23:40:24 INFO - #02: mozilla::gfx::Factory::CreateDrawTarget(mozilla::gfx::BackendType,mozilla::gfx::IntSizeTyped<mozilla::gfx::UnknownUnits> const &,mozilla::gfx::SurfaceFormat) [gfx/2d/Factory.cpp:397] 23:40:24 INFO - 23:40:24 INFO - #03: gfxPlatform::CreateDrawTargetForBackend(mozilla::gfx::BackendType,mozilla::gfx::IntSizeTyped<mozilla::gfx::UnknownUnits> const &,mozilla::gfx::SurfaceFormat) [gfx/thebes/gfxPlatform.cpp:1454] 23:40:24 INFO - 23:40:24 INFO - #04: mozilla::layers::PersistentBufferProviderBasic::Create(mozilla::gfx::IntSizeTyped<mozilla::gfx::UnknownUnits>,mozilla::gfx::SurfaceFormat,mozilla::gfx::BackendType) [gfx/layers/PersistentBufferProvider.cpp:73] 23:40:24 INFO - 23:40:24 INFO - #05: mozilla::layers::LayerManager::CreatePersistentBufferProvider(mozilla::gfx::IntSizeTyped<mozilla::gfx::UnknownUnits> const &,mozilla::gfx::SurfaceFormat) [gfx/layers/Layers.cpp:149] 23:40:24 INFO - 23:40:24 INFO - #06: mozilla::layers::ClientLayerManager::CreatePersistentBufferProvider(mozilla::gfx::IntSizeTyped<mozilla::gfx::UnknownUnits> const &,mozilla::gfx::SurfaceFormat) [gfx/layers/client/ClientLayerManager.cpp:920] 23:40:24 INFO - 23:40:24 INFO - #07: mozilla::dom::CanvasRenderingContext2D::TrySharedTarget(RefPtr<mozilla::gfx::DrawTarget> &,RefPtr<mozilla::layers::PersistentBufferProvider> &) [dom/canvas/CanvasRenderingContext2D.cpp:1892] 23:40:24 INFO - 23:40:24 INFO - #08: mozilla::dom::CanvasRenderingContext2D::EnsureTarget(mozilla::gfx::RectTyped<mozilla::gfx::UnknownUnits,float> const *,mozilla::dom::CanvasRenderingContext2D::RenderingMode) [dom/canvas/CanvasRenderingContext2D.cpp:1697] 23:40:24 INFO - 23:40:24 INFO - #09: mozilla::dom::CanvasRenderingContext2D::DrawWindow(nsGlobalWindow &,double,double,double,double,nsAString const &,unsigned int,mozilla::ErrorResult &) [dom/canvas/CanvasRenderingContext2D.cpp:5601] 23:40:24 INFO - 23:40:24 INFO - #10: mozilla::dom::CanvasRenderingContext2DBinding::drawWindow [obj-firefox/dom/bindings/CanvasRenderingContext2DBinding.cpp:2311] 23:40:24 INFO - 23:40:24 INFO - #11: mozilla::dom::GenericBindingMethod(JSContext *,unsigned int,JS::Value *) [dom/bindings/BindingUtils.cpp:2960] 23:40:24 INFO - .... 23:40:24 ERROR - TEST-UNEXPECTED-FAIL | file:///C:/slave/test/build/tests/reftest/tests/layout/reftests/w3c-css/submitted/shapes1/shape-outside-polygon-018.html | application terminated with exit code 1 23:40:24 INFO - REFTEST INFO | Copy/paste: C:\slave\test\build\win32-minidump_stackwalk.exe c:\users\cltbld\appdata\local\temp\tmpubwyc8.mozrunner\minidumps\48e261cc-30b6-4ed0-aa5d-ceb0bed311a0.dmp C:\slave\test\build\symbols 23:40:26 INFO - REFTEST INFO | Saved minidump as C:\slave\test\build\blobber_upload_dir\48e261cc-30b6-4ed0-aa5d-ceb0bed311a0.dmp 23:40:26 INFO - REFTEST INFO | Saved app info as C:\slave\test\build\blobber_upload_dir\48e261cc-30b6-4ed0-aa5d-ceb0bed311a0.extra 23:40:26 INFO - REFTEST PROCESS-CRASH | file:///C:/slave/test/build/tests/reftest/tests/layout/reftests/w3c-css/submitted/shapes1/shape-outside-polygon-018.html | application crashed [@ xul.dll + 0x8f2fe3] 23:40:26 INFO - Crash dump filename: c:\users\cltbld\appdata\local\temp\tmpubwyc8.mozrunner\minidumps\48e261cc-30b6-4ed0-aa5d-ceb0bed311a0.dmp 23:40:26 INFO - Operating system: Windows NT 23:40:26 INFO - 6.1.7601 Service Pack 1 23:40:26 INFO - CPU: x86 23:40:26 INFO - GenuineIntel family 6 model 45 stepping 7 23:40:26 INFO - 8 CPUs 23:40:26 INFO - 23:40:26 INFO - GPU: UNKNOWN 23:40:26 INFO - 23:40:26 INFO - Crash reason: EXCEPTION_BREAKPOINT 23:40:26 INFO - Crash address: 0x5d922fe3 23:40:26 INFO - Process uptime: 767 seconds 23:40:26 INFO - 23:40:26 INFO - Thread 0 (crashed) 23:40:26 INFO - 0 xul.dll + 0x8f2fe3 23:40:26 INFO - eip = 0x5d922fe3 esp = 0x001da418 ebp = 0x001da420 ebx = 0x001da598 23:40:26 INFO - esi = 0x6098f950 edi = 0x00000208 eax = 0x00000000 ecx = 0x70e906ef 23:40:26 INFO - edx = 0x00000060 efl = 0x00000206 23:40:26 INFO - Found by: given as instruction pointer in context
This really started up on July 13th and was failing quite often by July 15th. I think this is on track for a high frequency failure (24 failures since July 13th). This is the same failure as posted above. :jet, could you help find someone to look at fixing this crash in the next 2 weeks?
Flags: needinfo?(bugs)
Whiteboard: [stockwell needswork]
Summary: Intermittent layout/reftests/w3c-css/submitted/shapes1/shape-outside-polygon-018.html | application crashed [@ xul.dll + 0x8f2fe3] → Intermittent Assertion failure: [GFX1]: Failed to create DrawTarget, Type: 3 Size: Size(800,1000), at z:\build\build\src\obj-firefox\dist\include\mozilla/gfx/Logging.h:519
This is happening often enough that we should disable tests, but there are a lot of different tests failing this way currently. (Many, but not all in css-break.)
(In reply to Joel Maher ( :jmaher) (UTC-9) (PTO: back August 2nd) from comment #3) > This really started up on July 13th and was failing quite often by July > 15th. I think this is on track for a high frequency failure (24 failures > since July 13th). This is the same failure as posted above. This failure was seen much earlier, but reported in other bugs. For instance, July 3, https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=56cb20e086a2d67a467809d406d5fed673267aca&filter-searchStr=windows+debug+reftest. AFAIK, all of these failures are Windows 7, non-e10s, so perhaps this is another case where we can wait for non-e10s testing to end.
(In reply to Geoff Brown [:gbrown] from comment #10) > AFAIK, all of these failures are Windows 7, non-e10s, so perhaps this is > another case where we can wait for non-e10s testing to end. I'm not sure that all the "Failed to create DrawTarget" errors really are constrained to non-e10s Win32. +cc: Milan who may know more about that.
Flags: needinfo?(bugs) → needinfo?(milan)
This is a failure with backend type Skia - which means non-accelerated Windows. I think the rest of the Windows testing has acceleration, right? So, it's a coincidence that we're only seeing it on those configurations, although I guess e10s could play part of this if it turns out it is "just" running out of memory. Mason, was there any OMTP that landed mid July that could explain some of these problems increasing in frequency? I guess we don't usually have a recorded, so there shouldn't be, but just checking.
Flags: needinfo?(milan) → needinfo?(mchang)
(In reply to Milan Sreckovic [:milan] from comment #12) > This is a failure with backend type Skia - which means non-accelerated > Windows. I think the rest of the Windows testing has acceleration, right? > So, it's a coincidence that we're only seeing it on those configurations, > although I guess e10s could play part of this if it turns out it is "just" > running out of memory. > > Mason, was there any OMTP that landed mid July that could explain some of > these problems increasing in frequency? I guess we don't usually have a > recorded, so there shouldn't be, but just checking. I scanned a couple of the crash signatures and all of them are coming from Canvas. All the OMTP stuff only changed in ClientPaintedLayer, which isn't in the callstack at all here. From comment 3, this started on July 13th. I did a bugzilla search to see what OMTP patches landed between July 13-15 and came up with bug 1380493, and bug 1380483. Bug 1380483 might be the most suspicious in that we start checking that content client exists before recording, but this doesn't actually create a DrawTarget and would've happened by default since we don't enable OMTP on inbound yet.
Flags: needinfo?(mchang)
Week over week, 61/888 -> 1/901, beta only.
This was happening on non-e10s. We are only running e10s windows reftests now. This might return if we start running non-e10s windows reftests again.
Whiteboard: [stockwell needswork] → [stockwell disabled]
Summary: Intermittent Assertion failure: [GFX1]: Failed to create DrawTarget, Type: 3 Size: Size(800,1000), at z:\build\build\src\obj-firefox\dist\include\mozilla/gfx/Logging.h:519 → Intermittent Assertion failure: [GFX1]: Failed to create DrawTarget, Type: 3 Size: Size(800,1000), at z:\build\build\src\obj-firefox\dist\include\mozilla/gfx/Logging.h:520
we are looking at 160+ failures in the last week, all on windows 7 debug non-e10s: https://brasstacks.mozilla.com/orangefactor/index.html?display=Bug&bugid=1373578 there is no test to turn off as this seems to happen inside of reftests (as pointed out earlier css-break) Is this a graphics specific crash? I see comments regarding non accelerated graphics from :milan and :mchang. :jet, I see you are the triage owner- can you help find the right person to fix this in the next week?
Flags: needinfo?(bugs)
Whiteboard: [stockwell disabled] → [stockwell needswork]
(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #26) > :jet, I see you are the triage owner- can you help find the right person to > fix this in the next week? As noted above, we're running into resource contention on the Win32 Debug variants. This will only get worse on those platforms as more reftests get added. I'm willing to take a cheap fix here (e.g., restart the browser after n tests, or shrink the test bucket sizes) but it seems unlikely that we'll see a fix in Skia for a use case (thousands of sequential large canvas bitmap snapshots in debug builds) that only our test harness has.
Flags: needinfo?(bugs)
bug 1302203 is to run reftests per manifest that should fix this- we could disable this on non-e10s win7-debug
Depends on: 1302203
Attached patch disable_win7_reftest.patch (deleted) — Splinter Review
I don't think there is a test to disable, this leaves us with either disabling all the tests on win7/debug (non-e10s), or possibly running in more chunks. this patch would disable the reftests on win7-debug for non-e10s- as a note, this is the configuration that we specifically turned back on for non-e10s; our coverage of non-e10s reftests will be android only. I am happy to do more chunks instead or other ideas.
Attachment #8903174 - Flags: review?(bugs)
I tried running with 16 chunks, but still hit this failure: I don't think more chunks is feasible. https://treeherder.mozilla.org/#/jobs?repo=try&revision=326923e740a99e12e9280d3798ed6cdd77d85b35
it seems that just disabling reftests wholesale is our only option right now. We have been working on making the harness restart between manifests and that is a lot harder than it would seem. Possibly we could run a really small subset of the tests if we find there is important value on non-e10s for a specific feature or two?
https://treeherder.mozilla.org/logviewer.html#?job_id=128115082&repo=mozilla-inbound - non-e10s Win7 debug devtools, so good luck with just not running reftests.
if 98% of the failures go away with reftest, then we will win.
Attachment #8903174 - Flags: review?(bugs) → review+
This patch disables reftests on win32/debug/non-e10s. I gave it an r+ given the frequency of this failure on this build config, and after receiving a release-drivers notice today that e10s-multi is shipping to 100% of eligible users on the 55 Release. mrbkap: please comment if you're concerned that we're losing too much test coverage too soon here. Thx!
Flags: needinfo?(mrbkap)
What is the definition of "eligible users"? What fraction of our Windows user base is _not_ getting e10s-multi? Also, do we have non-debug reftest coverage on win32/e10s?
Flags: needinfo?(bugs)
this would only be disable don win7-non-e10s (which is only run on debug for win7). We have opt/debug/pgo reftests on windows for e10s coverage still.
Sorry, I misspoke. Do we still have non-debug reftest coverage on win32/non-e10s?
no, this is it for win32. We do have android reftest non-e10s coverage, and some on linux when profiling with jsdcov.
OK. Then my main question remains: are we still shipping win32/non-e10s to users?
I just had the same conversation with Jet. Unfortunately, "eligible users" is still not 100% of all users. In particular, we have users on old-style extensions that disable e10s on 55 and 56 (release and beta currently) as well as users with a11y enabled that disable e10s everywhere. I don't know what the breakdown of those users is wrt win32 vs win64. Jim, would you know?
Flags: needinfo?(mrbkap) → needinfo?(jmathies)
Flags: needinfo?(bugs)
Attachment #8903174 - Flags: review+ → review?(bugs)
I spoke with dbolter on irc about this. A11y users on win32 still get non-e10s and will continue to do so at least through FF56. A go/no-go decision for A11y+e10s on FF57 is expected this Friday. Per dbolter: "Basically we'd like a week before maybe saying yeah turn off single process tests. I'm pushing hard to ship." Let's revisit this one next week. Resetting the r? on the patch.
sounds good, thanks for the discussion here
(In reply to Boris Zbarsky [:bz] (still digging out from vacation mail) from comment #49) > OK. Then my main question remains: are we still shipping win32/non-e10s to > users? Sure, accessibility users and incompat add-on users, plus anyone who disables e10s. Our 64-bit distro #s are still really small (2% of release I think?) but this is changing (starting on beta this week) as we've started migrating 32-bit users on 64-bit machines over to 64-bit builds.
Flags: needinfo?(jmathies)
Priority: -- → P3
this continues to be our #1 intermittent, who is working on fixing this?
Flags: needinfo?(bugs)
(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #58) > this continues to be our #1 intermittent, who is working on fixing this? @dbolter asked to have until Friday (9/15) for the decision status on shipping a11y+e10s which will reduce the need for these tests on this configuration.
Flags: needinfo?(bugs) → needinfo?(dbolter)
Correct, agreed. This NI should make me come back Friday, but if I don't please feel free to ping. Note while there is a chance we'll ride 57 to Beta, it doesn't mean we'll stick.
Flags: needinfo?(dbolter)
Flags: needinfo?(dbolter)
Blocks: 1393934
:davidb, checking in with you here- I assume all is well?
chatted with :davidb online, he would like to see us keep reftests in non-e10s mode around at until we ship with a11y+e10s (ideally 57 release in November). I thought of solving this problem another way, what if we run the tests in many chunks- this has the browser session doing a lot less, and in the end I have ~1000 reftest jobs and 1 instance of this: https://treeherder.mozilla.org/#/jobs?repo=try&revision=1ce7c0b8206e16566c625d03279dd2e276dbd22b We typically see 400-500 jobs/week for reftest-non-e10s, so this appears to be a significant win for this bug and bug 1393934. :gbrown, what do you think of this approach?
Flags: needinfo?(gbrown)
That result is slightly inconsistent with my experience in comment 34...but you used more chunks, so maybe it will work? Let's try it!
Flags: needinfo?(gbrown)
yeah, I looked back at that try push, 16 chunks with 5 data points/chunk (80 jobs total) and I see 2 instances of this error. Now to figure out how to chunk 32 times on non-e10s only!
This is the simplest way I could think of solving this bug- I am open to other ideas if you have them.
Attachment #8909459 - Flags: review?(gbrown)
Comment on attachment 8909459 [details] [diff] [review] win7/debug non-e10s reftests at 32 chunks Review of attachment 8909459 [details] [diff] [review]: ----------------------------------------------------------------- I don't have a better suggestion. Thanks for calling it out as a hack. How can we help ensure this gets cleaned up when we stop running non-e10s? Maybe a note in tests.yml?
Attachment #8909459 - Flags: review?(gbrown) → review+
Pushed by jmaher@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/4ac60362e1cc split reftest non-e10s into 32 chunks. r=gbrown
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
Assignee: nobody → jmaher
Flags: needinfo?(dbolter)
Whiteboard: [stockwell disable-recommended] → [stockwell disabled]
Attachment #8903174 - Flags: review?(bugs)
Blocks: 1431291
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: