Closed Bug 1137716 Opened 10 years ago Closed 10 years ago

Startup crash on Optimus w/ Intel Ironlake Graphics mozilla::layers::CompositorD3D11::GetTextureFactoryIdentifier()

Categories

(Core :: Graphics, defect)

x86
Windows NT
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla40
Tracking Status
firefox36 --- unaffected
firefox37 + fixed
firefox38 + fixed
firefox39 + fixed
firefox40 --- fixed

People

(Reporter: kairo, Assigned: jrmuizel)

References

(Depends on 1 open bug)

Details

(Keywords: crash, topcrash, Whiteboard: gfx-noted)

Crash Data

Attachments

(3 files, 1 obsolete file)

[Tracking Requested - why for this release]: This bug was filed from the Socorro interface and is report bp-1d7d48e2-0107-466e-8e71-38a252150225. ============================================================= Stack: 0 xul.dll mozilla::layers::CompositorD3D11::GetTextureFactoryIdentifier() gfx/layers/d3d11/CompositorD3D11.cpp 1 xul.dll mozilla::layers::CompositorParent::AllocPLayerTransactionParent(nsTArray<mozilla::layers::LayersBackend> const&, unsigned __int64 const&, mozilla::layers::TextureFactoryIdentifier*, bool*) gfx/layers/ipc/CompositorParent.cpp 2 xul.dll mozilla::layers::PCompositorParent::OnMessageReceived(IPC::Message const&, IPC::Message*&) obj-firefox/ipc/ipdl/PCompositorParent.cpp 3 xul.dll mozilla::ipc::MessageChannel::DispatchSyncMessage(IPC::Message const&) ipc/glue/MessageChannel.cpp 4 xul.dll mozilla::ipc::MessageChannel::OnMaybeDequeueOne() ipc/glue/MessageChannel.cpp 5 xul.dll MessageLoop::DoWork() ipc/chromium/src/base/message_loop.cc 6 xul.dll `anonymous namespace'::ThreadFunc(void*) ipc/chromium/src/base/platform_thread_win.cc 7 kernel32.dll BaseThreadInitThunk This crash signature is #6 with 1.2% of all crashes in early 37.0b1 data. This is Win7-only and all crash addresses end in "caa1". Not that in all crash reports I looked into, I found detoured.dll in the modules list, which according to https://coderrr.wordpress.com/2008/08/27/how-to-get-rid-of-microsoft-detours-detoureddll/ belongs to Microsoft Detours, http://research.microsoft.com/en-us/projects/detours/ seems to be the product page for that tool. I wonder why this would be used by any larger amount of people, though. Firefox 36 is not affected by this, but we have crashes from 37 Beta, 37 Dev Edition, and 38 Nightly.
Tracking topcrash for 37+. (Going to assume 39 is affected.) Milan - Do you have anyone available to investigate?
Flags: needinfo?(milan)
Keywords: topcrash
GetSharedHandle() seems to return S_OK, but the handle is null, and we just call MOZ_CRASH. Seemingly only Windows 7, and not there on 36 may suggest some connection with D2D1.1 (trying to, even if we fail?) NI :nical because of the caller stack.
Flags: needinfo?(nical.bugzilla)
Flags: needinfo?(milan)
Flags: needinfo?(bas)
Regarding detours.dll, it would not surprise me at all if the graphics driver were hooking some APIs so that it could play dual-GPU tricks. I've seen such things before. Rank App notes Count % 1 has 2768 100.00 % 2 gpus 2768 100.00 % 3 gpu 2768 100.00 % 4 dual 2768 100.00 % mozilla::layers::CompositorD3D11::GetTextureFactoryIdentifier()|EXCEPTION_BREAKPOINT (642 crashes) 100% (641/642) vs. 2% (1353/60686) nvd3d9wrap.dll 100% (641/642) vs. 2% (1354/60686) nvdxgiwrap.dll 100% (641/642) vs. 2% (1445/60686) nvapi.dll 100% (641/642) vs. 2% (1449/60686) nvumdshim.dll 100% (641/642) vs. 3% (1962/60686) nvinit.dll 97% (624/642) vs. 1% (627/60686) d3d8.dll 100% (641/642) vs. 5% (2997/60686) nvwgf2um.dll 97% (624/642) vs. 7% (4412/60686) d3d10.dll 97% (624/642) vs. 7% (4412/60686) d3d10core.dll 100% (641/642) vs. 11% (6405/60686) igd10umd32.dll 97% (624/642) vs. 11% (6446/60686) d3d8thk.dll 97% (624/642) vs. 14% (8258/60686) d3d9.dll 64% (408/642) vs. 2% (934/60686) detoured.dll 37% (239/642) vs. 1% (714/60686) _etoured.dll
(In reply to Milan Sreckovic [:milan] from comment #3) > GetSharedHandle() seems to return S_OK, but the handle is null, and we just > call MOZ_CRASH. Seemingly only Windows 7, and not there on 36 may suggest > some connection with D2D1.1 (trying to, even if we fail?) > NI :nical because of the caller stack. No, I strongly suspect not. At least some of these have D2D 1.1 running. I suspect this is related to the dual GPUs. This is all optimus GPUs and it seems to be a fairly narrow range of models. We may have to do something along the lines of blacklisting somehow.
Flags: needinfo?(bas)
(In reply to Milan Sreckovic [:milan] from comment #3) > NI :nical because of the caller stack. Nothing comes to mind as far as the stack is concerned, S_OK with a null handles looks like a driver not doing what it should.
Flags: needinfo?(nical.bugzilla)
OK, if we're going to blacklist, Bas, can you figure out what should be blacklisted?
Assignee: nobody → bas
Whiteboard: gfx-noted
Are there any driver version correlations?
Flags: needinfo?(dmajor)
The Intel adapter is always device 0x0046 and the Intel driver is versions 8.15.10.2008 to 8.15.10.2622 inclusive. The nVidia adapter varies, mostly 0x0a70 0x0df4 0x0df1 0x0df0. The nVidia DLLs have versions 8.17.12.5730 to 8.17.12.6901 inclusive. The crashes are only on Win7 and Win7SP1.
Flags: needinfo?(dmajor)
Bas - Can you blacklist based on the information in comment 9? If so, can you have a patch ready for Beta 7 gtb on Thu?
Flags: needinfo?(bas)
(In reply to Lawrence Mandel [:lmandel] (use needinfo) from comment #10) > Bas - Can you blacklist based on the information in comment 9? If so, can > you have a patch ready for Beta 7 gtb on Thu? I don't know how blacklisting with Dual GPUs works.. I'm not sure if anyone does.. :( Jeff.. do you have any idea who we might ask?
Flags: needinfo?(bas) → needinfo?(jmuizelaar)
Flags: needinfo?(jmuizelaar)
Summary: crash in mozilla::layers::CompositorD3D11::GetTextureFactoryIdentifier() → Startup crash in mozilla::layers::CompositorD3D11::GetTextureFactoryIdentifier()
I'll try to get a patch together.
(In reply to David Major [:dmajor] (UTC+13) from comment #9) > The Intel adapter is always device 0x0046 and the Intel driver is versions > 8.15.10.2008 to 8.15.10.2622 inclusive. > > The nVidia adapter varies, mostly 0x0a70 0x0df4 0x0df1 0x0df0. The nVidia > DLLs have versions 8.17.12.5730 to 8.17.12.6901 inclusive. > > The crashes are only on Win7 and Win7SP1. David, can you get an exhaustive list of adapter id's?
Flags: needinfo?(dmajor)
So we don't really have infrastructure to handle dual gpu blacklisting...
Attached patch Here's a patch. It may work. Who knows... (obsolete) (deleted) — Splinter Review
Attachment #8580166 - Flags: review?(bas)
Attached patch A version that builds (deleted) — Splinter Review
Attachment #8580166 - Attachment is obsolete: true
Attachment #8580166 - Flags: review?(bas)
Attachment #8580189 - Flags: review?(bas)
> David, can you get an exhaustive list of adapter id's? 0x0a70 0x0df1 0x0df4 0x0df0 0x0a7a 0x0a35 0x0dee 0x0a6c 0x0dd3 0x0a2d 0x0caf 0x0df2 0x0a2b 0x0a72 0x0a29 0x0df3
Flags: needinfo?(dmajor)
Comment on attachment 8580189 [details] [diff] [review] A version that builds Review of attachment 8580189 [details] [diff] [review]: ----------------------------------------------------------------- It's a shame we will also blacklist these NVidia devices as secondary GPUs now when the intel device is not 0x0046.. But for 37 let's do this, can you put a comment in to look at this? ::: widget/GfxInfoBase.cpp @@ +630,3 @@ > > #if defined(XP_WIN) || defined(ANDROID) > + uint64_t driverVersion; What changed here?
Attachment #8580189 - Flags: review?(bas) → review+
Summary: Startup crash in mozilla::layers::CompositorD3D11::GetTextureFactoryIdentifier() → Startup crash on Optimus w/ Intel Ironlake Graphics mozilla::layers::CompositorD3D11::GetTextureFactoryIdentifier()
I tried to find a laptop that reproduced this but made a bad assumption about what kind of intel graphics it was happening on.
Backed both out in https://hg.mozilla.org/integration/mozilla-inbound/rev/a2e34f98c85a - whether or not it's going to eventually work on Windows, that broke tests on Mac and Android.
I agree that WINDOWS_7 is better, but what's weird is that WINDOWS7 seems to have built here: https://treeherder.mozilla.org/#/jobs?repo=try&revision=47ac91f92b2d
Ah never mind, I accidentally made the fix in the try push: https://hg.mozilla.org/try/rev/47ac91f92b2d
Comment on attachment 8580189 [details] [diff] [review] A version that builds Approval Request Comment [Feature/regressing bug #]: Unknown [User impact if declined]: Startup crashes on people's machines that have particular hardware. [Describe test coverage new/current, TreeHerder]: Very limited. Hasn't been in Nightly yet. We don't have the hardware to test it on. [Risks and why]: This changes the blocklisting infrastructure so it definitely has some risk, especially this late in the cycle
Attachment #8580189 - Flags: approval-mozilla-beta?
Attachment #8580189 - Flags: approval-mozilla-aurora?
Comment on attachment 8580189 [details] [diff] [review] A version that builds I am going to take in aurora even if it didn't land in m-c to maximize testing for beta.
Attachment #8580189 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
This is a bad bug and a new issue in 37. However, this is too risky to land directly on Beta. We're going to wait until at least tomorrow to try and get some data. We may decide to land this later and test over the weekend with the risk of pushing the release if required.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla39
Assignee: bas → jmuizelaar
Comment on attachment 8580189 [details] [diff] [review] A version that builds This change hasn't produced obvious problems on Nightly or Aurora but won't be able to really be tested until we get it onto Beta. We'll take this in the 37 RC as this looks like a significant enough issue to block the release. Beta+
Attachment #8580189 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
I see some Intel devices with DeviceID 0x0116 that hit this crash in the csv files. David can you confirm that you don't see any 0x0116 intel devices?
Flags: needinfo?(dmajor)
I do see them now. This may be a new development. I'm pretty sure it was more like 99% 0x0046 when I originally posted. Rank Adapter device id Count % 1 0x0046 1703 91.95 % 2 0x0116 107 5.78 % 3 0x0106 39 2.11 % 4 0x0126 3 0.16 %
Flags: needinfo?(dmajor)
This crash was supposed to be fixed, but it is the #1 crash in early 37.0 release data.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
So I noticed these crash reports of WARP- which is not really expected. Something weird might be going on there.
(In reply to Jeff Muizelaar [:jrmuizel] from comment #36) > So I noticed these crash reports of WARP- which is not really expected. > Something weird might be going on there. Might that be another case of or connected to bug 1149761?
The WARP- seems innocuous. The current code will have WARP- when ever we call InitD3D11Devices and don't succeed at WARP even if we never tried. I've filed bug 1150124 to improve this reporting.
It seems as though the D3D11 compositor is being used for reasons unknown.
So I just realized that our ScopedGfxFeatureReporter writes to the AppNotes using an event posted to the main thread. This means that during startup they will not necessarily contain all of the data that we would like to see. This likely explains why it seems like we're using the D3D11 compositor without reporting that in the AppNotes. It's conceivable that the block listing code is just not working properly and not blocking this laptops.
Depends on: 1150324, 1150124
I typo'd the version number in the blacklisting patch. That explains why the blacklist didn't work.
Attached patch Fix driver version typo (deleted) — Splinter Review
Approval Request Comment [Feature/regressing bug #]: 1137716 [User impact if declined]: Crashes on startup [Describe test coverage new/current, TreeHerder]: None [Risks and why]: Unintentional blacklisting
Attachment #8587560 - Flags: approval-mozilla-release?
Attachment #8587560 - Flags: approval-mozilla-beta?
Attachment #8587560 - Flags: approval-mozilla-aurora?
Comment on attachment 8587560 [details] [diff] [review] Fix driver version typo We're going to take this blacklist typo correction for a start-up crash in 37.0.1 Release+ Beta+ Aurora+
Attachment #8587560 - Flags: approval-mozilla-release?
Attachment #8587560 - Flags: approval-mozilla-release+
Attachment #8587560 - Flags: approval-mozilla-beta?
Attachment #8587560 - Flags: approval-mozilla-beta+
Attachment #8587560 - Flags: approval-mozilla-aurora?
Attachment #8587560 - Flags: approval-mozilla-aurora+
Status: REOPENED → RESOLVED
Closed: 10 years ago10 years ago
Resolution: --- → FIXED
Target Milestone: mozilla39 → mozilla40
We do not have any dual GPU setups to test this, so I guess verification of the fix can only be done by analyzing Socorro data. Please let me know if you think there is a way to manually verify this.
There are still some hits on this in 37.0.1 but the volume is greatly reduced. I think it is a matter of additional device ID's. The stragglers are: 0x0dd2 0x0dd3 0x1050 0x1051 0x1054.
I had listed 0x0dd3 in comment 17 but I don't see it in the patch (it's about 2/3 of the remaining crashes). The other device IDs must be ones that were too low volume to notice on beta.
I have one of the machines that should reproduce this (Identical machine, identical drivers), but I'm not able to for some reason... I'll push a patch that adds the additional device ids...
Attached patch Block more devices (deleted) — Splinter Review
Approval Request Comment [Feature/regressing bug #]: 37 [User impact if declined]: Startup crashes [Risks and why]: Just more devices being blocked
Attachment #8588711 - Flags: approval-mozilla-release?
Attachment #8588711 - Flags: approval-mozilla-beta?
Attachment #8588711 - Flags: approval-mozilla-aurora?
Comment on attachment 8588711 [details] [diff] [review] Block more devices Should be in 38 beta 2 or 3.
Attachment #8588711 - Flags: approval-mozilla-beta?
Attachment #8588711 - Flags: approval-mozilla-beta+
Attachment #8588711 - Flags: approval-mozilla-aurora?
Attachment #8588711 - Flags: approval-mozilla-aurora+
For whoever decides the release approval on that patch: The relative volume of this signature is much lower now: 0.5% of 37.0.1 crashes, versus 6.6% of 37.0 crashes. But we'll need to weigh the low volume against the fact that it's a startup crash.
set back to affected to make sure sheriffs see it
I was able to reproduce this by forcing firefox to use the nvidia gpu on the machine in question.
I believe this was caused by our current blacklist breaking in some way.
Comment on attachment 8588711 [details] [diff] [review] Block more devices This will ride along in 37.0.2. Release+
Attachment #8588711 - Flags: approval-mozilla-release? → approval-mozilla-release+
Jeff, could you confirm that the fix works correctly on the machine that you've reproduced this? At least for Firefox 37.0.2: https://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/37.0.2-candidates/build1/.
Flags: needinfo?(jmuizelaar)
The original patch fixed this on the machine that I have. The latest patch only impacts more rare machines.
Flags: needinfo?(jmuizelaar)
Thanks Jeff! I guess that means we need more crash data to confirm.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: