Closed
Bug 1137716
Opened 10 years ago
Closed 10 years ago
Startup crash on Optimus w/ Intel Ironlake Graphics mozilla::layers::CompositorD3D11::GetTextureFactoryIdentifier()
Categories
(Core :: Graphics, defect)
Tracking
()
RESOLVED
FIXED
mozilla40
People
(Reporter: kairo, Assigned: jrmuizel)
References
(Depends on 1 open bug)
Details
(Keywords: crash, topcrash, Whiteboard: gfx-noted)
Crash Data
Attachments
(3 files, 1 obsolete file)
(deleted),
patch
|
bas.schouten
:
review+
Sylvestre
:
approval-mozilla-aurora+
lmandel
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
lmandel
:
approval-mozilla-aurora+
lmandel
:
approval-mozilla-beta+
lmandel
:
approval-mozilla-release+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
Sylvestre
:
approval-mozilla-aurora+
Sylvestre
:
approval-mozilla-beta+
lmandel
:
approval-mozilla-release+
|
Details | Diff | Splinter Review |
[Tracking Requested - why for this release]:
This bug was filed from the Socorro interface and is
report bp-1d7d48e2-0107-466e-8e71-38a252150225.
=============================================================
Stack:
0 xul.dll mozilla::layers::CompositorD3D11::GetTextureFactoryIdentifier() gfx/layers/d3d11/CompositorD3D11.cpp
1 xul.dll mozilla::layers::CompositorParent::AllocPLayerTransactionParent(nsTArray<mozilla::layers::LayersBackend> const&, unsigned __int64 const&, mozilla::layers::TextureFactoryIdentifier*, bool*) gfx/layers/ipc/CompositorParent.cpp
2 xul.dll mozilla::layers::PCompositorParent::OnMessageReceived(IPC::Message const&, IPC::Message*&) obj-firefox/ipc/ipdl/PCompositorParent.cpp
3 xul.dll mozilla::ipc::MessageChannel::DispatchSyncMessage(IPC::Message const&) ipc/glue/MessageChannel.cpp
4 xul.dll mozilla::ipc::MessageChannel::OnMaybeDequeueOne() ipc/glue/MessageChannel.cpp
5 xul.dll MessageLoop::DoWork() ipc/chromium/src/base/message_loop.cc
6 xul.dll `anonymous namespace'::ThreadFunc(void*) ipc/chromium/src/base/platform_thread_win.cc
7 kernel32.dll BaseThreadInitThunk
This crash signature is #6 with 1.2% of all crashes in early 37.0b1 data. This is Win7-only and all crash addresses end in "caa1".
Not that in all crash reports I looked into, I found detoured.dll in the modules list, which according to https://coderrr.wordpress.com/2008/08/27/how-to-get-rid-of-microsoft-detours-detoureddll/ belongs to Microsoft Detours, http://research.microsoft.com/en-us/projects/detours/ seems to be the product page for that tool. I wonder why this would be used by any larger amount of people, though.
Firefox 36 is not affected by this, but we have crashes from 37 Beta, 37 Dev Edition, and 38 Nightly.
Reporter | ||
Comment 1•10 years ago
|
||
More stats and reports can be found at https://crash-stats.mozilla.com/report/list?signature=mozilla%3A%3Alayers%3A%3ACompositorD3D11%3A%3AGetTextureFactoryIdentifier%28%29
Comment 2•10 years ago
|
||
Tracking topcrash for 37+. (Going to assume 39 is affected.)
Milan - Do you have anyone available to investigate?
status-firefox36:
--- → unaffected
status-firefox39:
--- → affected
tracking-firefox38:
--- → +
tracking-firefox39:
--- → +
Flags: needinfo?(milan)
Keywords: topcrash
Comment 3•10 years ago
|
||
GetSharedHandle() seems to return S_OK, but the handle is null, and we just call MOZ_CRASH. Seemingly only Windows 7, and not there on 36 may suggest some connection with D2D1.1 (trying to, even if we fail?)
NI :nical because of the caller stack.
Flags: needinfo?(nical.bugzilla)
Flags: needinfo?(milan)
Flags: needinfo?(bas)
Regarding detours.dll, it would not surprise me at all if the graphics driver were hooking some APIs so that it could play dual-GPU tricks. I've seen such things before.
Rank App notes Count %
1 has 2768 100.00 %
2 gpus 2768 100.00 %
3 gpu 2768 100.00 %
4 dual 2768 100.00 %
mozilla::layers::CompositorD3D11::GetTextureFactoryIdentifier()|EXCEPTION_BREAKPOINT (642 crashes)
100% (641/642) vs. 2% (1353/60686) nvd3d9wrap.dll
100% (641/642) vs. 2% (1354/60686) nvdxgiwrap.dll
100% (641/642) vs. 2% (1445/60686) nvapi.dll
100% (641/642) vs. 2% (1449/60686) nvumdshim.dll
100% (641/642) vs. 3% (1962/60686) nvinit.dll
97% (624/642) vs. 1% (627/60686) d3d8.dll
100% (641/642) vs. 5% (2997/60686) nvwgf2um.dll
97% (624/642) vs. 7% (4412/60686) d3d10.dll
97% (624/642) vs. 7% (4412/60686) d3d10core.dll
100% (641/642) vs. 11% (6405/60686) igd10umd32.dll
97% (624/642) vs. 11% (6446/60686) d3d8thk.dll
97% (624/642) vs. 14% (8258/60686) d3d9.dll
64% (408/642) vs. 2% (934/60686) detoured.dll
37% (239/642) vs. 1% (714/60686) _etoured.dll
Comment 5•10 years ago
|
||
(In reply to Milan Sreckovic [:milan] from comment #3)
> GetSharedHandle() seems to return S_OK, but the handle is null, and we just
> call MOZ_CRASH. Seemingly only Windows 7, and not there on 36 may suggest
> some connection with D2D1.1 (trying to, even if we fail?)
> NI :nical because of the caller stack.
No, I strongly suspect not. At least some of these have D2D 1.1 running. I suspect this is related to the dual GPUs. This is all optimus GPUs and it seems to be a fairly narrow range of models. We may have to do something along the lines of blacklisting somehow.
Flags: needinfo?(bas)
Comment 6•10 years ago
|
||
(In reply to Milan Sreckovic [:milan] from comment #3)
> NI :nical because of the caller stack.
Nothing comes to mind as far as the stack is concerned, S_OK with a null handles looks like a driver not doing what it should.
Flags: needinfo?(nical.bugzilla)
Comment 7•10 years ago
|
||
OK, if we're going to blacklist, Bas, can you figure out what should be blacklisted?
Assignee: nobody → bas
Updated•10 years ago
|
Whiteboard: gfx-noted
The Intel adapter is always device 0x0046 and the Intel driver is versions 8.15.10.2008 to 8.15.10.2622 inclusive.
The nVidia adapter varies, mostly 0x0a70 0x0df4 0x0df1 0x0df0. The nVidia DLLs have versions 8.17.12.5730 to 8.17.12.6901 inclusive.
The crashes are only on Win7 and Win7SP1.
Flags: needinfo?(dmajor)
Comment 10•10 years ago
|
||
Bas - Can you blacklist based on the information in comment 9? If so, can you have a patch ready for Beta 7 gtb on Thu?
Flags: needinfo?(bas)
Comment 11•10 years ago
|
||
(In reply to Lawrence Mandel [:lmandel] (use needinfo) from comment #10)
> Bas - Can you blacklist based on the information in comment 9? If so, can
> you have a patch ready for Beta 7 gtb on Thu?
I don't know how blacklisting with Dual GPUs works.. I'm not sure if anyone does.. :( Jeff.. do you have any idea who we might ask?
Flags: needinfo?(bas) → needinfo?(jmuizelaar)
Assignee | ||
Updated•10 years ago
|
Flags: needinfo?(jmuizelaar)
Summary: crash in mozilla::layers::CompositorD3D11::GetTextureFactoryIdentifier() → Startup crash in mozilla::layers::CompositorD3D11::GetTextureFactoryIdentifier()
Assignee | ||
Comment 12•10 years ago
|
||
I'll try to get a patch together.
Assignee | ||
Comment 13•10 years ago
|
||
(In reply to David Major [:dmajor] (UTC+13) from comment #9)
> The Intel adapter is always device 0x0046 and the Intel driver is versions
> 8.15.10.2008 to 8.15.10.2622 inclusive.
>
> The nVidia adapter varies, mostly 0x0a70 0x0df4 0x0df1 0x0df0. The nVidia
> DLLs have versions 8.17.12.5730 to 8.17.12.6901 inclusive.
>
> The crashes are only on Win7 and Win7SP1.
David, can you get an exhaustive list of adapter id's?
Flags: needinfo?(dmajor)
Assignee | ||
Comment 14•10 years ago
|
||
So we don't really have infrastructure to handle dual gpu blacklisting...
Assignee | ||
Comment 15•10 years ago
|
||
Attachment #8580166 -
Flags: review?(bas)
Assignee | ||
Comment 16•10 years ago
|
||
Attachment #8580166 -
Attachment is obsolete: true
Attachment #8580166 -
Flags: review?(bas)
Attachment #8580189 -
Flags: review?(bas)
Comment 17•10 years ago
|
||
> David, can you get an exhaustive list of adapter id's?
0x0a70 0x0df1 0x0df4 0x0df0 0x0a7a 0x0a35 0x0dee 0x0a6c 0x0dd3 0x0a2d 0x0caf 0x0df2 0x0a2b 0x0a72 0x0a29 0x0df3
Flags: needinfo?(dmajor)
Comment 18•10 years ago
|
||
Comment on attachment 8580189 [details] [diff] [review]
A version that builds
Review of attachment 8580189 [details] [diff] [review]:
-----------------------------------------------------------------
It's a shame we will also blacklist these NVidia devices as secondary GPUs now when the intel device is not 0x0046.. But for 37 let's do this, can you put a comment in to look at this?
::: widget/GfxInfoBase.cpp
@@ +630,3 @@
>
> #if defined(XP_WIN) || defined(ANDROID)
> + uint64_t driverVersion;
What changed here?
Attachment #8580189 -
Flags: review?(bas) → review+
Assignee | ||
Updated•10 years ago
|
Summary: Startup crash in mozilla::layers::CompositorD3D11::GetTextureFactoryIdentifier() → Startup crash on Optimus w/ Intel Ironlake Graphics mozilla::layers::CompositorD3D11::GetTextureFactoryIdentifier()
Assignee | ||
Comment 19•10 years ago
|
||
I tried to find a laptop that reproduced this but made a bad assumption about what kind of intel graphics it was happening on.
Comment 20•10 years ago
|
||
You said: https://hg.mozilla.org/integration/mozilla-inbound/rev/726f8309756a
I said: https://hg.mozilla.org/integration/mozilla-inbound/rev/76b1809d22eb because it's more fun when it actually compiles
Comment 21•10 years ago
|
||
Backed both out in https://hg.mozilla.org/integration/mozilla-inbound/rev/a2e34f98c85a - whether or not it's going to eventually work on Windows, that broke tests on Mac and Android.
Assignee | ||
Comment 22•10 years ago
|
||
I agree that WINDOWS_7 is better, but what's weird is that WINDOWS7 seems to have built here:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=47ac91f92b2d
Assignee | ||
Comment 23•10 years ago
|
||
Ah never mind, I accidentally made the fix in the try push: https://hg.mozilla.org/try/rev/47ac91f92b2d
Assignee | ||
Comment 24•10 years ago
|
||
Assignee | ||
Comment 25•10 years ago
|
||
Comment on attachment 8580189 [details] [diff] [review]
A version that builds
Approval Request Comment
[Feature/regressing bug #]: Unknown
[User impact if declined]: Startup crashes on people's machines that have particular hardware.
[Describe test coverage new/current, TreeHerder]: Very limited. Hasn't been in Nightly yet. We don't have the hardware to test it on.
[Risks and why]: This changes the blocklisting infrastructure so it definitely has some risk, especially this late in the cycle
Attachment #8580189 -
Flags: approval-mozilla-beta?
Attachment #8580189 -
Flags: approval-mozilla-aurora?
Comment 26•10 years ago
|
||
Comment on attachment 8580189 [details] [diff] [review]
A version that builds
I am going to take in aurora even if it didn't land in m-c to maximize testing for beta.
Attachment #8580189 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment 27•10 years ago
|
||
This is a bad bug and a new issue in 37. However, this is too risky to land directly on Beta. We're going to wait until at least tomorrow to try and get some data. We may decide to land this later and test over the weekend with the risk of pushing the release if required.
Comment 28•10 years ago
|
||
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla39
Updated•10 years ago
|
Assignee: bas → jmuizelaar
Comment 30•10 years ago
|
||
Comment on attachment 8580189 [details] [diff] [review]
A version that builds
This change hasn't produced obvious problems on Nightly or Aurora but won't be able to really be tested until we get it onto Beta. We'll take this in the 37 RC as this looks like a significant enough issue to block the release. Beta+
Attachment #8580189 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
Comment 31•10 years ago
|
||
Comment 32•10 years ago
|
||
Updated•10 years ago
|
Assignee | ||
Comment 33•10 years ago
|
||
I see some Intel devices with DeviceID 0x0116 that hit this crash in the csv files. David can you confirm that you don't see any 0x0116 intel devices?
Flags: needinfo?(dmajor)
Comment 34•10 years ago
|
||
I do see them now. This may be a new development. I'm pretty sure it was more like 99% 0x0046 when I originally posted.
Rank Adapter device id Count %
1 0x0046 1703 91.95 %
2 0x0116 107 5.78 %
3 0x0106 39 2.11 %
4 0x0126 3 0.16 %
Flags: needinfo?(dmajor)
Reporter | ||
Comment 35•10 years ago
|
||
This crash was supposed to be fixed, but it is the #1 crash in early 37.0 release data.
Assignee | ||
Updated•10 years ago
|
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee | ||
Comment 36•10 years ago
|
||
So I noticed these crash reports of WARP- which is not really expected. Something weird might be going on there.
Reporter | ||
Comment 37•10 years ago
|
||
(In reply to Jeff Muizelaar [:jrmuizel] from comment #36)
> So I noticed these crash reports of WARP- which is not really expected.
> Something weird might be going on there.
Might that be another case of or connected to bug 1149761?
Assignee | ||
Comment 38•10 years ago
|
||
The WARP- seems innocuous. The current code will have WARP- when ever we call InitD3D11Devices and don't succeed at WARP even if we never tried. I've filed bug 1150124 to improve this reporting.
Assignee | ||
Comment 39•10 years ago
|
||
It seems as though the D3D11 compositor is being used for reasons unknown.
Assignee | ||
Updated•10 years ago
|
Assignee | ||
Comment 40•10 years ago
|
||
So I just realized that our ScopedGfxFeatureReporter writes to the AppNotes using an event posted to the main thread. This means that during startup they will not necessarily contain all of the data that we would like to see. This likely explains why it seems like we're using the D3D11 compositor without reporting that in the AppNotes.
It's conceivable that the block listing code is just not working properly and not blocking this laptops.
Assignee | ||
Comment 41•10 years ago
|
||
I typo'd the version number in the blacklisting patch. That explains why the blacklist didn't work.
Assignee | ||
Comment 42•10 years ago
|
||
Assignee | ||
Comment 43•10 years ago
|
||
Approval Request Comment
[Feature/regressing bug #]: 1137716
[User impact if declined]: Crashes on startup
[Describe test coverage new/current, TreeHerder]: None
[Risks and why]: Unintentional blacklisting
Attachment #8587560 -
Flags: approval-mozilla-release?
Attachment #8587560 -
Flags: approval-mozilla-beta?
Attachment #8587560 -
Flags: approval-mozilla-aurora?
Comment 44•10 years ago
|
||
Comment on attachment 8587560 [details] [diff] [review]
Fix driver version typo
We're going to take this blacklist typo correction for a start-up crash in 37.0.1 Release+ Beta+ Aurora+
Attachment #8587560 -
Flags: approval-mozilla-release?
Attachment #8587560 -
Flags: approval-mozilla-release+
Attachment #8587560 -
Flags: approval-mozilla-beta?
Attachment #8587560 -
Flags: approval-mozilla-beta+
Attachment #8587560 -
Flags: approval-mozilla-aurora?
Attachment #8587560 -
Flags: approval-mozilla-aurora+
Comment 45•10 years ago
|
||
Comment 46•10 years ago
|
||
Comment 47•10 years ago
|
||
Status: REOPENED → RESOLVED
Closed: 10 years ago → 10 years ago
status-firefox40:
--- → fixed
Resolution: --- → FIXED
Target Milestone: mozilla39 → mozilla40
Comment 49•10 years ago
|
||
We do not have any dual GPU setups to test this, so I guess verification of the fix can only be done by analyzing Socorro data. Please let me know if you think there is a way to manually verify this.
Comment 50•10 years ago
|
||
There are still some hits on this in 37.0.1 but the volume is greatly reduced. I think it is a matter of additional device ID's. The stragglers are: 0x0dd2 0x0dd3 0x1050 0x1051 0x1054.
Comment 51•10 years ago
|
||
I had listed 0x0dd3 in comment 17 but I don't see it in the patch (it's about 2/3 of the remaining crashes). The other device IDs must be ones that were too low volume to notice on beta.
Assignee | ||
Comment 52•10 years ago
|
||
I have one of the machines that should reproduce this (Identical machine, identical drivers), but I'm not able to for some reason...
I'll push a patch that adds the additional device ids...
Assignee | ||
Comment 53•10 years ago
|
||
Approval Request Comment
[Feature/regressing bug #]: 37
[User impact if declined]: Startup crashes
[Risks and why]: Just more devices being blocked
Attachment #8588711 -
Flags: approval-mozilla-release?
Attachment #8588711 -
Flags: approval-mozilla-beta?
Attachment #8588711 -
Flags: approval-mozilla-aurora?
Comment 54•10 years ago
|
||
Comment on attachment 8588711 [details] [diff] [review]
Block more devices
Should be in 38 beta 2 or 3.
Attachment #8588711 -
Flags: approval-mozilla-beta?
Attachment #8588711 -
Flags: approval-mozilla-beta+
Attachment #8588711 -
Flags: approval-mozilla-aurora?
Attachment #8588711 -
Flags: approval-mozilla-aurora+
Comment 55•10 years ago
|
||
For whoever decides the release approval on that patch: The relative volume of this signature is much lower now: 0.5% of 37.0.1 crashes, versus 6.6% of 37.0 crashes. But we'll need to weigh the low volume against the fact that it's a startup crash.
Comment 56•10 years ago
|
||
set back to affected to make sure sheriffs see it
Comment 57•10 years ago
|
||
Assignee | ||
Comment 58•10 years ago
|
||
I was able to reproduce this by forcing firefox to use the nvidia gpu on the machine in question.
Assignee | ||
Comment 59•10 years ago
|
||
I believe this was caused by our current blacklist breaking in some way.
Comment 60•10 years ago
|
||
Comment 61•10 years ago
|
||
Comment 62•10 years ago
|
||
Comment on attachment 8588711 [details] [diff] [review]
Block more devices
This will ride along in 37.0.2. Release+
Attachment #8588711 -
Flags: approval-mozilla-release? → approval-mozilla-release+
Comment 63•10 years ago
|
||
Comment 64•10 years ago
|
||
Jeff, could you confirm that the fix works correctly on the machine that you've reproduced this? At least for Firefox 37.0.2: https://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/37.0.2-candidates/build1/.
Flags: needinfo?(jmuizelaar)
Assignee | ||
Comment 65•10 years ago
|
||
The original patch fixed this on the machine that I have. The latest patch only impacts more rare machines.
Flags: needinfo?(jmuizelaar)
Comment 66•10 years ago
|
||
Thanks Jeff! I guess that means we need more crash data to confirm.
You need to log in
before you can comment on or make changes to this bug.
Description
•