Closed
Bug 1359416
Opened 8 years ago
Closed 7 years ago
Increase in Intel graphics driver crashes with gfx.direct3d11.allow-intel-mutex on
Categories
(Core :: Graphics, defect)
Tracking
()
RESOLVED
FIXED
mozilla56
Tracking | Status | |
---|---|---|
firefox-esr52 | --- | unaffected |
firefox53 | --- | unaffected |
firefox54 | + | disabled |
firefox55 | + | fixed |
firefox56 | + | fixed |
People
(Reporter: philipp, Assigned: bas.schouten)
References
(Blocks 1 open bug)
Details
(Keywords: crash, regression, topcrash-win)
Crash Data
Attachments
(2 files)
(deleted),
text/x-review-board-request
|
milan
:
review+
gchang
:
approval-mozilla-beta+
|
Details |
(deleted),
text/x-review-board-request
|
dvander
:
review+
jcristau
:
approval-mozilla-beta+
|
Details |
This bug was filed from the Socorro interface and is
report bp-ac8e08c0-4db2-4d81-a6c1-7b7dd0170425.
=============================================================
intel graphics driver crashes on windows are sharply increasing on the beta channel after 54.0b1 got rolled out there. while they where accounting to 1.5% of crashes in the browser process during the 53.0b cycle, crashes with signatures starting with igd* are now 15% of browser crashes in 54.0b1.
http://bit.ly/2q0vGay shows the signatures with the most movement between the two beta cycles (i've added all signatures which share increased to this bug).
this is happening on all supported windows versions and seems to originate from a couple of particular gpus:
Adapter device id facet
1 0x0116 794 33.32 %
2 0x0166 422 17.71 %
3 0x0152 318 13.34 %
4 0x0102 284 11.92 %
5 0x0106 271 11.37 %
6 0x0126 208 8.73 %
7 0x0156 61 2.56 %
8 0x0162 17 0.71 %
9 0x0122 5 0.21 %
10 0x0112 3 0.13 %
adding a ni for milan to make sure that the proper people are looking into this :-)
Flags: needinfo?(milan)
Reporter | ||
Comment 1•8 years ago
|
||
a good junk of the increase seems to come from reports with IMFYCbCrImage::GetD3D11TextureData in the stack, so maybe that's related to bug 1300699?
Bug 1300699 was hopefully just unit test code, but there are enough changes in there that I'd like that confirmed - Kevin, Jerry, did bug 1300699 patches contain any changes to the code that runs outside of the unit tests, even if those changes "should not make any difference"?
Flags: needinfo?(milan)
Flags: needinfo?(kechen)
Flags: needinfo?(hshih)
Looks like bug 1348320
Reporter | ||
Comment 4•8 years ago
|
||
i forgot to add the query for the crashes mentioned in comment #1:
https://crash-stats.mozilla.com/search/?proto_signature=~IMFYCbCrImage%3A%3AGetD3D11TextureData&date=%3E%3D2016-12-01T00%3A00%3A00.000Z&_facets=signature&_facets=version&_facets=user_comments&_facets=adapter_vendor_id&_facets=build_id&_facets=install_time&_facets=platform_pretty_version&_facets=useragent_locale&_facets=release_channel&_facets=cpu_arch&_facets=adapter_driver_version&_facets=adapter_device_id&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-version
Comment 6•8 years ago
|
||
After reviewing the patches in Bug 1300699 again, there are two minor changes outside of the unit test.
1. The timing of executing GetSize()[1] :
Before the patch, this method was involved when creating DXGIYCbCrTextureData[2] while the patch changed the function call to the beginning of the function. But, from my understanding, there was no chance to change the mSize between these two timing.
2. An additional call to GetContent/CompositorDevice in[3].
[1] https://hg.mozilla.org/mozilla-central/file/3f8ed553e230/gfx/layers/IMFYCbCrImage.cpp#l335
[2] https://hg.mozilla.org/mozilla-central/file/17853e696113/gfx/layers/IMFYCbCrImage.cpp#l331
[3] https://hg.mozilla.org/mozilla-central/file/3f8ed553e230/gfx/layers/IMFYCbCrImage.cpp#l227
Flags: needinfo?(kechen)
Comment 7•8 years ago
|
||
Track 54+ as the volume of crash in 54 is high.
Updated•8 years ago
|
status-firefox53:
--- → unaffected
Comment 8•8 years ago
|
||
Perhaps the increasing in 54.0b1 is related to Bug 1335971?
It seems like we still have the problem in Bug 1292923.
Hello Bas, any idea about this?
Flags: needinfo?(bas)
Reporter | ||
Comment 9•8 years ago
|
||
(In reply to Kevin Chen[:kechen] (UTC + 8) from comment #8)
> Perhaps the increasing in 54.0b1 is related to Bug 1335971?
yeah, that might be in line with coincidental crash stats data. the spike wasn't very visible during the 54 nightly cycle (maybe people there generally use newer hardware), except perhaps for the [@ igd10umd64.dll] signature which popped up somewhat more often starting with build 20170205030206 - that would fit well with when bug 1335971 landed: https://crash-stats.mozilla.com/signature/?product=Firefox&release_channel=nightly&process_type=browser&signature=igd10umd64.dll&date=%3E%3D2017-01-01T00%3A00%3A00.000Z#reports (sort by buildid).
could we attempt a backout of bug 1335971 from beta, as the crash volume continues to be really high there (16% of all browser crashes from 54.0b2 have a signature starting with igd*)?
The functionality from bug 1335971 is behind a preference gfx.direct3d11.allow-intel-mutex, which means we can turn this off using a system add-on. Probably less intrusive than backing out the code - Bas, I'm assuming disabling this would be the same as backing the code out.
Ryan, I can't recall where to start with the system add-on approach, and I'm assuming that's preferable from rel coordination point of view?
Flags: needinfo?(ryanvm)
Comment 11•8 years ago
|
||
We could ship a hotfix addon theoretically. But we also ship two beta builds a week, so wouldn't it be easier to just flip the pref on Beta and have it go out that way?
Flags: needinfo?(ryanvm)
Reporter | ||
Updated•8 years ago
|
Summary: Crash in igd10umd32.dll | CResource<T>::CLS::FinalConstruct → Increase in Intel graphics driver crashes in 54.0b cycle
Changing the pref on beta would be great.
Assignee: nobody → bas
Assignee | ||
Comment 13•8 years ago
|
||
Having looked at this a little more I have some doubts whether bug 1335971 is responsible. But let's flip the pref anyway and see.
Flags: needinfo?(bas)
Assignee | ||
Comment 14•8 years ago
|
||
Although.. maybe for the ones with IMFYCbCrImage::GetD3D11TextureData in it it is. So if those are where the increase is coming from it could work.
Comment hidden (mozreview-request) |
Comment 16•8 years ago
|
||
mozreview-review |
Comment on attachment 8863147 [details]
Bug 1359416: Disable uploading on client side on Intel devices on Beta.
https://reviewboard.mozilla.org/r/134972/#review138056
Attachment #8863147 -
Flags: review?(milan) → review+
Reporter | ||
Updated•8 years ago
|
Keywords: checkin-needed
Hi Bas, this was mentioned as a top crasher on Beta. If this fix helps, could you please nominate this patch for uplift to Beta55? Thanks!
Flags: needinfo?(bas)
Reporter | ||
Comment 18•8 years ago
|
||
...54 beta 5 :-)
Comment 19•8 years ago
|
||
We're sorry, Autoland could not rebase your commits for you automatically. Please manually rebase your commits and try again.
hg error in cmd: hg rebase -s a2479bb25aca -d 83fe6517d5be: rebasing 393202:a2479bb25aca "Bug 1359416: Disable uploading on client side on Intel devices on Beta. r=milan" (tip)
merging gfx/thebes/gfxPrefs.h
warning: conflicts while merging gfx/thebes/gfxPrefs.h! (edit, then use 'hg resolve --mark')
unresolved conflicts (see hg resolve, then hg rebase --continue)
Comment 20•8 years ago
|
||
In retrospect, comment 19 was silly because we only want this on Beta. Which means it needs approval first and shouldn't get autolanded once it has it.
Keywords: checkin-needed
Updated•8 years ago
|
Flags: needinfo?(hshih)
Assignee | ||
Comment 21•8 years ago
|
||
Comment on attachment 8863147 [details]
Bug 1359416: Disable uploading on client side on Intel devices on Beta.
Approval Request Comment
[Feature/Bug causing the regression]: 1335971
[User impact if declined]: Crashes
[Is this code covered by automated tests?]: Yes
[Has the fix been verified in Nightly?]: No
[Needs manual test from QE? If yes, steps to reproduce]: No
[List of other uplifts needed for the feature/fix]: None
[Is the change risky?]: No
[Why is the change risky/not risky?]: Reverting to old behavior.
[String changes made/needed]: None
Flags: needinfo?(bas)
Attachment #8863147 -
Flags: approval-mozilla-beta?
Comment 22•8 years ago
|
||
Comment on attachment 8863147 [details]
Bug 1359416: Disable uploading on client side on Intel devices on Beta.
Revert to old behavior and see how it goes in Beta54. Beta54+. Should be in 54 beta 5.
Attachment #8863147 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
Comment 23•8 years ago
|
||
bugherder uplift |
Comment 24•8 years ago
|
||
(In reply to Bas Schouten (:bas.schouten) from comment #21)
> [Is this code covered by automated tests?]: Yes
> [Has the fix been verified in Nightly?]: No
> [Needs manual test from QE? If yes, steps to reproduce]: No
Setting qe-verify- based on Bas' assessment on manual testing needs and the fact that this fix has automated coverage.
Flags: qe-verify-
Reporter | ||
Comment 25•7 years ago
|
||
toggling gfx.direct3d11.allow-intel-mutex to false in 54.0b5 worked very well - the intel driver crashes are back to a level we were used to see in prior versions.
Crash Signature: igd10umd32.dll | NDXGI::CDevice::SetSharedResourceCreationArgs ]
[@ igd10umd32.dll | CContext::ID3D11DeviceContext2_UpdateSubresource_<T> ]
[@ igd10umd32.dll | d3d11.dll@0xc0651 ] → igd10umd32.dll | NDXGI::CDevice::SetSharedResourceCreationArgs ]
[@ igd10umd32.dll | CContext::ID3D11DeviceContext2_UpdateSubresource_<T> ]
[@ igd10umd32.dll | d3d11.dll@0xc0651 ]
[@ memset | TextStageManager::MapTextureTransferSurface ]
Assignee | ||
Comment 26•7 years ago
|
||
(In reply to [:philipp] from comment #25)
> toggling gfx.direct3d11.allow-intel-mutex to false in 54.0b5 worked very
> well - the intel driver crashes are back to a level we were used to see in
> prior versions.
Any chance you could verify the patch in bug 1348320 has also been effective at reducing our crash rates for 55?
Flags: needinfo?(madperson)
Reporter | ||
Comment 27•7 years ago
|
||
curiously there's a minuscule volume of these crashes on nightly in the first place - i could only see a hand full of them on 55.0a1 vs many thousands on 54.0b (not sure why that is, maybe the affected intel hd 2500/3000/4000 devices aren't as present there).
so in order to verify the fix i think it would be either necessary to uplift the patch and flip back the preference for a trial run again or wait until the next cycle and see how things turn out then...
Flags: needinfo?(madperson)
Comment 28•7 years ago
|
||
[Tracking Requested - why for this release]: Topcrash in early Beta 55 builds
Comment 29•7 years ago
|
||
Based on comment 27 it sounds like bug 1348320 was not as effective as it was hoped. Should we land this patch here in 55 too?
Flags: needinfo?(bas)
Comment 30•7 years ago
|
||
... I meant comment 28.
Reporter | ||
Updated•7 years ago
|
Assignee | ||
Comment 31•7 years ago
|
||
(In reply to Julien Cristau [:jcristau] from comment #29)
> Based on comment 27 it sounds like bug 1348320 was not as effective as it
> was hoped. Should we land this patch here in 55 too?
It would be a pretty big perf setback. We should probably figure out what's causing beta to be disproportionately affected by this and blacklist specifically the set of hardware causing this.
Flags: needinfo?(bas)
Reporter | ||
Comment 32•7 years ago
|
||
so far these are some correlations from the 55.0b rollout:
Platform pretty version facet
1 Windows 10 284 90.45 %
2 Windows 7 21 6.69 %
3 Windows 8.1 6 1.91 %
4 Windows 8 3 0.96 %
Adapter device id facet
1 0x0116 180 57.32 %
2 0x0126 75 23.89 %
3 0x0102 24 7.64 %
4 0x0106 18 5.73 %
5 0x0166 12 3.82 %
6 0x0152 4 1.27 %
7 0x0112 1 0.32 %
Adapter driver version facet
1 9.17.10.4459 248 78.98 %
2 9.17.10.4229 36 11.46 %
3 8.15.10.2418 18 5.73 %
4 9.17.10.2770 2 0.64 %
5 9.17.10.2828 2 0.64 %
...
this is the crash stats query i've used for this: http://bit.ly/2stomHP
Reporter | ||
Updated•7 years ago
|
Summary: Increase in Intel graphics driver crashes in 54.0b cycle → Increase in Intel graphics driver crashes with gfx.direct3d11.allow-intel-mutex on
Comment 33•7 years ago
|
||
This is a top crash in beta, I think we should land the pref flip now, rather than keeping this crash around for weeks when we know we have a workaround.
Reporter | ||
Comment 34•7 years ago
|
||
milan, could you make the the pref flip happen again?
the issue is hanging around on 55.0b for a while now already & while the volume of this bug seems less severe than it has been in the 54.0b cycle, with ~200 daily crashes on beta it's still one of the top issues and crash rates in 55 are generally higher than we'd like to see...
Flags: needinfo?(milan)
If we block gen6/sandybridge, we'll take care of #1, #2, #3, #4, #7 in the list in comment 32. That's about 11% of all the users; I don't know what percentage of Windows 10 users that is.
Bas, what do you think about blocking those users?
Flags: needinfo?(milan) → needinfo?(bas)
Assignee | ||
Comment 36•7 years ago
|
||
(In reply to Milan Sreckovic [:milan] from comment #35)
> If we block gen6/sandybridge, we'll take care of #1, #2, #3, #4, #7 in the
> list in comment 32. That's about 11% of all the users; I don't know what
> percentage of Windows 10 users that is.
> Bas, what do you think about blocking those users?
Taking the perf improvement from 11% of the user is definitely better than taking it from all of them!
Flags: needinfo?(bas)
Comment 37•7 years ago
|
||
Any chance we can get either the pref flip or blocking a subset of chips done this week?
Flags: needinfo?(bas)
Assignee | ||
Comment 38•7 years ago
|
||
Jeff, can you send me the list of device IDs that constitute Tesla devices?
Flags: needinfo?(bas) → needinfo?(jmuizelaar)
Comment 39•7 years ago
|
||
The list is here in the Tesla section: https://github.com/jrmuizel/gpu-db/blob/master/nvidia.json
Flags: needinfo?(jmuizelaar)
Comment hidden (mozreview-request) |
Assignee | ||
Comment 41•7 years ago
|
||
(In reply to Jeff Muizelaar [:jrmuizel] from comment #39)
> The list is here in the Tesla section:
> https://github.com/jrmuizel/gpu-db/blob/master/nvidia.json
I posted this on the wrong bug, but thanks!
Assignee | ||
Comment 42•7 years ago
|
||
mozreview-review |
Comment on attachment 8887673 [details]
Bug 1359416: Allow blocklisting of D3D11 keyed mutex and block it for Intel Gen 6.
https://reviewboard.mozilla.org/r/158560/#review164116
Attachment #8887673 -
Flags: review?(bas) → review+
Updated•7 years ago
|
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment 45•7 years ago
|
||
Pushed by msreckovic@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/4d83b498dc00
Allow blocklisting of D3D11 keyed mutex and block it for Intel Gen 6. r=bas
I had to back this out for various test failures with assertions like https://treeherder.mozilla.org/logviewer.html#?job_id=116542086&repo=autoland
Flags: needinfo?(bas)
Comment 47•7 years ago
|
||
Backout by kwierso@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/ea24b188a4b0
Backed out changeset 4d83b498dc00 for assertions in ServoStyleSet a=backout CLOSED TREE
The preferences now get first accessed off the main thread, and had no chance to be initialized. I'll take a look.
Updated•7 years ago
|
Flags: needinfo?(bas) → needinfo?(milan)
Updated•7 years ago
|
Flags: needinfo?(milan)
Comment hidden (mozreview-request) |
(In reply to Milan Sreckovic [:milan] from comment #50)
> Comment on attachment 8887673 [details]
> Bug 1359416: Allow blocklisting of D3D11 keyed mutex and block it for Intel
> Gen 6.
>
> Review request updated; see interdiff:
> https://reviewboard.mozilla.org/r/158560/diff/3-4/
Changed to David as Bas already reviewed the other part, and this deals with the whole gfxVars story.
Comment 52•7 years ago
|
||
mozreview-review |
Comment on attachment 8887673 [details]
Bug 1359416: Allow blocklisting of D3D11 keyed mutex and block it for Intel Gen 6.
https://reviewboard.mozilla.org/r/158560/#review166368
Attachment #8887673 -
Flags: review?(dvander) → review+
Comment 53•7 years ago
|
||
Pushed by msreckovic@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/38778100700e
Allow blocklisting of D3D11 keyed mutex and block it for Intel Gen 6. r=bas,dvander
Comment 54•7 years ago
|
||
bugherder |
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
Updated•7 years ago
|
status-firefox-esr52:
--- → unaffected
Reporter | ||
Comment 55•7 years ago
|
||
thanks for the patch, could you request uplift to beta if you deem fit to do so?
Flags: needinfo?(milan)
Comment on attachment 8887673 [details]
Bug 1359416: Allow blocklisting of D3D11 keyed mutex and block it for Intel Gen 6.
Approval Request Comment
[Feature/Bug causing the regression]:
[User impact if declined]: Users that should be blocked get the feature.
[Is this code covered by automated tests?]:
[Has the fix been verified in Nightly?]: It is probably too early to see the trends already.
[Needs manual test from QE? If yes, steps to reproduce]:
[List of other uplifts needed for the feature/fix]:
[Is the change risky?]: Going back to old functionality, low risk.
[Why is the change risky/not risky?]:
[String changes made/needed]:
Flags: needinfo?(milan)
Attachment #8887673 -
Flags: approval-mozilla-beta?
In addition to the above comments, another reason to take the patch into beta - this gives us the ability to use the downloadable blocklist to turn this functionality off remotely, if any additional troublesome configurations present themselves.
Comment 58•7 years ago
|
||
Comment on attachment 8887673 [details]
Bug 1359416: Allow blocklisting of D3D11 keyed mutex and block it for Intel Gen 6.
add blocklist for a d3d11 feature on intel gen 6 to try and work around crashes. should be in 55.0b13
Attachment #8887673 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
Comment 59•7 years ago
|
||
bugherder uplift |
Reporter | ||
Updated•7 years ago
|
Comment hidden (Intermittent Failures Robot) |
Updated•5 years ago
|
Blocks: gfx-driver-bug
You need to log in
before you can comment on or make changes to this bug.
Description
•