Closed
Bug 1063048
Opened 10 years ago
Closed 10 years ago
Firefox 32 startup crash in _VEC_memzero | _VEC_memzero
Categories
(Core :: Graphics, defect)
Tracking
()
RESOLVED
FIXED
mozilla35
People
(Reporter: kairo, Assigned: bjacob)
References
Details
(Keywords: crash)
Crash Data
Attachments
(1 file)
(deleted),
patch
|
bas.schouten
:
review+
lmandel
:
approval-mozilla-aurora+
lmandel
:
approval-mozilla-beta+
lmandel
:
approval-mozilla-release+
|
Details | Diff | Splinter Review |
This bug was filed from the Socorro interface and is
report bp-5e80cac5-a6e4-4d8d-9c98-91d852140902.
=============================================================
In early Firefox 32 stats, we have a startup crash in "_VEC_memzero | _VEC_memzero" at #2 in the topcrash list, at ~40% the rate of the leading OOM|small signature.
See https://crash-stats.mozilla.com/report/list?signature=_VEC_memzero+|+_VEC_memzero&product=Firefox&process_type=browser&version=Firefox%3A32.0
Is this a resurrection of bug 988549 or something else?
The crash reasons seem to all be EXCEPTION_ACCESS_VIOLATION_WRITE and the address patterns sound familiar, possibly from that older bug.
Reporter | ||
Comment 1•10 years ago
|
||
[Tracking Requested - why for this release]:
#2 top crash in early 32 data.
David, is this related to bug 1062452 and bug 1063052 potentially?
Benoit, is this a similar issue to bug 988549?
Assignee | ||
Comment 2•10 years ago
|
||
No idea, the stack really doesn't tell us much, and _VEC_memzero is very unspecific: it just means we're crashing while zeroing some buffer. Perhaps the best way to compare it to past gfx bugs would be to see how it correlates with AdapterVendorID / AdapterDeviceID.
Flags: needinfo?(bjacob)
Yes, it may be related to bug 1062452, these are mostly switchable Intel+ATI:
_VEC_memzero | _VEC_memzero|EXCEPTION_ACCESS_VIOLATION_WRITE (267 crashes)
92% (246/267) vs. 8% (3812/46772) atiuxpag.dll
98% (261/267) vs. 16% (7352/46772) d3d10.dll
98% (261/267) vs. 16% (7352/46772) d3d10core.dll
100% (267/267) vs. 23% (10944/46772) igd10umd32.dll
Flags: needinfo?(dmajor)
Updated•10 years ago
|
status-firefox32:
--- → affected
Something that caught my eye is that this is 100% Win7 RTM (not SP1). That means these machines have a known crashy driver version -- see bug 988549 comment 34.
100% (879/880) vs. 23% (12843/55579) igd10umd32.dll
8% (69/880) vs. 0% (121/55579) 8.15.10.2125
92% (808/880) vs. 2% (1017/55579) 8.15.10.2141
Benoit, didn't we get the blacklisting for these versions sorted out?
Flags: needinfo?(bjacob)
Also: this is 97% 0x8086/0x0046 "Intel Graphics Media Accelerator HD". A spot-check shows D2D+ on all.
Assignee | ||
Comment 6•10 years ago
|
||
(In reply to David Major [:dmajor] from comment #4)
> Something that caught my eye is that this is 100% Win7 RTM (not SP1). That
> means these machines have a known crashy driver version -- see bug 988549
> comment 34.
>
> 100% (879/880) vs. 23% (12843/55579) igd10umd32.dll
> 8% (69/880) vs. 0% (121/55579) 8.15.10.2125
> 92% (808/880) vs. 2% (1017/55579) 8.15.10.2141
>
> Benoit, didn't we get the blacklisting for these versions sorted out?
(In reply to David Major [:dmajor] from comment #5)
> Also: this is 97% 0x8086/0x0046 "Intel Graphics Media Accelerator HD". A
> spot-check shows D2D+ on all.
Indeed, on the release channel, versions < 8.15.10.2202 should be blacklisted:
http://hg.mozilla.org/releases/mozilla-release/file/tip/widget/windows/GfxInfo.cpp#l936
as device 0x0046 here falls under the "4500HD" category:
http://hg.mozilla.org/releases/mozilla-release/file/tip/widget/xpwidgets/GfxDriverInfo.cpp#l144
So it is mysterious why these would have D2D+. Do these machines have a second GPU (as reported in App Notes) ? Why does the system in comment 0 have an AMD GPU?
Flags: needinfo?(bjacob)
> So it is mysterious why these would have D2D+. Do these machines have a
> second GPU (as reported in App Notes) ? Why does the system in comment 0
> have an AMD GPU?
Comment 3 shows most of these having an ATI module loaded. I don't see dual GPU in App Notes, though. I do see "DriverVersionMismatch" on all of them:
https://crash-stats.mozilla.com/search/?version=32.0&signature=%3D_VEC_memzero+|+_VEC_memzero&_facets=signature&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=app_notes
Assignee | ||
Comment 8•10 years ago
|
||
(In reply to David Major [:dmajor] from comment #7)
> I do see "DriverVersionMismatch" on all of them:
Dang! Excellent find. We used to blacklist on that condition. The second patch on bug 984417 changed that to only reporting it in AppNotes, what you saw there, but not blacklisting.
As a hot fix for Firefox 32, we can easily revert to that behavior. That is, just back out https://hg.mozilla.org/mozilla-central/rev/35ff4bfb198f .
On mozilla-central and IMHO aurora at least, we should go for something smarter than that.
The issue at hand here is that the driver version is given to us in two different places: 1) from the Windows registry, as we report in AppNotes, and 2) from the DLL, as you examined in comment 4.
Our blacklisting logic, which also writes these AppNotes, uses only the value from the Windows registry.
We've long known that the value from the Windows registry is sometimes wrong i.e. different from the value in the DLL. That's what we call "DriverVersionMismatch".
We used to blacklist Direct2D whenever a DriverVersionMismatch happened, which is why we didn't crash. Going back to that behavior is the easy fix for a 32 chemspill. But going forward, we really should stop using this unreliable value from the registry, since we have the value from the DLL anyway.
This is nontrivial, because there are other things that we don't know how to do without the Windows registry. For example, how to get the GPU device id.
Well, we know how to do that --- that's what we've been doing on desktop Linux: start a separate process to query that information by creating an actual device/context and querying it. But we've never prioritize doing the right things around there.
Assignee | ||
Comment 9•10 years ago
|
||
See above comment. That whole thing, that we sadly have to revert to, is a mess, but is made necessary by the Windows registry containing unreliable information. Will file a follow-up bug, to no longer depend on the Windows registry for blacklisting, but that is a nontrivial engineering project. The present backout patch is the only reasonable thing to do for the aurora/beta/release channels.
Attachment #8486886 -
Flags: review?(bas)
Assignee | ||
Comment 10•10 years ago
|
||
Filed bug 1065212 about stopping to rely on the Windows registry to get this information.
Comment 11•10 years ago
|
||
Comment on attachment 8486886 [details] [diff] [review]
backout-unblacklisting-DriverVersionMismatch
Review of attachment 8486886 [details] [diff] [review]:
-----------------------------------------------------------------
::: widget/windows/GfxInfo.cpp
@@ +1029,5 @@
> +
> + if (mHasDriverVersionMismatch) {
> + if (aFeature == nsIGfxInfo::FEATURE_DIRECT3D_10_LAYERS ||
> + aFeature == nsIGfxInfo::FEATURE_DIRECT3D_10_1_LAYERS ||
> + aFeature == nsIGfxInfo::FEATURE_DIRECT2D)
Note we're not blocking D3D_11, which we maybe should do. But let's try this first and see if we run into trouble.
Attachment #8486886 -
Flags: review?(bas) → review+
Assignee | ||
Comment 12•10 years ago
|
||
Comment on attachment 8486886 [details] [diff] [review]
backout-unblacklisting-DriverVersionMismatch
Note: OK, for channels that have DIRECT3D_11, I will add it and consider that the r+ extends to it.
Approval Request Comment
[Feature/regressing bug #]: bug 984417
[User impact if declined]: lots of crashes - enough to be a chemspill driver
[Describe test coverage new/current, TBPL]: none, in fact none of our Windows test slaves uses Intel graphics, afaik.
[Risks and why]: very, very low risk: just backing out a small, simple patch.
[String/UUID change made/needed]: none
Attachment #8486886 -
Flags: approval-mozilla-release?
Attachment #8486886 -
Flags: approval-mozilla-beta?
Attachment #8486886 -
Flags: approval-mozilla-aurora?
Comment 13•10 years ago
|
||
Benoit - Great that you have a simple patch to fix this bug and that you know the long term solution as well (bug 1065212).
I don't see many crashes with this signature on Aurora (30), Nightly (4), or Beta (0). It may be worth landing on Aurora to see if the crash volume drops to zero but we will have to wait a few days for results. The point of waiting is to confirm to the best of our ability that this fix really does address the issue before pushing out 32.0.1.
Comment 14•10 years ago
|
||
Let's ignore the specific signature for the moment, since that can change a lot across channels.
Here's a list of the crashes which have DriverVersionMismatch in them in FF32:
https://crash-stats.mozilla.com/search/?version=32.0&app_notes=DriverVersionMismatch&_facets=signature&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=app_notes
Here's the same list in recent nightly:
https://crash-stats.mozilla.com/search/?release_channel=nightly&app_notes=DriverVersionMismatch&build_id=%3E20140901000000&_facets=signature&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=app_notes
Ignoring bug 1062612 which is unrelated, we could check whether this patch improves OOM | Small or the abort rates on trunk. But that data will take several days to be reliable, and I think we'd be better off just doing 32.0.1 with this, and updating existing 32 users first to see if this fixes the regression.
Assignee | ||
Comment 15•10 years ago
|
||
Reporter | ||
Comment 16•10 years ago
|
||
(In reply to Benjamin Smedberg [:bsmedberg] from comment #14)
> Let's ignore the specific signature for the moment, since that can change a
> lot across channels.
Well, the signature in this bug makes sense because we know from other blocklisting issues (see bug 988549) that older Intel drivers crash with this signature when we enable D2D.
> Here's the same list in recent nightly:
Most of those are actually crashes that we know have different causes and just happen to people with driver version mismatches as well.
That said, if this has an impact on OOM|small, I'd love to see that. :)
Comment 17•10 years ago
|
||
Comment on attachment 8486886 [details] [diff] [review]
backout-unblacklisting-DriverVersionMismatch
This is the driver for the 32.0.1 desktop release. Approving the backout for aurora, beta, and release.
Attachment #8486886 -
Flags: approval-mozilla-release?
Attachment #8486886 -
Flags: approval-mozilla-release+
Attachment #8486886 -
Flags: approval-mozilla-beta?
Attachment #8486886 -
Flags: approval-mozilla-beta+
Attachment #8486886 -
Flags: approval-mozilla-aurora?
Attachment #8486886 -
Flags: approval-mozilla-aurora+
Comment 18•10 years ago
|
||
https://hg.mozilla.org/releases/mozilla-release/rev/227d1d0bf16b
Queued for Aurora/Beta as well.
Assignee: nobody → bjacob
status-firefox33:
--- → affected
status-firefox34:
--- → affected
status-firefox35:
--- → fixed
status-firefox-esr31:
--- → unaffected
Comment 19•10 years ago
|
||
Comment 20•10 years ago
|
||
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla35
Reporter | ||
Comment 22•10 years ago
|
||
I can verify that this is fixed in 32.0.1, judging by the crash data from over the weekend.
You need to log in
before you can comment on or make changes to this bug.
Description
•