Closed Bug 711656 Opened 13 years ago Closed 13 years ago

Firefox 11.0a1 Crash in TextStageManager::AddStagesForSubrect with Intel GMA X4500HD, 4500MHD, and HD Graphics up to 8.15.10.2141

Categories

(Core :: Graphics, defect)

11 Branch
All
Windows 7
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla13
Tracking Status
firefox10 --- unaffected
firefox11 + verified
firefox12 + fixed
firefox13 --- fixed
firefox-esr10 --- unaffected

People

(Reporter: marcia, Assigned: drs)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

(Keywords: crash, regression, topcrash, Whiteboard: [startupcrash][qa-] Keep Open!!)

Crash Data

Attachments

(8 files, 4 obsolete files)

(deleted), patch
joe
: review+
Details | Diff | Splinter Review
(deleted), patch
jrmuizel
: review+
Details | Diff | Splinter Review
(deleted), patch
bjacob
: review+
Details | Diff | Splinter Review
(deleted), patch
Details | Diff | Splinter Review
(deleted), patch
bas.schouten
: review+
Details | Diff | Splinter Review
(deleted), image/png
Details
(deleted), image/png
Details
(deleted), image/png
Details
Seen while looking at Windows crash stats. https://crash-stats.mozilla.com/report/list?signature=__entry_from_strcat_in_strcpy. Crashes started in crash stats using the 2011120100 build, but there was a rather large spike on 2011121500 with 116 crashes. Possible pushlog regression range: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=cc94a16983b0&tochange=e9686560b98d Frame Module Signature Source 0 msvcrt.dll __entry_from_strcat_in_strcpy 1 d2d1.dll TextStageManager::AddStagesForSubrect(IDWritePrivateGlyphRunAnalysis*,SubRectInfo const*) 2 msvcrt.dll free 3 DWrite.dll DWriteGlyphRunAnalysis::GetAlphaTextureBounds(DWRITE_TEXTURE_TYPE,tagRECT*) 4 DWrite.dll GlyphRunAnalysis::GlyphRunAnalysis(ClientSideCacheContext&,FontFaceElement const&,DWRITE_FONT_SIMULATIONS,DWRITE_GLYPH_RUN const*,float,DWRITE_MATRIX const*,DWRITE_RENDERING_MODE,DWRITE_MEASURING_MODE,float,float) 5 d2d1.dll HwGlyphRunRealizer::IssueRenderingCommands(BatchedBrush*)
I added related crash signatures that started in 11.0a1/20111215.
Crash Signature: [@ __entry_from_strcat_in_strcpy] [@ memcpy | TextStageManager::AddStagesForSubrect(IDWritePrivateGlyphRunAnalysis*, SubRectInfo const*)] [@ memset | TextStageManager::AddStagesForSubrect(IDWritePrivateGlyphRunAnalysis*, SubRectInfo const*)]
Summary: Firefox 11.0a1 Crash [@ __entry_from_strcat_in_strcpy ] → Firefox 11.0a1 Crash in TextStageManager::AddStagesForSubrect
The stacks here look fairly bogus to me, unfortunately. Trying to guess what could be going on... the crashes I looked at are all EXCEPTION_ACCESS_VIOLATION_WRITE, and virtually all of them occured at a page boundary (the only exception I saw was at 0x20, which sounds like it's related to an unexpectedly-NULL pointer). For the rest of them, I'm guessing that someone is writing beyond an allocated buffer, perhaps corrupting something else that follows, but crashing when they reach the end of the writable page. As a precaution, I'm marking this as security-sensitive, given that it looks like there's code writing to memory it shouldn't be touching.... if someone figures out the specific scenario where this happens, they might be able to control what gets corrupted, and hence arrive at some kind of exploit. The dwrite-related addresses on the stack seem to imply involvement of font/text-rendering code, even though I don't actually believe the backtraces are accurate. Most of the crashes are happening at startup, and one has a comment "facebook" .... Looking at the possible pushlog range suggested in comment #0, I'm concerned there might be a link with the system-font-fallback telemetry that was added in bug 705590 (though I don't actually see a problem in the patch there). Hitting font fallback on facebook (due to the other-language links) at startup is likely very common.
Group: core-security
Whiteboard: sg:critical?
I don't think this is something due to the Telemetry probe in system font fallback that landed 11/30, there are reports of this happening as far back as June but the problem seems to have grown more frequent around 12/15, as noted in comment 1. The crashes after 12/15 all seem to happen very close to startup time (uptime = 1, 2, 3, 6, 7, etc.). DirectWrite GlyphRunAnalysis objects are used in drawing. Bas's patch for bug 702851 was landed around 12/15, maybe the patch there somehow tickled the code that causes this crash?
I pulled down 167 dumps from the past four weeks. All of these seem to contain the following stack on one of the non-crashing threads: 0 ntdll.dll ZwGetContextThread 1 KERNELBASE.dll KERNELBASE.dll@0x6249f 2 KERNELBASE.dll KERNELBASE.dll@0x518a 3 kernel32.dll kernel32.dll@0x4addb 4 dbghelp.dll Win32LiveSystemProvider::EnumModules 5 KERNELBASE.dll KERNELBASE.dll@0x13240 6 kernel32.dll kernel32.dll@0xe5eb 7 ntdll.dll RtlAllocateMemoryBlockLookaside 8 dbghelp.dll NtWin32LiveSystemProvider::EnumFunctionTables 9 ntdll.dll RtlNtStatusToDosErrorNoTeb 10 ntdll.dll RtlNtStatusToDosErrorNoTeb 11 dbghelp.dll AllocMemory 12 dbghelp.dll GenGetDebugRecord 13 KERNELBASE.dll KERNELBASE.dll@0xc095 14 dbghelp.dll GenAllocateModuleObject 15 KERNELBASE.dll KERNELBASE.dll@0x1852 16 kernel32.dll kernel32.dll@0x22a60 17 dbghelp.dll GenGetAuxMemory 18 dbghelp.dll GenGetProcessInfo 19 ntdll.dll RtlDosApplyFileIsolationRedirection_Ustr Anyone have a clue as to what this might be? I don't see this in other crashdumps.
I'm guessing Bas would be the best person to track this down.
Assignee: nobody → bas.schouten
That doesn't ring any bells. Can you pull one of these dumps into Visual Studio or WinDBG and get a better stack for that thread?
The crash volume is now the same as before the spike that has decreased slowly. I added the _VEC_memzero crash signature which has similar stack traces and spiked at the same time.
Crash Signature: [@ __entry_from_strcat_in_strcpy] [@ memcpy | TextStageManager::AddStagesForSubrect(IDWritePrivateGlyphRunAnalysis*, SubRectInfo const*)] [@ memset | TextStageManager::AddStagesForSubrect(IDWritePrivateGlyphRunAnalysis*, SubRectInfo const*)] → [@ __entry_from_strcat_in_strcpy] [@ memcpy | TextStageManager::AddStagesForSubrect(IDWritePrivateGlyphRunAnalysis*, SubRectInfo const*)] [@ memset | TextStageManager::AddStagesForSubrect(IDWritePrivateGlyphRunAnalysis* SubRectInfo const*)] [@ _VEC_mem…
If this now occurs at startup for some configurations, then we may have forced these users to not use Nightly, hence the drop in crashes.
Adding the top crash keyword - #8 on the trunk.
Keywords: topcrash
The _VEC_memzero crash signature is #6 top crasher in 11.0a2 and #30 in 12.0a1. The first frames of the stack trace are: Frame Module Signature [Expand] Source 0 msvcrt.dll _VEC_memzero 1 d2d1.dll TextStageManager::MapTextureTransferSurface 2 d2d1.dll TextStageManager::AddStagesForSubrect 3 d2d1.dll HwGlyphRunRealizer::IssueRenderingCommands 4 d2d1.dll CHwSurfaceRenderTarget::DrawGlyphRunInternal 5 d2d1.dll CCommand_DrawGlyphRun::Execute 6 d2d1.dll CHwSurfaceRenderTarget::ProcessBatch 7 d2d1.dll CBatchSerializer::FlushInternal 8 d2d1.dll CBatchSerializer::FlushInternalAndTryIncreaseStorage 9 d2d1.dll DrawingContext::SetTextAntialiasMode 10 d2d1.dll D2DRenderTargetBase<ID2D1DCRenderTarget>::SetTextAntialiasMode 11 xul.dll _cairo_dwrite_show_glyphs_on_d2d_surface gfx/cairo/cairo/src/cairo-d2d-surface.cpp:4099 12 xul.dll _cairo_d2d_show_glyphs gfx/cairo/cairo/src/cairo-d2d-surface.cpp:4205 13 xul.dll _cairo_surface_show_text_glyphs gfx/cairo/cairo/src/cairo-surface.c:2768 14 xul.dll _cairo_gstate_show_text_glyphs gfx/cairo/cairo/src/cairo-gstate.c:1998 15 xul.dll GlyphBuffer::Flush gfx/thebes/gfxFont.cpp:1135 16 xul.dll gfxFont::Draw gfx/thebes/gfxFont.cpp:1328 17 xul.dll gfxTextRun::Draw gfx/thebes/gfxFont.cpp:3679 18 xul.dll nsTextFrame::DrawTextRun layout/generic/nsTextFrameThebes.cpp:5345 19 xul.dll nsTextFrame::DrawText layout/generic/nsTextFrameThebes.cpp:5467 20 xul.dll nsTextFrame::PaintText layout/generic/nsTextFrameThebes.cpp:5333 21 xul.dll nsDisplayText::Paint layout/generic/nsTextFrameThebes.cpp:4177 22 xul.dll mozilla::FrameLayerBuilder::DrawThebesLayer layout/base/FrameLayerBuilder.cpp:2120 23 xul.dll mozilla::layers::ThebesLayerD3D10::DrawRegion gfx/layers/d3d10/ThebesLayerD3D10.cpp:406 24 xul.dll mozilla::layers::ThebesLayerD3D10::Validate gfx/layers/d3d10/ThebesLayerD3D10.cpp:281 25 xul.dll mozilla::layers::ContainerLayerD3D10::Validate gfx/layers/d3d10/ContainerLayerD3D10.cpp:380 26 xul.dll mozilla::layers::ContainerLayerD3D10::Validate gfx/layers/d3d10/ContainerLayerD3D10.cpp:382 27 xul.dll mozilla::layers::LayerManagerD3D10::Render gfx/layers/d3d10/LayerManagerD3D10.cpp:686 28 xul.dll mozilla::layers::LayerManagerD3D10::EndTransaction gfx/layers/d3d10/LayerManagerD3D10.cpp:355 29 xul.dll nsDisplayList::PaintForFrame layout/base/nsDisplayList.cpp:633 30 xul.dll nsLayoutUtils::PaintFrame layout/base/nsLayoutUtils.cpp:1700 More reports at: https://crash-stats.mozilla.com/report/list?signature=_VEC_memzero
An unknown bug added _VEC_memzero to the skiplist (see https://github.com/mozilla/socorro/blob/master/scripts/config/processorconfig.py.dist). Comments for crashes not appearing on startup let me thinks it's a dupe of bug 635464: "switched graphics card. (amd 6650->intel i5 built-in)" "switched graphics card" Startup crashes might be caused by an automatic GPU switch.
Crash Signature: SubRectInfo const*)] [@ _VEC_memzero] → SubRectInfo const*)] [@ _VEC_memzero] [@ _VEC_memzero | TextStageManager::MapTextureTransferSurface]
(In reply to Scoobidiver from comment #13) > An unknown bug added _VEC_memzero to the skiplist bug 715921 (as mentioned in the commit message for this line - https://github.com/mozilla/socorro/commit/ef1bfa72005612d560bd4feea7a1fe93bf8e8a88 ) as requested by :mats.
Depends on: 715921
Crash Signature: SubRectInfo const*)] [@ _VEC_memzero] [@ _VEC_memzero | TextStageManager::MapTextureTransferSurface] → SubRectInfo const*)] [@ _VEC_memzero] [@ _VEC_memzero | TextStageManager::MapTextureTransferSurface(D2D_RECT_U const&, unsigned char**, unsigned int*)]
93% of crashes happen within one minute. It's #3 top crasher in 11.0b1. I checked manually some crash reports and they all happens with Intel GPUs: 2e22 (GMA X4500HD), 2a42 (GMA 4500MHD), 0042 (desktop HD Graphics), 0046 (mobile HD Graphics). When it's a single GPU, driver versions are up to 8.15.10.2141. The latest driver version is 8.15.10.2555 or 2622 depending on the device ID. When it's a dual GPU with an ATI GPU, driver versions are up to 8.752.4.0. If there was a downloaded blocklist, it should be implemented before the final release. For new installations, blocklist.xml should be already up-to-date in the install folder which is not the case. Indeed, it's updated in the profile folder after the first startup. More reports at: https://crash-stats.mozilla.com/report/list?signature=_VEC_memzero%20|%20TextStageManager%3A%3AMapTextureTransferSurface%28D2D_RECT_U%20const%26%2C%20unsigned%20char**%2C%20unsigned%20int*%29
Blocks: 605779
Summary: Firefox 11.0a1 Crash in TextStageManager::AddStagesForSubrect → Firefox 11.0a1 Crash in TextStageManager::AddStagesForSubrect with Intel GMA X4500HD, 4500MHD, and HD Graphics up to 8.15.10.2141
Whiteboard: sg:critical? → sg:critical?[startupcrash]
Version: Trunk → 11 Branch
These Intel GPUs are already D2D-blocklisted in hard code for these driver versions. bp-7c333271-3b23-4730-8ebd-0040e2120207: User Comments update to Firfox 11 beta 1 and it is crashing on my Windows 7 environment App Notes AdapterVendorID: 0x8086, AdapterDeviceID: 0x0046, AdapterSubsysID: 043f1028, AdapterDriverVersion: 8.15.10.2141 D2D? D2D+ DWrite? DWrite+ D3D10 Layers? D3D10 Layers+ According to that, I don't think users set gfx.direct2d.force-enabled to true. I think it's a regression from bug 704710 that landed in 11.0a1/20111215 and that has canceled hard-coded blocklists.
Blocks: 704710
Correlations confirm that the D2D-blocklist for Intel GMA 4500 and HD is gone when Device and vendor IDs were converted to string: 100% (633/633) vs. 8% (2348/30240) igd10umd32.dll 35% (219/633) vs. 1% (242/30240) 8.15.10.2125 1% (5/633) vs. 0% (5/30240) 8.15.10.2131 64% (406/633) vs. 1% (410/30240) 8.15.10.2141
Crash Signature: SubRectInfo const*)] [@ _VEC_memzero] [@ _VEC_memzero | TextStageManager::MapTextureTransferSurface(D2D_RECT_U const&, unsigned char**, unsigned int*)] → SubRectInfo const*)] [@ _VEC_memzero] [@ _VEC_memzero | TextStageManager::MapTextureTransferSurface(D2D_RECT_U const&, unsigned char**, unsigned int*)] [@ msvcrt.dll@0x9cc6]
I was unable to repro this issue when I spoofed a few of the given driver versions and adapter/vendor combos. That is, they were blocked when I spoofed them. My best guess is that there's some kind of string case issue when comparing, for example "0x2A22" to "0x2a22", so I wrote this patch to fix that.
Attachment #599055 - Flags: review?(joe)
Try push: https://tbpl.mozilla.org/?tree=Try&rev=37b1a9ec0eb0 I'd appreciate it if someone having this issue or someone with access to a broken configuration could test this try build, once it's done.
Assignee: bas.schouten → bugzilla
Status: NEW → ASSIGNED
Comment on attachment 599055 [details] [diff] [review] Patch v1.0, fix driver blocklist string comparison casing. Review of attachment 599055 [details] [diff] [review]: ----------------------------------------------------------------- Yes, this seems very likely to be the problem.
Attachment #599055 - Flags: review?(joe) → review+
By the way, many many thanks to Doug, who looked at this while the rest of the graphics team was up to its eyes in Fennec work.
Comment on attachment 599055 [details] [diff] [review] Patch v1.0, fix driver blocklist string comparison casing. [Triage Comment]
Attachment #599055 - Flags: approval-mozilla-beta+
Attachment #599055 - Flags: approval-mozilla-aurora+
Approving for landing as per Alex K.
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla13
Shouldn't this be kept open until we confirm that this patch actually eliminates the underlying crash? I don't see any confirmation that someone reproduced the crash and then verified that the patch fixed the problem. We seem to be guessing that this will solve the problem.
I agree with jtd, I was really shooting from the hip here. If this doesn't fix it, I'll probably need more data from someone who can repro the bug.
Target Milestone: mozilla13 → ---
(In reply to Jonathan Kew (:jfkthame) from comment #2) > As a precaution, I'm marking this as security-sensitive, given that it looks > like there's code writing to memory it shouldn't be touching.... if someone > figures out the specific scenario where this happens, they might be able to > control what gets corrupted, and hence arrive at some kind of exploit. Is it still security sensitive because bug 590373 was not?
Whiteboard: sg:critical?[startupcrash] → [sg:critical][startupcrash]
(In reply to Scoobidiver from comment #29) > The patch fixes nothing I wouldn't be completely sure of that - there are no crashes with those signatures in builds from yesterday.
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #30) > I wouldn't be completely sure of that - there are no crashes with those > signatures in builds from yesterday. They are supposed to be fixed in 13.0a1 and 12.0a2/20120222. In addition, the Table and Graph tabs don't take into account crashes with the last build that is why you were mislead.
(In reply to Scoobidiver from comment #31) > (In reply to Robert Kaiser (:kairo@mozilla.com) from comment #30) > > I wouldn't be completely sure of that - there are no crashes with those > > signatures in builds from yesterday. > They are supposed to be fixed in 13.0a1 and 12.0a2/20120222. In addition, > the Table and Graph tabs don't take into account crashes with the last build > that is why you were mislead. They take account anything up to the last full UTC day, at least once the reports have been generated for that day, which they have been for the 23rd, AFAIK, and we don't have crashes for those signatures in builds from the 23rd yet. Also, the Reports tab lists everything we have processed so far, and it also doesn't list any crashes for builds newer than the 22nd - on Aurora, that is. On trunk, I see 2 of the msvcrt ones, with "AdapterDeviceID: 0x0046", which wouldn't be affected by this patch anyhow.
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #32) > On trunk, I see 2 of the msvcrt ones, with "AdapterDeviceID: 0x0046", which > wouldn't be affected by this patch anyhow. 0x0046 is the device ID for HD Graphics. Anyway, the D2D-blocklist introduced in bug 590373 is still not effective.
(In reply to Scoobidiver from comment #33) > (In reply to Robert Kaiser (:kairo@mozilla.com) from comment #32) > > On trunk, I see 2 of the msvcrt ones, with "AdapterDeviceID: 0x0046", which > > wouldn't be affected by this patch anyhow. > 0x0046 is the device ID for HD Graphics. Anyway, the D2D-blocklist > introduced in bug 590373 is still not effective. I don't understand what the patch there did block exactly, and how this is related to those msvcrt.dll@0x9cc6 signatures we seem to still be seeing. I don't think either of us two can't really be sure what is effective there and what isn't, we need the graphics team to assess that.
I can't repro this problem with any spoofed configuration, and there's so many things that could be causing this that it's really difficult to pinpoint it without a computer that has the problem. I'm going to keep looking but any solutions I come up with are guesses at best for now. I think a good next step at this point is to open up this bug report to the public (instead of marking it as secure) so that maybe someone who can actually repro this can step forward and help. It seems like the security concerns were warranted until we found out it was due to a blocklisting problem rather than a (realistic) potential attack vector.
Let's have a look at crashes with build id >= 20120222 in the following list, and see what we can say about them: https://crash-stats.mozilla.com/report/list?signature=_VEC_memzero%20|%20TextStageManager%3A%3AMapTextureTransferSurface%28D2D_RECT_U%20const%26%2C%20unsigned%20char**%2C%20unsigned%20int*%29 *** https://crash-stats.mozilla.com/report/index/d65ce81d-6ca8-478e-be74-85d7c2120225 (+ many others: this case is the vast majority) AdapterVendorID: 0x8086, AdapterDeviceID: 0x0046, AdapterSubsysID: 10391462, AdapterDriverVersion: 8.15.10.2125 This driver is blacklisted. (Occurs with a few other versions too) https://crash-stats.mozilla.com/report/index/5b00c309-dc24-4ad5-b964-b9cf62120226 AdapterVendorID: 0x8086, AdapterDeviceID: 0x0046, AdapterSubsysID: 1600103c, AdapterDriverVersion: 8.741.1.1000 anomaly: this is a AMD driver version on Intel vendor ID way to check: 8.741 not in Intel range https://crash-stats.mozilla.com/report/index/85483fe2-a6d3-490f-8370-6c3492120227 Cisco VPN AdapterVendorID: 0x1002, AdapterDeviceID: 0x9480, AdapterSubsysID: 20561b0a, AdapterDriverVersion: 8.750.0.0 Has dual GPUs. GPU #2: AdapterVendorID2: 0x8086, AdapterDeviceID2: 0x0046, AdapterSubsysID2: 20561b0a, AdapterDriverVersion2: 8.750.0.0 Here we have two 'anomalies': the GPU#2 is again a Intel with AMD driver version, and "Cisco VPN" reminds of bug 435756 https://crash-stats.mozilla.com/report/index/a3fee3b3-30fe-4edf-80a0-d35e62120227 AdapterVendorID: 0x8086, AdapterDeviceID: 0x2e22, AdapterSubsysID: 02511025, AdapterDriverVersion: 8.15.10.2141 This driver is blacklisted. https://crash-stats.mozilla.com/report/index/1ef8a7a4-e846-4637-9bd9-5657c2120226 Cisco VPN AdapterVendorID: 0x8086, AdapterDeviceID: 0x0046, AdapterSubsysID: 1425103c, AdapterDriverVersion: 8.15.10.2202 This one is "fine"... it's exactly the minimum version required for this device and OS. Maybe we could up the version requirement by a notch? Or maybe we should really worry about this Cisco VPN thing. *** Questions: 1. is it plausible that all these users force-enabled or should we assume there is a bug in our blacklisting logic? Now I wish we were reporting on force-enabled prefs in crash reports :-/ 2. in dual GPU setups, is it normal to have a AMD driver version on a Intel device? Other than that, note that *all* of these crashes that are not on blacklisted drivers, are on driver versions that are only slightly above the minimum required version, so they could be blacklisted by a tiny increase of the version requirement.
Group: core-security
Keywords: qawanted
Whiteboard: [sg:critical][startupcrash] → [startupcrash]
Cheng, it would really help if we could ask users who have given us their email address in crash stacks the contents of their about:support. Since this seems to be a startup crash, it might be necessary for those users to start up in safe mode, but their about:support will still give us the necessary information.
Benoit Jacob has agreed to 1) prepare a potential backout patch for Doug's string changes, just in case, and 2) create a patch to add the force-enabled state to our crash report annotations.
Attached patch report force-enabled features (deleted) — Splinter Review
That is what Jeff originally asked for in bug 627464. This patch replaces the '?' by '!' for force-enabled features.
Attachment #600984 - Flags: review?(jmuizelaar)
(In reply to Joe Drew (:JOEDREW!) from comment #39) > Benoit Jacob has agreed to 1) prepare a potential backout patch for Doug's > string changes, just in case, and 2) create a patch to add the force-enabled > state to our crash report annotations. 2) done; for 1), it turns out that Doug's patch (bug 704710) does not unapply cleanly at all. Following IRC discussion, we're going the alternate route of basically reimplementing the relevant subset of the old blacklisting code, alongside the new. By 'relevant part' I mean the part that deals with Intel 4500/HD devices on Windows. Doug said on IRC he would make the patch today.
Comment on attachment 600984 [details] [diff] [review] report force-enabled features Review of attachment 600984 [details] [diff] [review]: ----------------------------------------------------------------- ::: gfx/src/gfxCrashReporterUtils.cpp @@ -110,1 @@ > Why this change?
Attachment #600984 - Flags: review?(jmuizelaar) → review+
Because I didn't like that a lot. Nested feature initialization resulted in seemingly random line breaks. Adapting it to the present change would have made it even more awkward.
Whiteboard: [startupcrash] → [startupcrash] Keep Open!!
Comment on attachment 600984 [details] [diff] [review] report force-enabled features [Approval Request Comment] This patch is very low-risk and can help understanding crashes.
Attachment #600984 - Flags: approval-mozilla-aurora?
This is a hack to hopefully fix this issue for the reported bad set of cards. It will not solve the problem in the correct way but will prevent us from having to back out the whole thing. It will also not deal with the issue if more cards than this specific set are not being blocked when they should. This is bad to land but probably necessary.
Attachment #601094 - Flags: review?(joe)
Comment on attachment 601094 [details] [diff] [review] Patch v1.0, special case the entire Intel GMAX4500HD series. Review of attachment 601094 [details] [diff] [review]: ----------------------------------------------------------------- ::: widget/windows/GfxInfo.cpp @@ +916,5 @@ > > + /** Bug 711656: HACK HACK HACK > + * Special case this, even though it's already in the blocklist. > + * Note that we're getting the driver version twice because the first one > + * happens within the 'if (!aDriverInfo->Length())' check, but we don't want to This needs to be inside the if (!aDriverInfo->Length()) check, though, because we don't want to evaluate the system blocklist on a blocklist download ping. @@ +957,5 @@ > + driverVersion < V(8,15,10,2202) || > + mWindowsVersion == gfxWindowsPlatform::kWindows7 && > + driverVersion < V(8,15,10,2202)) && > + gVendorID == 0x8086 /* Intel */) { > + for (PRUint32 i = 0; i < sizeof(IntelGMAX4500HD)/sizeof(IntelGMAX4500HD[0]); i++) { ArrayLength(IntelGMAX4500HD)
Attachment #601094 - Flags: review?(joe) → review+
Comment on attachment 600984 [details] [diff] [review] report force-enabled features [Triage Comment] Low risk - approving for Aurora 12 in support of bug investigation.
Attachment #600984 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment on attachment 601094 [details] [diff] [review] Patch v1.0, special case the entire Intel GMAX4500HD series. Review of attachment 601094 [details] [diff] [review]: ----------------------------------------------------------------- ::: widget/windows/GfxInfo.cpp @@ +948,5 @@ > + 0x2a02, /* IntelGL960_1 */ > + 0x2a03, /* IntelGL960_2 */ > + 0x2a12, /* IntelGM965_1 */ > + 0x2a13 /* IntelGM965_2 */ > + }; This is the wrong list of device id's! This is IntelGMAX3000, not IntelGMAX4500HD.
Attachment #601094 - Flags: review-
Good catch!
Attachment #601094 - Attachment is obsolete: true
Attachment #601155 - Flags: review?(bjacob)
Comment on attachment 601155 [details] [diff] [review] Patch v1.1, special case the entire Intel GMAX4500HD series. Review of attachment 601155 [details] [diff] [review]: ----------------------------------------------------------------- ::: widget/windows/GfxInfo.cpp @@ +957,5 @@ > + > + if ((mWindowsVersion == gfxWindowsPlatform::kWindowsXP && > + driverVersion < V(6,14,10,5284) || > + mWindowsVersion == gfxWindowsPlatform::kWindowsVista && > + driverVersion < V(8,15,10,2202) || This is correct C++ but will have compilers like GCC scream warnings: "suggest parentheses around && inside of ||"
Attachment #601155 - Flags: review?(bjacob) → review+
Crash Signature: SubRectInfo const*)] [@ _VEC_memzero] [@ _VEC_memzero | TextStageManager::MapTextureTransferSurface(D2D_RECT_U const&, unsigned char**, unsigned int*)] [@ msvcrt.dll@0x9cc6] → SubRectInfo const*)] [@ _VEC_memzero | TextStageManager::MapTextureTransferSurface(D2D_RECT_U const&, unsigned char**, unsigned int*)] [@ msvcrt.dll@0x9cc6]
I received the about:support from one user who could reproduce at some point, but at the moment she couldn't reproduce anymore (in firefox 11) and her about:support showed 8.15.10.2125 correctly blacklisted. Maybe we have some uninitialized value somewhere?
(In reply to Benoit Jacob [:bjacob] from comment #60) > Good news! On Firefox 13, there is no crash with build id > 20120301 ! First, there are crashes after the last patch landed (see comment 57). Then, if you look at the history, you can have up to 5 builds without crashes (startup crashes depend on new users).
Could it be a scheduling issue because the blocklisting check comes too late?
No, the blocklist check comes first and blocks (occurs in the main thread) the actual initialization of the graphics feature.
Can we add a downloaded blocklist? It'll work if Firefox 10 users use Firefox at least during one day before upgrading to Firefox 11. For new installations, it won't work.
Crash Signature: SubRectInfo const*)] [@ _VEC_memzero | TextStageManager::MapTextureTransferSurface(D2D_RECT_U const&, unsigned char**, unsigned int*)] [@ msvcrt.dll@0x9cc6] → SubRectInfo const*)] [@ _VEC_memzero | TextStageManager::MapTextureTransferSurface(D2D_RECT_U const&, unsigned char**, unsigned int*)] [@ msvcrt.dll@0x9cc6] [@ msvcrt.dll@0x15a45]
Hardware: x86 → All
The problem here is that our whole blacklisting mechanism has a bug. Adding a downloaded blacklist entry will almost certainly not solve it.
Bug 715401 is a very similar bug, with our blacklisting not working, and the people there have the Yahoo Toolbar. Similarly, it is worth checking if the the present bug also has such a correlation with a particular add-on or module.
Many crash reports only list {972ce4c6-7e08-4474-a285-3208198ce6fd} under Extensions, and that means that they had no extension installed. So this crash is not related to any extension. I haven't checked Modules.
(In reply to Benoit Jacob [:bjacob] from comment #66) > Similarly, it is worth checking if the the present bug also has such a > correlation with a particular add-on or module. There are no loaded extensions in startup crash reports. Nevertheless, there are correlations per module (sometimes related to extensions): * 11.0: msvcrt.dll@0x9cc6|EXCEPTION_ACCESS_VIOLATION_WRITE (193 crashes) 32% (61/193) vs. 10% (3619/34679) GrooveShellExtensions.dll (MS Office Groove) 23% (45/193) vs. 8% (2898/34679) aswProperty.dll (Avast! antivirus) 12% (23/193) vs. 1% (224/34679) DpOFeedb.dll (DigitalPersona Pro) 13% (26/193) vs. 3% (1055/34679) BtMmHook.dll (Bluetooth Software) 8% (16/193) vs. 0% (31/34679) BGLsp.dll (BullGuard) 8% (16/193) vs. 1% (264/34679) BabyFox.dll (Babylon Toolbar) _VEC_memzero | TextStageManager::MapTextureTransferSurface(D2D_RECT_U const&, unsigned char**, unsigned int*)|EXCEPTION_ACCESS_VIOLATION_WRITE (68 crashes) 44% (30/68) vs. 10% (3619/34679) GrooveShellExtensions.dll (MS Office Groove) 28% (19/68) vs. 8% (2898/34679) aswProperty.dll (Avast! antivirus) 16% (11/68) vs. 2% (761/34679) DropboxExt.14.dll (DropBox)
Here, https://tbpl.mozilla.org/?tree=Try&rev=b9c53783cfa9 is a try push of mozilla-beta with the following changesets cleanly backed out: undo-39ce0a82bb75 undo-fe937bac6e75 undo-dd2e6ce53715 undo-838e8168ea50 undo-49b8bec6d175 undo-8f7893e1c20c undo-b9c1b8afb35a undo-8c075fee9be4 # this is the patch from bug 704710 that we want to back out the other changesets here are later patches affecting the same files, that I had to back out as well in order to have a clean backout of bug 704710. I could have kept them and done a non-clean backout by hand, but our goal right now is to minimize risk. Note that it would be very difficult to do the same on mozilla-central, because Bug 679226 / cset e57e271bf328 completely restructured the widget/ directory: moved files around, and even changed indentation, meaning that we can't back stuff out of mozilla-central without a lot of manual work, meaning a high risk of mistake.
Here's my backout patch queue folded into one patch. Applies against current mozilla-beta. That's what's on try (previous comment).
Previous Try failed because I forgot to re-add some old xpcshell tests. New Try: https://tbpl.mozilla.org/?tree=Try&rev=fe369816c08e
Backed out the patch here from mozilla-beta: http://hg.mozilla.org/releases/mozilla-beta/rev/cf3911792b59 As well as 7 other patches, including Bug 704710. http://hg.mozilla.org/releases/mozilla-beta/rev/ff5f2055aba4 (the other 6 csets are in between these two ones)
Benoit, thanks for this work! I hope it helps and we're not backing out (too many) newer blocks that are needed as well, I wouldn't like if this set of high-volume crashers would just be replaced by another set of high-volume crashers. I guess you guys will work on a better solution that hopefully will even be available in time for 12?
Yes, of course we have to work on the real solution. The backout just buys us 6 weeks.
This bug was first reported the day after the patch in bug 699482 was landed, and that wasn't backed out. Maybe it was that which caused it.
The patch in bug 699482 inserted FEATURE_STATUS_UNKNOWN = 2; in the IDL. Previously we had: FEATURE_BLOCKED_DRIVER_VERSION = 2; We are currently (since bug 625160 landed, and until bug 653102 is fixed) caching these status under the gfx.blacklist.* preferences, which are plain integers like |2|. What could happen is that people had the value |2| in the preference, and suddenly the meaning of that cached value changed.
So, the good news is that if the theory in comment 78 is true, this can be fixed by just clearing the gfx.blacklist.* prefs.
This implements the minimal fix assuming the theory in comment 78 is true. Patch for mozilla-central.
Same, but for mozilla-beta (just different directory structure)
Argh!! The way the downloaded blacklisting is implemented, relies fundamentally on these prefs, not just as cache but as the only intermediate storage between the evaluation of the downloaded blacklist and the actual blacklisting decision.
Attachment #604659 - Attachment description: minimal mozilla-central patch to NOT use cached blacklisting decisions → minimal mozilla-central patch to NOT use cached blacklisting decisions (ARGH! breaks downloaded blacklist!)
Attachment #604660 - Attachment description: minimal mozilla-beta patch to NOT use cached blacklisting decisions → minimal mozilla-beta patch to NOT use cached blacklisting decisions (ARGH! breaks downloaded blacklist!)
Also note that if the theory in comment 78 is correct, backing out bug 699482 won't help, as now the people with prefs from after this patch will get wrong blacklisting decisions.
(In reply to Benoit Jacob [:bjacob] (On vacation until March 18) from comment #83) > Also note that if the theory in comment 78 is correct, backing out bug > 699482 won't help, as now the people with prefs from after this patch will > get wrong blacklisting decisions. It's true for people with version 11 and above. This patch didn't land in the Release channel. So the backout of bug 699482 in Beta could work for people who haven't switched between channels with the same profile, if your theory is right.
Got it! Here's what happened. As comment 78 says, bug 699482 changed the value of FEATURE_BLOCKED_DRIVER_VERSION so that users who had a cached FEATURE_BLOCKED_DRIVER_VERSION now get FEATURE_STATUS_UNKNOWN instead. Normally that shouldn't cause real trouble because gfx features should block on anything else than FEATURE_NO_INFO (which means: no bad news. No news is good news). But D2D initialization has a bug here, that is triggered by this change: http://hg.mozilla.org/mozilla-central/file/b98d1ff2d051/gfx/thebes/gfxWindowsPlatform.cpp#l412 412 #ifdef CAIRO_HAS_D2D_SURFACE 413 bool d2dDisabled = false; 414 bool d2dForceEnabled = false; 415 bool d2dBlocked = false; 416 417 nsCOMPtr<nsIGfxInfo> gfxInfo = do_GetService("@mozilla.org/gfx/info;1"); 418 if (gfxInfo) { 419 PRInt32 status; 420 if (NS_SUCCEEDED(gfxInfo->GetFeatureStatus(nsIGfxInfo::FEATURE_DIRECT2D, &status))) { 421 if (status != nsIGfxInfo::FEATURE_NO_INFO) { 422 d2dDisabled = true; 423 if (status == nsIGfxInfo::FEATURE_BLOCKED_DRIVER_VERSION || 424 status == nsIGfxInfo::FEATURE_BLOCKED_DEVICE) 425 { 426 d2dBlocked = true; 427 } 428 } 429 } 430 } 431 432 d2dDisabled = Preferences::GetBool("gfx.direct2d.disabled", false); 433 d2dForceEnabled = Preferences::GetBool("gfx.direct2d.force-enabled", false); When status == nsIGfxInfo::FEATURE_STATUS_UNKNOWN, at line 422 we set d2dDisabled=true, but we don't pass the if() condition at line 423-424 so we don't set d2dBlocked=true. Below at line 432 we overwrite d2dDisabled anyway, so there is a plain bug here: line 422 never has any effect.
Attached patch fix d2d blocklisting check (deleted) — Splinter Review
Not asking for review, this is a trivial fix and I want to land it now to ask for beta approval.
Marking as depending on Bug 653102. I hope that this time we'll really prioritize fixing bug 653102.
Depends on: 653102
Comment on attachment 604666 [details] [diff] [review] fix d2d blocklisting check [Approval Request Comment] Regression caused by (bug #): 699482 User impact if declined: crashiness. specifically, Direct2D/DirectWrite blacklisting is broken in Firefox 11+ for users upgrading from Firefox 10- Testing completed (on m-c, etc.): Just landed on m-c Risk to taking this patch (and alternatives if risky): No risk. Trivial bug fix. String changes made by this patch: none
Attachment #604666 - Flags: approval-mozilla-beta?
Attachment #604666 - Flags: approval-mozilla-aurora?
Also, if my theory is correct, here are the STR for this bug: 1. use a Win7 machine where Direct2D should be blacklisted (use spoofing to simulate an old driver version if needed) 2. go to about:config, set gfx.blacklist.direct2d=2, restart browser, check that the pref is still there 3. go to about:support, you should see Direct2D not blacklisted anymore
(In reply to Benoit Jacob [:bjacob] (On vacation until March 18) from comment #86) > Not asking for review, this is a trivial fix and I want to land it now to > ask for beta approval. You should ask for review even if you don't want to block on it - any patch that we land on beta could use a second set of eyes.
Comment on attachment 604666 [details] [diff] [review] fix d2d blocklisting check Alright, asking Bas for review per Gavin's suggestion.
Attachment #604666 - Flags: review?
Attachment #604666 - Flags: review? → review+
Attachment #604659 - Attachment is obsolete: true
Attachment #604660 - Attachment is obsolete: true
Verified bjacob's STR, though it was using a spoofed configuration. Doing it this way doesn't mean the STR may not work on real hardware, but it doesn't verify that this is actually the problem.
STR: 0) Start with an ATI Radeon HD 4200 (vendor ID 1002, device ID 9710, driver version 8.733.0.0) 1) Download/install FF10beta6: https://ftp.mozilla.org/pub/mozilla.org/firefox/releases/10.0b6/win32/en-US/Firefox%20Setup%2010.0b6.exe 2) Ensure this is a clean new profile (firefox.exe -P, create new profile) 3) Check about:support for the value of "Direct2D Enabled" ("Blocked for your graphics...") 4) Check gfx.blacklist.direct2d in about:config (it's unset at this point) 5) Force a blocklist ping in the Error Console: Components.classes["@mozilla.org/extensions/blocklist;1"].getService(Components.interfaces.nsITimerCallback).notify(null); 6) Check about:support for the value of "Direct2D Enabled" ("Blocked for your graphics...") 7) Check gfx.blacklist.direct2d in about:config (it's now set to 2) 8) Go to "About Mozilla Firefox" and apply the update to FF11beta7 9) Check about:support for the value of "Direct2D Enabled" ("true") 10) Check gfx.blacklist.direct2d in about:config (still set to 2)
STR 2 (no update, just using the same profile across 10/nightly): 0) Start with an ATI Radeon HD 4200 (vendor ID 1002, device ID 9710, driver version 8.733.0.0) 1) Download/install FF10.0.2 2) Ensure this is a clean new profile (firefox.exe -P, create new profile) 3) Check about:support for the value of "Direct2D Enabled" ("Blocked for your graphics...") 4) Check gfx.blacklist.direct2d in about:config (it's unset at this point) 5) Force a blocklist ping in the Error Console: Components.classes["@mozilla.org/extensions/blocklist;1"].getService(Components.interfaces.nsITimerCallback).notify(null); 6) Check about:support for the value of "Direct2D Enabled" ("Blocked for your graphics...") 7) Check gfx.blacklist.direct2d in about:config (it's now set to 2) 8) Install the latest available Nightly (3fdc1c14a8ce) 9) Check about:support for the value of "Direct2D Enabled" ("true") 10) Check gfx.blacklist.direct2d in about:config (still set to 2) Steps to Verify (no update, just using the same profile across 10/nightly TB build): 0) Start with an ATI Radeon HD 4200 (vendor ID 1002, device ID 9710, driver version 8.733.0.0) 1) Download/install FF10.0.2 2) Ensure this is a clean new profile (firefox.exe -P, create new profile) 3) Check about:support for the value of "Direct2D Enabled" ("Blocked for your graphics...") 4) Check gfx.blacklist.direct2d in about:config (it's unset at this point) 5) Force a blocklist ping in the Error Console: Components.classes["@mozilla.org/extensions/blocklist;1"].getService(Components.interfaces.nsITimerCallback).notify(null); 6) Check about:support for the value of "Direct2D Enabled" ("Blocked for your graphics...") 7) Check gfx.blacklist.direct2d in about:config (it's now set to 2) 8) Install the Nightly TB build with Benoit's patch (b9357da14ab8) 9) Check about:support for the value of "Direct2D Enabled" ("false") 10) Check gfx.blacklist.direct2d in about:config (still set to 2) Is #9 in the steps to verify expected to report back "false" instead of "Blocked.."?
Thanks for verifying this on real hardware!
Comment on attachment 604666 [details] [diff] [review] fix d2d blocklisting check [Triage Comment] Please land on Aurora 12, Beta 11, and Release 11.
Attachment #604666 - Flags: approval-mozilla-release+
Attachment #604666 - Flags: approval-mozilla-beta?
Attachment #604666 - Flags: approval-mozilla-beta+
Attachment #604666 - Flags: approval-mozilla-aurora?
Attachment #604666 - Flags: approval-mozilla-aurora+
Blocks: 699482
No longer blocks: 704710
Verified that "Direct2D Enabled" in about:support correctly reports "false" using the STR in https://bugzilla.mozilla.org/show_bug.cgi?id=711656#c95 for Beta 8.
Noticing a difference with AzureBackend: direct 2d across FF 10 & 11 on Windows 7 64-bit: - FF 10.0.2 makes no mention of AzureBackend: direct2d - FF 11 Beta 8 indicates a mention of AzureBackend: direct2d Why would these two be different?
Comparing about:support on Win 7 32-bit on FF 10.0.2 vs. FF 11 Beta 8, the values appear to identical. Could this be a Win 7 64-bit specific issue?
Adding results from Windows 7 64-bit VM at Alex's request... Adapter Description - VMWare SVGA 3D (Microsoft Corporation - WDDM) Driver Version - 7.14.1.1134 Firefox 10.0.2: Direct2d Enabled - Blocked by your graphics card because of unresolved driver issues gfx.blacklist.direct2d - does not exist *ping blocklist* Direct2d Enabled - Blocked by your graphics card because of unresolved driver issues gfx.blacklist.direct2d - 3 *pave over with Firefox 11.0b8* Direct2d Enabled - Blocked for your graphics driver version gfx.blacklist.direct2d - 3
Attached image FF 10.0.2 About Support Graphics (deleted) —
Also for comment 101, GPU Accelerated Windows shows 0 in beta 8, 0/1 for 10.0.2
The difference in GPU Accelerated Windows is bug 598339, and the additional adapter description for GPU #2 is bug 679110. The addition of AzureBackend=direct2d is of interest.
(In reply to Jason Smith from comment #101) > - FF 11 Beta 8 indicates a mention of AzureBackend: direct2d It was introduced in 11.0 by bug 702770 and the misalignment fixed in 12.0 by bug 707223.
I confirm that the AzureBackend line is nothing to worry about. It only says which Azure backend would be used regardless of blacklisting, it doesn't override blacklisting.
There is no blocklist on Windows 8 (not a regression). See https://crash-stats.mozilla.com/report/list?signature=msvcrt.dll%400x9890 The Intel hack should be removed from m-c.
Crash Signature: SubRectInfo const*)] [@ _VEC_memzero | TextStageManager::MapTextureTransferSurface(D2D_RECT_U const&, unsigned char**, unsigned int*)] [@ msvcrt.dll@0x9cc6] [@ msvcrt.dll@0x15a45] → SubRectInfo const*)] [@ _VEC_memzero | TextStageManager::MapTextureTransferSurface(D2D_RECT_U const&, unsigned char**, unsigned int*)] [@ msvcrt.dll@0x9cc6] [@ msvcrt.dll@0x15a45] [@ msvcrt.dll@0x9890]
Confirmed that the only crash here is still the one Windows 8 user running into [@ msvcrt.dll@0x9890 ]
There are two crashes in 11.0b8 and one crash in 12.0a2/20120311, but it's bug 635464. The Windows 8 graphics blocklist is treated in bug 706908. I let it opened in order to remove the Intel hack.
Blocks: 715401
Verified fixed in Firefox 11 (build#2) using the STR in comment#95.
There is one crash in 11.0 build 2 with Intel's driver version 8.15.10.2281 that is not in the blocklist: bp-02a4a511-afb9-445d-b62a-3cf1d2120314 Those crashes should be monitored carefully to see if the cutoff version should be shifted.
Backed out the Intel special case code: https://hg.mozilla.org/mozilla-central/rev/0dcc716da095
I close it as fixed. For remaining issues (Windows 8, driver versions not in the blocklist), file a new bug.
Status: REOPENED → RESOLVED
Closed: 13 years ago13 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla13
I don't think qawanted is needed anymore -- please re-add if this is incorrect. Alex, can you use your hardware to verify this is fixed with the latest Firefox 12 and 13 builds?
Keywords: qawanted
qa- RE comment 118. Alex, if you want QA to verify, please give someone the hardware (probably Marcia since Juan is on PTO).
Whiteboard: [startupcrash] Keep Open!! → [startupcrash][qa-] Keep Open!!
(In reply to Anthony Hughes, Mozilla QA (irc: ashughes) from comment #118) > I don't think qawanted is needed anymore -- please re-add if this is > incorrect. > > Alex, can you use your hardware to verify this is fixed with the latest > Firefox 12 and 13 builds? I'll bring it in tomorrow - keep forgetting :)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: