Closed Bug 705258 Opened 13 years ago Closed 13 years ago

Hang in gfxDWriteFontEntry::GetFontTable

Categories

(Core :: Graphics, defect)

x86
Windows 7
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla13
Tracking Status
firefox11 - ---

People

(Reporter: scoobidiver, Assigned: jtd)

References

Details

(Keywords: hang, Whiteboard: [Snappy:P1])

Crash Data

It's different from bug 705240 which is about the crash reporter. It's #1 top crasher in todays' build. The regression range is: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=3c8147998124&tochange=de483d897af4 The likely culprit is bug 627842. Signature chromehang | NtGdiGetFontData UUID dfcad005-c7b6-4099-bc23-ce8ad2111125 Date Processed 2011-11-25 03:27:15.958910 Uptime 55 Install Age 3.1 hours since version was first installed. Install Time 2011-11-25 08:22:29 Product Firefox Version 11.0a1 Build ID 20111124031031 Release Channel nightly OS Windows NT OS Version 6.1.7601 Service Pack 1 Build Architecture x86 Build Architecture Info GenuineIntel family 6 model 23 stepping 6 Crash Reason EXCEPTION_BREAKPOINT Crash Address 0x7392193d User Comments My Firefox crash with Kaspersky Internet Security App Notes AdapterVendorID: 10de, AdapterDeviceID: 0640, AdapterSubsysID: 00000000, AdapterDriverVersion: 8.17.12.8562 D2D? D2D+ DWrite? DWrite+ xpcom_runtime_abort(###!!! ABORT: HangMonitor triggered: file e:/builds/moz2_slave/m-cen-w32-ntly/build/xpcom/threads/HangMonitor.cpp, line 111) Processor Notes EMCheckCompatibility False Thread 0 Frame Module Signature [Expand] Source 0 ntdll.dll KiFastSystemCallRet 1 gdi32.dll NtGdiGetFontData 2 gdi32.dll GetFontData 3 xul.dll gfxDWriteFontEntry::GetFontTable gfx/thebes/gfxDWriteFontList.cpp:304 4 xul.dll gfxDWriteFontEntry::ReadCMAP gfx/thebes/gfxDWriteFontList.cpp:366 5 xul.dll gfxFontEntry::TestCharacterMap gfx/thebes/gfxFont.cpp:111 6 xul.dll gfxFontFamily::FindFontForChar gfx/thebes/gfxFont.cpp:696 7 xul.dll gfxPlatformFontList::FindFontForCharProc gfx/thebes/gfxPlatformFontList.cpp:450 8 xul.dll nsBaseHashtable<nsCStringHashKey,nsAutoPtr<mozilla::scache::CacheEntry>,mozilla::scache::CacheEntry*>::s_EnumStub obj-firefox/dist/include/nsBaseHashtable.h:364 9 xul.dll PL_DHashTableEnumerate obj-firefox/xpcom/build/pldhash.cpp:755 10 xul.dll nsBaseHashtable<nsCStringHashKey,nsAutoPtr<nsHttpConnectionMgr::nsConnectionEntry>,nsHttpConnectionMgr::nsConnectionEntry*>::Enumerate obj-firefox/dist/include/nsBaseHashtable.h:239 11 xul.dll gfxPlatformFontList::FindFontForChar gfx/thebes/gfxPlatformFontList.cpp:412 12 xul.dll gfxFontGroup::WhichSystemFontSupportsChar gfx/thebes/gfxFont.cpp:2931 13 xul.dll gfxFontGroup::FindFontForChar 14 xul.dll gfxFontGroup::ComputeRanges gfx/thebes/gfxFont.cpp:2759 15 nvwgf2um.dll nvwgf2um.dll@0x532cf7 16 xul.dll gfxFontGroup::InitScriptRun gfx/thebes/gfxFont.cpp:2586 17 nvwgf2um.dll nvwgf2um.dll@0x509931 18 nvwgf2um.dll nvwgf2um.dll@0x532dfb 19 nvwgf2um.dll nvwgf2um.dll@0x50bbd0 20 xul.dll gfxFontGroup::InitTextRun gfx/thebes/gfxFont.cpp:2555 More reports at: https://crash-stats.mozilla.com/report/list?signature=chromehang%20|%20NtGdiGetFontData
John, do you think this might just have to do with Kapersky? Or did we make changes here.
Given that the regression range includes some changes Jonathan made to harfbuzz for bug 701637, I wonder if that's causing corruption that then causes things to fail in weird and wonderful places elsewhere.
The regression range also includes hang detection being enabled, which seems like a much more likely culprit. Note that this is a hang at http://hg.mozilla.org/mozilla-central/annotate/de483d897af4/gfx/thebes/gfxDWriteFontList.cpp#l304 which is a system call, I think. If Kapersky is intercepting our font reads and causing them to take a long time (it seems that the hang detector timeout is 30s), I could see this hang appearing. I think we have a couple of options: 1) Somehow read font data in a way that isn't susceptible to antivirus programs slowing us down 2) Tell antivirus programs (using some magical API which probably doesn't exist) that certain files are safe 3) Get Kapersky to fix it on their end. I'm CC:ing Benjamin Smedberg to get him to comment on the hang detector, and Kev Needham to see if he has any contacts at Kapersky that he can ping and/or CC on this bug.
Blocks: hang-detector
No longer blocks: font-inflation
Summary: Crash in gfxDWriteFontEntry::GetFontTable → Hang in gfxDWriteFontEntry::GetFontTable
The hang detector is definitely the immediate cause of this new signature. Presuming that the hang detector is working correctly (which we believe it is on Windows and Linux), this means that there is in fact a UI pause of 30 seconds. Marcia/KaiRo, I wonder if we can contact the reporters to see if they have in fact been seeing large UI pauses prior to the recent nightlies?
I think there's a bug with the hang detector on OSX, running this testcase causes it to fire: http://people.mozilla.org/~jdaggett/tests/measuretext-basic.html I'm also noticing that none of the crash reports for this seem to be posting either, What's odd is that the test runs for a while and completes but I don't think any portion of that is "hanging" while running the test. The hang detector crash seems to occur 30 secs *after* the testcase completes and when the browser is doing *nothing*!! https://crash-stats.mozilla.com/report/index/bp-ebd9395a-d67e-4925-a93d-5d4db2111127
(In reply to John Daggett (:jtd) from comment #5) > I think there's a bug with the hang detector on OSX, running this testcase > causes it to fire The hang monitor on Mac OS X is temporarily disabled from 11.0a1/20111126 (see bug 705154).
(In reply to Joe Drew (:JOEDREW!) from comment #3) > 1) Somehow read font data in a way that isn't susceptible to antivirus > programs slowing us down > 2) Tell antivirus programs (using some magical API which probably doesn't > exist) that certain files are safe > 3) Get Kapersky to fix it on their end. 4) implement an improved font fallback system that tries harder to avoid synchronously crawling all fonts to get their CMAPs. In that past we discussed various options for this. One option is to ship a table mapping character code ranges to fonts that are shipped on our major platforms and are likely to contain glyphs for the given characters, and try those fonts first. This is probably not very difficult and would probably eliminate most of the problem. Instead, or additionally, we could build a persistent table dynamically based on the fonts found on a user's system.
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #7) > Instead, or additionally, we could build a persistent table dynamically > based on the fonts found on a user's system. Sounds like bug 600713.
(In reply to Karl Tomlinson (:karlt) from comment #8) > (In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #7) > > Instead, or additionally, we could build a persistent table dynamically > > based on the fonts found on a user's system. > > Sounds like bug 600713. Bug 600713 is specifically about caching cmaps. But a smarter way to do this is to try and combine better hard-coded fallback with caching of a bit-vector of scripts supported by the union of fonts in a family since there's generally a strong correlation in cmap contents within a family. And the aim should be to improve performance of system font fallback not just to avoid font table i/o. We should instrument first, then make improvements and not the reverse.
Looking at data from 11/27 I found it interesting that about 34% of these hangs were on 64bit systems with some of the more frequent configurations listed below. 18 chromehang | NtGdiGetFontData 11.0a1 Windows NT 6.1.7601 Service Pack 1 amd64 | family 6 model 23 stepping 10 | 2 9 chromehang | NtGdiGetFontData 11.0a1 Windows NT 6.1.7601 Service Pack 1 amd64 | family 6 model 42 stepping 7 | 4 9 chromehang | NtGdiGetFontData 11.0a1 Windows NT 6.1.7601 Service Pack 1 amd64 | family 6 model 30 stepping 5 | 8 8 chromehang | NtGdiGetFontData 11.0a1 Windows NT 6.1.7601 Service Pack 1 amd64 | family 6 model 15 stepping 13 | 2 7 chromehang | NtGdiGetFontData 11.0a1 Windows NT 6.1.7601 Service Pack 1 amd64 | family 6 model 37 stepping 5 | 4 7 chromehang | NtGdiGetFontData 11.0a1 Windows NT 6.1.7601 Service Pack 1 amd64 | family 6 model 23 stepping 6 | 2 6 chromehang | NtGdiGetFontData 11.0a1 Windows NT 6.1.7601 Service Pack 1 amd64 | family 6 model 37 stepping 2 | 4 6 chromehang | NtGdiGetFontData 11.0a1 Windows NT 6.1.7600 amd64 | family 6 model 15 stepping 13 | 2 4 chromehang | NtGdiGetFontData 11.0a1 Windows NT 6.1.7601 Service Pack 1 amd64 | family 6 model 23 stepping 10 | 4 4 chromehang | NtGdiGetFontData 11.0a1 Windows NT 6.1.7601 Service Pack 1 amd64 | family 16 model 6 stepping 2 | 2 4 chromehang | NtGdiGetFontData 11.0a1 Windows NT 6.1.7601 Service Pack 1 amd64 | family 16 model 4 stepping 2 | 4 3 chromehang | NtGdiGetFontData 11.0a1 Windows NT 6.1.7601 Service Pack 1 amd64 | family 6 model 42 stepping 7 | 8 3 chromehang | NtGdiGetFontData 11.0a1 Windows NT 6.1.7601 Service Pack 1 amd64 | family 6 model 30 stepping 5 | 4 3 chromehang | NtGdiGetFontData 11.0a1 Windows NT 6.1.7601 Service Pack 1 amd64 | family 6 model 26 stepping 5 | 8 3 chromehang | NtGdiGetFontData 11.0a1 Windows NT 6.1.7601 Service Pack 1 amd64 | family 16 model 4 stepping 3 | 4 3 chromehang | NtGdiGetFontData 11.0a1 Windows NT 6.1.7600 amd64 | family 6 model 30 stepping 5 | 8 3 chromehang | NtGdiGetFontData 11.0a1 Windows NT 6.1.7600 amd64 | family 6 model 23 stepping 10 | 2
here is a bit easier list to look at with stepping info removed 34 chromehang | NtGdiGetFontData 11.0a1 | family 6 model 23 22 chromehang | NtGdiGetFontData 11.0a1 | family 6 model 15 17 chromehang | NtGdiGetFontData 11.0a1 | family 6 model 37 16 chromehang | NtGdiGetFontData 11.0a1 | family 6 model 42 15 chromehang | NtGdiGetFontData 11.0a1 | family 6 model 30 4 chromehang | NtGdiGetFontData 11.0a1 | family 6 model 26 2 chromehang | NtGdiGetFontData 11.0a1 | family 15 model 72 2 chromehang | NtGdiGetFontData 11.0a1 | family 15 model 67 2 chromehang | NtGdiGetFontData 11.0a1 | family 15 model 6 2 chromehang | NtGdiGetFontData 11.0a1 | family 15 model 4 1 chromehang | NtGdiGetFontData 11.0a1 | family 15 model 47 1 chromehang | NtGdiGetFontData 11.0a1 | family 15 model 124 1 chromehang | NtGdiGetFontData 11.0a1 | family 15 model 107 1 chromehang | NtGdiGetFontData 11.0a1 | family 15 model 104 9 chromehang | NtGdiGetFontData 11.0a1 | family 16 model 4 8 chromehang | NtGdiGetFontData 11.0a1 | family 16 model 6 4 chromehang | NtGdiGetFontData 11.0a1 | family 16 model 5 1 chromehang | NtGdiGetFontData 11.0a1 | family 16 model 2 1 chromehang | NtGdiGetFontData 11.0a1 | family 16 model 10
chofmann, why do you think that is interesting? Does that differ significantly from the normal population of nightly users?
Data from the crashdumps (mostly inconclusive): Distribution of uptimes in 1278 crash dumps <50s: 169 <100s: 726 <200s: 1152 <500s: 1258 OS version XP: 373 Vista: 43 Win7: 861 DirectWrite enabled: 495 DirectWrite dll versions 509 6.1.7601.17563 76 6.1.7600.16763 74 6.1.7600.16385 53 6.1.7601.17514 34 7.0.6002.18409 29 6.1.7600.16699 9 6.1.7600.20905 2 6.1.7601.16562 2 6.1.7260.0 1 7.0.6002.18107 1 6.1.7601.21664 1 6.1.7600.20710 Top 20 addons by freq 1223 972ce4c6-7e08-4474-a285-3208198ce6fd == feedback add-on? 415 d10d0bf8-f5b5-c8b4-a8b2-2b9879e08c5d 197 testpilot@labs.mozilla.com 162 compatibility@addons.mozilla.org 122 b9db16a4-6edc-47ec-a1f4-b86292ed211d 112 D4DD63FA-01E4-46a7-B6B1-EDAB7D6AD389 108 19503e42-ca3c-4c27-b1e2-9cdb2170ee34 99 8620c15f-30dc-4dba-a131-7c5d20cf4a29 97 e4a8a97b-f2ed-450b-b12d-ee082ba24781 80 ffxtlbr@babylon.com 73 firebug@software.joehewitt.com 72 jqs@sun.com 69 73a6fe31-595d-460b-a920-fcc0f8843232 68 personas@christopher.beard 66 DDC359D1-844A-42a7-9AA1-88A850A938A8 57 elemhidehelper@adblockplus.org 56 64161300-e22b-11db-8314-0800200c9a66 54 46551EC9-40F0-4e47-8E18-8E5CF550CFB8 51 toolbar@ask.com 50 c0c9a2c7-2e5c-4447-bc53-97718bc91e1b DLL frequency 1225 nssdbm3.dll 1216 freebl3.dll 1206 nssckbi.dll 1204 nsi.dll 1196 sapi.dll 1073 browsercomps.dll 1049 dll.dll 1007 xpcom.dll 995 smime3.dll 989 nss3.dll 987 xul.dll 987 nssutil3.dll 985 ssl3.dll 983 softokn3.dll 981 mozsqlite3.dll 979 plds4.dll 978 mozalloc.dll 961 plc4.dll 945 mozutils.dll 895 nspr4.dll 847 feclient.dll 825 dbghelp.dll 795 mscms.dll 791 DWrite.dll 767 shell32.dll 738 wsock32.dll 726 winspool.drv 721 firefox.exe 712 winrnr.dll 710 setupapi.dll 710 rpcrt4.dll 709 EhStorShell.dll 707 NapiNSP.dll 705 pnrpnsp.dll 703 gdi32.dll 702 msimg32.dll 700 rasadhlp.dll 700 oleaut32.dll 697 winmm.dll 694 ole32.dll 691 shlwapi.dll 685 explorerframe.dll 683 kernel32.dll 680 usp10.dll 676 ws2_32.dll 673 advapi32.dll 669 comdlg32.dll 658 Wldap32.dll 655 imm32.dll 653 cscapi.dll 652 comctl32.dll 647 user32.dll 646 IPHLPAPI.DLL 645 lpk.dll 645 dnsapi.dll 643 WindowsCodecs.dll 641 ntshrui.dll 636 shdocvw.dll 630 msctf.dll 628 wintrust.dll 628 mswsock.dll 626 clbcatq.dll 616 propsys.dll 615 msvcrt.dll 613 ntdll.dll 613 crypt32.dll 612 nlaapi.dll 606 apphelp.dll 604 userenv.dll 604 msasn1.dll 600 powrprof.dll 594 ntmarta.dll 588 uxtheme.dll 588 FWPUCLNT.DLL 586 sechost.dll 581 winnsi.dll 580 dwmapi.dll 566 KERNELBASE.dll 563 msvcr90.dll 560 mozjs.dll 559 cfgmgr32.dll 550 dui70.dll 549 RpcRtRemote.dll 544 msvcp90.dll 544 duser.dll 538 version.dll 534 WSHTCPIP.DLL 529 slc.dll 520 wship6.dll 520 srvcli.dll 518 rsaenh.dll 505 psapi.dll 486 devobj.dll 479 cryptsp.dll 455 profapi.dll 444 CRYPTBASE.dll 439 d3d10core.dll 438 t2embed.dll 436 d2d1.dll 435 d3d10.dll 422 cscdll.dll 421 WLIDNSP.DLL 420 d3d10_1.dll 418 cscui.dll 416 d3d10_1core.dll 413 dxgi.dll 403 msvcr80.dll 390 msvcp80.dll 374 AudioSes.dll 339 normaliz.dll 339 MMDevAPI.dll 289 mdnsNSP.dll 288 wshbth.dll 284 net.dll 272 urlmon.dll 267 wininet.dll 241 iertutil.dll 239 Resource.dll 239 ATL90.dll 238 OFFICE.ODF 238 GROOVEEX.DLL 237 GrooveIntlResource.dll 232 icm32.dll 153 xpsp2res.dll 141 sspicli.dll 133 idmmkb.dll 127 nvwgf2umx.dll 115 d3d9.dll 113 d3d8thk.dll 104 tapi32.dll 101 oleacc.dll 100 msi.dll
Operation being done at the time hang reporter decided to crash the system: System font fallback (probably underestimated here): 635 Looking up fullnames via src: local(xxx) in @font-face rule: 138 Looking up localized names: 6 There are lots of stacks that appear to be truncated in the crash report data. Example: Module|DWrite.dll|6.1.7601.17563|DWrite.pdb|6254FFC9C8114DC1AE61EA8708E725661|0x7feefa10000|0x7feefb8dfff|0 0|0|gdi32.dll|NtGdiGetFontData|||0xa 0|1|gdi32.dll|GetFontData|||0x86 0|2|xul.dll|AutoSelectFont::AutoSelectFont(HDC__ *,tagLOGFONTW *)|hg:hg.mozilla.org/mozilla-central:gfx/thebes/gfxGDIFontList.h:c58bad0b4640|79|0x12 0|3|xul.dll|gfxDWriteFontEntry::GetFontTable(unsigned int,FallibleTArray<unsigned char> &)|hg:hg.mozilla.org/mozilla-central:gfx/thebes/gfxDWriteFontList.cpp:c58bad0b4640|304|0x3d 0|4|xul.dll|nsTArray_base<nsTArrayFallibleAllocator>::ShrinkCapacity(unsigned int,unsigned __int64)|hg:hg.mozilla.org/mozilla-central:obj-firefox/dist/include/nsTArray-inl.h:c58bad0b4640|216|0x18 0|5|xul.dll|mozilla::imagelib::RasterImage::~RasterImage()|hg:hg.mozilla.org/mozilla-central:image/src/RasterImage.cpp:c58bad0b4640|242|0xe 0|6|xul.dll|AutoFallibleTArray<unsigned char,16384>::AutoFallibleTArray<unsigned char,16384>()|hg:hg.mozilla.org/mozilla-central:obj-firefox/dist/include/nsTArray.h:c58bad0b4640|1372|0xd 0|7|xul.dll|gfxDWriteFontEntry::ReadCMAP()|hg:hg.mozilla.org/mozilla-central:gfx/thebes/gfxDWriteFontList.cpp:c58bad0b4640|366|0x16 [... nothing beyond here ...] ReadCMAP on stack: 1053
Depends on: 705594, 705590
Keywords: crash, regressionhang
For the reports with truncated stacks, you might try downloading the raw dump + the matching Firefox build, opening it in VS/WinDBG with the symbol server, and getting a stack from there. With the binaries available, Microsoft's debuggers are somewhat better at unwinding the stack reliably.
(In reply to Benjamin Smedberg [:bsmedberg] from comment #12) > chofmann, why do you think that is interesting? Does that differ > significantly from the normal population of nightly users? I thought it would, but its hard to tell. crashes and chromehangs reported from amd64 systems make up a higher pct. that I would have expected. here are the numbers for 11/27. This chrome hand is a bit more frequent on amd64 than the population of general crahses, but lest frequent on amd64 than general chromehangs total non-chrome hang reports 0.65 1265 x86 0.29 578 amd64 0.04 85 \N 0.00 12 arm 0.00 1 ppc total chromehang reports 0.61 1874 x86 0.37 1146 amd64 0.01 33 \N
(In reply to John Daggett (:jtd) from comment #13) > [...] 1278 crash dumps [...] > Top 20 addons by freq > 1223 972ce4c6-7e08-4474-a285-3208198ce6fd == feedback add-on? FYI, feedback is testpilot, IIRC, but this is the default theme, which should always be installed (if it gets lost, people can never switch back to it after switching to a different theme unless they uninstall the other theme - note that everything still works if it's gone, this is just a hook for the add-on manager to recognize it).
Whiteboard: [Snappy:P1]
(In reply to Ted Mielczarek [:ted, :luser] from comment #15) > For the reports with truncated stacks, you might try downloading the raw > dump + the matching Firefox build, opening it in VS/WinDBG with the symbol > server, and getting a stack from there. With the binaries available, > Microsoft's debuggers are somewhat better at unwinding the stack reliably. Thanks Ted, but I was trying to assess aggregate stats for the whole set of crash reports so unless this is an easily scriptable process it doesn't help. We've instrumented the codepath in question with telemetry probes so that should tell us more.
Assignee: nobody → jdaggett
Cheng usually is the person who reaches out to users, but I could do it as well. Might be best to wait until we have an @mozilla email address to use. (In reply to Benjamin Smedberg [:bsmedberg] from comment #4) > The hang detector is definitely the immediate cause of this new signature. > Presuming that the hang detector is working correctly (which we believe it > is on Windows and Linux), this means that there is in fact a UI pause of 30 > seconds. Marcia/KaiRo, I wonder if we can contact the reporters to see if > they have in fact been seeing large UI pauses prior to the recent nightlies?
(In reply to Marcia Knous [:marcia] from comment #19) > Cheng usually is the person who reaches out to users, but I could do it as > well. Might be best to wait until we have an @mozilla email address to use. > > (In reply to Benjamin Smedberg [:bsmedberg] from comment #4) > > The hang detector is definitely the immediate cause of this new signature. > > Presuming that the hang detector is working correctly (which we believe it > > is on Windows and Linux), this means that there is in fact a UI pause of 30 > > seconds. Marcia/KaiRo, I wonder if we can contact the reporters to see if > > they have in fact been seeing large UI pauses prior to the recent nightlies? There is no point. This is a known problem that we've been able to reproduce. This isn't in any way biased by the hang detector
On Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:11.0a1) Gecko/20111216 Firefox/11.0a1 ID:20111216031140 there was a hang with signature [@ chromehang | NtGdiGetFontData ] Is bp-43ead796-f0ae-4df5-bd60-ae2b02111221 this bug or a different one?
Yes, but that should only occur if the hangmonitor thread is running: http://mxr.mozilla.org/mozilla-central/source/modules/libpref/src/init/all.js#1479 Must be someone experimenting with the hangmonitor.
We're no longer seeing this in the top crash list for FF11. Could this have been bug 722394?
(In reply to Alex Keybl [:akeybl] from comment #23) > We're no longer seeing this in the top crash list for FF11. Could this have > been bug 722394? It's not a crash, but a hang and the hang monitor has been disabled.
(In reply to Scoobidiver from comment #24) > (In reply to Alex Keybl [:akeybl] from comment #23) > > We're no longer seeing this in the top crash list for FF11. Could this have > > been bug 722394? > It's not a crash, but a hang and the hang monitor has been disabled. If this was caused by the hang monitor, then there's no reason to track for a specific release - this will be covered as part of the Snappy initiative.
Blocks: 734308
No longer blocks: 734308
Depends on: 734308
This has been fixed by bug 705594, we no longer enumerate cmaps on font fallback when using DirectWrite. Instead we use a custom DirectWrite text renderer to extract the appropriate fallback font.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla13
You need to log in before you can comment on or make changes to this bug.