Closed Bug 705287 Opened 13 years ago Closed 6 years ago

Hang in gfxDWriteFontList::DelayedInitFontList

Categories

(Core :: Graphics, defect)

x86
Windows 7
defect
Not set
critical

Tracking

()

RESOLVED WONTFIX

People

(Reporter: scoobidiver, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: hang, Whiteboard: [Snappy:P1])

Crash Data

It's a new crash signature and is #73 top crasher in today's build. The regression range is: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=3c8147998124&tochange=de483d897af4 The likely culprit is bug 627842. Signature chromehang | InitializeFontMetrics(OpenTypeFace const&, DWRITE_FONT_METRICS*, PhysicalFontFace const*) UUID d666d9b5-6687-435c-b471-ec5662111125 Date Processed 2011-11-25 04:55:53.367164 Uptime 51 Install Age 20.6 hours since version was first installed. Install Time 2011-11-24 16:16:16 Product Firefox Version 11.0a1 Build ID 20111124031031 Release Channel nightly OS Windows NT OS Version 6.1.7601 Service Pack 1 Build Architecture x86 Build Architecture Info GenuineIntel family 6 model 23 stepping 6 Crash Reason EXCEPTION_BREAKPOINT Crash Address 0x6e3e193d User Comments App Notes AdapterVendorID: 10de, AdapterDeviceID: 0640, AdapterSubsysID: 00000000, AdapterDriverVersion: 8.17.12.6099 D2D? D2D+ DWrite? DWrite+ xpcom_runtime_abort(###!!! ABORT: HangMonitor triggered: file e:/builds/moz2_slave/m-cen-w32-ntly/build/xpcom/threads/HangMonitor.cpp, line 111) Processor Notes EMCheckCompatibility True Thread 0 Frame Module Signature [Expand] Source 0 DWrite.dll InitializeFontMetrics 1 DWrite.dll FontCollectionBuilder::Builder::AddFontInternal 2 DWrite.dll FontCollectionBuilder::Builder::AddFont 3 DWrite.dll FontCollectionBuilder::Builder::AddFile 4 DWrite.dll FontCollectionBuilder::FontCollectionBuilder 5 DWrite.dll FontCollectionElement::AddToCacheImpl 6 DWrite.dll CacheWriter::AddElement 7 DWrite.dll ClientSideCacheContext::ClientLookup 8 DWrite.dll ClientSideCacheContext::InitializeElementImpl 9 DWrite.dll FontCollectionElement::FontCollectionElement 10 DWrite.dll DWriteFontCollection::DWriteFontCollection 11 DWrite.dll ComObject<DWriteFontCollection>::ComObject<DWriteFontCollection><unsigned __int64*,unsigned int,unsigned __int64,IntrusivePtr<ClientSideCacheContext>,DWriteFactory*,FontCollection> 12 DWrite.dll InnerComObject<DWriteFactory,DWriteFontCollection>::InnerComObject<DWriteFactory,DWriteFontCollection><unsigned __int64*,unsigned int,unsigned __int64,IntrusivePtr<ClientSideCacheContext>,DWriteFactory*,FontCollection> 13 DWrite.dll DWriteFactory::GetSystemFontCollectionInternal 14 DWrite.dll DWriteFactory::GetSystemFontCollection 15 xul.dll gfxDWriteFontList::DelayedInitFontList gfx/thebes/gfxDWriteFontList.cpp:764 16 xul.dll gfxDWriteFontList::ResolveFontName gfx/thebes/gfxDWriteFontList.cpp:1145 17 xul.dll gfxWindowsPlatform::ResolveFontName gfx/thebes/gfxWindowsPlatform.cpp:591 18 xul.dll gfxFontGroup::ForEachFontInternal gfx/thebes/gfxFont.cpp:2373 19 @0xe 20 @0x65006c More reports at: https://crash-stats.mozilla.com/report/list?signature=chromehang%20|%20InitializeFontMetrics%28OpenTypeFace%20const%26%2C%20DWRITE_FONT_METRICS*%2C%20PhysicalFontFace%20const*%29
Crash Signature: [@ chromehang | InitializeFontMetrics(OpenTypeFace const&, DWRITE_FONT_METRICS*, PhysicalFontFace const*) ] → [@ chromehang | InitializeFontMetrics(OpenTypeFace const&, DWRITE_FONT_METRICS* PhysicalFontFace const*) ] [@ chromehang | `anonymous namespace''::TrueTypeFontMetricsBuilder::GetBlackBox(unsigned short) ] [@ chromehang | FontFileAnalyzer::IsPfm(unsigned…
This is probably exactly the same issue as bug 705258.
Blocks: hang-detector
No longer blocks: font-inflation
Summary: Crash in gfxDWriteFontList::DelayedInitFontList → Hang in gfxDWriteFontList::DelayedInitFontList
(In reply to Scoobidiver from comment #0) > It's a new crash signature and is #73 top crasher in today's build. > > The regression range is: > http://hg.mozilla.org/mozilla-central/ > pushloghtml?fromchange=3c8147998124&tochange=de483d897af4 > The likely culprit is bug 627842. To be clear: "chromehang | " is a new type of crash report, introduced by bug 429592, which landed in that range.
Some of the stacks in the dumps are incomplete but most of the stacks take place within the call to DWriteFactory::GetSystemFontCollection, the DirectWrite call that returns the object needed to look up names. DirectWrite dll version: Module|DWrite.dll|6.1.7600.16385 Module|DWrite.dll|6.1.7600.16385 Module|DWrite.dll|6.1.7600.16385 Module|DWrite.dll|6.1.7600.16385 Module|DWrite.dll|6.1.7600.16385 Module|DWrite.dll|6.1.7601.17514 Module|DWrite.dll|6.1.7601.17514 Module|DWrite.dll|6.1.7601.17563 Module|DWrite.dll|6.1.7601.17563 Module|DWrite.dll|6.1.7601.17563 Module|DWrite.dll|6.1.7601.17563 Module|DWrite.dll|6.1.7601.17563 Module|DWrite.dll|7.0.6002.18409 The delay problem in GetSystemFontCollection was fixed by Microsoft in KB2505438, which included DWrite.dll versions 6.1.7600.16763 and higher (non-SP1) and 6.1.7601.17563 and higher (SP1). Windows 8 should't have this problem at all. This data seems to indicate that the problem is not completely fixed, 7/14 cases have supposedly fixed versions of the DWrite.dll.
Keywords: crash, regressionhang
Scoobidiver, if we have a crash signature, we should leave the crash keyword on it or else Socorro will probably not find it any more.
Keywords: crash
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #4) > Scoobidiver, if we have a crash signature, we should leave the crash keyword > on it or else Socorro will probably not find it any more. I don't think the crash keyword is required so that the bug shows up in Socorro. See bug 705761 for instance. In addition, there are many crashes in Bugzilla with no crash keyword.
Yeah, I also don't think chromehang's should be tagged with 'crash' since that will directly affect crash-related metrics when they *aren't* crashes but hangs.
(In reply to Scoobidiver from comment #5) > I don't think the crash keyword is required so that the bug shows up in > Socorro. See bug 705761 for instance. Interesting, then I may be wrong. OK, I'll probably need to rewrite some other of my tooling so that "crash" and "hang" are counted in it as "crash bugs", I guess, or my analysis of issues to track in CrashKill is wrong... But thanks for making me realize that. > In addition, there are many crashes in Bugzilla with no crash keyword. Crashes without the crash keyword are wrong, that needs to be changed.
Keywords: crash
Assignee: nobody → jdaggett
Whiteboard: [Snappy:P1]
Need to track down the DirectWrite DLL version associated with the outliers here. There's a known MS bug with the timing of DirectWrite font cache startup than can cause this, but MS claims this is fixed. Telemetry variable: DWRITEFONT_DELAYEDINITFONTLIST_COLLECT
Blocks: 859558
As mentioned in comment 3, this is fundamentally an OS-level bug that occurs because of a timing issue at *cold* startup between the system font cache and the dwrite system library. Microsoft has said they fixed this previously so hopefully as users update their OS the problem will disappear altogether. The only "solution" here is to basically notice when a long delay has occurred and effectively disable hardware acceleration. Given the *very* small percentage of users that are affected by this, I don't think that's the right way to go but it's certainly a possibility. Bug 706340 added gfxinfo to telemetry data. So we should be able to use that to answer the question: How many users of FF {23,24,25} hit a delay here of more that 2secs and what versions of DirectWrite are they using? Vladan, could you look at the telemetry data for this? Or point me at the docs on how to pull this out? Telemetry item: DWRITEFONT_DELAYEDINITFONTLIST_COLLECT The latest version of the telemetry metrics interface doesn't seem to have an export feature, that's kind of unfortunate. http://telemetry.mozilla.org/#path=release/25/DWRITEFONT_DELAYEDINITFONTLIST_COLLECT
Flags: needinfo?(vdjeric)
Depends on: 863122
The chromehangs link I emailed you also contains info on hang stacks involving gfxDWriteFontList::DelayedInitFontList and the associated DWriteVersions. I took a quick look at the top 5 hang signatures involving DelayedInitFontList in that link, and I found the following DWriteVersions (in various proportions): 6.1.7600.16385 6.1.7600.16699 6.1.7600.16763 6.1.7600.16961 6.1.7600.16972 6.1.7600.20655 6.1.7600.20710 6.1.7600.20781 6.1.7600.20830 6.1.7600.21162 6.1.7601.16562 6.1.7601.17105 6.1.7601.17514 6.1.7601.17563 6.1.7601.17776 6.1.7601.17789 6.1.7601.18126 6.1.7601.21935 6.2.9200.16433 6.2.9200.16492 6.2.9200.16571 6.2.9200.16581 7.0.6002.23097 Granted, the vast majority of the hangs reported in this sample are from DWrite versions before Microsoft's fix. Unfortunately, that means real users are still running into this issue. If you'd like to work with the raw Telemetry data, you can do your analysis directly on our new Telemetry backend: http://mreid-moz.github.io/blog/2013/11/06/current-state-of-telemetry-analysis/ http://jonasfj.dk/blog/2013/11/telemetry-rebooted-analysis-future/ Please come by the #telemetry channel for more details
Flags: needinfo?(vdjeric)
Latest results from Telemetry stats: http://jsfiddle.net/69QNY/2/ For the past few months, using Firefox 24 release data: The DirectWrite call GetSystemFontCollection() takes over 2.3secs in roughly 2.1 million startups. This is out of a total of 137 million DirectWrite startups (for comparison, 209 million GDI startups). data: release/24/DWRITEFONT_DELAYEDINITFONTLIST_COLLECT gfxDWriteFontList::DelayedInitFontList GetSystemFontCollection (ms) index start end count pct cum pct 6 647 2324 480054 0.35 98.45 7 2324 8350 1171404 0.85 99.31 8 8350 30000 790817 0.58 99.88 9 >30000 159719 0.12 100.00 total counts: 137304675 data: release/24/GDI_INITFONTLIST_TOTAL gfxGDIFontList::InitFontList Total (ms) index start end count pct cum pct 6 647 2324 1712605 0.82 99.56 7 2324 8350 696282 0.33 99.90 8 8350 30000 189563 0.09 99.99 9 >30000 28550 0.01 100.00 total counts: 209074700 I think we need to figure out two things -- are these cold startups (i.e. does the startup occur soon after a system restart) and, for the long delay cases, how many of these are due to large numbers of fonts on the system? One "solution" would be to shunt users affected by consistent long DirectWrite startup times to GDI by effectively disabling hardware acceleration. This is a bit of a "nuclear solution", the user would lose the benefits of DirectWrite and related acceleration.
Distribution of number of fonts at startup, only for DirectWrite users: data: release/24/DWRITEFONT_DELAYEDINITFONTLIST_COUNT gfxDWriteFontList::DelayedInitFontList Font Family Count index start end count pct cum pct 0 0 1 0 0.00 0.00 1 1 3 24 0.00 0.00 2 3 10 60 0.00 0.00 3 10 32 3231 0.00 0.00 4 32 101 140744 0.10 0.10 5 101 319 123098407 89.65 89.76 6 319 1006 12889622 9.39 99.15 7 1006 3172 1076709 0.78 99.93 8 3172 10000 94342 0.07 100.00 9 >10000 1539 0.00 100.00 total counts: 137304678 So there's a *very* small percentage of users with more than 3000 fonts on their system (roughly 0.07% of startups).
Updated telemetry stats don't show any significant change in the percentage of users affected by long times spent in DirectWrite system font collection initialization: http://jsfiddle.net/eGvw5/4/ release/24 DWRITEFONT_DELAYEDINITFONTLIST_COLLECT index start end count pct cum pct 6 647 2324 427213 0.35 98.46 7 2324 8350 1048207 0.85 99.32 8 8350 30000 701801 0.57 99.89 9 >30000 134983 0.11 100.00 total counts: 122697422 release/27 DWRITEFONT_DELAYEDINITFONTLIST_COLLECT index start end count pct cum pct 6 647 2324 621027 0.37 98.44 7 2324 8350 1491087 0.88 99.32 8 8350 30000 963853 0.57 99.89 9 >30000 180538 0.11 100.00 total counts: 168563706 release/28 DWRITEFONT_DELAYEDINITFONTLIST_COLLECT index start end count pct cum pct 6 647 2324 36820 0.32 98.77 7 2324 8350 81229 0.71 99.48 8 8350 30000 50133 0.44 99.92 9 >30000 9719 0.08 100.00 total counts: 11472782 nightly/31 DWRITEFONT_DELAYEDINITFONTLIST_COLLECT index start end count pct cum pct 6 647 2324 10084 0.42 98.44 7 2324 8350 21144 0.89 99.32 8 8350 30000 13668 0.57 99.90 9 >30000 2487 0.10 100.00 total counts: 2388461 Given that system startup time was added to telemetry data at the end of January, it would be interesting to know the following data: For the set of telemetry data where (DWRITEFONT_DELAYEDINITFONTLIST_COLLECT > 2324), what's the distribution of time since system startup? Are these all within 60secs of startup? Additionally, for the set of telemetry data where (DWRITEFONT_DELAYEDINITFONTLIST_COLLECT > 2324) what's the distribution of DirectWrite library versions? Are there recent versions included in that list? This would be useful information to report back to Microsoft. In the past they've assured me that this problem has been "fixed" in recent versions of the DirectWrite library.
Crash Signature: , PhysicalFontFace const*) ] [@ chromehang | `anonymous namespace''::TrueTypeFontMetricsBuilder::GetBlackBox(unsigned short) ] [@ chromehang | FontFileAnalyzer::IsPfm(unsigned __int64) ] → , PhysicalFontFace const*) ] [@ chromehang | `anonymous namespace''::TrueTypeFontMetricsBuilder::GetBlackBox(unsigned short) ] [@ chromehang | FontFileAnalyzer::IsPfm(unsigned __int64) ] [@ chromehang | InitializeFontMetrics ] [@ chromehang | `anonymo…
Assignee: jd.bugzilla → nobody
Closing because no crash reported since 12 weeks.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
Closing because no crash reported since 12 weeks.
You need to log in before you can comment on or make changes to this bug.