Closed Bug 1399847 Opened 7 years ago Closed 5 years ago

Crash in IPCError-browser | ShutDownKill with FontCollectionBuilder

Categories

(Core :: Graphics, defect, P2)

55 Branch
x86
Windows 7
defect

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox-esr52 --- wontfix
firefox-esr60 --- wontfix
firefox55 --- wontfix
firefox56 + wontfix
firefox57 + wontfix
firefox58 --- wontfix
firefox59 --- wontfix
firefox60 --- wontfix
firefox61 --- wontfix
firefox62 --- wontfix

People

(Reporter: philipp, Assigned: aosmond)

References

Details

(Keywords: crash, regression, Whiteboard: [gfx-noted])

Crash Data

This bug was filed from the Socorro interface and is report bp-a21d3006-4d1e-4b15-a9f8-f070e0170914. ============================================================= Crashing Thread (0) Frame Module Signature Source 0 dwrite.dll FontFileAnalyzer::IsPfm(unsigned __int64) 1 dwrite.dll FontFileAnalyzer::Analyze() 2 dwrite.dll FontFileAnalyzer::FontFileAnalyzer(FontFileReference const&) 3 dwrite.dll FontFileReference::GetLastWriteTime() 4 dwrite.dll FontCollectionBuilder::FontCollectionBuilder(IDWriteFactory*, void const*, unsigned int, unsigned __int64, FontLoaderManagers const&, FontCollection const&, CountedPtr<AccessToken> const&) 5 dwrite.dll FontCollectionElement::AddToCacheImpl(FontLoaderManagers const&, CacheWriter&, void const**, unsigned int*) 6 dwrite.dll CacheWriter::AddElement(FontLoaderManagers const&, IFontCacheElement&, unsigned int, unsigned int, void const**, unsigned int*, bool*) 7 dwrite.dll ClientSideCacheContext::ClientLookup(IFontCacheElement&, unsigned int, unsigned int) 8 dwrite.dll ClientSideCacheContext::InitializeElementImpl(IFontCacheElement&, unsigned int, unsigned int) 9 dwrite.dll FontCollectionElement::FontCollectionElement(void const*, unsigned int, unsigned __int64, ClientSideCacheContext*, DWriteFactory*, FontCollection const&) 10 dwrite.dll DWriteFontCollection::DWriteFontCollection(void const*, unsigned int, unsigned __int64, ClientSideCacheContext*, DWriteFactory*, FontCollection const&) 11 dwrite.dll ComObject<DWriteFontCollection>::ComObject<DWriteFontCollection><unsigned __int64*, unsigned int, unsigned __int64, IntrusivePtr<ClientSideCacheContext>, DWriteFactory*, FontCollection>(unsigned __int64*, unsigned int, unsigned __int64, IntrusivePtr<ClientSideCacheContext>, DWriteFactory*, FontCollection) 12 dwrite.dll InnerComObject<DWriteFactory, DWriteFontCollection>::InnerComObject<DWriteFactory, DWriteFontCollection><unsigned __int64*, unsigned int, unsigned __int64, IntrusivePtr<ClientSideCacheContext>, DWriteFactory*, FontCollection>(unsigned __int64*, unsigned int, unsigned __int64, IntrusivePtr<ClientSideCacheContext>, DWriteFactory*, FontCollection) 13 dwrite.dll DWriteFactory::GetSystemFontCollectionInternal(bool) 14 dwrite.dll DWriteFactory::GetSystemFontCollection(IDWriteFontCollection**, int) 15 xul.dll gfxDWriteFontList::InitFontListForPlatform() gfx/thebes/gfxDWriteFontList.cpp:899 16 xul.dll gfxPlatformFontList::InitFontList() gfx/thebes/gfxPlatformFontList.cpp:288 17 xul.dll gfxWindowsPlatform::CreatePlatformFontList() gfx/thebes/gfxWindowsPlatform.cpp:499 18 xul.dll gfxPlatformFontList::Init() gfx/thebes/gfxPlatformFontList.h:102 19 xul.dll gfxPlatform::Init() gfx/thebes/gfxPlatform.cpp:761 20 xul.dll gfxPlatform::InitChild(mozilla::gfx::ContentDeviceData const&) gfx/thebes/gfxPlatform.cpp:565 21 xul.dll mozilla::dom::ContentChild::RecvSetXPCOMProcessAttributes(mozilla::dom::XPCOMInitData const&, mozilla::dom::ipc::StructuredCloneData const&, nsTArray<LookAndFeelInt>&&) dom/ipc/ContentChild.cpp:531 22 xul.dll mozilla::dom::PContentChild::OnMessageReceived(IPC::Message const&) obj-firefox/ipc/ipdl/PContentChild.cpp:6920 23 xul.dll mozilla::ipc::MessageChannel::DispatchAsyncMessage(IPC::Message const&) ipc/glue/MessageChannel.cpp:2092 24 xul.dll mozilla::ipc::MessageChannel::DispatchMessageW(IPC::Message&&) ipc/glue/MessageChannel.cpp:2018 25 xul.dll mozilla::ipc::MessageChannel::RunMessage(mozilla::ipc::MessageChannel::MessageTask&) ipc/glue/MessageChannel.cpp:1887 26 xul.dll mozilla::ipc::MessageChannel::MessageTask::Run() ipc/glue/MessageChannel.cpp:1920 27 xul.dll nsThread::ProcessNextEvent(bool, bool*) xpcom/threads/nsThread.cpp:1446 28 xul.dll mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) ipc/glue/MessagePump.cpp:97 29 xul.dll mozilla::ipc::MessagePumpForChildProcess::Run(base::MessagePump::Delegate*) ipc/glue/MessagePump.cpp:302 30 xul.dll MessageLoop::RunHandler() ipc/chromium/src/base/message_loop.cc:319 31 xul.dll MessageLoop::Run() ipc/chromium/src/base/message_loop.cc:299 32 xul.dll nsBaseAppShell::Run() widget/nsBaseAppShell.cpp:156 33 xul.dll nsAppShell::Run() widget/windows/nsAppShell.cpp:278 34 xul.dll XRE_RunAppShell() toolkit/xre/nsEmbedFunctions.cpp:882 35 xul.dll mozilla::ipc::MessagePumpForChildProcess::Run(base::MessagePump::Delegate*) ipc/glue/MessagePump.cpp:270 36 xul.dll MessageLoop::RunHandler() ipc/chromium/src/base/message_loop.cc:319 37 xul.dll MessageLoop::Run() ipc/chromium/src/base/message_loop.cc:299 38 xul.dll XRE_InitChildProcess(int, char** const, XREChildData const*) toolkit/xre/nsEmbedFunctions.cpp:699 39 xul.dll mozilla::BootstrapImpl::XRE_InitChildProcess(int, char** const, XREChildData const*) toolkit/xre/Bootstrap.cpp:65 40 firefox.exe content_process_main(mozilla::Bootstrap*, int, char** const) ipc/contentproc/plugin-container.cpp:64 41 firefox.exe wmain toolkit/xre/nsWindowsWMain.cpp:115 42 firefox.exe __scrt_common_main_seh f:/dd/vctools/crt/vcstartup/src/startup/exe_common.inl:253 43 kernel32.dll BaseThreadInitThunk 44 ntdll.dll __RtlUserThreadStart 45 ntdll.dll _RtlUserThreadStart content process shutdownkill reports happening in FontCollectionBuilder are regressing in firefox 56 on windows 7, mainly on 32bit builds: https://crash-stats.mozilla.com/signature/?product=Firefox&process_type=content&proto_signature=~FontCollectionBuilder%3A%3AFontCollectionBuilder&signature=IPCError-browser%20%7C%20ShutDownKill&date=%3E%3D2017-07-01T00%3A00%3A00.000Z#graphs on 56.0b they account for just a bit under 10% of the [@ IPCError-browser | ShutDownKill] signature last week... the first occurrence on nightly was in 56.0a1 build 20170714030205 which contained two apparent font related changes: bug 1376026 & bug 1376964 whole pushlog to the day before: https://hg.mozilla.org/mozilla-central/pushloghtml?startdate=2017-07-13&tochange=67cd1ee26f2661fa5efe3d952485ab3c89af4271
Flags: needinfo?(milan)
Lee, Jonathan, can you take a look? There seems to be a lot of these...
Flags: needinfo?(milan)
Flags: needinfo?(lsalzman)
Flags: needinfo?(jfkthame)
Priority: -- → P1
Whiteboard: [gfx-noted]
It looks like the first patch in bug 1376026 may cause us to use DWrite on older Win7 systems where previously we'd have skipped it -- perhaps that means we're trying to use it on flakier, older systems and hence hitting more crashes? Bug 1376964 was to do with webfont loading, so shouldn't be involved here; the stack here shows a crash happening during platform initialization, way before we've gotten to the point of fetching and instantiating webfonts.
Flags: needinfo?(jfkthame)
I good first step might be to add telemetry that measures the time that gfxDWriteFontList::InitFontListForPlatform takes.
As I understand it this is not a crash but instead just the content process not shutting down in time. It's conceivable that this could happen if you had a lot of fonts with a slow disk.
We should find out why we have all these content processes getting created during shutdown...
Flags: needinfo?(lsalzman)
Priority: P1 → P2
(In reply to Milan Sreckovic [:milan] from comment #6) > We should find out why we have all these content processes getting created > during shutdown... The crash reports generally seem to show a pretty short uptime (less than a minute, in most cases). Are we in the middle of restoring a session (or something like that) when the user changes their mind and closes the browser? So perhaps it's not so much that we're creating content processes during shutdown, and more like the browser is shutting down while content processes are still being created...?
(In reply to Jonathan Kew (:jfkthame) from comment #7) > (In reply to Milan Sreckovic [:milan] from comment #6) > > We should find out why we have all these content processes getting created > > during shutdown... > > The crash reports generally seem to show a pretty short uptime (less than a > minute, in most cases). Are we in the middle of restoring a session (or > something like that) when the user changes their mind and closes the > browser? So perhaps it's not so much that we're creating content processes > during shutdown, and more like the browser is shutting down while content > processes are still being created...? A conceivable course of events: Jeff noted that sometimes pauses on the order of seconds up to 30s can occur during the GetSystemFontCollection call, according to telemetry. I might guess that this occurs perhaps due to hitting the pagefile? Maybe the user is seeing the hang, panic ensues, and trigger shutdown and/or a frenzy of trying to kill locked tabs and create new ones?
Andrew, is this related to the shutdown bugs you're working on at the moment?
Flags: needinfo?(aosmond)
The timing sort of matches. It was also pointed out to me that we may be spinning up a content process on shutdown to deal with page snapshorts for activity stream or something related.
No crashes since 20170917031738 ?
(In reply to Milan Sreckovic [:milan] from comment #11) > No crashes since 20170917031738 ? if i narrow down the graph from comment #0 to the nightly channel this doesn't really seem to stop: https://crash-stats.mozilla.com/signature/?product=Firefox&release_channel=nightly&process_type=content&proto_signature=~FontCollectionBuilder%3A%3AFontCollectionBuilder&signature=IPCError-browser%20%7C%20ShutDownKill&date=%3E%3D2017-07-01T00%3A00%3A00.000Z#graphs and there are also still reports from current builds like bp-c6edd743-17de-4b6a-ada4-233fb0170920
I thought this was fixed with our latest RC, but this is still getting a large amount of crash reports in 56 beta 12 and 56.0RC builds. It also still sounds like a content process issue and not something that causes a startup crash. Milan, Andrew, any thoughts here? The difference between 55.0.3 and 56 seems so huge, do we know how this might affect users?
Flags: needinfo?(milan)
Hmmm, about 12% of the signatures from the 56 RC contain gfxPlatform::Init in the stack trace. There does not appear to be any single code path which dominates the reports (I identified several which have similar numbers). The similar crash I had investigated signature wise was bug 1400637 but that was 57+ only, so clearly this has a different root cause.
Andrew's currently the freshest on shutdown issues, so even though this is not quite related to work he'd have done, it makes sense to assign to him.
Assignee: nobody → aosmond
Flags: needinfo?(milan)
Priority: P2 → P1
I don't think this is any more prevalent than it was historically. I don't think we are getting stuck -- it is just a common place to be when setting up a content process if shutdown comes in soon after. (From what I can gather in the crash reports, it really only happens on x86 in significant numbers, amd64 has a much much smaller rate of occurrence.) As per bug 1375704 comment 20, we should see the number of reports go down to a more acceptable level now.
Flags: needinfo?(aosmond)
It's back...
Flags: needinfo?(aosmond)
Supposedly bug 1399796 was going to help, but this is not reflected in the crash stats. On 57.0b, gfxPlatform::Init related IPC ShutDownKill reports remain around 7%. On 58.0a, we are now less than 1%. The dip observed in the graph appears to be luck. The one build that actually did see fewer crash reports was on a Saturday with an almost empty pushlog (and the one change was related to tests). The prediction in bug 1375704 comment 24 where we would see an uptick in reports in 57b.0 appears to be accurate...
It was suggested that bug 1385249 landing might reduce the crash rate in bug 1375704 comment 27, but it does not appear to have had any impact on nightly and it has been in for several days now.
Depends on: 1406442
Huge numbers here, but not a lot of user comments. On release 56, we're seeing 19791 crash reports here in the last week. I don't think we are going to make any progress here for 56; we don't know how to improve it right now, and this probably isn't a good dot release candidate.
I'm guessing the same will happen for 57.
Priority: P1 → P2
Version: 56 Branch → 55 Branch
This still appears to have a pretty high crash rate :(

Reviewing the recent crashes, while we still see FontCollectionBuilder shutdown crashes, they are a drop in the bucket compared to the general IPC browser errors. None of the big hitters do not appear to be graphics related to me.

Flags: needinfo?(aosmond)

Closing because no crashes reported for 12 weeks.

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.