Closed Bug 1577507 Opened 5 years ago Closed 5 years ago

Crash in [@ nsAppShellInit]

Categories

(Core :: Graphics, defect)

defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla75
Tracking Status
firefox-esr68 --- wontfix
firefox69 --- wontfix
firefox70 --- wontfix
firefox71 --- wontfix
firefox72 --- wontfix
firefox73 --- wontfix
firefox74 --- fixed
firefox75 --- fixed

People

(Reporter: marcia, Assigned: tnikkel)

References

(Regression)

Details

(Keywords: crash, regression)

Crash Data

Attachments

(1 file)

This bug is for crash report bp-225125e1-d129-467e-804a-3e2980190805.

Seen while looking at macOS crash stats: https://bit.ly/327Y2Sw. Crashes started in 20190805095413 on 70. There are also a handful of crashes in 69.0b14 and 15.

Code was touched in Bug 1545381. adding ni on nfroyd for some insight. All of the crashes have MOZ_RELEASE_ASSERT(((bool)(__builtin_expect(!!(!NS_FAILED_impl(rv)), 1))))

Top 9 frames of crashing thread:

0 XUL nsAppShellInit widget/nsAppShellSingleton.h:47
1 XUL nsFactoryEntry::GetFactory xpcom/components/nsComponentManager.cpp:1858
2 XUL nsComponentManagerImpl::GetServiceLocked xpcom/components/nsComponentManager.cpp:1384
3 XUL nsComponentManagerImpl::GetService xpcom/components/nsComponentManager.cpp:1436
4 XUL XRE_RunAppShell toolkit/xre/nsEmbedFunctions.cpp:876
5 XUL XRE_InitChildProcess ipc/chromium/src/base/message_loop.cc:315
6 plugin-container main ipc/app/MozillaRuntimeMain.cpp:23
7 libdyld.dylib libdyld.dylib@0x1014 
8 libdyld.dylib libdyld.dylib@0x1014 

OS: Windows 10 → macOS
Version: 70 Branch → Trunk
Blocks: 1545381

:marcia, since this bug is a regression, could you fill (if possible) the regressed_by field?
For more information, please visit auto_nag documentation.

Flags: needinfo?(mozillamarcia.knous)

Adding a ni on Nathan - I guess I forgot to do it initially. This is a very low volume crash.

Flags: needinfo?(nfroyd)

I don't have anything to add beyond bug 1545381 comment 19; somebody who's knowledgeable about how our plugins work would need to take a look.

Flags: needinfo?(nfroyd)
Flags: needinfo?(mozillamarcia.knous)
Regressed by: 1545381

Last crashes in nightly 71 were in the 10-3 build - nothing since then on nightly. One crash in 70.0b8. Perhaps we can close this bug out as WFM.

Marking fix-optional to remove this from regression triage. Closing it is also probably ok.

Since my last Comment 4, I noticed this is still visible in 72 nightly, with anywhere from 2-9 crashes per build. However, they appear to be single users hitting the crash in most cases. I also changed the platform to all since I see some Windows crashes as well.

OS: macOS → All
Hardware: Desktop → All

High correlation to 10.13 on macOS: (90.00% in signature vs 00.57% overall) platform_pretty_version = OS X 10.13 [90.00% vs 12.60% if platform = Mac OS X]

For what it's worth, most (all?) of these are in the gpu child process.

Adding 72 and 73 as affected. Still relatively low volume, but this signature is currently #4 overall on 73 nightly. There are no URLs and no comments.

Low volume crash at this point, marking it fix-optional across all versions to remove it from weekly triage.

I am setting 74 back to affected as we have ~20 crashes per day and only macOS users are affected, that seems to be a lot for nightly, especially since we have few macOS users.

(In reply to Marcia Knous [:marcia] from comment #7)

High correlation to 10.13 on macOS: (90.00% in signature vs 00.57% overall) platform_pretty_version = OS X 10.13 [90.00% vs 12.60% if platform = Mac OS X]

This correlation is still there on mozilla-central (though it's much weaker):

platform_pretty_version = OS X 10.13 [26.00% vs 10.17% if platform = Mac OS X]

But for some reason, macOS 10.15 now predominates on beta:

platform_pretty_version = OS X 10.15 [92.59% vs 53.35% if platform = Mac OS X]

The 73 and later crashes are all in the GPU process.

Looking over the code, there's nothing that could obviously be a source of failures in the GPU process on OSX. Maybe something with sandboxing? Maybe people have broken installs that interacts in some weird way with sandboxing? The obvious next step would be to crash any place the init method can return a failure, and then we'd get more information about what is failing.

For reference:

failure handling that triggers the crash -
https://searchfox.org/mozilla-central/rev/2e355fa82aaa87e8424a9927c8136be184eeb6c7/widget/nsAppShellSingleton.h#41

since this mostly happens on OSX, OSX Init method -
https://searchfox.org/mozilla-central/rev/2e355fa82aaa87e8424a9927c8136be184eeb6c7/widget/cocoa/nsAppShell.mm#282

There are a few NS_ENSURE_STATE checks here, hard to say which one returns.

Also the nsBaseAppShell call might be failing too -
https://searchfox.org/mozilla-central/rev/2e355fa82aaa87e8424a9927c8136be184eeb6c7/widget/nsBaseAppShell.cpp#40

I think mccr8 has the right idea, adds some asserts to these and see which one fails and debug from there.

(In reply to Andrew McCreight [:mccr8] from comment #16)

Looking over the code, there's nothing that could obviously be a source of failures in the GPU process on OSX. Maybe something with sandboxing? Maybe people have broken installs that interacts in some weird way with sandboxing? The obvious next step would be to crash any place the init method can return a failure, and then we'd get more information about what is failing.

We don't have a Mac GPU process so something is going wrong here for the crashes to be reporting as in the GPU process.

I recently opened a bug that seems to be triggered by assuming that a gpu child process has been launched when it really hasn't been -- bug 1595420.

Top crash on the OSX February 12 Nightlies.

I don't know if this is significant, but every crash report I looked at (probably about 50) has telemetry data either gpuProcess": { "status": "force_enabled" } or gpuProcess": { "status": "available" }. And environment.rst indicates available means in use. I will look into this some more.

Note: the OOM bug Steven Michaud referenced in comment 19 is marked as being regressed by bug 1550422 which was a big gfx pref refactoring and landed in 69.

(In reply to Haik Aftandilian [:haik] from comment #21)

I don't know if this is significant, but every crash report I looked at (probably about 50) has telemetry data either gpuProcess": { "status": "force_enabled" } or gpuProcess": { "status": "available" }. And environment.rst indicates available means in use. I will look into this some more.

Setting the pref layers.gpu-process.force-enabled on DeveloperEdition/74 causes this crash for me. The crash is not user-visible and results in a zombie plugin-container child. Crash report: https://crash-stats.mozilla.org/report/index/4e11d13e-1970-40e6-beee-46cd50200214

Component: XPCOM → Graphics

Since this moved components, please re-triage (see Haik's comments)

Flags: needinfo?(jbonisteel)

That pref is not intended for use on Mac, hence why that crash is probably happening

Flags: needinfo?(jbonisteel) → needinfo?(haftandilian)

(In reply to Jessie [:jbonisteel] plz needinfo from comment #24)

That pref is not intended for use on Mac, hence why that crash is probably happening

But it shows that if graphics incorrectly tries to start a GPU process on macOS we will get these crashes.

I think for this bug we need the graphics team to root cause why we are trying to start the GPU process on macOS. Whether it be users incorrectly setting that pref or graphics code enabling the GPU process due to a bug.

Does the telemetry noted in comment 21 indicate what is happening?

(Per Andrew McCreight [:mccr8] from comment #20)

Top crash on the OSX February 12 Nightlies.

Flags: needinfo?(haftandilian) → needinfo?(jbonisteel)

Timothy - if it isn't easily clear why this is happening, can you make a patch that disables the pref on Mac?

Flags: needinfo?(jbonisteel) → needinfo?(tnikkel)
Assignee: nobody → tnikkel
Status: NEW → ASSIGNED
Flags: needinfo?(tnikkel)
Pushed by tnikkel@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/1d37dcac0fb3 Disable GPU process on mac even if prefs try to enable it. r=aosmond
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla75

Comment on attachment 9127930 [details]
Bug 1577507. Disable GPU process on mac even if prefs try to enable it. r?aosmond

Beta/Release Uplift Approval Request

  • User impact if declined: crashes for users who have turned on gpu process pref on mac
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Properly disables a configuration that will crash, only affects users who enabled pref
  • String changes made/needed: none
Attachment #9127930 - Flags: approval-mozilla-beta?

Since the fix landed, there aren't any more of these crashes on crash-stats. The most recent build ID with the crash is 20200221214911 and this fix went into 20200222095208.

Comment on attachment 9127930 [details]
Bug 1577507. Disable GPU process on mac even if prefs try to enable it. r?aosmond

MacOS crash fix that proved effective on Nightly, uplift approved for 74.0b8, thanks.

Attachment #9127930 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Has Regression Range: --- → yes
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: