Closed Bug 1322633 Opened 8 years ago Closed 7 years ago

Intermittent various tests | application crashed [@ mozilla::layers::CrossProcessCompositorBridgeParent::AllocPAPZCTreeManagerParent]

Categories

(Core :: Panning and Zooming, defect, P3)

53 Branch
x86
Linux
defect

Tracking

()

RESOLVED FIXED
mozilla54
Tracking Status
firefox52 --- fixed
firefox53 --- fixed
firefox54 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: kats)

References

Details

(Keywords: crash, intermittent-failure, Whiteboard: [gfx-noted])

Crash Data

Attachments

(1 file)

Crash appears to be at https://hg.mozilla.org/integration/autoland/file/3be54c62e0011dca5367be2a04e54bc6882f1908/gfx/layers/ipc/CrossProcessCompositorBridgeParent.cpp#l138
OS: Unspecified → Linux
Priority: -- → P3
Hardware: Unspecified → x86
Whiteboard: [gfx-noted]
Version: unspecified → 53 Branch
Seeing this on other tests too, like test_navigation.py TestNavigate.test_about_blank_for_new_docshell (Linux debug Mn job)
Summary: Intermittent browser/components/sessionstore/test/browser_windowStateContainer.js | application crashed [@ mozilla::layers::CrossProcessCompositorBridgeParent::AllocPAPZCTreeManagerParent] → Intermittent various tests | application crashed [@ mozilla::layers::CrossProcessCompositorBridgeParent::AllocPAPZCTreeManagerParent]
Firefox UI tests are also affected:
https://treeherder.mozilla.org/#/jobs?repo=autoland&filter-searchStr=Firefox%20UI&bugfiler&selectedJob=71836529

The crash there happens when opening and closing browser windows.
Severity: normal → critical
(In reply to Henrik Skupin (:whimboo) from comment #7)
> Firefox UI tests are also affected:
> https://treeherder.mozilla.org/#/jobs?repo=autoland&filter-
> searchStr=Firefox%20UI&bugfiler&selectedJob=71836529

https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=28765570b2933291117323689668a4ce99c03da8&selectedJob=71836529 is a better url.

It seems like all the crashes are happening on python-based tests. I'm not sure why. I've been trying to reproduce this locally but haven't had any success. If I can't get it today I'll try to just land some logging and wait for it to hit in automation.
Assignee: nobody → bugmail
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #9)
> It seems like all the crashes are happening on python-based tests.

Scratch that, it happens on browser-chrome as well, which I missed.
I spent some time looking at the code. Because the crash is happening in CrossProcessCompositorBridgeParent::AllocPAPZCTreeManagerParent, the content process must be at [1]. That function in turn is called at [2], just a little bit after the PLayerTransaction constructor is sent across the same channel. If we look at the parent side of that PLayerTransaction constructor, it does a similar kind of check that AllocPAPZCTreeManagerParent does, but has a fallback path [3] to handle the case where the layers id is not found in sIndirectLayerTrees. This fallback path looks like a bogus thing based on the comment there and bug 900745. If we remove that bogus thing then presumably the crash will move up to that location instead of crashing in AllocPAPZCTreeManagerParent. I don't know if that's helpful but at least it's a more honest crash in that the right code will get blamed. :)

[1] http://searchfox.org/mozilla-central/rev/8fa84ca6444e2e01fb405e078f6d2c8da0e55723/dom/ipc/TabChild.cpp#2638
[2] http://searchfox.org/mozilla-central/rev/8fa84ca6444e2e01fb405e078f6d2c8da0e55723/dom/ipc/TabChild.cpp#2617
[3] http://searchfox.org/mozilla-central/rev/8fa84ca6444e2e01fb405e078f6d2c8da0e55723/gfx/layers/ipc/CrossProcessCompositorBridgeParent.cpp#97
Depends on: 900745
The main change in this patch is moving the InitAPZState() call inside the if (success) guard, so that we don't call it if the PLayerTransaction construction wasn't successful (see bug 900745, which this patch depends on). The rest of the patch is just moving a hunk of code so that mLayersId is still set before the call to InitAPZState.

I'll file a follow-up bug for the graphics branch which will need a call to InitAPZState() inserted at https://hg.mozilla.org/projects/graphics/file/d52552ae7dcf/dom/ipc/TabChild.cpp#l2608 for the WR layermanager case.
Attachment #8831570 - Flags: review?(dvander)
bug 900745 looks super ancient. Why exactly do we fail to create a LayerTransactionParent?
Comment on attachment 8831570 [details] [diff] [review]
Don't call InitAPZState if PLayerTransaction creation failed

Review of attachment 8831570 [details] [diff] [review]:
-----------------------------------------------------------------

Actually if we're opening and closing a window I guess maybe we could deallocate the layer tree ID before the tab is shown?
Attachment #8831570 - Flags: review?(dvander) → review+
(In reply to David Anderson [:dvander] from comment #17)
> Actually if we're opening and closing a window I guess maybe we could
> deallocate the layer tree ID before the tab is shown?

I guess I never said that anywhere, but yes - that seems to be the only way in which this situation might arise. The PLayerTransaction constructor (inadvertently) avoided the crash because of the dummy LayerTransactionParent that it creates on the parent side but the PAPZCTreeManager constructor just crashes. Instead of creating a dummy APZCTreeManagerParent as well I figured we could resurrect that "success" flag and use it as a guard to solve this problem.
Pushed by kgupta@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/d500ad0713bb
Don't initialize APZ protocols unless PLayerTransaction was successfully set up. r=dvander
https://hg.mozilla.org/mozilla-central/rev/d500ad0713bb
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla54
Adding crash signatures. The crash volume on Nightly that matches this bug seems low enough that it won't be useful in letting us know if this is really fixed. I'm going to wait a few days and see if there are still matching intermittent failures - if not, then I'll request aurora uplift. I'm not sure it's worth uplifting to beta given it's relatively low frequency. I'll leave 52 as "affected" for now and decide later.
Crash Signature: [@ RefPtr<mozilla::layers::APZCTreeManager>::RefPtr| mozilla::layers::CrossProcessCompositorBridgeParent::AllocPAPZCTreeManagerParent] [@ RefPtr<T>::RefPtr<T> | mozilla::layers::CrossProcessCompositorBridgeParent::AllocPAPZCTreeManagerParent] [@ mozilla…
Comment on attachment 8831570 [details] [diff] [review]
Don't call InitAPZState if PLayerTransaction creation failed

Approval Request Comment
[Feature/Bug causing the regression]: gpu process separation, most likely
[User impact if declined]: intermittent failures/crashes when rapidly opening and closing windows
[Is this code covered by automated tests?]: yes
[Has the fix been verified in Nightly?]: we don't have clear STR so it's hard to say it's verified. the intermittent failures/crashes are also very low volume, but they haven't occurred since this landed.
[Needs manual test from QE? If yes, steps to reproduce]: no, since we don't have STR
[List of other uplifts needed for the feature/fix]: bug 900745
[Is the change risky?]: shouldn't be
[Why is the change risky/not risky?]: it's a pretty small change, only affects the "degenerate" code path where the window is closed so quickly that we're still in the process of initializing it. that's what causes the crash.
[String changes made/needed]: none
Attachment #8831570 - Flags: approval-mozilla-beta?
Attachment #8831570 - Flags: approval-mozilla-aurora?
Comment on attachment 8831570 [details] [diff] [review]
Don't call InitAPZState if PLayerTransaction creation failed

Fix a crash. Aurora53+.
Attachment #8831570 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment on attachment 8831570 [details] [diff] [review]
Don't call InitAPZState if PLayerTransaction creation failed

fix apz init crash, for beta52
Attachment #8831570 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
So it looks like the code on beta for this is quite different. In particular, the InitRenderingState function has some early-returns and cleanup paths that aren't there in 53+ and the code movement hunk in the patch might have different effects on 52. I'll see if I can write an equivalent patch for 52 that's less risky.
Oh, actually - bug 900745 by itself should fix this on 52, because the TabChild.cpp code has an early return codepath that would avoid calling the AllocPAPZCTreeManager function. So we can just uplift bug 900745 to beta and don't need to bother with this patch.
I uplifted bug 900745 to beta, so marking this fixed on 52. For posterity, I'll just note that the patch on bug 900745 should cause the child process to go down the branch at [1] which should avoid the crash even without the patch on this bug.

[1] https://hg.mozilla.org/releases/mozilla-beta/file/dcdc85ecb2f3/dom/ipc/TabChild.cpp#l2628
Seems like this may not have fixed the issue, since bug 1338422 was just filed which looks similar and is from a build that contains the fix :( I'll use that bug to track.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: