1255823 - test_bug260264.html can kill the child process because of a mismatched Msg_UpdateZoomConstraints__ID message

Reporter

Description

•

9 years ago

STR: 1. Be on Linux. (This may also happen elsewhere) 2. Apply the patch in bug 1255121. 3. Run: $ MOZ_CHAOSMODE=0 ./mach mochitest --e10s test_bug260264.html --debugger=rr --debugger-args="record -h" This happens once out of three or four runs when debugging under rr chaos mode. You'll sometimes get: ... TEST-PASS | /tests/dom/tests/mochitest/bugs/test_bug260264.html | not properly blocked TEST-PASS | /tests/dom/tests/mochitest/bugs/test_bug260264.html | not properly opened TEST-PASS | /tests/dom/tests/mochitest/bugs/test_bug260264.html | not properly blocked TEST-PASS | /tests/dom/tests/mochitest/bugs/test_bug260264.html | not properly blocked ###!!! [Parent][DispatchAsyncMessage] Error: (msgtype=0x7,name=???) Route error: message sent to unknown actor ID [Parent 29704] WARNING: pipe error (34): Connection reset by peer: file /home/ehsan/moz/src/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 459 [Parent 29704] WARNING: pipe error (36): Connection reset by peer: file /home/ehsan/moz/src/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 459 ###!!! [Parent][MessageChannel] Error: (msgtype=0x2C0077,name=PBrowser::Msg_Destroy) Channel error: cannot send/recv Followed by the child being killed. msgtype 0x7 is Msg_UpdateZoomConstraints__ID based on <https://wiki.mozilla.org/Electrolysis/Debugging#Working_backwards_from_a_C.2B.2B_Message_to_its_IPDL_message>

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 1

•

9 years ago

I talked to Ehsan about this on IRC, I'll investigate.

Assignee: nobody → bugmail.mozilla

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 2

•

9 years ago

So what appears to be happening here is that: 1) parent process creates a PAPZParent and sends over the constructor 2) child process creates a PAPZChild 3) parent process destroys the PAPZParent and sends over the delete (it's an async message) 4) child process sends over the UpdateZoomConstraints message 5) child process receives the delete when the message in 4 goes over to the parent, it can't find the actor anymore, because it destroyed it in step 3, and barfs. According to IRC conversation we need to have a two-step destruction process for protocols with an async delete so that this sort of thing doesn't happen because a message was inflight before the delete was received. Also I'm fairly certain this was a regression from bug 1020199. Peter, do you have cycles to pick this up in the immediate future?

Blocks: 1020199

Flags: needinfo?(peterv)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Updated

•

9 years ago

status-firefox46: --- → unaffected

status-firefox47: --- → affected

status-firefox48: --- → affected

Keywords: regression

Whiteboard: [gfx-noted]

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 3

•

9 years ago

The immediate future has come and gone. I wrote a patch which turns the PAPZ shutdown into a two-step process. Try push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=4ecccfefebf9

Flags: needinfo?(peterv)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 4

•

9 years ago

Try push seems green.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 5

•

9 years ago

Attached file MozReview Request: Bug 1255823 - Add a two-step destruction process to PAPZ. r? (obsolete) (deleted) — Details

Review commit: https://reviewboard.mozilla.org/r/44189/diff/#index_header See other reviews: https://reviewboard.mozilla.org/r/44189/

Attachment #8737924 - Flags: review?(peterv)

Attachment #8737924 - Flags: review?(dvander)

David Anderson [:dvander] - inactive, e-mail if emergency

Comment 6

•

9 years ago

I forget if we have a bug filed (bill would know), but we want that two-step handshake pattern to be built into IPDL at some point. It's kind of ridiculous we need to hand-roll it out in every new actor.

David Anderson [:dvander] - inactive, e-mail if emergency

Updated

•

9 years ago

Attachment #8737924 - Flags: review?(dvander) → review+

David Anderson [:dvander] - inactive, e-mail if emergency

Comment 7

•

9 years ago

Comment on attachment 8737924 [details] MozReview Request: Bug 1255823 - Add a two-step destruction process to PAPZ. r? https://reviewboard.mozilla.org/r/44189/#review40827

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 8

•

9 years ago

I haven't heard back from peterv and I'd like to land this asap to get more bake time before uplift, so I'm going to go ahead and do that. Leaving r? on the patch to get a post-landing review from him, since I'd still like him to look at the patch.

Pulsebot

Comment 9

•

9 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/65e9d46daf3e

Peter Van der Beken [:peterv]

Comment 10

•

9 years ago

Comment on attachment 8737924 [details] MozReview Request: Bug 1255823 - Add a two-step destruction process to PAPZ. r? https://reviewboard.mozilla.org/r/44189/#review41233 Thanks for picking this up! ::: gfx/layers/ipc/APZChild.cpp:93 (Diff revision 1) > if (mObserver) { > nsCOMPtr<nsIObserverService> os = services::GetObserverService(); > os->RemoveObserver(mObserver, "tab-child-created"); > - } else { > + } else if (mBrowser) { > mBrowser->SetAPZChild(nullptr); > + mBrowser = nullptr; Is there a reason why you're nulling a member in the destructor? ::: gfx/layers/ipc/RemoteContentController.cpp:339 (Diff revision 1) > RemoteContentController::Destroy() > { > RefPtr<RemoteContentController> controller = this; > NS_DispatchToMainThread(NS_NewRunnableFunction([controller] { > if (controller->CanSend()) { > - DeletePAPZParent(controller); > + if (controller->SendDestroy()) { Maybe combine these to |if (controller->CanSend() && controller->SendDestroy() {|. ::: gfx/layers/ipc/RemoteContentController.cpp:347 (Diff revision 1) > + // __delete__ back, otherwise we might get destroyed in the meantime and > + // the IPC code will crash on try to handle the __delete__. > + uint64_t key = controller->mLayersId; > + MOZ_ASSERT(sDestroyedControllers.find(key) == sDestroyedControllers.end()); > + sDestroyedControllers[key] = controller; > + } Do we really need a hash, it seems like we could just call AddRef here and Release in ActorDestroy?

Attachment #8737924 - Flags: review?(peterv) → review+

Peter Van der Beken [:peterv]

Comment 11

•

9 years ago

Sigh, I reviewed it this morning, but probably didn't push the right buttons in MozReview :-(.

Pulsebot

Comment 12

•

9 years ago

Backout: https://hg.mozilla.org/integration/mozilla-inbound/rev/1bc6fd8ffd85

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 13

•

9 years ago

^ was for crashes in PAPZParent::DestroySubtree. I'm not sure off hand what's going on, will look into it. 16:06:11 KWierso: https://treeherder.mozilla.org/logviewer.html#?job_id=25289631&repo=mozilla-inbound is one 16:06:21 KWierso: another https://treeherder.mozilla.org/logviewer.html#?job_id=25286481&repo=mozilla-inbound 16:06:30 KWierso: and another https://treeherder.mozilla.org/logviewer.html#?job_id=25284784&repo=mozilla-inbound

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 14

•

9 years ago

(In reply to Peter Van der Beken [:peterv] from comment #10) > > + mBrowser = nullptr; > > Is there a reason why you're nulling a member in the destructor? Hm, good point, that shouldn't be needed. > > - DeletePAPZParent(controller); > > + if (controller->SendDestroy()) { > > Maybe combine these to |if (controller->CanSend() && > controller->SendDestroy() {|. Will do. > > + MOZ_ASSERT(sDestroyedControllers.find(key) == sDestroyedControllers.end()); > > + sDestroyedControllers[key] = controller; > > + } > > Do we really need a hash, it seems like we could just call AddRef here and > Release in ActorDestroy? We could, yeah. I'm usually not a fan of manually calling AddRef/Release though because even though it might be fine now, future copypasta or modifications can get them out of sync and then it's harder to track down the problem.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 15

•

9 years ago

I'm having a hard time reproducing the crash, but from tracing through the code it looks like something like this is happening: 1. RemoteContentController::Destroy() is called. This sends a destroy message to the child and puts the RemoteContentController in the sDestroyedControllers map (aka AddRef's it). 2. The child process gets the destroy message from the parent and sends back a __delete__ message 3. ActorDestroy is called on the parent side (unsure why). This releases the ref and actually destroys the RemoteContentController. 4. The parent side gets the __delete__ message and tries to call DestroySubtree() on the PAPZParent which crashes because it derefs a dangling pointer to RemoteContentController. I suspect the right fix is that in RemoteContentController::ActorDestroy, we skip the ref release bit for certain values of ActorDestroyReason. However I'm not sure what values, and the documentation isn't very clear either. Guessing wrong could end up leaking stuff. Also note that the crashes in comment 13 are during tests that intentionally crash the child process and trigger some sort of error shutdown in IPC code.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 16

•

9 years ago

I moved the release-ref into an overridden Recv__delete__. Hopefully this will solve the problem without leaking stuff. I added some more comments in the code to explain why I think this should work. Try push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=40d8b05d79d7

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 17

•

9 years ago

That version leaked a single RemoteContentController on NormalShutdown (the ActorDestroy(NormalShutdown) call happened between the SendDestroy() and the Recv__delete__, which prevented the Recv__delete__ from ever arriving, and leaked the entry in the map). New attempt: https://treeherder.mozilla.org/#/jobs?repo=try&revision=60047c2bc5e7

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 18

•

9 years ago

Ok, that one looks good. Updated patch coming up.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 19

•

9 years ago

Comment on attachment 8737924 [details] MozReview Request: Bug 1255823 - Add a two-step destruction process to PAPZ. r? Review request updated; see interdiff: https://reviewboard.mozilla.org/r/44189/diff/1-2/

Attachment #8737924 - Attachment description: MozReview Request: Bug 1255823 - Add a two-step destruction process to PAPZ. r?peterv,dvander → MozReview Request: Bug 1255823 - Add a two-step destruction process to PAPZ. r?

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 20

•

9 years ago

Comment on attachment 8737924 [details] MozReview Request: Bug 1255823 - Add a two-step destruction process to PAPZ. r? Flipping flags back to r? for a review of the updated patch.

Attachment #8737924 - Flags: review?(peterv)

Attachment #8737924 - Flags: review?(dvander)

Attachment #8737924 - Flags: review+

David Anderson [:dvander] - inactive, e-mail if emergency

Comment 21

•

9 years ago

Comment on attachment 8737924 [details] MozReview Request: Bug 1255823 - Add a two-step destruction process to PAPZ. r? https://reviewboard.mozilla.org/r/44189/#review42877 ::: gfx/layers/ipc/RemoteContentController.cpp:364 (Diff revision 2) > + // the IPC code will crash on try to handle the __delete__. > + uint64_t key = controller->mLayersId; > + MOZ_ASSERT(sDestroyedControllers.find(key) == sDestroyedControllers.end()); > + sDestroyedControllers[key] = controller; > + } else { > + // If the send failed, then this object must already have had ActorDestroy Is this always true? I thought that ActorDestroy was posted to the event loop, so if the Send fails for some other reason, it might not be called yet. But I could be wrong.

Attachment #8737924 - Flags: review?(dvander)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 22

•

9 years ago

(In reply to David Anderson [:dvander] from comment #21) > > + // If the send failed, then this object must already have had ActorDestroy > > Is this always true? I thought that ActorDestroy was posted to the event > loop In the generated ipdl code (e.g. objdir/ipc/ipdl/PAPZParent.cpp) I see ActorDestroy being called directly during IPC message handling, not getting posted to the event loop. I'm not sure if that's what you mean. > so if the Send fails for some other reason, it might not be called > yet. What other failure causes are there? If the send fails for other reasons, I'm assuming the channel is in a bad/error state anyway and is going to get torn down shortly. Will ActorDestroy still be called under those circumstances, and if so, do we know what the ActorDestroyReason will be? Because then I can keep a ref to the actor if the condition |CanSend() && !SendDestroy()| is true, and clear it when I get appropriate ActorDestroyReason. > But I could be wrong. So could I! :)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 23

•

9 years ago

Really I think the wonky part about all this is that the IPC code can call Recv__delete__ on an actor *after* it has already called ActorDestroy on it. That just seems wrong, and that's the crash I described in comment 15. The ActorDestroy in step 3 there was for NormalShutdown, and it was followed by a Recv__delete__.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 24

•

9 years ago

err I mean AbnormalShutdown. I think.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 25

•

9 years ago

Try push with another variation that might be a bit more robust: https://treeherder.mozilla.org/#/jobs?repo=try&revision=9258a8c63837

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 26

•

9 years ago

Attached patch Alternate patch (obsolete) (deleted) — Details — Splinter Review

Here's a slightly different fix - let me know if you prefer this one. The diff between the first approach and this one can be found at https://hg.mozilla.org/try/rev/7cf2492cc13e5ea2de8a1c489967f577c7ee1899 (try push is in the last comment).

Attachment #8741738 - Flags: review?(dvander)

David Anderson [:dvander] - inactive, e-mail if emergency

Comment 27

•

9 years ago

Comment on attachment 8741738 [details] [diff] [review] Alternate patch Review of attachment 8741738 [details] [diff] [review]: ----------------------------------------------------------------- I do like this one better, I hope it works - it's terrible we have to do any of this.

Attachment #8741738 - Flags: review?(dvander) → review+

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Updated

•

9 years ago

Attachment #8741738 - Flags: review?(peterv)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Updated

•

9 years ago

Attachment #8737924 - Attachment is obsolete: true

Attachment #8737924 - Flags: review?(peterv)

Pulsebot

Comment 28

•

9 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/6d973d2f1bae

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 29

•

9 years ago

This apparently caused some tests to leak like https://treeherder.mozilla.org/logviewer.html#?job_id=26026924&repo=mozilla-inbound Possibly only on Linux64? Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/6e6c5143feaa

Flags: needinfo?(bugmail.mozilla)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 30

•

9 years ago

For some reason I'm not able to reproduce the problem I was seeing before where the Recv__delete__ is called after the ActorDestroy. Without that problem the fix is relatively straightforward, just unconditionally release the ref in ActorDestroy. Try push with that seems to be pretty green so far. https://treeherder.mozilla.org/#/jobs?repo=try&revision=90aed9aeea7e&group_state=expanded

Flags: needinfo?(bugmail.mozilla)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 31

•

9 years ago

Attached patch Patch v3 (deleted) — Details — Splinter Review

Ok, that does seem to be looking good, and I can't reproduce the scenario I was seeing before. I'll wait for the few remaining jobs on the try push and then land this, carrying r=dvander (the only change in this version is that I took out the "aWhy != ActorDestroyReason::AbnormalShutdown" condition and made that code always execute).

Attachment #8741738 - Attachment is obsolete: true

Attachment #8741738 - Flags: review?(peterv)

Attachment #8742509 - Flags: review?(peterv)

Pulsebot

Comment 32

•

9 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/e9500cac731d

Carsten Book [:Tomcat]

Comment 33

•

9 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/e9500cac731d

Status: NEW → RESOLVED

Closed: 9 years ago

status-firefox48: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla48

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 34

•

9 years ago

I'm going to wontfix this for 47 because it carries some risk that I don't want to uplift to beta. It already didn't spend very much time on Nightly before it merged to aurora. And as far as I can tell the underlying issue that it fixes hasn't been reported anywhere else or shown up in crash-stats. Also e10s/APZ will not go to release in 47 so it should be ok to wontfix for 47.

status-firefox47: affected → wontfix

(no longer active)

Reporter

Comment 35

•

9 years ago

kats, did you test your fix with the patch for bug 1255121? Or should I do that separately before landing that patch?

Flags: needinfo?(bugmail.mozilla)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 36

•

9 years ago

I did test it with that patch locally but not on try. It's probably worth testing again just to make sure.

Flags: needinfo?(bugmail.mozilla)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 37

•

9 years ago

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #34) > And as far as I can tell the underlying issue > that it fixes hasn't been reported anywhere else or shown up in crash-stats. My crash-stats-fu is apparently horrible, because this appears to be the #1 topcrash on 47, bug 1268881. We should uplift this fix after checking crash-stats to make sure this actually fixes stuff and doesn't make anything else worse.

status-firefox47: wontfix → affected

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Updated

•

9 years ago

Blocks: 1268881

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 38

•

9 years ago

Comment on attachment 8742509 [details] [diff] [review] Patch v3 Approval Request Comment [Feature/regressing bug #]: bug 1020199 [User impact if declined]: lotsa crashes, see bug 1268881. [Describe test coverage new/current, TreeHerder]: not a whole lot, but we have crash-stats data that indicates this fixed the crashes in bug 1268881. [Risks and why]: medium-ish risk, IPDL shutdown is hairy at the best of times. But I assume if something went wrong it would have shown up on crash-stats by now. [String/UUID change made/needed]: none

Attachment #8742509 - Flags: review?(peterv)

Attachment #8742509 - Flags: review+

Attachment #8742509 - Flags: approval-mozilla-beta?

Jim Mathies [:jimm]

Comment 39

•

9 years ago

We would very much like to get this in 47 beta build 2.

Flags: needinfo?(rkothari)

Ritu Kothari (:ritu) (Inactive, please n-i to RyanVM, jcristau, or pascal)

Comment 40

•

9 years ago

(In reply to Jim Mathies [:jimm] from comment #39) > We would very much like to get this in 47 beta build 2. Sure. I will approve the uplift if you think it helps! I was looking at the crash data from bug 1268881 and whether we've had any occurrences on 48.0a2 since 04-20 (when this landed in central) and I do see an instances from 04-28 https://crash-stats.mozilla.com/report/index/a3ac3947-5019-4d14-9876-f887c2160429 and one from 04-25 https://crash-stats.mozilla.com/report/index/2a7db262-7034-40c3-8c97-20b2b2160425. This is for the first of 3 crash signatures associated with bug 1268881. The data for 2nd and 3rd signature looks promising from 48.0a2.

Flags: needinfo?(rkothari)

Ritu Kothari (:ritu) (Inactive, please n-i to RyanVM, jcristau, or pascal)

Comment 41

•

9 years ago

Comment on attachment 8742509 [details] [diff] [review] Patch v3 Top crash on 47.0b1, Beta47+

Attachment #8742509 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 42

•

9 years ago

(In reply to Ritu Kothari (:ritu) from comment #40) > https://crash-stats.mozilla.com/report/index/a3ac3947-5019-4d14-9876- > f887c2160429 and one from 04-25 > https://crash-stats.mozilla.com/report/index/2a7db262-7034-40c3-8c97- > 20b2b2160425. These two crashes have the following in their ipc_channel_error: (msgtype=0xEC0003,name=PTCPSocket::Msg_Data) Processing error: message was deserialized, but the handler returned false (indicating failure) so it seems to have a different root cause. Thanks for the quick approval, I'll land the uplift shortly.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 43

•

9 years ago

https://hg.mozilla.org/releases/mozilla-beta/rev/caa76bbe680c

status-firefox47: affected → fixed

Ritu Kothari (:ritu) (Inactive, please n-i to RyanVM, jcristau, or pascal)

Comment 44

•

9 years ago

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #42) > (In reply to Ritu Kothari (:ritu) from comment #40) > > https://crash-stats.mozilla.com/report/index/a3ac3947-5019-4d14-9876- > > f887c2160429 and one from 04-25 > > https://crash-stats.mozilla.com/report/index/2a7db262-7034-40c3-8c97- > > 20b2b2160425. > > These two crashes have the following in their ipc_channel_error: > > (msgtype=0xEC0003,name=PTCPSocket::Msg_Data) Processing error: message was > deserialized, but the handler returned false (indicating failure) > > so it seems to have a different root cause. > > Thanks for the quick approval, I'll land the uplift shortly. Ok. That's good to know. I am glad you got a chance to review those two reports. Awesome job all around flagging this one as a top crasher and finding a good fix to uplift! :)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 46

•

9 years ago

Doesn't look like beta is liking the uplift, there's a bunch of orange :(

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 47

•

9 years ago

Backed out while I investigate: https://hg.mozilla.org/releases/mozilla-beta/rev/259d41769207

status-firefox47: fixed → affected

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 48

•

9 years ago

So there's three paths we can follow here: 1) Back out bug 1020199 from 47. Although the patch is conceptually not too complex, it's a monster patch in terms of lines touched. Still, I created a backout patch and pushed it to try based on top of current 47 beta [1]. 2) Bisect through the stuff that landed on 48 before this patch (pushlog at [2]) to see which of those patches fixed the OS X crashes. If we can find it, and it's upliftable, then we just need to uplift that along with this and we're good. I've started doing this bisection via try pushes ([3], [4]) and will continue as I can over the weekend. 3) Investigate the crashes in situ (they are reproducible on a local beta OS X build) and find some fix for it. Given my poor understanding of IPC/shutdown code this seems pretty hard to do, but if somebody else is willing to do so please go right ahead. [1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=02e8416c0deb [2] http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=552ef9bae233&tochange=e9500cac731d [3] https://treeherder.mozilla.org/#/jobs?repo=try&revision=d23741cc3a5b [4] https://treeherder.mozilla.org/#/jobs?repo=try&revision=9bf652334ad6

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 49

•

9 years ago

Quick update: 1) The backout try push looks mostly clean, there's some failures but they look unrelated. 2) I've narrowed down the range to http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=e17ded471c3f&tochange=e7a1c3636af1 so far, and am continuing to bisect.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 50

•

9 years ago

Narrowed the range further to http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=95e61ede373d&tochange=a57dc2f4a6f6, so I'm pretty sure it's bug 1219296 which is responsible. It moves CSS scroll snapping into the compositor and eliminates a RequestFlingSnap IPC message, which I presume is the thing that's causing the crash. I rebased those patches onto beta, try push at https://treeherder.mozilla.org/#/jobs?repo=try&revision=444d4e11fab6

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 51

•

9 years ago

That try push looks ok. I'll request uplift on bug 1219296.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 52

•

9 years ago

Ritu, I flagged bug 1219296 and its follow-up bug 1267471 for uplift to beta. Given the options I think uplifting those two bugs along with this one to beta is probably the safest approach. An alternate approach, as I said in comment 48, would be to back out bug 1020199 from 47, but I think that would be more risky (you can see how big that patch is). I can also try to find a "smaller" version of bug 1219296 for beta, but it's unlikely I can do that before the Monday deadline.

Flags: needinfo?(rkothari)

Jim Mathies [:jimm]

Comment 53

•

9 years ago

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #51) > That try push looks ok. I'll request uplift on bug 1219296. Thanks kats!

Ritu Kothari (:ritu) (Inactive, please n-i to RyanVM, jcristau, or pascal)

Comment 54

•

9 years ago

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #52) > Ritu, I flagged bug 1219296 and its follow-up bug 1267471 for uplift to > beta. Given the options I think uplifting those two bugs along with this one > to beta is probably the safest approach. > > An alternate approach, as I said in comment 48, would be to back out bug > 1020199 from 47, but I think that would be more risky (you can see how big > that patch is). I can also try to find a "smaller" version of bug 1219296 > for beta, but it's unlikely I can do that before the Monday deadline. Yup, if this is the least risky option to take to address the top crash, let's do it. I am glad this is e10s specific only so the overall impact to 47 release which will ship with e10s off by default should be minimal.

Flags: needinfo?(rkothari)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 55

•

9 years ago

Relanded on beta on top of the dependencies: https://hg.mozilla.org/releases/mozilla-beta/rev/595707e05f3d

status-firefox47: affected → fixed

Ryan VanderMeulen [:RyanVM]

Updated

•

8 years ago

Version: unspecified → 47 Branch

Virtual_ManPL [:Virtual] 🇵🇱 - (please needinfo? me - so I will see your comment/reply/question/etc.)

Updated

•

8 years ago

Blocks: 1265812

MozReview Request: Bug 1255823 - Add a two-step destruction process to PAPZ. r? 9 years ago Kartikaya Gupta (email:kats@mozilla.staktrace.com) (deleted), text/x-review-board-request		Details
Alternate patch 9 years ago Kartikaya Gupta (email:kats@mozilla.staktrace.com) (deleted), patch	dvander : review+	Details \| Diff \| Splinter Review
Patch v3 9 years ago Kartikaya Gupta (email:kats@mozilla.staktrace.com) (deleted), patch	kats : review+ ritu : approval-mozilla-beta+	Details \| Diff \| Splinter Review