1098057 - Sync SendMessage calls from chrome may deadlock in the new plugin model

Assignee

Description

•

10 years ago

and the reverse of bug 1098055, going the other way, chrome -> sync sendmessage -> plugin -> sync ipc -> content -> sync ipc -> chrome I'm not sure how common this is.

Jim Mathies [:jimm]

Assignee

Updated

•

10 years ago

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1098055

Jim Mathies [:jimm]

Assignee

Comment 1

•

10 years ago

we also, afaict, have very few options for avoiding this. in the old model this would have been: chrome -> sync sendmessage -> plugin -> sync ipc -> chrome I don't see how we're avoiding this. ni'ing bsmedberg, curious if you've seen hangs like this?

Flags: needinfo?(benjamin)

Blake Kaplan (:mrbkap) (inactive)

Updated

•

10 years ago

tracking-e10s: ? → m5+

Jim Mathies [:jimm]

Assignee

Updated

•

10 years ago

Blocks: e10s-plugins

Benjamin Smedberg

Comment 2

•

10 years ago

We used to, but then we fixed a few things like UpdateWindow to stop sending sync messages and haven't seen reports of this recently. But none of our hang reporters would catch this, so we wouldn't have data on it if it were happening.

Flags: needinfo?(benjamin)

Jim Mathies [:jimm]

Assignee

Comment 3

•

10 years ago

(In reply to Benjamin Smedberg [:bsmedberg] from comment #2) > We used to, but then we fixed a few things like UpdateWindow to stop sending > sync messages and haven't seen reports of this recently. > > But none of our hang reporters would catch this, so we wouldn't have data on > it if it were happening. Why wouldn't the general chrome hang detector thread pick these up? The app shell dispatch loop wouldn't be pumping, which should trigger that detector.

Jim Mathies [:jimm]

Assignee

Updated

•

10 years ago

Assignee: nobody → jmathies

Jim Mathies [:jimm]

Assignee

Comment 4

•

10 years ago

I wonder about code like this - http://mxr.mozilla.org/mozilla-central/source/widget/windows/nsWindow.cpp#6404 I'm not sure if enumerating all the child windows in a sandboxed child flash process can trigger sync sends to the procedure in that process, but it certainly seems possible.

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 5

•

10 years ago

Just curious: why doesn't REQUIRE_DEFERRED_MESSAGE_PROTECTION protect against this?

Jim Mathies [:jimm]

Assignee

Comment 6

•

10 years ago

(In reply to Bill McCloskey (:billm) from comment #5) > Just curious: why doesn't REQUIRE_DEFERRED_MESSAGE_PROTECTION protect > against this? which connection are you curious about?

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 7

•

10 years ago

In comment 0, when the plug-in sends a sync message to content, would it process/defer the Windows message while waiting and thus process the message from chrome? Similarly, in comment 1, when the plugin sends a sync message to chrome, would it process/defer the Windows message from chrome?

Jim Mathies [:jimm]

Assignee

Comment 8

•

10 years ago

(In reply to Bill McCloskey (:billm) from comment #7) > In comment 0, when the plug-in sends a sync message to content, would it > process/defer the Windows message while waiting and thus process the message > from chrome? If the Windows message hit the plugin process message procedure while the plugin process was in the sync ipc send to content, the plugin <-> content channel deferral trap would catch it. However in the case where the native sent message triggers the sync ipc call to content, I think the system is vulnerable. > Similarly, in comment 1, when the plugin sends a sync message to chrome, > would it process/defer the Windows message from chrome? I think this is the same situation as above.

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 9

•

10 years ago

I see. I did not realize we could send sync messages while dispatching these Windows messages. I think I understand why Benjamin said that our hang detector wouldn't catch this. The plugin hang detector just looks for places where a sync IPC takes too long. In the case of comment 1, the too-long IPC is from plugin to chrome. The hang detector does support these sorts of hangs, but it was disabled in bug 683967 (by you!). It looks like we were thinking of enabling it again when bug 679238 was fixed, but that got WONTFIXed without much explanation. The situation is comment 0 is similar, although we'd probably need more work to get crash reports for the right processes. It seems like crash reports could be pretty helpful in sorting this situation out, although we would need support for paired minidumps on the crashstats website, which we need for other stuff as well.

Jim Mathies [:jimm]

Assignee

Comment 10

•

10 years ago

(In reply to Bill McCloskey (:billm) from comment #9) > I see. I did not realize we could send sync messages while dispatching these > Windows messages. > > I think I understand why Benjamin said that our hang detector wouldn't catch > this. The plugin hang detector just looks for places where a sync IPC takes > too long. In the case of comment 1, the too-long IPC is from plugin to > chrome. The hang detector does support these sorts of hangs, but it was > disabled in bug 683967 (by you!). It looks like we were thinking of enabling > it again when bug 679238 was fixed, but that got WONTFIXed without much > explanation. We could revisit self aborting in the content process when the browser isn't responding. I'm really not a fan of long timeouts when chrome is unresponsive, so I wonder if we can come up with some way of blocking this situation vs. hanging the browser and waiting for ~45 seconds, or whatever the current plugin hang timeout is.

Jim Mathies [:jimm]

Assignee

Comment 11

•

10 years ago

chrome -> sync sendmessage -> plugin -> sync ipc -> content -> sync ipc -> chrome For example, I think we might be able to detect that the plugin process main thread is in SendMessage before we sync ipc send from plugin to content here, and just fail that call.

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 12

•

10 years ago

(In reply to Jim Mathies [:jimm] from comment #10) > We could revisit self aborting in the content process when the browser isn't > responding. I'm really not a fan of long timeouts when chrome is > unresponsive, so I wonder if we can come up with some way of blocking this > situation vs. hanging the browser and waiting for ~45 seconds, or whatever > the current plugin hang timeout is. I was thinking we could use crash reports to figure out all the places where a Windows messages triggers a sync IPC. Then hopefully we could eliminate those cases somehow, perhaps by deferring the Windows message to the next turn of the event loop. It sounds like that's what Benjamin was suggesting we did in comment 2. I agree that such long timeouts are bad, but if we just use them as a way to gather information, I don't think it's so much of a problem.

Jim Mathies [:jimm]

Assignee

Comment 13

•

10 years ago

Kicking this out for re-triage - this bug is about a hypothetical deadlock, we have no idea if this actually occurs in the wild. This bug encompasses instrumenting using the crash reporter and a timeout to capture hung chrome processes. This seems more like a "want" vs. "must have", at least until we see evidence of this type of hang in the wild.

tracking-e10s: m5+ → ?

Brad Lassey [:blassey] (use needinfo?)

Updated

•

10 years ago

tracking-e10s: ? → +

Jim Mathies [:jimm]

Assignee

Comment 14

•

10 years ago

(In reply to Bill McCloskey (:billm) from comment #12) > (In reply to Jim Mathies [:jimm] from comment #10) > > We could revisit self aborting in the content process when the browser isn't > > responding. I'm really not a fan of long timeouts when chrome is > > unresponsive, so I wonder if we can come up with some way of blocking this > > situation vs. hanging the browser and waiting for ~45 seconds, or whatever > > the current plugin hang timeout is. > > I was thinking we could use crash reports to figure out all the places where > a Windows messages triggers a sync IPC. Then hopefully we could eliminate > those cases somehow, perhaps by deferring the Windows message to the next > turn of the event loop. It sounds like that's what Benjamin was suggesting > we did in comment 2. You're talking about all the places flash might call out from handling a native windowing event, right? So isn't your list basically the entire plugin api?

Flags: needinfo?(wmccloskey)

Jim Mathies [:jimm]

Assignee

Comment 15

•

10 years ago

(In reply to Jim Mathies [:jimm] from comment #14) > (In reply to Bill McCloskey (:billm) from comment #12) > > (In reply to Jim Mathies [:jimm] from comment #10) > > > We could revisit self aborting in the content process when the browser isn't > > > responding. I'm really not a fan of long timeouts when chrome is > > > unresponsive, so I wonder if we can come up with some way of blocking this > > > situation vs. hanging the browser and waiting for ~45 seconds, or whatever > > > the current plugin hang timeout is. > > > > I was thinking we could use crash reports to figure out all the places where > > a Windows messages triggers a sync IPC. Then hopefully we could eliminate > > those cases somehow, perhaps by deferring the Windows message to the next > > turn of the event loop. It sounds like that's what Benjamin was suggesting > > we did in comment 2. > > You're talking about all the places flash might call out from handling a > native windowing event, right? So isn't your list basically the entire > plugin api? Or alternatively on the chrome side the list would include any win32 api that potentially could process native events. This list is long and hard to identify. I think either way, we've been attacking this problem from this kind of angle for a while now and it doesn't seem to help. Regardless, if we had a nice way of detecting the and triggering a three way abort, we could collect some stats. I'd rather try experimenting with caching or failing plugin calls on the plugin side when we know we're in SendMessage in the plugin process, or maybe pass some information on to content about this such that content can return cached values or do chrome related things async.

Jim Mathies [:jimm]

Assignee

Updated

•

10 years ago

Depends on: 1160807

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 16

•

10 years ago

(In reply to Jim Mathies [:jimm] from comment #15) > (In reply to Jim Mathies [:jimm] from comment #14) > > (In reply to Bill McCloskey (:billm) from comment #12) > > > (In reply to Jim Mathies [:jimm] from comment #10) > > > > We could revisit self aborting in the content process when the browser isn't > > > > responding. I'm really not a fan of long timeouts when chrome is > > > > unresponsive, so I wonder if we can come up with some way of blocking this > > > > situation vs. hanging the browser and waiting for ~45 seconds, or whatever > > > > the current plugin hang timeout is. > > > > > > I was thinking we could use crash reports to figure out all the places where > > > a Windows messages triggers a sync IPC. Then hopefully we could eliminate > > > those cases somehow, perhaps by deferring the Windows message to the next > > > turn of the event loop. It sounds like that's what Benjamin was suggesting > > > we did in comment 2. > > > > You're talking about all the places flash might call out from handling a > > native windowing event, right? So isn't your list basically the entire > > plugin api? Yes, that's what I was suggesting. > I think either way, we've been attacking this problem from this kind of > angle for a while now and it doesn't seem to help. I think you might be overstating the case here, since we don't know whether this particular problem happens in practice, but I certainly won't argue against better hang detection. > Regardless, if we had a > nice way of detecting the and triggering a three way abort, we could collect > some stats. Yes, I agree. Perhaps each sync IPC call could include a list of processes that are transitively waiting for the result. If we send a sync IPC call to process P, and the list already includes process P, then we can crash everyone in the list.

Flags: needinfo?(wmccloskey)

Jim Mathies [:jimm]

Assignee

Updated

•

10 years ago

Blocks: e10s-hangs

Jim Mathies [:jimm]

Assignee

Updated

•

9 years ago

Status: NEW → RESOLVED

Closed: 9 years ago

Resolution: --- → WONTFIX

BMO Automation

Updated

•

3 years ago

Product: Core → Core Graveyard

Bugzilla

Sync SendMessage calls from chrome may deadlock in the new plugin model

Categories

(Core Graveyard :: Plug-ins, defect)

Tracking

(e10s+)

People

(Reporter: jimm, Assigned: jimm)

References

Details

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Updated

Updated

Comment 2

Comment 3

Updated

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Comment 13

Updated

Comment 14

Comment 15

Updated

Comment 16

Updated

Updated

Updated