Closed Bug 1016352 Opened 11 years ago Closed 10 years ago

WebSocketChannelChild must be thread-safe

Categories

(Core :: Networking: WebSockets, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla33

People

(Reporter: baku, Assigned: baku)

References

Details

Attachments

(1 file, 2 obsolete files)

Currently it's not thread-safe and the porting of WebSocket to workers fails in several e10s mochitests.
Why are we refcounting it off the main thread? I don't think channels are usually threadsafe.
WebSocketChannel is thread-safe and all the callbacks are received in the target thread, specified by RetargetDeliveryTo().
Hey baku - I assume it's this test you're talking about: https://tbpl.mozilla.org/php/getParsedLog.php?id=39963172&tree=Try&full=1 ? Ignore the rest of this comment if you're referring to something else. I tried NS_DECL_THREADSAFE_ISUPPORTS in WebSocketChannelChild, but I get what looks like the same error as in that log. Looks like SendPWebSocketConstructor is passing a null PBrowserChild/TabChild to PNeckoChild::Write?? I'm not sure - I'll need to look at it more. AsyncOpen should be the same in both single process and e10s, though. Both should be running on the main thread.
> I tried NS_DECL_THREADSAFE_ISUPPORTS in WebSocketChannelChild, but I get > what looks like the same error as in that log. Looks like > SendPWebSocketConstructor is passing a null PBrowserChild/TabChild to > PNeckoChild::Write?? I'm not sure - I'll need to look at it more. Yes. This is the issue. And last time I debugged it, the issue was that. Would be great if you can take a look. Let me know if I can help.
Took a deeper look today: So, the crash I see is in PNeckoChild::Write(PBrowserChild ...) - the PBrowserChild arg is null, but Write does not want it to be. There are a couple of things here. 1. I compared SendPWebSocketConstructor with it's HTTP and FTP equivalents; HTTP is set to allow null PBrowserChild args, FTP (like WebSockets) is not. This can be fixed by adding a 'nullable' qualifier to PWebSocket. Since we're already checking MissingRequiredTabChild in WebSocketChannelChild::AsyncOpen, it leads me to believe that this should be ok. I'm thinking that it is ok to have it null for mochitests or whatever, basically in cases where network.disable.ipc.security is set to true. Jason, can you confirm this? 2. Even with 'nullable' added, the test still fails :) But much more of the test is able to run, so that's progress :) Out of interest, Baku, have you looked at the behavior outside of automated tests - is there still a crash in PNeckoChild::Write? Note: I also had NS_DECL_THREADSAFE_ISUPPORTS in WebSocketChannelChild for this debugging.
Flags: needinfo?(jduell.mcbugs)
Flags: needinfo?(amarchesini)
Why should it be ok to have it as null for mochitests? We use the PBrowserChild to check permissions for requests coming from content processes, and websockets are inherently associated with particular pages, unlike HTTP channels.
Ah, I see. Then we can't do that :) Does that mean that MissingRequiredTabChild is wrong? Or am I conflating UsingNeckoIPCSecurity and this nullable qualifier? Anyway, that is a secondary issue. So somehow we need to get a TabChild/PBrowserChild from the NotificationCallbacks of the WebSocketChannel, or from the loadgroup. Both of those seem to be set in WebSocketImpl. I change WebSocketImpl::GetInterface to also try QueryInterface on it's mParent object (class WebSocket), but I still don't see a TabChild returned here.
MissingRequiredTabChild provides us with an error from a more useful context than the actual IPC message encoding (which is where the presence of nullable in the protocol definition takes effect).
We only allow a null PBrowser for xpcshell tests, where there's no browser to pass. Websockets don't work without the DOM, so they're always mochitests, so as jdm mentions, it's ok that we disallow null. > Since we're already checking MissingRequiredTabChild in WebSocketChannelChild::AsyncOpen I don't understand how we're getting past this check (and calling the constructor with a null TabChild): http://mxr.mozilla.org/mozilla-central/source/netwerk/protocol/websocket/WebSocketChannelChild.cpp#381 I.e. shouldn't we have already returned from AsyncOpen with ILLEGAL_VALUE if the tabchild is null? This bug seems mistitled to me--AFACIT this has nothing to do with WebSocketChannelChild being thread-safe or not. It sounds like the client code isn't setting the .notificationCallbacks for the channel to something that provides a nsITabChild.
Flags: needinfo?(jduell.mcbugs)
> This bug seems mistitled to me--AFACIT this has nothing to do with > WebSocketChannelChild being thread-safe or not. It sounds like the client > code isn't setting the .notificationCallbacks for the channel to something > that provides a nsITabChild. What I see in my tests is this: nsresult WebSocketImpl::UpdateURI() { // Check for Redirections nsRefPtr<BaseWebSocketChannel> channel; channel = static_cast<BaseWebSocketChannel*>(mChannel.get()); 0:18.33 Hit MOZ_CRASH(WebSocketChannelChild not thread-safe) at /home/baku/Sources/m/websocket/src/netwerk/protocol/websocket/WebSocketChannelChild.cpp:24 0:18.55 mozilla::net::WebSocketChannelChild::AddRef() (/home/baku/Sources/m/websocket/src/netwerk/protocol/websocket/WebSocketChannelChild.cpp:24 (discriminator 1)) 0:18.72 nsRefPtr<mozilla::net::BaseWebSocketChannel>::assign_with_AddRef(mozilla::net::BaseWebSocketChannel*) (/home/baku/Sources/m/websocket/build/content/base/src/../../../dist/include/nsAutoPtr.h:841) 0:18.72 nsRefPtr<mozilla::net::BaseWebSocketChannel>::operator=(mozilla::net::BaseWebSocketChannel*) (/home/baku/Sources/m/websocket/build/content/base/src/../../../dist/include/nsAutoPtr.h:945) the same code works on the main-process.
Flags: needinfo?(amarchesini)
Attached patch child.patch (obsolete) (deleted) — Splinter Review
https://tbpl.mozilla.org/?tree=Try&rev=709a3a2d93f8 Ok, this patch makes try (almost) green except for test 33. But this is a separated issue and it's related to WebSocket to workers. I mean, if jduell is happy with this patch we can land it before bug 504553.
Attachment #8444879 - Flags: review?(jduell.mcbugs)
Comment on attachment 8444879 [details] [diff] [review] child.patch Review of attachment 8444879 [details] [diff] [review]: ----------------------------------------------------------------- necko changes look fine. You need a DOM peer (feel free to pick a different one if you like) for review the WebSocket.cpp bits--the JS incantations are a mystery to me :)
Attachment #8444879 - Flags: review?(jduell.mcbugs)
Attachment #8444879 - Flags: review?(bugs)
Attachment #8444879 - Flags: review+
Attached patch child.patch (obsolete) (deleted) — Splinter Review
Attachment #8444879 - Attachment is obsolete: true
Attachment #8444879 - Flags: review?(bugs)
Attached patch child.patch (deleted) — Splinter Review
Actually, the WebSocket changes are not needed for this patch and I moved them to the other bug. If the patch is green on try, I'll land it. https://tbpl.mozilla.org/?tree=Try&rev=e4a04f2cc786
Attachment #8444959 - Attachment is obsolete: true
Checkin-needed, green on try, and just necko changes.
Keywords: checkin-needed
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla33
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: