Closed
Bug 1279514
Opened 8 years ago
Closed 8 years ago
Crash in IPCError-browser | (msgtype=0xEC0003,name=PTCPSocket::Msg_Data) Processing error: message was deserialized, but the han
Categories
(Core :: WebRTC: Networking, defect, P1)
Tracking
()
RESOLVED
FIXED
People
(Reporter: jimm, Assigned: drno)
References
Details
(Keywords: crash)
Crash Data
https://crash-stats.mozilla.com/report/index/e38e9793-07e3-40b1-a401-eacd72160609
Pretty high up content process crasher in beta 48 build 1.
Reporter | ||
Comment 1•8 years ago
|
||
Reporter | ||
Updated•8 years ago
|
Blocks: e10s-crashes
tracking-e10s:
--- → ?
Comment 2•8 years ago
|
||
I don't understand the stacks in any of the reports I looked at. The TCPSocket data message is a message sent from child to parent; all of the stacks show content processes doing very non-TCPSocket things.
Comment 3•8 years ago
|
||
https://crash-stats.mozilla.com/report/index/74ab1a0b-da10-44f6-b673-93c3d2160610#allthreads does actually show thread 11 doing TCP-related stuff, which suggests that e10s webrtc is triggering these crashes. I still can't sort out where the IPC error gets reported though, since deserialization should be happening on the parent side (barring bug 1268900 which hasn't landed yet).
Comment 4•8 years ago
|
||
This one shows dispatching of a message:
https://crash-stats.mozilla.com/report/index/1674fb96-28c7-4ede-8a50-6462c2160505#allthreads
it is coming from media/mtransport
I will move it to media/.. so that someone from there could take a look, and tell us more what is happening.
Component: Networking → WebRTC: Networking
Assignee | ||
Comment 5•8 years ago
|
||
(In reply to Josh Matthews [:jdm] from comment #3)
> https://crash-stats.mozilla.com/report/index/74ab1a0b-da10-44f6-b673-
> 93c3d2160610#allthreads does actually show thread 11 doing TCP-related
> stuff, which suggests that e10s webrtc is triggering these crashes.
So thread 11 in here is apparently trying to write to a TCP connection to the configured HTTP proxy. WebRTC uses (so far) TCP connections to talk to media relays (TURN servers), and in case a HTTP proxy is configured it will try to talk such a media relay through the HTTP proxy.
Assignee | ||
Comment 6•8 years ago
|
||
(In reply to Josh Matthews [:jdm] from comment #3)
> https://crash-stats.mozilla.com/report/index/74ab1a0b-da10-44f6-b673-
> 93c3d2160610#allthreads does actually show thread 11 doing TCP-related
> stuff, which suggests that e10s webrtc is triggering these crashes.
On a second look: yes the WebRTC HTTP Proxy tunnel code just got it's callback that the e10s TCP socket to the proxy connected. But it only tries to log its initial log messages into our internal ICE ring log buffer. But it has not send any data yet.
And the crashing thread 0 in this case does not show anything related to IPC TCP, but some JS garbage collection stuff. So I'm wondering what makes crash-stats believe this particular crash is related to this bug here.
Assignee | ||
Comment 7•8 years ago
|
||
In this case: https://crash-stats.mozilla.com/report/index/627703d2-9aa3-4eb4-bf75-6aba42160618#allthreads
Thread 11 actually tries to log an error messages about a failure to write to a TCP connection (this could be a direct connection to the media relay or through a HTTP proxy I think).
Also in this case there was clearly a WebRTC call going on with audio and video decoding happening in multiple threads.
Assignee | ||
Comment 8•8 years ago
|
||
After looking at a couple of the crashes it looks like quite a lot of them show the following:
- one thread in webrtc::AudioDeviceWindowsCore::DoCaptureThread()
- another thread doing webrtc::AudioDeviceWindowsCore::DoGetCaptureVolumeThread()
- and a third thread doing webrtc::AudioDeviceWindowsCore::DoSetCaptureVolumeThread()
Example: in https://crash-stats.mozilla.com/report/index/0e342207-2893-40e4-85f3-b13f22160616#allthreads threads 36, 37 and 38 appear to be blocked in the above functions.
@padenot: is it normal/expected that one thread captures, while another appears to change the volume and yet another tries to read the volume?
Flags: needinfo?(padenot)
Updated•8 years ago
|
Rank: 15
Priority: -- → P1
Comment 9•8 years ago
|
||
It's weird code, but it's what is indented by webrtc.org developers. This is happening all the time when capturing using the webrtc.org code on Windows. This is going away soon, it's been replaced by better code as part of the full duplex project.
Updated•8 years ago
|
Flags: needinfo?(padenot)
Reporter | ||
Updated•8 years ago
|
Assignee | ||
Updated•8 years ago
|
Assignee | ||
Comment 10•8 years ago
|
||
If Socorro is correctly classifying crashes in the same category (I have no idea how reliable that Sorocco feature is) then we appear to have at least one user report in bug 1275216 who claims that the combination of TCP+TURN+HTTP-Proxy causes this crash.
Updated•8 years ago
|
Assignee: nobody → drno
Rank: 15 → 11
Assignee | ||
Comment 11•8 years ago
|
||
So the answer is: this is the TCP filtering code from bug 1244926 which does it's job and prevents any connection where the initial packets are not ICE/STUN.
In other words: having to do an HTTP CONNECT first before exchanging STUN packets was overlooked when we designed the TCP packet filter.
Assignee | ||
Comment 12•8 years ago
|
||
Two options come my mind:
A) Based on the proxy settings of FF activate a different filter, which enforces that the first outgoing message is an HTTP CONNECT. Problem though is without enforcing that the destination is actually only the destination from the FF configuration it means someone could simply connect to any open proxy on the Internet and do what he wants. And enforcing the destination is tricky with DNS resolution giving us different results.
B) If I'm not mistaken the Necko HTTP client actually lives in the parent process. So if we remove our HTTP proxy code and use the Necko client instead we could hopefully still enforce the filtering of initial STUN packets, once the Necko HTTP client got a success response from the HTTP proxy.
In either case we probably need to disable the TCP filtering until we have a working solution to not have e10s clients crash once e10s gets released :-(
Assignee | ||
Comment 13•8 years ago
|
||
I'll leave this open with NI to check back on crash stats in a couple of weeks to confirm that we no longer see this crash from 48.0b7 on forward.
Flags: needinfo?(drno)
Assignee | ||
Comment 14•8 years ago
|
||
As expected I don't see any crashes from 48.0b7 on forward.
Status: NEW → RESOLVED
Closed: 8 years ago
Flags: needinfo?(drno)
Resolution: --- → FIXED
Comment 15•8 years ago
|
||
Crash volume for signature 'IPCError-browser | (msgtype=0xEC0003,name=PTCPSocket::Msg_Data) Processing error: message was deserialized, but the han':
- nightly (version 50): 0 crashes from 2016-06-06.
- aurora (version 49): 0 crashes from 2016-06-07.
- beta (version 48): 715 crashes from 2016-06-06.
- release (version 47): 0 crashes from 2016-05-31.
- esr (version 45): 0 crashes from 2016-04-07.
Crash volume on the last weeks:
W. N-1 W. N-2 W. N-3 W. N-4 W. N-5 W. N-6 W. N-7
- nightly 0 0 0 0 0 0 0
- aurora 0 0 0 0 0 0 0
- beta 26 43 103 120 122 99 126
- release 0 0 0 0 0 0 0
- esr 0 0 0 0 0 0 0
Affected platforms: Windows, Mac OS X, Linux
status-firefox48:
--- → affected
You need to log in
before you can comment on or make changes to this bug.
Description
•