1279514 - Crash in IPCError-browser | (msgtype=0xEC0003,name=PTCPSocket::Msg_Data) Processing error: message was deserialized, but the han

I don't understand the stacks in any of the reports I looked at. The TCPSocket data message is a message sent from child to parent; all of the stacks show content processes doing very non-TCPSocket things.

Josh Matthews [:jdm]

Comment 3

•

8 years ago

https://crash-stats.mozilla.com/report/index/74ab1a0b-da10-44f6-b673-93c3d2160610#allthreads does actually show thread 11 doing TCP-related stuff, which suggests that e10s webrtc is triggering these crashes. I still can't sort out where the IPC error gets reported though, since deserialization should be happening on the parent side (barring bug 1268900 which hasn't landed yet).

Dragana Damjanovic [:dragana]

Comment 4

•

8 years ago

This one shows dispatching of a message: https://crash-stats.mozilla.com/report/index/1674fb96-28c7-4ede-8a50-6462c2160505#allthreads it is coming from media/mtransport I will move it to media/.. so that someone from there could take a look, and tell us more what is happening.

Component: Networking → WebRTC: Networking

Nils Ohlmeier [:drno]

Assignee

Comment 5

•

8 years ago

(In reply to Josh Matthews [:jdm] from comment #3) > https://crash-stats.mozilla.com/report/index/74ab1a0b-da10-44f6-b673- > 93c3d2160610#allthreads does actually show thread 11 doing TCP-related > stuff, which suggests that e10s webrtc is triggering these crashes. So thread 11 in here is apparently trying to write to a TCP connection to the configured HTTP proxy. WebRTC uses (so far) TCP connections to talk to media relays (TURN servers), and in case a HTTP proxy is configured it will try to talk such a media relay through the HTTP proxy.

Nils Ohlmeier [:drno]

Assignee

Comment 6

•

8 years ago

(In reply to Josh Matthews [:jdm] from comment #3) > https://crash-stats.mozilla.com/report/index/74ab1a0b-da10-44f6-b673- > 93c3d2160610#allthreads does actually show thread 11 doing TCP-related > stuff, which suggests that e10s webrtc is triggering these crashes. On a second look: yes the WebRTC HTTP Proxy tunnel code just got it's callback that the e10s TCP socket to the proxy connected. But it only tries to log its initial log messages into our internal ICE ring log buffer. But it has not send any data yet. And the crashing thread 0 in this case does not show anything related to IPC TCP, but some JS garbage collection stuff. So I'm wondering what makes crash-stats believe this particular crash is related to this bug here.

Nils Ohlmeier [:drno]

Assignee

Comment 7

•

8 years ago

In this case: https://crash-stats.mozilla.com/report/index/627703d2-9aa3-4eb4-bf75-6aba42160618#allthreads Thread 11 actually tries to log an error messages about a failure to write to a TCP connection (this could be a direct connection to the media relay or through a HTTP proxy I think). Also in this case there was clearly a WebRTC call going on with audio and video decoding happening in multiple threads.

Nils Ohlmeier [:drno]

Assignee

Comment 8

•

8 years ago

After looking at a couple of the crashes it looks like quite a lot of them show the following: - one thread in webrtc::AudioDeviceWindowsCore::DoCaptureThread() - another thread doing webrtc::AudioDeviceWindowsCore::DoGetCaptureVolumeThread() - and a third thread doing webrtc::AudioDeviceWindowsCore::DoSetCaptureVolumeThread() Example: in https://crash-stats.mozilla.com/report/index/0e342207-2893-40e4-85f3-b13f22160616#allthreads threads 36, 37 and 38 appear to be blocked in the above functions. @padenot: is it normal/expected that one thread captures, while another appears to change the volume and yet another tries to read the volume?

Flags: needinfo?(padenot)

Jan-Ivar Bruaroey [:jib] (needinfo? me)

Updated

•

8 years ago

Rank: 15

Priority: -- → P1

Paul Adenot (:padenot)

Comment 9

•

8 years ago

It's weird code, but it's what is indented by webrtc.org developers. This is happening all the time when capturing using the webrtc.org code on Windows. This is going away soon, it's been replaced by better code as part of the full duplex project.

Paul Adenot (:padenot)

Updated

•

8 years ago

Flags: needinfo?(padenot)

Jim Mathies [:jimm]

Reporter

Updated

•

8 years ago

tracking-e10s: ? → +

Nils Ohlmeier [:drno]

Assignee

Updated

•

8 years ago

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1275216

Nils Ohlmeier [:drno]

Assignee

Comment 10

•

8 years ago

If Socorro is correctly classifying crashes in the same category (I have no idea how reliable that Sorocco feature is) then we appear to have at least one user report in bug 1275216 who claims that the combination of TCP+TURN+HTTP-Proxy causes this crash.

Maire Reavy [:mreavy]

Updated

•

8 years ago

Assignee: nobody → drno

Rank: 15 → 11

Nils Ohlmeier [:drno]

Assignee

Comment 11

•

8 years ago

So the answer is: this is the TCP filtering code from bug 1244926 which does it's job and prevents any connection where the initial packets are not ICE/STUN. In other words: having to do an HTTP CONNECT first before exchanging STUN packets was overlooked when we designed the TCP packet filter.

Nils Ohlmeier [:drno]

Assignee

Updated

•

8 years ago

Depends on: 1244926

Nils Ohlmeier [:drno]

Assignee

Comment 12

•

8 years ago

Two options come my mind: A) Based on the proxy settings of FF activate a different filter, which enforces that the first outgoing message is an HTTP CONNECT. Problem though is without enforcing that the destination is actually only the destination from the FF configuration it means someone could simply connect to any open proxy on the Internet and do what he wants. And enforcing the destination is tricky with DNS resolution giving us different results. B) If I'm not mistaken the Necko HTTP client actually lives in the parent process. So if we remove our HTTP proxy code and use the Necko client instead we could hopefully still enforce the filtering of initial STUN packets, once the Necko HTTP client got a success response from the HTTP proxy. In either case we probably need to disable the TCP filtering until we have a working solution to not have e10s clients crash once e10s gets released :-(

Nils Ohlmeier [:drno]

Assignee

Updated

•

8 years ago

Depends on: 1285318

Nils Ohlmeier [:drno]

Assignee

Comment 13

•

8 years ago

I'll leave this open with NI to check back on crash stats in a couple of weeks to confirm that we no longer see this crash from 48.0b7 on forward.

Flags: needinfo?(drno)

Nils Ohlmeier [:drno]

Assignee

Comment 14

•

8 years ago

As expected I don't see any crashes from 48.0b7 on forward.

Status: NEW → RESOLVED

Closed: 8 years ago

Flags: needinfo?(drno)

Resolution: --- → FIXED

BugBot [:suhaib / :marco/ :calixte]

Comment 15

•

8 years ago

Crash volume for signature 'IPCError-browser | (msgtype=0xEC0003,name=PTCPSocket::Msg_Data) Processing error: message was deserialized, but the han': - nightly (version 50): 0 crashes from 2016-06-06. - aurora (version 49): 0 crashes from 2016-06-07. - beta (version 48): 715 crashes from 2016-06-06. - release (version 47): 0 crashes from 2016-05-31. - esr (version 45): 0 crashes from 2016-04-07. Crash volume on the last weeks: W. N-1 W. N-2 W. N-3 W. N-4 W. N-5 W. N-6 W. N-7 - nightly 0 0 0 0 0 0 0 - aurora 0 0 0 0 0 0 0 - beta 26 43 103 120 122 99 126 - release 0 0 0 0 0 0 0 - esr 0 0 0 0 0 0 0 Affected platforms: Windows, Mac OS X, Linux

status-firefox48: --- → affected

Bugzilla

Crash in IPCError-browser | (msgtype=0xEC0003,name=PTCPSocket::Msg_Data) Processing error: message was deserialized, but the han

Categories

(Core :: WebRTC: Networking, defect, P1)

Tracking

()

People

(Reporter: jimm, Assigned: drno)

References

Details

(Keywords: crash)

Crash Data

Security

(public)

User Story

Description

Comment 1

Updated

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Updated

Comment 9

Updated

Updated

Updated

Comment 10

Updated

Comment 11

Updated

Comment 12

Updated

Comment 13

Comment 14

Comment 15