823056 - Intermittent hang in test_peerConnection_basicAudio.html | Failure: 'application timed out after 330 seconds with no output'

Reporter

Description

•

12 years ago

With the last merge from m-c to alder the peer connection tests are starting to hang and crash on Linux based platforms: https://tbpl.mozilla.org/php/getParsedLog.php?id=18062650&tree=Alder#error0 52 INFO TEST-START | /tests/dom/media/tests/mochitest/test_peerConnection_basicAudio.html ++DOMWINDOW == 14 (0x4008a18) [serial = 19] [outer = 0x534b5d0] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80004003: file ../../../../intl/uconv/src/nsCharsetConverterManager.cpp, line 301 WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80004003: file ../../../../intl/uconv/src/nsCharsetConverterManager.cpp, line 301 TEST-UNEXPECTED-FAIL | /tests/dom/media/tests/mochitest/test_peerConnection_basicAudio.html | application timed out after 330 seconds with no output args: ['/home/cltbld/talos-slave/test/build/bin/screentopng'] INFO | automation.py | Application ran for: 0:05:41.625900 INFO | automation.py | Reading PID log: /tmp/tmp0Lr3kgpidlog PROCESS-CRASH | /tests/dom/media/tests/mochitest/test_peerConnection_basicAudio.html | application crashed [@ libpthread-2.11.so + 0xdc44] Crash dump filename: /tmp/tmpNqLUxy/minidumps/7f3fa034-3d37-b72b-7dc340b5-0e6b817d.dmp Operating system: Linux 0.0.0 Linux 2.6.31.5-127.fc12.x86_64 #1 SMP Sat Nov 7 21:11:14 EST 2009 x86_64 CPU: amd64 family 6 model 23 stepping 10 2 CPUs Crash reason: SIGABRT Crash address: 0x1f400000880 Thread 0 (crashed) 0 libpthread-2.11.so + 0xdc44 rbx = 0x00000000036d5d00 r12 = 0x00007fffbbf42740 r13 = 0x0000000000000000 r14 = 0x00007fffbbf429e0 r15 = 0x00007f2fc2ec4b26 rip = 0x00000034d360dc44 rsp = 0x00007fffbbf42690 rbp = 0x00007fffbbf42760 Found by: given as instruction pointer in context 1 libpthread-2.11.so + 0x8f4a rip = 0x00000034d3608f4b rsp = 0x00007fffbbf426a8 rbp = 0x00007fffbbf42760 Found by: stack scanning 2 libxul.so!mozilla::BlockingResourceBase::CheckAcquire(mozilla::CallStack const&) [BlockingResourceBase.cpp : 107 + 0x8] rip = 0x00007f2fc2072f46 rsp = 0x00007fffbbf426b0 rbp = 0x00007fffbbf42760 Found by: stack scanning 3 libxul.so!CSF::CC_SIPCCService::notifyDeviceEventObservers(ccapi_device_event_e, linked_ptr<CSF::CC_Device>, linked_ptr<CSF::CC_DeviceInfo>) [CC_SIPCCService.cpp : 717 + 0x4] rbx = 0x0000000002a4e060 r12 = 0x00007fffbbf42790 rip = 0x00007f2fc250f089 rsp = 0x00007fffbbf42770 rbp = 0x00007fffbbf42800 Found by: call frame info 4 libxul.so!CSF::CC_SIPCCService::onDeviceEvent(ccapi_device_event_e, unsigned int, cc_device_info_t_*) [CC_SIPCCService.cpp : 607 + 0x22] rbx = 0x00007fffbbf42980 r12 = 0x00007fffbbf42990 r13 = 0x0000000000000000 r14 = 0x00007fffbbf429e0 r15 = 0x00007f2fc2ec4b26 rip = 0x00007f2fc250f4b5 rsp = 0x00007fffbbf42810 rbp = 0x00007fffbbf42a30 Found by: call frame info 5 libxul.so!ccsnap_gen_deviceEvent [ccapi_snapshot.c : 414 + 0xd] rbx = 0x0000000004763050 r12 = 0x0000000000000000 r13 = 0x0000000000000000 r14 = 0x00000034e08fe158 r15 = 0x00007fffbbf431a0 rip = 0x00007f2fc25241b9 rsp = 0x00007fffbbf42a40 rbp = 0x00007fffbbf42a60 Found by: call frame info 6 libxul.so!parse_setup_properties [ccapi_config.c : 59 + 0x8] rbx = 0x00000000041f76e8 r12 = 0x00000034e08fe158 r13 = 0x00000000036d5dc8 r14 = 0x00000034e08fe158 r15 = 0x00007fffbbf431a0 rip = 0x00007f2fc252175c rsp = 0x00007fffbbf42a70 rbp = 0x00007fffbbf42a90 Found by: call frame info 7 libxul.so!CCAPI_Start_response [ccapi_config.c : 41 + 0x10] rbx = 0x00000000041f76e8 r12 = 0x0000000000000000

Henrik Skupin [:whimboo][⌚️UTC+2]

Reporter

Comment 1

•

12 years ago

Sounds critical so we should better make it s-s.

Group: core-security

Benjamin Smedberg

Comment 2

•

12 years ago

Nothing s-s about this, it's just a deadlock which is then being aborted by the test harness (SIGABRT).

Group: core-security

Henrik Skupin [:whimboo][⌚️UTC+2]

Reporter

Comment 3

•

12 years ago

Looks like it's only happening on 64bit Linux and with a debug build.

Hardware: All → x86_64

Henrik Skupin [:whimboo][⌚️UTC+2]

Reporter

Updated

•

12 years ago

Summary: Intermittent hang and crash in test_peerConnection_basicAudio.html | Failure: 'application timed out after 330 seconds with no output' and 'application crashed' [@ libpthread-2.11.so + 0xdc44] → [Linux64 debug] Intermittent hang in test_peerConnection_basicAudio.html | Failure: 'application timed out after 330 seconds with no output'

Maire Reavy [:mreavy]

Updated

•

12 years ago

Assignee: nobody → adam

Priority: -- → P1

Whiteboard: [WebRTC][automation-blocked] → [WebRTC][automation-blocked][blocking-webrtc+]

Robert Kaiser

Updated

•

12 years ago

Crash Signature: [@ libpthread-2.11.so + 0xdc44] → [@ libpthread-2.11.so@0xdc44]

Henrik Skupin [:whimboo][⌚️UTC+2]

Reporter

Updated

•

12 years ago

Crash Signature: [@ libpthread-2.11.so@0xdc44]

Randell Jesup [:jesup] (needinfo me)

Updated

•

12 years ago

Whiteboard: [WebRTC][automation-blocked][blocking-webrtc+] → [WebRTC][blocking-webrtc+][automation-blocked][blocking-webrtc+]

Randell Jesup [:jesup] (needinfo me)

Updated

•

12 years ago

Whiteboard: [WebRTC][blocking-webrtc+][automation-blocked][blocking-webrtc+] → [WebRTC][automation-blocked][blocking-webrtc+]

Henrik Skupin [:whimboo][⌚️UTC+2]

Reporter

Comment 4

•

12 years ago

Also happens for x86 builds now: https://tbpl.mozilla.org/php/getParsedLog.php?id=18177793&tree=Alder Can we get this prioritized to get fixed cause it early aborts our testrun?

Hardware: x86_64 → All

Summary: [Linux64 debug] Intermittent hang in test_peerConnection_basicAudio.html | Failure: 'application timed out after 330 seconds with no output' → [Linux debug] Intermittent hang in test_peerConnection_basicAudio.html | Failure: 'application timed out after 330 seconds with no output'

Comment hidden (Legacy TBPL/Treeherder Robot)

Henrik Skupin [:whimboo][⌚️UTC+2]

Reporter

Updated

•

12 years ago

Summary: [Linux debug] Intermittent hang in test_peerConnection_basicAudio.html | Failure: 'application timed out after 330 seconds with no output' → Intermittent hang in test_peerConnection_basicAudio.html | Failure: 'application timed out after 330 seconds with no output'

Eric Rescorla (:ekr)

Comment 7

•

12 years ago

These tbpls date from mid-day 12/21. Bug 820102 was merged to m-c on 12/24. Please verify that this is still an issue.

Henrik Skupin [:whimboo][⌚️UTC+2]

Reporter

Comment 8

•

12 years ago

Randell, would have to do a merge from m-c to alder.

Flags: needinfo?(rjesup)

Randell Jesup [:jesup] (needinfo me)

Comment 9

•

12 years ago

merge done For reference: "cd m-c; hg pull -u; cd ../alder; hg pull -u; hg pull ../m-c; hg merge" Deal with any conflicts during the merge; generally there are none. "hg commit -m 'Merge m-c to alder'; hg push alder" In my .hgrc, I have: [merge-tools] emacs.args = -q --eval "(ediff-merge-with-ancestor \"$local\" \"$other\" \"$base\" nil \"$output\")" But that's not necessary.

Flags: needinfo?(rjesup)

Eric Rescorla (:ekr)

Comment 10

•

12 years ago

OK, these are in fact still failing. However, since they work on my machine, it is not possible to diagnose. Please turn up the logging (e.g., NSPR_LOG_MODULES=mediapipeline:5,ikran:5) and re-run

Eric Rescorla (:ekr)

Comment 11

•

12 years ago

Actually, wait: these errors are all TEST_UNEXPECTED_PASS. The problem would appear to be that we are missing the latest updates from m-i.

Comment hidden (Legacy TBPL/Treeherder Robot)

Henrik Skupin [:whimboo][⌚️UTC+2]

Reporter

Comment 13

•

12 years ago

I'm not able to get this test hanging on my local machine. I'm running a recent debug build from m-c on 10.7, so it matches the platform as given in the last comment. But looks like that something is still different.

Eric Rescorla (:ekr)

Comment 14

•

12 years ago

What even is it waiting for?

Henrik Skupin [:whimboo][⌚️UTC+2]

Reporter

Comment 15

•

12 years ago

Comment 12 is most likely bug 824851. So it doesn't belong to this bug. That said no platform shows the hang currently for the last merge from Randell. We should keep this bug open and watch further merges if it reproduces or not.

Henrik Skupin [:whimboo][⌚️UTC+2]

Reporter

Comment 16

•

12 years ago

I have seen this hang this morning when I was not connected to a known network. I filed the issue as bug 824919 and a possible fix might also fix our problem here.

Depends on: 824919

Henrik Skupin [:whimboo][⌚️UTC+2]

Reporter

Comment 17

•

12 years ago

With my patch on bug 824923 merged to m-c and then to alder we should get better information what's going on here.

Depends on: 824923

Maire Reavy [:mreavy]

Comment 18

•

12 years ago

(In reply to Henrik Skupin (:whimboo) [away 12/21 - 01/01] from comment #15) > Comment 12 is most likely bug 824851. So it doesn't belong to this bug. That > said no platform shows the hang currently for the last merge from Randell. > We should keep this bug open and watch further merges if it reproduces or > not. Removing this from the blocking list, but keeping the bug open to catch regressions. Henrik -- Once you test with your patch from bug 824923, if you find that this is happening frequently or blocking automation, we can promote this back to blocking.

Severity: critical → normal

Priority: P1 → P3

Whiteboard: [WebRTC][automation-blocked][blocking-webrtc+] → [WebRTC][blocking-webrtc-]

Randell Jesup [:jesup] (needinfo me)

Comment 19

•

12 years ago

If this is "we kill it after 330 sec", there's one bug (Failure to shut down SRTCP) which I've seen today when testing the peerconnection leak patch. This bug can cause it to fail to exit

Comment hidden (Legacy TBPL/Treeherder Robot)

Henrik Skupin [:whimboo][⌚️UTC+2]

Reporter

Comment 22

•

12 years ago

Randell, the last two tinderbox failures are bug 824851 and not this one.

Comment hidden (Legacy TBPL/Treeherder Robot)

Jason Smith [:jsmith]

Comment 26

•

12 years ago

Bumping this back to blocking - this is intermittent failure on m-c now. Let's clean this up.

Priority: P3 → P1

Whiteboard: [WebRTC][blocking-webrtc-] → [WebRTC][blocking-webrtc+]

Comment hidden (Legacy TBPL/Treeherder Robot)

Move locks to protect linked_ptr<> instances 12 years ago Adam Roach [:abr] (deleted), patch		Details \| Diff \| Splinter Review
Move locks to protect linked_ptr<> instances 12 years ago Adam Roach [:abr] (deleted), patch	ehugg : review+	Details \| Diff \| Splinter Review
Move locks to protect linked_ptr<> instances 12 years ago Adam Roach [:abr] (deleted), patch	abr : review+ jesup : checkin+	Details \| Diff \| Splinter Review