Open Bug 1544654 Opened 5 years ago Updated 2 years ago

WPT on Geckoview frequently crashes in some webrtc tests

Categories

(Core :: WebRTC: Signaling, defect, P2)

Unspecified
Android
defect

Tracking

()

Tracking Status
firefox68 --- affected

People

(Reporter: KWierso, Assigned: bwc)

References

(Blocks 1 open bug)

Details

(Whiteboard: [geckoview:p2])

Attachments

(1 file)

I'm trying to get web-platform-tests running against Geckoview's test runner activity, and I'm seeing some frequent (but not necessarily permanent) crashes during these webrtc tests. I can try to mark it as expected crashes to green up the test runs, but figured I should file a bug to track actually fixing the crashes.

/webrtc/RTCRtpTransceiver.https.html
https://treeherder.mozilla.org/logviewer.html#?job_id=239860267&repo=try
https://treeherder.mozilla.org/logviewer.html#?job_id=239860299&repo=try

/webrtc/RTCPeerConnection-setRemoteDescription-tracks.https.html.ini
https://treeherder.mozilla.org/logviewer.html#?job_id=239453707&repo=try

In every case where webrtc/RTCRtpTransceiver.https.html crashed (really, the browser became non-responsive), webrtc/RTCPeerConnection-setRemoteDescription-tracks.https.html is failing because the harness was expecting a crash that didn't happen. However, the meta file says nothing about this expected crash, so this must be a local modification to that meta file because you have observed that test to crash.

Let's focus on why webrtc/RTCPeerConnection-setRemoteDescription-tracks.https.html is crashing. Do you have some try logs for this?

Component: WebRTC: Networking → WebRTC: Signaling
Flags: needinfo?(wkocher)

(In reply to Byron Campen [:bwc] from comment #1)

In every case where webrtc/RTCRtpTransceiver.https.html crashed (really, the browser became non-responsive), webrtc/RTCPeerConnection-setRemoteDescription-tracks.https.html is failing because the harness was expecting a crash that didn't happen. However, the meta file says nothing about this expected crash, so this must be a local modification to that meta file because you have observed that test to crash.

Ah, yeah, I've been working on updating the expectations. Since this is technically a new platform, I was just taking the current results as the expectation data, based on 4 or 5 retriggers of each wpt test chunk. Only tests that consistently had the same results for each of those retriggers would get updated expectations. I did a few pushes with those 4-5 retriggers, and a bunch of tests consistently crashed in one push, and then consistently passed in a following push, so the test runs consistently showed as orange in Treeherder, but the reasoning bounced back and forth between EXPECTED-OK and EXPECTED-CRASH. These failures must have been from one of the pushes where we were expecting crashes but ended up passing.

Let's focus on why webrtc/RTCPeerConnection-setRemoteDescription-tracks.https.html is crashing. Do you have some try logs for this?

I've added support for processing crash dumps, and did a new try push with none of the updated expectations. One of the failures is a crash in webrtc/RTCPeerConnection-setRemoteDescription-tracks.https.html.

Hope that helps!

Flags: needinfo?(wkocher) → needinfo?(docfaraday)

(In reply to Wes Kocher (:KWierso) from comment #2)

Ah, yeah, I've been working on updating the expectations. Since this is technically a new platform, I was just taking the current results as the expectation data, based on 4 or 5 retriggers of each wpt test chunk. Only tests that consistently had the same results for each of those retriggers would get updated expectations. I did a few pushes with those 4-5 retriggers, and a bunch of tests consistently crashed in one push, and then consistently passed in a following push, so the test runs consistently showed as orange in Treeherder, but the reasoning bounced back and forth between EXPECTED-OK and EXPECTED-CRASH. These failures must have been from one of the pushes where we were expecting crashes but ended up passing.

(And indeed, after I took that push's results and updated the expectations so that test was EXPECTED-CRASH, the following push had that test fail as UNEXPECTED-OK...)

We seem to be crashing due to a call to pipe failing:

[task 2019-04-19T02:10:13.443Z] 02:10:13 INFO - PROCESS-CRASH | mozrunner-startup | application crashed [@ mozalloc_abort]
[task 2019-04-19T02:10:13.443Z] 02:10:13 INFO - Crash dump filename: /tmp/tmpJgAKPe/003ca424-f6c2-74bd-e05f-a05831d7706e.dmp
[task 2019-04-19T02:10:13.443Z] 02:10:13 INFO - Operating system: Android
[task 2019-04-19T02:10:13.443Z] 02:10:13 INFO - 0.0.0 Linux 3.10.0+ #1 PREEMPT Thu Jan 5 00:46:30 UTC 2017 x86_64
[task 2019-04-19T02:10:13.443Z] 02:10:13 INFO - CPU: amd64
[task 2019-04-19T02:10:13.443Z] 02:10:13 INFO - family 6 model 2 stepping 3
[task 2019-04-19T02:10:13.443Z] 02:10:13 INFO - 1 CPU
[task 2019-04-19T02:10:13.443Z] 02:10:13 INFO -
[task 2019-04-19T02:10:13.443Z] 02:10:13 INFO - GPU: UNKNOWN
[task 2019-04-19T02:10:13.443Z] 02:10:13 INFO -
[task 2019-04-19T02:10:13.443Z] 02:10:13 INFO - Crash reason: SIGSEGV /SEGV_MAPERR
[task 2019-04-19T02:10:13.443Z] 02:10:13 INFO - Crash address: 0x0
[task 2019-04-19T02:10:13.443Z] 02:10:13 INFO - Process uptime: not available
[task 2019-04-19T02:10:13.443Z] 02:10:13 INFO -
[task 2019-04-19T02:10:13.443Z] 02:10:13 INFO - Thread 11 (crashed)
[task 2019-04-19T02:10:13.443Z] 02:10:13 INFO - 0 libmozglue.so!mozalloc_abort [mozalloc_abort.cpp:2ccc6648064315964dd23039ad28ebf7d9f82999 : 33 + 0x11]
[task 2019-04-19T02:10:13.443Z] 02:10:13 INFO - rax = 0x00007db36636e811 rdx = 0x0000000000000005
[task 2019-04-19T02:10:13.444Z] 02:10:13 INFO - rcx = 0x00007db3663fd4a0 rbx = 0x00007db366f62890
[task 2019-04-19T02:10:13.444Z] 02:10:13 INFO - rsi = 0x00007db366370080 rdi = 0x00007db366004476
[task 2019-04-19T02:10:13.444Z] 02:10:13 INFO - rbp = 0x00007db366f62880 rsp = 0x00007db366f62870
[task 2019-04-19T02:10:13.444Z] 02:10:13 INFO - r8 = 0x0000000000000000 r9 = 0x0000000000000006
[task 2019-04-19T02:10:13.444Z] 02:10:13 INFO - r10 = 0xfffffffffffff486 r11 = 0x0000000000000000
[task 2019-04-19T02:10:13.444Z] 02:10:13 INFO - r12 = 0x0000000000000000 r13 = 0x00007db366f62d58
[task 2019-04-19T02:10:13.444Z] 02:10:13 INFO - r14 = 0x00007db366f628f8 r15 = 0x00007db363ef3110
[task 2019-04-19T02:10:13.444Z] 02:10:13 INFO - rip = 0x00007db3662aae49
[task 2019-04-19T02:10:13.444Z] 02:10:13 INFO - Found by: given as instruction pointer in context
[task 2019-04-19T02:10:13.444Z] 02:10:13 INFO - 1 libmozglue.so!abort [mozalloc_abort.cpp:2ccc6648064315964dd23039ad28ebf7d9f82999 : 79 + 0x8]
[task 2019-04-19T02:10:13.444Z] 02:10:13 INFO - rbx = 0x00007db366f62890 rbp = 0x00007db366f628e0
[task 2019-04-19T02:10:13.444Z] 02:10:13 INFO - rsp = 0x00007db366f62890 r12 = 0x0000000000000000
[task 2019-04-19T02:10:13.444Z] 02:10:13 INFO - r13 = 0x00007db366f62d58 r14 = 0x00007db366f628f8
[task 2019-04-19T02:10:13.444Z] 02:10:13 INFO - r15 = 0x00007db363ef3110 rip = 0x00007db3662aae8c
[task 2019-04-19T02:10:13.445Z] 02:10:13 INFO - Found by: call frame info
[task 2019-04-19T02:10:13.445Z] 02:10:13 INFO - 2 libxul.so!rtc::FatalMessage::~FatalMessage() [checks.cc:2ccc6648064315964dd23039ad28ebf7d9f82999 : 69 + 0x5]
[task 2019-04-19T02:10:13.445Z] 02:10:13 INFO - rbx = 0x00007db37ff8fbf0 rbp = 0x00007db366f62920
[task 2019-04-19T02:10:13.445Z] 02:10:13 INFO - rsp = 0x00007db366f628f0 r12 = 0x0000000000000000
[task 2019-04-19T02:10:13.445Z] 02:10:13 INFO - r13 = 0x00007db366f62d58 r14 = 0x00007db366f628f8
[task 2019-04-19T02:10:13.445Z] 02:10:13 INFO - r15 = 0x00007db363ef3110 rip = 0x00007db3628a1c8f
[task 2019-04-19T02:10:13.445Z] 02:10:13 INFO - Found by: call frame info
[task 2019-04-19T02:10:13.445Z] 02:10:13 INFO - 3 libxul.so!rtc::TaskQueue::Impl::Impl(char const*, rtc::TaskQueue*, rtc::TaskQueue::Priority) [task_queue_libevent.cc:2ccc6648064315964dd23039ad28ebf7d9f82999 : 287 + 0x49]
[task 2019-04-19T02:10:13.445Z] 02:10:13 INFO - rbx = 0x00007db366f62938 rbp = 0x00007db366f62a70
[task 2019-04-19T02:10:13.445Z] 02:10:13 INFO - rsp = 0x00007db366f62930 r12 = 0x0000000000000000
[task 2019-04-19T02:10:13.445Z] 02:10:13 INFO - r13 = 0x00007db366f62d58 r14 = 0x00007db34d08a408
[task 2019-04-19T02:10:13.445Z] 02:10:13 INFO - r15 = 0x00007db363ef3110 rip = 0x00007db3628ab35a
[task 2019-04-19T02:10:13.445Z] 02:10:13 INFO - Found by: call frame info

https://searchfox.org/mozilla-central/rev/f46e2bf881d522a440b30cbf5cf8d76fc212eaf4/media/webrtc/trunk/webrtc/rtc_base/task_queue_libevent.cc#287

This seems like an OS bug (or limitation) that we're tripping over. dminor, what do you think?

Flags: needinfo?(docfaraday) → needinfo?(dminor)

Also, I note that this try run was e10s; maybe we're running into some sort of sandboxing problem? I guess that would not be intermittent...

Yes, I think this is a similar problem to Bug 1505509. In that bug, we ended up building up a large number of webrtc.org platform threads, enough to get intermittent OOMs on win32 just due to stack allocations. If the tests run to completion, the platform threads mostly end up being freed, so maybe we're just not closing connections often enough or the gc is not running often enough.

Flags: needinfo?(dminor)
Priority: -- → P2
Whiteboard: [geckoview]

Here's a log with a crash on a webrtc test that might actually have a useful crash stack: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=245897383&repo=autoland&lineNumber=5253

I can try to take a look at enabling debug geckoview wpt next week if that sounds helpful.

Flags: needinfo?(docfaraday)

That stack is something I can actually work with. Preserving the bit I need here just in case I don't get back to it next week:

[task 2019-05-11T01:28:53.358Z] 01:28:53 INFO - 1 libxul.so!mozilla::net::PUDPSocketChild::SendClose() [PUDPSocketChild.cpp: : 205 + 0xc]
[task 2019-05-11T01:28:53.358Z] 01:28:53 INFO - rbx = 0x0000728ee7e96500 rbp = 0x0000728ed6c50d30
[task 2019-05-11T01:28:53.358Z] 01:28:53 INFO - rsp = 0x0000728ed6c50d10 r12 = 0x0000728ee3cac480
[task 2019-05-11T01:28:53.358Z] 01:28:53 INFO - r13 = 0x00000000ffffffff r14 = 0x0000728ee6087eb0
[task 2019-05-11T01:28:53.358Z] 01:28:53 INFO - r15 = 0x0000728ee7126b80 rip = 0x0000728eed8c8ec2
[task 2019-05-11T01:28:53.358Z] 01:28:53 INFO - Found by: call frame info
[task 2019-05-11T01:28:53.358Z] 01:28:53 INFO - 2 libxul.so!mozilla::NrUdpSocketIpc::close_i() [nr_socket_prsock.cpp:cd771a0b9c075b615eb0f2ce1a439c6f744ba4b8 : 1556 + 0x5]
[task 2019-05-11T01:28:53.358Z] 01:28:53 INFO - rbx = 0x0000728edb871c00 rbp = 0x0000728ed6c50d50
[task 2019-05-11T01:28:53.358Z] 01:28:53 INFO - rsp = 0x0000728ed6c50d40 r12 = 0x0000728ee3cac480
[task 2019-05-11T01:28:53.358Z] 01:28:53 INFO - r13 = 0x00000000ffffffff r14 = 0x0000728edb871c00
[task 2019-05-11T01:28:53.358Z] 01:28:53 INFO - r15 = 0x0000728ee7126b80 rip = 0x0000728eeda72d4a
[task 2019-05-11T01:28:53.358Z] 01:28:53 INFO - Found by: call frame info

Flags: needinfo?(docfaraday)
OS: Unspecified → Android
Whiteboard: [geckoview] → [geckoview:p2]
Assignee: nobody → docfaraday

Still seeing crashes; is there some way to force these try runs to give me a stack?

Flags: needinfo?(wkocher)

Hrm, it should be printing out the stack. Maybe the Browser not responding, setting status to CRASH failure mode doesn't do that...

The logcat has some maybe-interesting things going on if you ctrl-f on Fatal error in /builds/worker/workspace/build/src/media/webrtc/trunk/webrtc/rtc_base/task_queue_libevent.cc

Flags: needinfo?(wkocher)

Yeah, that still doesn't tell me enough; there's a sigsegv, but the stack doesn't have symbols. Is there a bug open for that?

06-03 18:06:10.864 21875 21875 F DEBUG : signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x0
06-03 18:06:10.864 21875 21875 F DEBUG : rax 0000782e0d9e311b rbx 0000782e0e374d90 rcx 0000782e0da714a0 rdx 0000000000000002
06-03 18:06:10.864 21875 21875 F DEBUG : rsi 0000782e0d9e4b10 rdi 0000782e0d604479
06-03 18:06:10.864 21875 21875 F DEBUG : r8 0000000000000000 r9 0000000000000009 r10 fffffffffffff489 r11 0000000000000000
06-03 18:06:10.864 21875 21875 F DEBUG : r12 0000000000000000 r13 0000782e0e375088 r14 0000782e0e374df8 r15 0000782e0b4a40b6
06-03 18:06:10.864 21875 21875 F DEBUG : cs 0000000000000033 ss 000000000000002b
06-03 18:06:10.864 21875 21875 F DEBUG : rip 0000782e0d91ef27 rbp 0000782e0e374d80 rsp 0000782e0e374d70 eflags 0000000000000206
06-03 18:06:10.874 21875 21875 F DEBUG :
06-03 18:06:10.874 21875 21875 F DEBUG : backtrace:
06-03 18:06:10.874 21875 21875 F DEBUG : #00 pc 0000000000016f27 /data/app/org.mozilla.geckoview.test-1/lib/x86_64/libmozglue.so (offset 0x3b000)
06-03 18:06:10.874 21875 21875 F DEBUG : #01 pc 0000000000016f69 /data/app/org.mozilla.geckoview.test-1/lib/x86_64/libmozglue.so (offset 0x3b000)
06-03 18:06:10.874 21875 21875 F DEBUG : #02 pc 0000000002092150 /data/app/org.mozilla.geckoview.test-1/lib/x86_64/libxul.so (offset 0xf1000)
06-03 18:06:10.874 21875 21875 F DEBUG : #03 pc 000000000209b81b /data/app/org.mozilla.geckoview.test-1/lib/x86_64/libxul.so (offset 0xf1000)
06-03 18:06:10.874 21875 21875 F DEBUG : #04 pc 000000000209c351 /data/app/org.mozilla.geckoview.test-1/lib/x86_64/libxul.so (offset 0xf1000)
06-03 18:06:10.874 21875 21875 F DEBUG : #05 pc 000000000209c309 /data/app/org.mozilla.geckoview.test-1/lib/x86_64/libxul.so (offset 0xf1000)
06-03 18:06:10.874 21875 21875 F DEBUG : #06 pc 0000000001fabc65 /data/app/org.mozilla.geckoview.test-1/lib/x86_64/libxul.so (offset 0xf1000)
06-03 18:06:10.874 21875 21875 F DEBUG : #07 pc 0000000001fab831 /data/app/org.mozilla.geckoview.test-1/lib/x86_64/libxul.so (offset 0xf1000)
06-03 18:06:10.874 21875 21875 F DEBUG : #08 pc 00000000006b9298 /data/app/org.mozilla.geckoview.test-1/lib/x86_64/libxul.so (offset 0xf1000)
06-03 18:06:10.874 21875 21875 F DEBUG : #09 pc 00000000006b8e34 /data/app/org.mozilla.geckoview.test-1/lib/x86_64/libxul.so (offset 0xf1000)
06-03 18:06:10.874 21875 21875 F DEBUG : #10 pc 00000000006bb0ef /data/app/org.mozilla.geckoview.test-1/lib/x86_64/libxul.so (offset 0xf1000)
06-03 18:06:10.874 21875 21875 F DEBUG : #11 pc 0000000000ca567d /data/app/org.mozilla.geckoview.test-1/lib/x86_64/libxul.so (offset 0xf1000)
06-03 18:06:10.874 21875 21875 F DEBUG : #12 pc 00000000010bc04d /data/app/org.mozilla.geckoview.test-1/lib/x86_64/libxul.so (offset 0xf1000)
06-03 18:06:10.874 21875 21875 F DEBUG : #13 pc 0000000000002653 [anon:js-executable-memory:0000138903baf000]
06-03 18:06:10.874 21875 21875 F DEBUG : #14 pc 000000000006b4ef [anon:jemalloc:0000782e02400000]
06-03 18:06:10.874 21875 21875 F DEBUG : #15 pc 00000000000104de [anon:js-executable-memory:0000138903a4f000]
06-03 18:06:10.874 21875 21875 F DEBUG : #16 pc 0000000002b0c587 /data/app/org.mozilla.geckoview.test-1/lib/x86_64/libxul.so (offset 0xf1000)
06-03 18:06:10.874 21875 21875 F DEBUG : #17 pc 00000000023bbe67 /data/app/org.mozilla.geckoview.test-1/lib/x86_64/libxul.so (offset 0xf1000)
06-03 18:06:10.874 21875 21875 F DEBUG : #18 pc 00000000023d157a /data/app/org.mozilla.geckoview.test-1/lib/x86_64/libxul.so (offset 0xf1000)
06-03 18:06:10.874 21875 21875 F DEBUG : #19 pc 000000000243c5ad /data/app/org.mozilla.geckoview.test-1/lib/x86_64/libxul.so (offset 0xf1000)
06-03 18:06:10.874 21875 21875 F DEBUG : #20 pc 0000000002450b2a /data/app/org.mozilla.geckoview.test-1/lib/x86_64/libxul.so (offset 0xf1000)
06-03 18:06:10.874 21875 21875 F DEBUG : #21 pc 0000000000010743 [anon:js-executable-memory:0000138903baf000]
06-03 18:06:10.874 21875 21875 F DEBUG : #22 pc 00000000000e2577 [anon:jemalloc:0000782e00900000]
06-03 18:06:10.874 21875 21875 F DEBUG : #23 pc 000000000001cacc [anon:js-executable-memory:0000138903c1f000]
06-03 18:06:10.874 21875 21875 F DEBUG : #24 pc 00000000000104de [anon:js-executable-memory:0000138903a4f000]
06-03 18:06:10.874 21875 21875 F DEBUG : #25 pc 0000000002b0c587 /data/app/org.mozilla.geckoview.test-1/lib/x86_64/libxul.so (offset 0xf1000)
06-03 18:06:10.874 21875 21875 F DEBUG : #26 pc 00000000023bbe67 /data/app/org.mozilla.geckoview.test-1/lib/x86_64/libxul.so (offset 0xf1000)
06-03 18:06:10.874 21875 21875 F DEBUG : #27 pc 00000000023d157a /data/app/org.mozilla.geckoview.test-1/lib/x86_64/libxul.so (offset 0xf1000)
06-03 18:06:10.874 21875 21875 F DEBUG : #28 pc 000000000246fbb1 /data/app/org.mozilla.geckoview.test-1/lib/x86_64/libxul.so (offset 0xf1000)
06-03 18:06:10.874 21875 21875 F DEBUG : #29 pc 0000000002453661 /data/app/org.mozilla.geckoview.test-1/lib/x86_64/libxul.so (offset 0xf1000)
06-03 18:06:10.874 21875 21875 F DEBUG : #30 pc 00000000023d123c /data/app/org.mozilla.geckoview.test-1/lib/x86_64/libxul.so (offset 0xf1000)
06-03 18:06:10.874 21875 21875 F DEBUG : #31 pc 000000000278349e /data/app/org.mozilla.geckoview.test-1/lib/x86_64/libxul.so (offset 0xf1000)
06-03 18:06:10.874 21875 21875 F DEBUG : #32 pc 0000000000cbec25 /data/app/org.mozilla.geckoview.test-1/lib/x86_64/libxul.so (offset 0xf1000)
06-03 18:06:10.874 21875 21875 F DEBUG : #33 pc 000000000004034a /data/app/org.mozilla.geckoview.test-1/lib/x86_64/libxul.so (offset 0xf1000)
06-03 18:06:10.874 21875 21875 F DEBUG : #34 pc 0000000000037055 /data/app/org.mozilla.geckoview.test-1/lib/x86_64/libxul.so (offset 0xf1000)
06-03 18:06:10.874 21875 21875 F DEBUG : #35 pc 00000000000373c3 /data/app/org.mozilla.geckoview.test-1/lib/x86_64/libxul.so (offset 0xf1000)
06-03 18:06:10.874 21875 21875 F DEBUG : #36 pc 000000000062785b /data/app/org.mozilla.geckoview.test-1/lib/x86_64/libxul.so (offset 0xf1000)
06-03 18:06:10.874 21875 21875 F DEBUG : #37 pc 00000000000a346d /data/app/org.mozilla.geckoview.test-1/lib/x86_64/libxul.so (offset 0xf1000)
06-03 18:06:10.874 21875 21875 F DEBUG : #38 pc 00000000000a45d2 /data/app/org.mozilla.geckoview.test-1/lib/x86_64/libxul.so (offset 0xf1000)
06-03 18:06:10.874 21875 21875 F DEBUG : #39 pc 00000000004019e7 /data/app/org.mozilla.geckoview.test-1/lib/x86_64/libxul.so (offset 0xf1000)
06-03 18:06:10.874 21875 21875 F DEBUG : #40 pc 00000000003d9384 /data/app/org.mozilla.geckoview.test-1/lib/x86_64/libxul.so (offset 0xf1000)
06-03 18:06:10.874 21875 21875 F DEBUG : #41 pc 00000000017bfc90 /data/app/org.mozilla.geckoview.test-1/lib/x86_64/libxul.so (offset 0xf1000)
06-03 18:06:10.874 21875 21875 F DEBUG : #42 pc 00000000022d9dee /data/app/org.mozilla.geckoview.test-1/lib/x86_64/libxul.so (offset 0xf1000)
06-03 18:06:10.874 21875 21875 F DEBUG : #43 pc 00000000003d9384 /data/app/org.mozilla.geckoview.test-1/lib/x86_64/libxul.so (offset 0xf1000)
06-03 18:06:10.874 21875 21875 F DEBUG : #44 pc 00000000022d9c96 /data/app/org.mozilla.geckoview.test-1/lib/x86_64/libxul.so (offset 0xf1000)
06-03 18:06:10.874 21875 21875 F DEBUG : #45 pc 0000000000023027 /data/app/org.mozilla.geckoview.test-1/lib/x86_64/libmozglue.so (offset 0x3b000)
06-03 18:06:10.874 21875 21875 F DEBUG : #46 pc 000000000076de32 /data/app/org.mozilla.geckoview.test-1/oat/x86_64/base.odex (offset 0x72b000)
06-03 18:06:10.874 21875 21875 F DEBUG : #47 pc 0000000000901c4f <anonymous:0000782e0da76000>
06-03 18:06:10.874 21875 21875 F DEBUG : #48 pc 00000000002e7259 /system/lib64/libart.so (_ZN3art11interpreter30EnterInterpreterFromEntryPointEPNS_6ThreadEPKNS_7DexFile8CodeItemEPNS_11ShadowFrameE+105)

Flags: needinfo?(snorp)

We should be getting symbols, so I guess something is going south there. Maybe Geoff can help us?

Flags: needinfo?(snorp) → needinfo?(gbrown)

This crash is not generating any minidumps.

"Normal" crashes do generate minidumps on Android, in all test suites, including web-platform tests:

Flags: needinfo?(gbrown)

This crash is not generating any minidumps.

"Normal" crashes do generate minidumps on Android, in all test suites, including web-platform tests, and the crash reports have symbols:

https://treeherder.mozilla.org/#/jobs?repo=try&tier=1%2C2%2C3&revision=57dbf77ec0087a914767251c35163581f03c7c20

It looks like the logs have expired since comment 12, so I retriggered to generate some new failures.

https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=252487649&repo=try&lineNumber=5315
has logcat:
https://taskcluster-artifacts.net/U_ouwoxxRuKarGIz6wlKhg/0/public/test_info//logcat-emulator-5554.log

06-19 20:51:37.850 19848 19863 I Gecko   : [Child 19848: Main Thread]: I/signaling [main|sdp_config] sdp_config.c:86: SDP: Initialized config pointer: 0x7a52cb5763c0
06-19 20:51:37.850 19848 19863 I Gecko   : [Child 19848: Main Thread]: I/jsep [1560973897840000 (id=2147483838 url=https://web-platform.test:8443/webrtc/RTCRtpTransceiver.https.html)]: stable -> have-remote-offer
06-19 20:51:37.850 19848 19863 E rtc     : 
06-19 20:51:37.850 19848 19863 E rtc     : 
06-19 20:51:37.850 19848 19863 E rtc     : #
06-19 20:51:37.850 19848 19863 E rtc     : # Fatal error in /builds/worker/workspace/build/src/media/webrtc/trunk/webrtc/rtc_base/task_queue_libevent.cc, line 287
06-19 20:51:37.850 19848 19863 E rtc     : # last system error: 24
06-19 20:51:37.850 19848 19863 E rtc     : # Check failed: pipe(fds) == 0
06-19 20:51:37.850 19848 19863 E rtc     : # 
06-19 20:51:37.850 19848 19863 E rtc     : #
06-19 20:51:37.850 19848 19863 E Gecko   : mozalloc_abort: abort() called from :0x7a52e740f151 ()
--------- beginning of crash
06-19 20:51:37.850 19848 19863 F libc    : Fatal signal 11 (SIGSEGV), code 1, fault addr 0x0 in tid 19863 (Web Content)
06-19 20:51:37.850   997   997 W         : debuggerd: handling request: pid=19848 uid=10062 gid=10062 tid=19863

Is the minidump missing because of the way we abort? Related to crashing in the content process?

:froydnj -- Do you have an idea?

Flags: needinfo?(nfroyd)

Another example that looks very similar:

https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=252487689&repo=try&lineNumber=5303

https://taskcluster-artifacts.net/NcM4o_GOQoqOknPRmnrclA/0/public/test_info//logcat-emulator-5554.log

06-19 20:50:45.590 19813 19839 I Gecko   : [Parent 19813: Unnamed thread 0x73a7aa022570]: D/mtransport NrIceCtx static call to find local stun addresses
06-19 20:50:45.590 19873 19888 I Gecko   : [Child 19873: Main Thread]: I/signaling [main|sdp_config] sdp_config.c:86: SDP: Initialized config pointer: 0x73a78da7a6d0
06-19 20:50:45.590 19873 19888 I Gecko   : [Child 19873: Main Thread]: I/jsep [1560973845590000 (id=2147483838 url=https://web-platform.test:8443/webrtc/RTCRtpTransceiver.https.html)]: stable -> have-remote-offer
06-19 20:50:45.590 19873 19888 E rtc     : 
06-19 20:50:45.590 19873 19888 E rtc     : 
06-19 20:50:45.590 19873 19888 E rtc     : #
06-19 20:50:45.590 19873 19888 E rtc     : # Fatal error in /builds/worker/workspace/build/src/media/webrtc/trunk/webrtc/rtc_base/task_queue_libevent.cc, line 287
06-19 20:50:45.590 19873 19888 E rtc     : # last system error: 24
06-19 20:50:45.590 19873 19888 E rtc     : # Check failed: pipe(fds) == 0
06-19 20:50:45.590 19873 19888 E rtc     : # 
06-19 20:50:45.590 19873 19888 E rtc     : #
06-19 20:50:45.590 19873 19888 E Gecko   : mozalloc_abort: abort() called from :0x73a7a67e3151 ()
--------- beginning of crash
06-19 20:50:45.590 19873 19888 F libc    : Fatal signal 11 (SIGSEGV), code 1, fault addr 0x0 in tid 19888 (Web Content)
06-19 20:50:45.590   990   990 W         : debuggerd: handling request: pid=19873 uid=10062 gid=10062 tid=19888

This case looks different:

https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=252487648&repo=try&lineNumber=5309:

https://taskcluster-artifacts.net/dxdMVx5BQ6Wkb46nxeIM6Q/0/public/test_info//logcat-emulator-5554.log

06-19 20:51:34.944 19820 19837 I Gecko   : [Child 19820: Main Thread]: I/signaling [main|sdp_config] sdp_config.c:86: SDP: Initialized config pointer: 0x77c133051f20
06-19 20:51:34.944 19820 19837 I Gecko   : [Child 19820: Main Thread]: I/jsep [1560973894934133 (id=2147483838 url=https://web-platform.test:8443/webrtc/RTCRtpTransceiver.https.html)]: stable -> have-remote-offer
06-19 20:57:34.654 21903 21903 D AndroidRuntime: >>>>>> START com.android.internal.os.RuntimeInit uid 0 <<<<<<
06-19 20:57:34.654 21903 21903 D AndroidRuntime: CheckJNI is ON
06-19 20:57:34.664 21903 21903 W art     : Unexpected CPU variant for X86 using defaults: x86_64
06-19 20:57:34.664 21903 21903 D ICU     : No timezone override file found: /data/misc/zoneinfo/current/icu/icu_tzdata.dat
06-19 20:57:34.674 21903 21903 E memtrack: Couldn't load memtrack module (No such file or directory)
06-19 20:57:34.674 21903 21903 E android.os.Debug: failed to load memtrack module: -2
06-19 20:57:34.674 21903 21903 I Radio-JNI: register_android_hardware_Radio DONE
06-19 20:57:34.674 21903 21903 D AndroidRuntime: Calling main entry com.android.commands.am.Am
06-19 20:57:34.674  1303  1682 I ActivityManager: Force stopping org.mozilla.geckoview.test appid=10062 user=0: from pid 21903
06-19 20:57:34.674  1303  1682 I ActivityManager: Killing 19820:org.mozilla.geckoview.test:tab/u0a62 (adj 0): stop org.mozilla.geckoview.test

In this case, I think the test stalled and timed out after about 6 minutes. Then the harness stopped the application, apparently without signalling to trigger a minidump. A wpt harness change (to signal on timeout) could probably generate a minidump in this case.

gsvelto said the only reasons you wouldn't get minidumps on Android are:

  • Some signal that's not handled by the crashreporter--which I don't think is the case here; or
  • This crash is happening prior to the crashreporter getting initialized, so very early in startup.

Do we start running things for webrtc that early in startup?

Flags: needinfo?(nfroyd)

Comment 19 and comment 20 show SIGSEGV -- should be handled!

From comment 19:

06-19 20:45:37.230  1303  1367 I ActivityManager: Start proc 19848:org.mozilla.geckoview.test:tab/u0a62 for service org.mozilla.geckoview.test/org.mozilla.gecko.process.GeckoServiceChildProcess$tab
06-19 20:45:37.230 19848 19848 D GeckoThread: State changed to LAUNCHED
...
06-19 20:51:37.850 19848 19863 E rtc     : # Fatal error in /builds/worker/workspace/build/src/media/webrtc/trunk/webrtc/rtc_base/task_queue_libevent.cc, line 287

That process was running for 6 minutes -- startup should be complete.

From comment 20:

Comment 20 shows a similar process life time:

06-19 20:44:50.850  1295  1358 I ActivityManager: Start proc 19873:org.mozilla.geckoview.test:tab/u0a62 for service org.mozilla.geckoview.test/org.mozilla.gecko.process.GeckoServiceChildProcess$tab
06-19 20:44:50.850 19873 19873 D GeckoThread: State changed to LAUNCHED
...
06-19 20:50:45.590 19873 19888 E rtc     : # Fatal error in /builds/worker/workspace/build/src/media/webrtc/trunk/webrtc/rtc_base/task_queue_libevent.cc, line 287

(In reply to Geoff Brown [:gbrown] from comment #23)

Comment 19 and comment 20 show SIGSEGV -- should be handled!

I'll look into this bit and other crash handling in wpt harness next week. Thanks!

There's a r+ patch which didn't land and no activity in this bug for 2 weeks.
:bwc, could you have a look please?
For more information, please visit auto_nag documentation.

Flags: needinfo?(docfaraday)

The patch does not seem to help.

Flags: needinfo?(docfaraday)

I should note that in bug 1526666, it looks like we are running out of fds, and getting the same error we see here. What kind of tools do we have to debug fd exhaustion on try? Can we increase the limit?

Flags: needinfo?(snorp)

We can't increase the fd limit. As for debugging the exhaustion, lsof works on Android emulators. Something like lsof -a -p <pid of app> should yield interesting results.

Flags: needinfo?(snorp)
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: