Closed
Bug 946178
Opened 11 years ago
Closed 11 years ago
Trees closed due to Marionette Bustage - | test_outgoing_radio_off.js | InvalidResponseException: Could not successfully complete transport of message to Gecko, socket closed? or AttributeError: 'NoneType' object has no attribute 'close'
Categories
(Remote Protocol :: Marionette, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: cbook, Unassigned)
References
()
Details
https://tbpl.mozilla.org/php/getParsedLog.php?id=31435030&tree=B2g-Inbound
b2g_emulator_vm b2g-inbound opt test marionette-webapi on 2013-12-04 01:59:40 PST for push 7ecc3f3f84b3
slave: tst-linux64-ec2-489
TEST-UNEXPECTED-FAIL | test_outgoing_radio_off.js | InvalidResponseException: Could not successfully complete transport of message to Gecko, socket closed?
TEST-UNEXPECTED-FAIL | test_outgoing_badNumber.js | InvalidResponseException: Could not successfully complete transport of message to Gecko, socket closed?
TEST-UNEXPECTED-FAIL | test_outgoing_busy.js | InvalidResponseException: Could not successfully complete transport of message to Gecko, socket closed?
investigating
Reporter | ||
Updated•11 years ago
|
Summary: Intermittent TEST-UNEXPECTED-FAIL | test_outgoing_radio_off.js | InvalidResponseException: Could not successfully complete transport of message to Gecko, socket closed? → Intermittent TEST-UNEXPECTED-FAIL | test_outgoing_radio_off.js | InvalidResponseException: Could not successfully complete transport of message to Gecko, socket closed? or AttributeError: 'NoneType' object has no attribute 'close'
Reporter | ||
Comment 1•11 years ago
|
||
So did spend some time looking into this. First sign of this problem were around 2am after the push for https://hg.mozilla.org/integration/b2g-inbound/rev/253e97c9c3f0
However this seems not the cause of the problems since for the merge from b2g-i hours before this changeset got not merge, but also m-c shows the problem.
As example
05:05 < Tomcat|sheriffduty> https://tbpl.mozilla.org/?tree=B2g-Inbound&rev=263f538a5509 is the b2g-i cset one push after the merge
05:05 < Tomcat|sheriffduty> marionette test green
05:06 < Tomcat|sheriffduty> https://tbpl.mozilla.org/?rev=9688476c1544 is the cset of the merge - marionette red
And even mozilla-inbound shows now this error. Could this be some kind of infra related that hit us at around 2am till now ?
Trees are closed
Severity: normal → blocker
Summary: Intermittent TEST-UNEXPECTED-FAIL | test_outgoing_radio_off.js | InvalidResponseException: Could not successfully complete transport of message to Gecko, socket closed? or AttributeError: 'NoneType' object has no attribute 'close' → Trees closed due to Marionette Bustage - | test_outgoing_radio_off.js | InvalidResponseException: Could not successfully complete transport of message to Gecko, socket closed? or AttributeError: 'NoneType' object has no attribute 'close'
Reporter | ||
Comment 2•11 years ago
|
||
btw failure logs of this issue are as example
https://tbpl.mozilla.org/php/getParsedLog.php?id=31439005&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=31436742&tree=Mozilla-Central
Comment 3•11 years ago
|
||
b2g26 appears to be unaffected
Comment 4•11 years ago
|
||
Looks like the b2g process crashes while the tests are running. From https://tbpl.mozilla.org/php/getParsedLog.php?id=31439005&full=1&branch=mozilla-inbound#error0 as an example:
03:49:17 INFO - 12-04 06:46:33.869 45 45 I Gecko : MARIONETTE LOG: INFO: == Test Start ==
03:49:17 INFO - 12-04 06:46:33.879 45 45 I Gecko : MobileConnection initialized
03:49:17 INFO - 12-04 06:46:33.899 45 45 I Gecko : MARIONETTE TEST RESULT:TEST-PASS | test_outgoing_radio_off.js | connection is instanceof [object MozMobileConnection] - true was true, expected true
03:49:17 ERROR - 12-04 06:46:33.989 45 45 F libc : Fatal signal 11 (SIGSEGV) at 0x0000002d (code=-6)
03:49:17 ERROR - This usually indicates the B2G process has crashed
03:49:17 INFO - 12-04 06:46:34.408 33 33 I ServiceManager: service 'media.resource_manager' died
03:49:17 INFO - 12-04 06:46:34.499 37 37 I DEBUG : debuggerd committing suicide to free the zombie!
You see the test starting, it does a check, then suddenly the b2g process crashes. If marionette has a bug, it will display a related error in the logs. This looks like a change was made somewhere else in the b2g process.
Comment 5•11 years ago
|
||
Judging from the test and its output, the code triggering the error is likely in this function: http://mxr.mozilla.org/mozilla-central/source/dom/telephony/test/marionette/test_outgoing_radio_off.js#20 but before onradiostatechanged is called (because otherwise, we'd get some addition output). So, it's likely where it uses mozMobileConnection to do some RIL stuff, so it's likely this mozMobileConnection webapi call or some ril layer code has caused this.
tomcat, do we have a good idea of what checkin caused the crash to start happening?
Flags: needinfo?(cbook)
Comment 8•11 years ago
|
||
FYI, I disabled test_outgoing_radio_off.js on m-c and am waiting for the tests to finish. If that works, it will at least allow us to reopen until someone who knows this test better can investigate.
https://hg.mozilla.org/mozilla-central/rev/b2b20bc6576a
Comment 9•11 years ago
|
||
The failure just moved to the next test, so we're still stuck.
https://tbpl.mozilla.org/php/getParsedLog.php?id=31446693&tree=Mozilla-Central
Comment 10•11 years ago
|
||
At Clint's suggestion, I diffed sources.xml between a good build and a bad one to see if it's another external repo causing problems. However, the only differences I'm seeing between good and bad builds are the Gecko and Gaia revisions. Are there external repos not covered by sources.xml?
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 15•11 years ago
|
||
Disabling all tests that use setRadioEnabled.
https://hg.mozilla.org/mozilla-central/rev/9906961b21af
Comment 16•11 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM UTC-5] from comment #10)
> At Clint's suggestion, I diffed sources.xml between a good build and a bad
> one to see if it's another external repo causing problems. However, the only
> differences I'm seeing between good and bad builds are the Gecko and Gaia
> revisions. Are there external repos not covered by sources.xml?
Not that I'm aware of.
Comment 17•11 years ago
|
||
Generating new emulator builds on previously-green changesets is producing the same failures, so this is definitely not tied to a particular recent Gecko change.
Comment 18•11 years ago
|
||
We're pretty confident at this point that there's an underlying B2G issue here deeper than Gecko. I don't think we want to wait for TPE to wake up before investigating further. Can you please find someone to help?
Flags: needinfo?(overholt)
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 21•11 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM UTC-5] from comment #15)
> Disabling all tests that use setRadioEnabled.
>
> https://hg.mozilla.org/mozilla-central/rev/9906961b21af
Still busted.
https://tbpl.mozilla.org/php/getParsedLog.php?id=31450341&tree=Mozilla-Central
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 23•11 years ago
|
||
The crashes are reproducible when running the TBPL builds locally, so that rules out an infra issue.
Comment 24•11 years ago
|
||
This try job *might* give a crash stack:
https://tbpl.mozilla.org/?tree=Try&rev=5f5b7c060570
I couldn't reproduce the crash locally, but someone who can could apply it for faster results.
Comment 25•11 years ago
|
||
Jgriffin applied this locally and it looks like no minidumps are being generated. He verified that crashreporting is enabled.
Comment 26•11 years ago
|
||
I reverted comment 8 and comment 15 since they didn't help.
https://hg.mozilla.org/mozilla-central/rev/e8f983b8f586
Comment 27•11 years ago
|
||
Local testing showed the problem was https://github.com/mozilla-b2g/gaia/commit/71e2fcdaf3a6b86d11645ab1163e037179f591a3, which I've just reverted in https://github.com/mozilla-b2g/gaia/commit/f6a5c9991765bc91511e552e175cb91e10efae89
Comment 28•11 years ago
|
||
FYI, the real proof is in the pudding, so we'll see if these tests go green again after the above backout.
Comment 29•11 years ago
|
||
The pudding smells a little off - https://tbpl.mozilla.org/php/getParsedLog.php?id=31461295&tree=Mozilla-Inbound says it built with https://git.mozilla.org/?p=releases/gaia.git;a=tree;h=041206149b26d40545f7788f48cced09300f19f5 which seems to be the backout, but https://tbpl.mozilla.org/php/getParsedLog.php?id=31461703&tree=Mozilla-Inbound is still red.
Comment 30•11 years ago
|
||
The backout worked. Trees reopened at 17:11 MVT.
Flags: needinfo?(overholt)
Comment 31•11 years ago
|
||
Oh, turns out that the "gaia-revlink" that we tinderboxprint is only suitable for confusing the crap out of you - that's the push before the backout, but that URL leads to a display which has the summary of whatever happens to be the tip commit prominently featured at the top, the better to confuse you with.
Comment 32•11 years ago
|
||
I'm trying to understand the process here. If the backout worked, shouldn't this bug be closed, or are we waiting on re-enabling something?
Reporter | ||
Comment 33•11 years ago
|
||
marking this as fixed since the fix of the problem is worked on in the backout bug 933203
Reporter | ||
Updated•11 years ago
|
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
Keywords: intermittent-failure
Updated•2 years ago
|
Product: Testing → Remote Protocol
You need to log in
before you can comment on or make changes to this bug.
Description
•