Closed Bug 681861 Opened 13 years ago Closed 11 years ago

Intermittent Android test failure with repeated "reconnecting socket" "unable to connect socket"

Categories

(Release Engineering :: General, defect, P3)

ARM
Android
defect

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: philor, Assigned: jmaher)

References

Details

(Keywords: intermittent-failure, Whiteboard: [android_tier_1_mozpool])

Crappy summary, but that's all the log has to tell me:

http://tbpl.allizom.org/php/getParsedLog.php?id=6113674
Android Tegra 250 mozilla-central opt test reftest-2 on 2011-08-24 19:46:31 PDT

REFTEST TEST-START | http://10.250.48.202:30047/tests/layout/reftests/table-background/border-collapse-table-ref.html
failed to validate file when downloading /mnt/sdcard/tests/reftest/reftest.log!
reconnecting socket
unable to connect socket
reconnecting socket
unable to connect socket
reconnecting socket
unable to connect socket
reconnecting socket
unable to connect socket
reconnecting socket
unable to connect socket

INFO | automation.py | Application ran for: 0:20:43.805144
INFO | automation.py | Reading PID log: /tmp/tmpINLOajpidlog
WARNING | automationutils.processLeakLog() | refcount logging is off, so leaks can't be detected!

REFTEST INFO | runreftest.py | Running tests: end.
reconnecting socket
unable to connect socket
reconnecting socket
unable to connect socket
reconnecting socket
unable to connect socket
program finished with exit code 0
elapsedTime=1506.754477
TinderboxPrint: reftest-2<br/><em class="testfail">T-FAIL</em>
Priority: -- → P3
Whiteboard: [orange] → [orange][android_tier_1]
No update here in over 2 weeks on this android_tier_1 bug. Is this still an issue?
I'm probably cheating, and some of the last few are actually something new and different. Luckily, who's going to catch me?

https://tbpl.mozilla.org/php/getParsedLog.php?id=6620798&tree=Mozilla-Inbound
(In reply to Phil Ringnalda (:philor) from comment #11)
> I'm probably cheating, and some of the last few are actually something new
> and different. Luckily, who's going to catch me?
> 
> https://tbpl.mozilla.org/php/getParsedLog.php?id=6620798&tree=Mozilla-Inbound

1 of the last 3 is a different "tegra has gone away" type, but it is still the overall pattern that sometimes tegras just "go away"

one was during reboot and the other 2 are during install step
Assignee: nobody → jmaher
Depends on: 690311
https://tbpl.mozilla.org/php/getParsedLog.php?id=7509483&tree=Firefox
Android XUL Tegra 250 mozilla-central opt test jsreftest-3 on 2011-11-21 08:30:35 PST for push 78cd6a30e250

after regress-380506.js
Whiteboard: [orange][android_tier_1] → [orange][android_tier_1_mozharness]
https://tbpl.mozilla.org/php/getParsedLog.php?id=8287142&tree=Firefox
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Er, no, not yet fixed, I'm afraid.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Looks like someone added some verbosity:

https://tbpl.mozilla.org/php/getParsedLog.php?id=10650718&tree=Mozilla-Inbound

failed to validate file when downloading /mnt/sdcard/tests/reftest/reftest.log!
Error receiving data from socket. cmd=ps
; err=[Errno 60] Operation timed out
reconnecting socket
unable to connect socket: [Errno 60] Operation timed out
reconnecting socket
unable to connect socket: [Errno 64] Host is down
reconnecting socket
unable to connect socket: [Errno 64] Host is down
reconnecting socket
unable to connect socket: [Errno 64] Host is down
https://tbpl.mozilla.org/php/getParsedLog.php?id=10653799&tree=Mozilla-Inbound

Error receiving data from socket. cmd=ps
; err=[Errno 60] Operation timed out
reconnecting socket
unable to connect socket: [Errno 60] Operation timed out
reconnecting socket
unable to connect socket: [Errno 60] Operation timed out
reconnecting socket
unable to connect socket: [Errno 60] Operation timed out
reconnecting socket
unable to connect socket: [Errno 60] Operation timed out
https://tbpl.mozilla.org/php/getParsedLog.php?id=10671305&tree=Mozilla-Inbound
Error receiving data from socket. cmd=ps
; err=[Errno 65] No route to host
reconnecting socket
unable to connect socket: [Errno 60] Operation timed out
reconnecting socket
unable to connect socket: [Errno 60] Operation timed out
reconnecting socket
unable to connect socket: [Errno 60] Operation timed out
reconnecting socket
unable to connect socket: [Errno 60] Operation timed out
https://tbpl.mozilla.org/php/getParsedLog.php?id=11250612&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=11252456&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=11250676&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=11251802&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=11249790&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=11249637&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=11247623&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=11243582&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=11243890&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=11245656&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=11253960&tree=Firefox
https://tbpl.mozilla.org/php/getParsedLog.php?id=11253741&tree=Firefox
https://tbpl.mozilla.org/php/getParsedLog.php?id=11253750&tree=Firefox
https://tbpl.mozilla.org/php/getParsedLog.php?id=11252125&tree=Firefox
https://tbpl.mozilla.org/php/getParsedLog.php?id=11249524&tree=Firefox
https://tbpl.mozilla.org/php/getParsedLog.php?id=11249324&tree=Firefox
https://tbpl.mozilla.org/php/getParsedLog.php?id=11244797&tree=Firefox
https://tbpl.mozilla.org/php/getParsedLog.php?id=11246075&tree=Firefox
https://tbpl.mozilla.org/php/getParsedLog.php?id=11231629&tree=Firefox
https://tbpl.mozilla.org/php/getParsedLog.php?id=11231716&tree=Firefox
Several of the reports on 2012-05-14 were caused by the upgrade to SUTagent 1.08.
I reviewed many of the Mozilla-Inbound logs reported here for the last 2 weeks. There are several unique logs, but more than 70% of the failures look very much like this:

========= Started Cleanup Device failed (results: 2, elapsed: 30 secs) (at 2012-05-09 22:40:12.426146) =========
python /builds/sut_tools/cleanup.py 10.250.50.83
 in dir /builds/tegra-173/test/../talos-data (timeout 1200 secs)
 watching logfiles {}
 argv: ['python', '/builds/sut_tools/cleanup.py', '10.250.50.83']
 environment:
  PATH=/opt/local/bin:/opt/local/sbin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin
  PWD=/builds/tegra-173/talos-data
  SUT_IP=10.250.50.83
  SUT_NAME=tegra-173
  __CF_USER_TEXT_ENCODING=0x1F5:0:0
 closing stdin
 using PTY: False
Warning proxy.flg found during cleanup
Connecting to: 10.250.50.83
reconnecting socket
unable to connect socket
reconnecting socket
unable to connect socket
reconnecting socket
unable to connect socket
reconnecting socket
unable to connect socket
reconnecting socket
unable to connect socket
reconnecting socket
unable to connect socket
reconnecting socket
unable to connect socket
reconnecting socket
unable to connect socket
reconnecting socket
unable to connect socket
reconnecting socket
unable to connect socket
reconnecting socket
unable to connect socket
reconnecting socket
unable to connect socket
reconnecting socket
unable to connect socket
reconnecting socket
unable to connect socket
reconnecting socket
unable to connect socket
reconnecting socket
unable to connect socket
reconnecting socket
unable to connect socket
reconnecting socket
unable to connect socket
program finished with exit code 1
elapsedTime=30.290938
========= Finished Cleanup Device failed (results: 2, elapsed: 30 secs) (at 2012-05-09 22:40:42.740397) =========


 - failure occurs during cleanup.py
 - there are 18 attempts to reconnect
 - elapsed time is ~30 seconds

Can anyone tell me what "Warning proxy.flg found during cleanup" indicates?
https://tbpl.mozilla.org/php/getParsedLog.php?id=12495651&tree=Mozilla-Aurora

Aiui, the proxy flag is set while the device is rebooting.
This may be indicative of the device not coming back up before the test is run (long reboot) or the device going down permanently, or some other variant of the proxy flag being set but not cleaned up.

This is part of a larger class of bugs where we're allocating the test to a tegra that isn't online.
Blocks: 790595
Depends on: 793091
No longer blocks: 790595
Depends on: 793678, 790595
Depends on: 794017
Whiteboard: [orange][android_tier_1_mozharness] → [android_tier_1_mozharness]
Not really mozharness that will fix this, but mozpool.
Whiteboard: [android_tier_1_mozharness] → [android_tier_1_mozpool]
Product: mozilla.org → Release Engineering
Status: NEW → RESOLVED
Closed: 13 years ago11 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.