Closed
Bug 675375
Opened 13 years ago
Closed 13 years ago
Frequent Android talos red with "devicemanager.FileError: error returned from pull: could not find metadata"
Categories
(Testing :: Talos, defect)
Tracking
(Not tracked)
RESOLVED
WORKSFORME
People
(Reporter: dholbert, Unassigned)
References
()
Details
(Keywords: intermittent-failure)
In the last two days, we've been getting very frequent Talos reds (around 1 red per 1 or 2 pushes, on m-c and m-i) with this error message:
> devicemanager.FileError: error returned from pull: could not find metadata
http://tinderbox.mozilla.org/showlog.cgi?log=Mozilla-Inbound/1311956009.1311957454.10061.gz
http://tinderbox.mozilla.org/showlog.cgi?log=Mozilla-Inbound/1311962909.1311964350.6883.gz
http://tinderbox.mozilla.org/showlog.cgi?log=Mozilla-Inbound/1311975178.1311976645.28669.gz
http://tinderbox.mozilla.org/showlog.cgi?log=Mozilla-Inbound/1311972269.1311973677.15655.gz
http://tinderbox.mozilla.org/showlog.cgi?log=Mozilla-Inbound/1311978029.1311979476.9356.gz
http://tinderbox.mozilla.org/showlog.cgi?log=Mozilla-Inbound/1311849689.1311851237.5678.gz
http://tinderbox.mozilla.org/showlog.cgi?log=Mozilla-Inbound/1311852539.1311853997.23538.gz
http://tinderbox.mozilla.org/showlog.cgi?log=Mozilla-Inbound/1311855354.1311856792.6359.gz
http://tinderbox.mozilla.org/showlog.cgi?log=Mozilla-Inbound/1311888725.1311890153.8445.gz
http://tinderbox.mozilla.org/showlog.cgi?log=Mozilla-Inbound/1311894389.1311895829.2984.gz
http://tinderbox.mozilla.org/showlog.cgi?log=Mozilla-Inbound/1311896109.1311897548.11519.gz
http://tinderbox.mozilla.org/showlog.cgi?log=Mozilla-Inbound/1311896110.1311897557.11584.gz
http://tinderbox.mozilla.org/showlog.cgi?log=Mozilla-Inbound/1311904949.1311906393.23041.gz
http://tinderbox.mozilla.org/showlog.cgi?log=Mozilla-Inbound/1311939652.1311941059.20206.gz
http://tinderbox.mozilla.org/showlog.cgi?log=Mozilla-Inbound/1311935369.1311936855.26358.gz
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1311914969.1311916422.6084.gz
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1311979463.1311980867.14135.gz
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1311975233.1311976679.28766.gz
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1311957064.1311958497.14507.gz
Reporter | ||
Updated•13 years ago
|
OS: Linux → Android
Hardware: x86_64 → ARM
Summary: Frequent talos red with "devicemanager.FileError: error returned from pull: could not find metadata" → Frequent Android talos red with "devicemanager.FileError: error returned from pull: could not find metadata"
Reporter | ||
Comment 1•13 years ago
|
||
The logs in comment 0 are all one of {remote-ts, remote-tpan, remote-tzoom}
Summary: Frequent Android talos red with "devicemanager.FileError: error returned from pull: could not find metadata" → Frequent Android talos red with "devicemanager.FileError: error returned from pull: could not find metadata" (affects remote-ts, remote-tpan, remote-tzoom)
Reporter | ||
Comment 2•13 years ago
|
||
Reporter | ||
Comment 3•13 years ago
|
||
http://tinderbox.mozilla.org/showlog.cgi?log=Mozilla-Inbound/1311989189.1311990630.21356.gz
http://tinderbox.mozilla.org/showlog.cgi?log=Mozilla-Inbound/1311987029.1311988436.11987.gz
http://tinderbox.mozilla.org/showlog.cgi?log=Mozilla-Inbound/1311987029.1311988468.12075.gz
CC'ing Aki and Bear, in the hopes that they might be the right people to investigate this or know who the right people are.
Comment 4•13 years ago
|
||
adding some members of ateam, since the problem may be in the device manager
Reporter | ||
Comment 5•13 years ago
|
||
Comment 6•13 years ago
|
||
Comment 7•13 years ago
|
||
http://hg.mozilla.org/build/talos/file/c5e493459dd0/devicemanager.py#l704
This appears to be a devicemanager bug.
Component: Release Engineering → Talos
Product: mozilla.org → Testing
QA Contact: release → talos
Version: other → unspecified
Comment 8•13 years ago
|
||
Reporter | ||
Comment 9•13 years ago
|
||
Reporter | ||
Comment 10•13 years ago
|
||
Comment 11•13 years ago
|
||
Joel, have you found any way to repro/diagnose this?
Comment 12•13 years ago
|
||
I have only spent a couple hours on this, and I haven't been able to reproduce it locally. I might have to look at just fixing the code without a repro.
Any tips folks have to reproduce this would be appreciated.
Updated•13 years ago
|
Whiteboard: [orange]
Comment 13•13 years ago
|
||
so walking through the code it appears we are getting an exception while doing a sock.recv()
Here is the function:
def uread(to_recv, error_msg):
""" unbuffered read """
try:
data = self._sock.recv(to_recv)
if not data:
err(error_msg)
return None
return data
except:
err(error_msg)
return None
We are raising an exception in the except clause, so just thinking this through, self._sock.recv() is throwing an exception. This is feeling like the socket is disconnecting and we throw an error. I can test around that condition.
A few thoughts here are that we are killing the networking on the tegra. I have seen this while testing where we eat up a lot of memory. The resolution is to power cycle. We could test that theory by looking at the history of the tegra when this happens and see if it goes offline unplanned during or after this test.
I don't think a fennec crash would cause this to throw an exception.
Reporter | ||
Comment 14•13 years ago
|
||
http://tinderbox.mozilla.org/showlog.cgi?log=Mozilla-Inbound/1312896570.1312898024.25352.gz
http://tinderbox.mozilla.org/showlog.cgi?log=Mozilla-Inbound/1312896570.1312898008.25281.gz
http://tinderbox.mozilla.org/showlog.cgi?log=Mozilla-Inbound/1312906679.1312908116.15534.gz
http://tinderbox.mozilla.org/showlog.cgi?log=Mozilla-Inbound/1312917389.1312918855.2808.gz
http://tinderbox.mozilla.org/showlog.cgi?log=Mozilla-Inbound/1312917390.1312918851.2782.gz
Comment 15•13 years ago
|
||
Comment 16•13 years ago
|
||
Comment 17•13 years ago
|
||
Comment 18•13 years ago
|
||
Comment 19•13 years ago
|
||
Comment 20•13 years ago
|
||
Comment 21•13 years ago
|
||
Comment 22•13 years ago
|
||
Comment 23•13 years ago
|
||
Comment 24•13 years ago
|
||
Comment 25•13 years ago
|
||
Comment 26•13 years ago
|
||
Comment 27•13 years ago
|
||
Comment 28•13 years ago
|
||
Comment 29•13 years ago
|
||
Comment 30•13 years ago
|
||
Comment 31•13 years ago
|
||
Comment 32•13 years ago
|
||
Comment 33•13 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=6408444&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=6410969&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=6410970&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=6411818&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=6412219&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=6409890&tree=Firefox
Comment 34•13 years ago
|
||
Comment 35•13 years ago
|
||
Comment 36•13 years ago
|
||
Comment 37•13 years ago
|
||
Comment 38•13 years ago
|
||
Comment 39•13 years ago
|
||
oh, I finally saw this reproduce! It appears that we are hitting a race condition where we are shutting down the devicemanager at the same time we are doing a pullFile(). I call this an inch of progress!
Comment 40•13 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=6473159&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=6473596&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=6473736&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=6473735&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=6480578&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=6475333&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=6475826&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=6476530&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=6480513&tree=Mozilla-Inbound
Comment 41•13 years ago
|
||
Comment 42•13 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=6563772&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=6563768&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=6563978&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=6556503&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=6558122&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=6564508&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=6564505&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=6564350&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=6565536&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=6562011&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=6562009&tree=Mozilla-Inbound
Did we do something in the last 10 days to become more racy? This feels way more common than it was before I left.
Summary: Frequent Android talos red with "devicemanager.FileError: error returned from pull: could not find metadata" (affects remote-ts, remote-tpan, remote-tzoom) → Frequent Android talos red with "devicemanager.FileError: error returned from pull: could not find metadata"
Comment 43•13 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=6565836&tree=Firefox
https://tbpl.mozilla.org/php/getParsedLog.php?id=6565691&tree=Firefox
https://tbpl.mozilla.org/php/getParsedLog.php?id=6566128&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=6566241&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=6566121&tree=Firefox
https://tbpl.mozilla.org/php/getParsedLog.php?id=6566933&tree=Mozilla-Inbound
Comment 44•13 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=6572288&tree=Firefox
https://tbpl.mozilla.org/php/getParsedLog.php?id=6572276&tree=Firefox
https://tbpl.mozilla.org/php/getParsedLog.php?id=6573292&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=6573297&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=6573821&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=6574328&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=6574494&tree=Mozilla-Inbound
Comment 45•13 years ago
|
||
Comment 46•13 years ago
|
||
Comment 47•13 years ago
|
||
Comment 48•13 years ago
|
||
Comment 49•13 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=6584893&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=6585720&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=6586633&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=6586845&tree=Mozilla-Inbound
Comment 50•13 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=6589567&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=6588677&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=6588197&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=6588683&tree=Firefox
https://tbpl.mozilla.org/php/getParsedLog.php?id=6593548&tree=Mozilla-Inbound
Comment 51•13 years ago
|
||
Comment 52•13 years ago
|
||
Comment 53•13 years ago
|
||
Comment 54•13 years ago
|
||
Comment 55•13 years ago
|
||
Comment 56•13 years ago
|
||
Comment 57•13 years ago
|
||
Comment 58•13 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=6613248&tree=Firefox
https://tbpl.mozilla.org/php/getParsedLog.php?id=6608775&tree=Firefox
https://tbpl.mozilla.org/php/getParsedLog.php?id=6607175&tree=Firefox
https://tbpl.mozilla.org/php/getParsedLog.php?id=6614922&tree=Firefox
https://tbpl.mozilla.org/php/getParsedLog.php?id=6615600&tree=Mozilla-Inbound
Comment 59•13 years ago
|
||
I think at the time I knew what this was supposed to be fixed-by, but by now I've forgotten.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → WORKSFORME
Assignee | ||
Updated•12 years ago
|
Keywords: intermittent-failure
Assignee | ||
Updated•12 years ago
|
Whiteboard: [orange]
You need to log in
before you can comment on or make changes to this bug.
Description
•