Closed
Bug 1428790
Opened 7 years ago
Closed 7 years ago
Intermittent command timed out: 7200 seconds elapsed running ['/tools/buildbot/bin/python', 'scripts/scripts/talos_script.py', '--suite', 'h2-e10s', '--add-option', '--webServer,localhost', '--branch-name', 'Mozilla-Inbound-Non-PGO', '--cfg'
Categories
(Testing :: Talos, defect, P5)
Tracking
(firefox58 fixed, firefox59 fixed)
RESOLVED
FIXED
mozilla59
People
(Reporter: intermittent-bug-filer, Assigned: rwood)
References
Details
(Keywords: intermittent-failure, Whiteboard: [stockwell disabled])
Attachments
(1 file)
Comment hidden (Intermittent Failures Robot) |
Comment 2•7 years ago
|
||
Since this bug was opened (in the last 3 days) there are 77 failures.
They occur mostly on Linux x64 and the rest of them on OS X 10.10, Windows 7, Windows10 64. The affected builds type are debug, opt, pgo.
Recent log: https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-inbound&job_id=154685960&lineNumber=8278
and a relevant part of it:
8278
command timed out: 7200 seconds elapsed running ['/tools/buildbot/bin/python', 'scripts/scripts/talos_script.py', '--suite', 'h2-e10s', '--add-option', '--webServer,localhost', '--branch-name', 'Mozilla-Inbound-Non-PGO', '--cfg', 'talos/linux_config.py', '--download-symbols', 'ondemand', '--use-talos-json', '--blob-upload-branch', 'Mozilla-Inbound-Non-PGO'], attempting to kill
8279
process killed by signal 9
8280
program finished with exit code -1
8281
elapsedTime=7200.009734
8282
========= master_lag: 0.13 =========
:rwood, can you please have a look at this?
Flags: needinfo?(rwood)
Whiteboard: [stockwell needswork]
Comment 3•7 years ago
|
||
The log seems to indicate that the h2 talos spends a lot of time copying profiles. Note the timestamps:
12:54:37 INFO - Cloning profile located at /home/cltbld/.mozilla/profiles/simple
12:54:37 INFO - => minidumps
12:54:37 INFO - => bookmarkbackups
12:54:37 INFO - => safebrowsing
12:54:37 INFO - => OfflineCache
12:54:37 INFO - => sessionstore-backups
12:54:37 INFO - => crashes
12:54:37 INFO - => crashes/events
12:54:37 INFO - => browser-extension-data
12:54:37 INFO - => browser-extension-data/screenshots@mozilla.org
12:54:37 INFO - => datareporting
12:54:37 INFO - => datareporting/archived
12:54:37 INFO - => datareporting/archived/2017-11
12:54:38 INFO - => datareporting/archived/2017-10
12:54:38 INFO - => datareporting/archived/2018-01
12:54:38 INFO - => datareporting/archived/2017-09
12:54:38 INFO - => datareporting/archived/2017-12
12:54:38 INFO - => startupCache
12:54:38 INFO - => gmp
12:54:38 INFO - => gmp/Linux_x86_64-gcc3
12:54:38 INFO - => storage
12:54:38 INFO - => storage/permanent
12:54:38 INFO - => storage/permanent/chrome
12:54:38 INFO - => storage/permanent/chrome/idb
12:54:38 INFO - => storage/permanent/chrome/idb/3561288849sdhlie.files
12:54:38 INFO - => storage/permanent/chrome/idb/2918063365piupsah.files
12:54:38 INFO - => storage/temporary
12:54:38 INFO - => storage/default
12:54:38 INFO - => storage/default/https+++ir.ebaystatic.com
12:54:38 INFO - => storage/default/https+++ir.ebaystatic.com/idb
12:54:38 INFO - => storage/default/https+++ir.ebaystatic.com/idb/12183338011.files
12:54:38 INFO - => storage/default/https+++tpc.googlesyndication.com
12:54:38 INFO - => storage/default/https+++tpc.googlesyndication.com/idb
12:54:38 INFO - => storage/default/https+++tpc.googlesyndication.com/idb/12183338011.files
12:54:38 INFO - => storage/default/https+++ad.doubleclick.net
12:54:38 INFO - => storage/default/https+++ad.doubleclick.net/idb
12:54:38 INFO - => storage/default/https+++ad.doubleclick.net/idb/12183338011.files
12:54:38 INFO - => storage/default/https+++www.youtube.com
12:54:38 INFO - => storage/default/https+++www.youtube.com/cache
12:54:38 INFO - => storage/default/https+++www.youtube.com/cache/morgue
12:54:38 INFO - => storage/default/https+++www.youtube.com/cache/morgue/54
12:54:38 INFO - => storage/default/https+++www.youtube.com/cache/morgue/56
12:54:38 INFO - => storage/default/https+++www.youtube.com/cache/morgue/35
12:54:38 INFO - => storage/default/https+++www.youtube.com/cache/morgue/84
12:54:38 INFO - => storage/default/https+++www.youtube.com/cache/morgue/194
12:54:38 INFO - => storage/default/https+++www.youtube.com/cache/morgue/22
12:54:38 INFO - => storage/default/https+++www.youtube.com/cache/morgue/118
12:54:38 INFO - => storage/default/https+++www.youtube.com/idb
12:54:38 INFO - => storage/default/https+++www.youtube.com/idb/3211250388sbwdpsunsohintoatciif.files
12:54:38 INFO - => storage/default/https+++images-na.ssl-images-amazon.com
12:54:38 INFO - => storage/default/https+++images-na.ssl-images-amazon.com/idb
12:54:38 INFO - => storage/default/https+++images-na.ssl-images-amazon.com/idb/12183338011.files
12:54:38 INFO - => gmp-gmpopenh264
12:54:38 INFO - => gmp-gmpopenh264/1.6
12:54:38 INFO - => gmp-gmpopenh264/1.7.1
12:54:38 INFO - => saved-telemetry-pings
12:54:39 INFO - => cache2
12:54:39 INFO - => cache2/entries
13:20:03 INFO - => cache2/doomed
13:20:04 INFO - => thumbnails
It looks like copying cache2/entries takes 20+ minutes, and this happens 4 times during a single run of h2. All the h2 seem to take close to 2 hours and will randomly go over 2 hours, resulting in this failure. This "simple" profile doesn't seem very simple, and might have gotten polluted somewhere along the way.
Comment 4•7 years ago
|
||
yeah, windows is slow by far- possibly we need to determine how to only unpack the profile once. :rwood, could you look into this to see if we can realistically only unpack/copy the profile once?
Assignee | ||
Comment 5•7 years ago
|
||
(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #4)
> yeah, windows is slow by far- possibly we need to determine how to only
> unpack the profile once. :rwood, could you look into this to see if we can
> realistically only unpack/copy the profile once?
Sure I'll have a look, thx
Flags: needinfo?(rwood)
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 9•7 years ago
|
||
(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #4)
> yeah, windows is slow by far- possibly we need to determine how to only
> unpack the profile once. :rwood, could you look into this to see if we can
> realistically only unpack/copy the profile once?
I spent some time looking into this. Problems with this approach (copying the profile once):
- we currently blow away the temporary folder used by talos which includes the profile; if we want to reuse the profile we'll need to keep the temporary folder around
- when running tp6 we have to install a mitmproxy CA certificate into the profile and set up a Firefox proxy; right now we are blowing away the profile so we don't have to remove those. However if we want to reuse the same profile we'll need to write code to remove the CA certificate and also remove the proxy
- we set browser prefs for the tppageloader options; if we aren't blowing away the profile then we'll have to make sure we remove/reset any of those browser prefs after the test run
- I'm a bit concerned about running tests using the same profile and the impact that may have on tests further down the chain
IMO I don't think just copying the profile once is the way to go here. Problem is I don't have another solution in mind ATM.
Comment 10•7 years ago
|
||
Can we use a slimmer profile then? It seems like most of the time is spent copying cache entries in the profile, if we can reduce the cache size in the canned profile it will likely help quite a bit.
Assignee | ||
Comment 11•7 years ago
|
||
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #10)
> Can we use a slimmer profile then? It seems like most of the time is spent
> copying cache entries in the profile, if we can reduce the cache size in the
> canned profile it will likely help quite a bit.
The profile is heavy/large on purpose to test using a heavy user profile, so that would default the purpose really IMO. We may need to just disable this on Windows as it's fine on the other platforms.
:tarek, what do you think? I think we need to disable this on Windows, or as Kartikaya suggests use a reduced heavy profile (however is there a point to the test then?). Thanks!
Flags: needinfo?(tarek)
Comment 12•7 years ago
|
||
This is already disabled on Windows. It's failing on linux64 mostly, now.
Assignee | ||
Comment 13•7 years ago
|
||
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #12)
> This is already disabled on Windows. It's failing on linux64 mostly, now.
Ah right yes, I saw the 'h2' symbol on inbound Win 10 - turns out mochitest headless also uses the same 'h2' symbol
Assignee | ||
Comment 14•7 years ago
|
||
Talos h1 (heavy 1) is still running on Windows but not h2
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 17•7 years ago
|
||
Yeah I think we could skip windows altogether since copying around files is notoriously slow there
Flags: needinfo?(tarek)
Comment 18•7 years ago
|
||
we need to skip linux also
Assignee | ||
Comment 19•7 years ago
|
||
Talos h1 has the same issue on Win10 (Bug 1428794). Alright I'm going to disable h1 and h2 on Windows and Linux, at least until a fix can be found...
Comment hidden (mozreview-request) |
Comment 22•7 years ago
|
||
mozreview-review |
Comment on attachment 8943313 [details]
Bug 1428790 - Disable talos heavy user profile tests on windows and linux;
https://reviewboard.mozilla.org/r/213624/#review219342
ok!
Attachment #8943313 -
Flags: review?(jmaher) → review+
Comment 23•7 years ago
|
||
mozreview-review |
Comment on attachment 8943313 [details]
Bug 1428790 - Disable talos heavy user profile tests on windows and linux;
https://reviewboard.mozilla.org/r/213624/#review219348
::: taskcluster/ci/test/talos.yml:216
(Diff revision 1)
> try-name: h1
> treeherder-symbol: tc-T(h1)
> max-run-time: 5400
> run-on-projects:
> by-test-platform:
> .*-qr/.*: ['mozilla-central', 'try']
You'll need to drop this -qr line as well. Same in h2. Otherwise platforms like linux64-qr will match both the .*-qr/.* line and linux.* and the python code will complain about it
Assignee | ||
Comment 24•7 years ago
|
||
mozreview-review-reply |
Comment on attachment 8943313 [details]
Bug 1428790 - Disable talos heavy user profile tests on windows and linux;
https://reviewboard.mozilla.org/r/213624/#review219348
Just discovered that on my try push, thanks!
Comment hidden (mozreview-request) |
Comment 26•7 years ago
|
||
Pushed by rwood@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/89cb08844b17
Disable talos heavy user profile tests on windows and linux; r=jmaher
Comment 27•7 years ago
|
||
bugherder |
Status: NEW → RESOLVED
Closed: 7 years ago
status-firefox59:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla59
Updated•7 years ago
|
Assignee: nobody → rwood
status-firefox58:
--- → affected
Comment hidden (Intermittent Failures Robot) |
Comment 29•7 years ago
|
||
bugherder uplift |
Whiteboard: [stockwell disable-recommended] → [stockwell disabled]
Comment 30•7 years ago
|
||
bugherder uplift |
https://hg.mozilla.org/releases/mozilla-beta/rev/472ecdbbe0ca (FIREFOX_58b_RELBRANCH)
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
You need to log in
before you can comment on or make changes to this bug.
Description
•