Closed
Bug 1154377
Opened 10 years ago
Closed 9 years ago
Intermittent OS X build Automation Error: mozprocess timed out after 2400 seconds running ['/tools/buildbot/bin/python', 'mach', '--log-no-times', 'build', '-v'] timed out after 2400 seconds of no output
Categories
(Release Engineering :: General, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: jlund, Assigned: glandium)
References
Details
Attachments
(3 files)
(deleted),
patch
|
nthomas
:
review+
nthomas
:
checked-in-
|
Details | Diff | Splinter Review |
(deleted),
patch
|
mshal
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
mshal
:
review+
|
Details | Diff | Splinter Review |
+++ This bug was initially created as a clone of Bug #1145507 +++
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 48•10 years ago
|
||
Could this be something to do with sccache ? Lots of different mac slaves in scl3, all failing in intl/icu or js.
Flags: needinfo?(mh+mozilla)
Comment 49•10 years ago
|
||
Perhaps not that specific actually, but sccache is try-specific IIRC.
Assignee | ||
Comment 50•10 years ago
|
||
I don't know. Maybe. But without looking on a stuck slave, while it's being stuck I can't tell.
Flags: needinfo?(mh+mozilla)
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 115•10 years ago
|
||
glandium and I dug into this - sccache is having issues and it looks like the server and network communication. So this disables sccache, r+ from Callek on IRC. Jobs starting after 1826 Pacific should be OK.
https://hg.mozilla.org/build/mozharness/rev/2dc80314c97e
https://hg.mozilla.org/build/mozharness/rev/2e1cd6e8c253
Attachment #8593141 -
Flags: review+
Attachment #8593141 -
Flags: checked-in+
Comment 117•10 years ago
|
||
Comment on attachment 8593141 [details] [diff] [review]
[mozharness] Disable sccache on mac
Backed out because I missed the mozharness pinning in-tree, which means it's not going to be effective.
https://hg.mozilla.org/build/mozharness/rev/1d37d6e92c7f
https://hg.mozilla.org/build/mozharness/rev/8466a94c95b2
I'll work out how to do this in buildbot instead.
Attachment #8593141 -
Flags: checked-in+ → checked-in-
Comment 118•10 years ago
|
||
Buildbot fix:
https://hg.mozilla.org/build/buildbot-configs/rev/58939afabf7c
https://hg.mozilla.org/build/buildbot-configs/rev/c9362236ab6c
and a typo fix, because it's one of those days
https://hg.mozilla.org/build/buildbot-configs/rev/b90ef9540c76
https://hg.mozilla.org/build/buildbot-configs/rev/faefb8767843
Reconfiged the try masters, done at 2100 Pacific.
Assignee | ||
Comment 119•10 years ago
|
||
So, one part of the equation is that for some reason the main sccache server
process stops listening to its port. This part needs further investigation.
Now, this would be less of a problem if the port wasn't still bound and
listening, because the server subprocesses still have an open file descriptor
they inherited from the main sccache process, except they are not handling
incoming connections (they're not supposed to)...
Assignee | ||
Comment 120•10 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #119)
> So, one part of the equation is that for some reason the main sccache server
> process stops listening to its port. This part needs further investigation.
This part is actually a red herring.
What is happening is that one process gets stuck on a http request to S3. Other processes continue to work, until make doesn't have anything more to do, in which case it waits for that stuck process. Then 5 minutes pass and the sccache server stops listening to its socket because no new request came in. From there on, sccache clients just get stuck if there happens to be any new one for some reason.
> Now, this would be less of a problem if the port wasn't still bound and
> listening, because the server subprocesses still have an open file descriptor
> they inherited from the main sccache process, except they are not handling
> incoming connections (they're not supposed to)...
This is still possibly true, but there are two issues here:
- Network is currently flaky.
- Sccache doesn't handle that very well.
Assignee | ||
Comment 121•10 years ago
|
||
Note the "keeps listening in subprocesses" is specific to OSX. It doesn't happen on linux.
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 123•10 years ago
|
||
Assignee: nobody → mh+mozilla
Attachment #8593323 -
Flags: review?(mshal)
Assignee | ||
Comment 124•10 years ago
|
||
Attachment #8593324 -
Flags: review?(mshal)
Updated•10 years ago
|
Attachment #8593323 -
Flags: review?(mshal) → review+
Comment 125•10 years ago
|
||
Comment on attachment 8593324 [details] [diff] [review]
Close listening sockets when forking processes
Any theories as to why the network would be flaky just within the past week or so? (Bug 1153012 was reported on 4/9)
Attachment #8593324 -
Flags: review?(mshal) → review+
Assignee | ||
Comment 126•10 years ago
|
||
No idea, someone with knowledge of the network infra should look into it.
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 131•10 years ago
|
||
Comment 132•10 years ago
|
||
Comment 133•10 years ago
|
||
mozharness production tag moved to: https://hg.mozilla.org/build/mozharness/rev/production
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 137•10 years ago
|
||
Assignee | ||
Comment 138•10 years ago
|
||
(In reply to Nick Thomas [:nthomas] from comment #118)
> Buildbot fix:
> https://hg.mozilla.org/build/buildbot-configs/rev/58939afabf7c
> https://hg.mozilla.org/build/buildbot-configs/rev/c9362236ab6c
> and a typo fix, because it's one of those days
> https://hg.mozilla.org/build/buildbot-configs/rev/b90ef9540c76
> https://hg.mozilla.org/build/buildbot-configs/rev/faefb8767843
>
> Reconfiged the try masters, done at 2100 Pacific.
This can be backed out after the landing of bug 1155476 is picked up by most try pushes.
Comment 139•10 years ago
|
||
<glandium> nthomas: 72 of the past 100 try pushes are using the fixed sccache, fwiw
Waiting a little longer.
Comment 140•10 years ago
|
||
Backed out with https://hg.mozilla.org/build/buildbot-configs/rev/23e73092ea0a. By the time this goes live in a reconfig we should be at a higher proportion, and the network would need to still be broken for it to manifest.
Comment hidden (Legacy TBPL/Treeherder Robot) |
Reporter | ||
Comment 142•10 years ago
|
||
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 148•9 years ago
|
||
I think we can close this bug, now. Especially now that /different/ occurrences have been hit and attributed here. (comment 147)
Assignee | ||
Updated•9 years ago
|
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Comment 149•9 years ago
|
||
Heh, like we'd stop starring just because a bug was closed-and-not-the-right-one.
Keywords: intermittent-failure
Assignee | ||
Comment 150•9 years ago
|
||
I'd hope you look why something is closed before starring.
Updated•7 years ago
|
Component: General Automation → General
You need to log in
before you can comment on or make changes to this bug.
Description
•