Closed
Bug 797245
(talos-r4-lion-063)
Opened 12 years ago
Closed 12 years ago
talos-r4-lion-063 problem tracking
Categories
(Infrastructure & Operations Graveyard :: CIDuty, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: philor, Unassigned)
References
Details
(Whiteboard: [buildslaves][capacity][badslave?] This slave might have issues even if hardware diagnostics did not catch anything)
Attachments
(2 files)
(deleted),
patch
|
rail
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
rail
:
review+
|
Details | Diff | Splinter Review |
Please disable this slave and have hardware diagnostics run on it, particularly RAM tests - we've had this bug 795215 "pink pixel of death" problem for a while, where we'll get intermittent failures in a ton of reftests and the difference is that one canvas has a single pixel which is pink rather than white (a single bit of difference), but once I filed it so we could see that it was actually talos-r4-lion-063 twice in a row, hello bad RAM.
Reporter | ||
Comment 1•12 years ago
|
||
Reporter | ||
Updated•12 years ago
|
Summary: talos-r4-lion-063 problem tracking → [disable me] talos-r4-lion-063 problem tracking
Reporter | ||
Comment 2•12 years ago
|
||
Reporter | ||
Comment 3•12 years ago
|
||
Reporter | ||
Comment 4•12 years ago
|
||
Reporter | ||
Comment 5•12 years ago
|
||
Updated•12 years ago
|
Summary: [disable me] talos-r4-lion-063 problem tracking → talos-r4-lion-063 problem tracking
Comment 6•12 years ago
|
||
Putting back into the pool.
This slave might have issues even if hardware diagnostics did not catch anything.
If it gives us trouble we should dig into the problem and try to find a test case.
If such test case is found we should put the slave in bug 712206.
At that point we can ask re-imaging and try again.
If that fails again we should decommission the slave.
Whiteboard: [buildduty][buildslaves][capacity][badslave?] → [buildduty][buildslaves][capacity][badslave?] This slave might have issues even if hardware diagnostics did not catch anything
Updated•12 years ago
|
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 7•12 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=17848078&tree=Mozilla-Aurora is the same old pink pixel of death. So is https://tbpl.mozilla.org/php/getParsedLog.php?id=17878263&tree=Ionmonkey.
Test case? Write the same large number of values to memory twice, read them, make sure they are still the same when you read them. Although there's something in the reftest harness about "reusing canvases" that may mean it's partially more like writing a big block of zeros, overwriting a small part with non-zero data, then reading it over and over again and making sure none of the zeros become non-zero.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Reporter | ||
Comment 8•12 years ago
|
||
Nice, now it has burned 12 jobs in a row, "hdiutil: attach failed - Device not configured"
Severity: normal → major
Comment 9•12 years ago
|
||
[16:15] <philor> who's on sledgehammerduty?
[16:17] <philor> that damn talos-r4-lion-063, which has hardware problems no matter how many inadequate runs of inadequate hardware diagnostics it gets, has burned its last dozen runs
Disabled in slavealloc
Reporter | ||
Updated•12 years ago
|
Severity: major → normal
Comment 10•12 years ago
|
||
Comment 11•12 years ago
|
||
Comment 12•12 years ago
|
||
Comment 13•12 years ago
|
||
Comment 14•12 years ago
|
||
(In reply to Aki Sasaki [:aki] from comment #9)
> Disabled in slavealloc
And it managed to dodge the sledgehammer:
talos-r4-lion-063:~ cltbld$ uptime
11:26 up 19:51, 3 users, load averages: 0.66 0.58 0.53
I manually restarted teh slave now, so it should properly come back disabled.
Comment 15•12 years ago
|
||
We should decomm this slave. Hardware diagnostics show no problems with it, but it keeps burning jobs.
Comment 16•12 years ago
|
||
Attachment #699979 -
Flags: review?(rail)
Comment 17•12 years ago
|
||
Attachment #699981 -
Flags: review?(rail)
Updated•12 years ago
|
Attachment #699979 -
Flags: review?(rail) → review+
Comment 18•12 years ago
|
||
Comment on attachment 699981 [details] [diff] [review]
remove from buildbot
I think it'll conflict with https://bugzilla.mozilla.org/attachment.cgi?id=699975&action=edit
Attachment #699981 -
Flags: review?(rail) → review-
Comment 19•12 years ago
|
||
(In reply to Rail Aliiev [:rail] from comment #18)
> Comment on attachment 699981 [details] [diff] [review]
> remove from buildbot
>
> I think it'll conflict with
> https://bugzilla.mozilla.org/attachment.cgi?id=699975&action=edit
I was planning to resolve that when I land.
Comment 20•12 years ago
|
||
Comment on attachment 699981 [details] [diff] [review]
remove from buildbot
r+ in this case.
BTW, maybe it'll be easier to read the cofigs if we have something likes this:
'lion': dict([("talos-r4-lion-%03i" % x, {}) for x in \
set(range(4,85)) - set([10, 63, 83]) ]),
Attachment #699981 -
Flags: review- → review+
Comment 21•12 years ago
|
||
Not a buildduty need for tracking anymore
Ben --> can you please make sure all the patches are properly landed.
Flags: needinfo?(bhearsum)
Whiteboard: [buildduty][buildslaves][capacity][badslave?] This slave might have issues even if hardware diagnostics did not catch anything → [buildslaves][capacity][badslave?] This slave might have issues even if hardware diagnostics did not catch anything
Updated•12 years ago
|
Status: REOPENED → RESOLVED
Closed: 12 years ago → 12 years ago
Flags: needinfo?(bhearsum)
Resolution: --- → FIXED
Assignee | ||
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
Updated•6 years ago
|
Product: Release Engineering → Infrastructure & Operations
Updated•5 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•