Closed
Bug 902657
Opened 11 years ago
Closed 11 years ago
panda-recovery
Categories
(Infrastructure & Operations :: DCOps, task)
Infrastructure & Operations
DCOps
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: Callek, Assigned: jpech)
References
Details
+++ This bug was initially created as a clone of Bug #817103 +++
Reporter | ||
Updated•11 years ago
|
Blocks: panda-0282
Reporter | ||
Updated•11 years ago
|
Blocks: panda-0292
Reporter | ||
Updated•11 years ago
|
Blocks: panda-0295
Reporter | ||
Updated•11 years ago
|
Blocks: panda-0300
Reporter | ||
Updated•11 years ago
|
Blocks: panda-0301
Reporter | ||
Updated•11 years ago
|
Blocks: panda-0305
Reporter | ||
Updated•11 years ago
|
Blocks: panda-0306
Reporter | ||
Updated•11 years ago
|
Blocks: panda-0325
Reporter | ||
Updated•11 years ago
|
Blocks: panda-0340
Reporter | ||
Updated•11 years ago
|
Blocks: panda-0387
Reporter | ||
Updated•11 years ago
|
Blocks: panda-0396
Reporter | ||
Updated•11 years ago
|
Blocks: panda-0479
Reporter | ||
Updated•11 years ago
|
Blocks: panda-0482
Reporter | ||
Updated•11 years ago
|
Blocks: panda-0729
Reporter | ||
Updated•11 years ago
|
Blocks: panda-0737
Reporter | ||
Updated•11 years ago
|
Blocks: panda-0743
Reporter | ||
Updated•11 years ago
|
Blocks: panda-0763
Reporter | ||
Updated•11 years ago
|
Blocks: panda-0769
Reporter | ||
Updated•11 years ago
|
Blocks: panda-0770
Reporter | ||
Updated•11 years ago
|
Blocks: panda-0788
Reporter | ||
Updated•11 years ago
|
Blocks: panda-0172
Reporter | ||
Updated•11 years ago
|
Blocks: panda-0180
Reporter | ||
Updated•11 years ago
|
Blocks: panda-0296
Reporter | ||
Updated•11 years ago
|
Blocks: panda-0313
Reporter | ||
Updated•11 years ago
|
Blocks: panda-0371
Reporter | ||
Updated•11 years ago
|
Blocks: panda-0392
Reporter | ||
Updated•11 years ago
|
Blocks: panda-0395
Reporter | ||
Updated•11 years ago
|
Blocks: panda-0664
Reporter | ||
Updated•11 years ago
|
Blocks: panda-0674
Reporter | ||
Updated•11 years ago
|
Blocks: panda-0696
Reporter | ||
Updated•11 years ago
|
Blocks: panda-0720
Reporter | ||
Updated•11 years ago
|
Blocks: panda-0820
Updated•11 years ago
|
Assignee: relops → jwatkins
Reporter | ||
Updated•11 years ago
|
Blocks: panda-0810
Updated•11 years ago
|
No longer blocks: panda-0479
Updated•11 years ago
|
No longer blocks: panda-0482
Reporter | ||
Updated•11 years ago
|
Blocks: panda-0739
Updated•11 years ago
|
Blocks: panda-0784
Updated•11 years ago
|
Blocks: panda-0044
Updated•11 years ago
|
Blocks: panda-0870
Updated•11 years ago
|
Blocks: panda-0816
Updated•11 years ago
|
Blocks: panda-0795
Updated•11 years ago
|
Blocks: panda-0036
Updated•11 years ago
|
Blocks: panda-0864
Updated•11 years ago
|
Assignee: jwatkins → achavez
Comment 3•11 years ago
|
||
All had failures, replaced SD cards on 2013-08-26.
The following list of pandas can be put back into service:
panda-0282 ready
panda-0300 ready
panda-0301 ready
panda-0305 ready
panda-0306 ready
panda-0313 ready
panda-0325 ready
panda-0387 ready
panda-0396 ready
panda-0729 ready
panda-0737 ready
panda-0743 ready
panda-0763 ready
panda-0769 ready
panda-0770 ready
panda-0788 ready
panda-0810 ready
panda-0820 ready
Comment 4•11 years ago
|
||
The following had SD card failures, SD cards were replaced and can be put back in production:
panda-0292
panda-0295
panda-0296
Panda board failure, decommissioned and will be replaced with a new panda:
panda-0172
Updated•11 years ago
|
Blocks: panda-0265
Updated•11 years ago
|
Blocks: panda-0731
Updated•11 years ago
|
Blocks: panda-0803
Updated•11 years ago
|
Blocks: panda-0834
Updated•11 years ago
|
Blocks: panda-0801
Updated•11 years ago
|
Blocks: panda-0259
Updated•11 years ago
|
Blocks: panda-0835
Updated•11 years ago
|
Blocks: panda-0262
Updated•11 years ago
|
Assignee: achavez → arich
Status: NEW → ASSIGNED
Updated•11 years ago
|
Assignee: arich → achavez
Comment 5•11 years ago
|
||
These pandas were incorrectly flagged by the new mozpool selftest. The selftest has since been corrected and they are now passing the selftest without issue. We can close the tracker bugs and return them to service.
0835
0834
0803
0731
0674
0664
Comment 6•11 years ago
|
||
Also, a false positive. Pls return to service.
0819
Comment 7•11 years ago
|
||
panda-0036 removed/ decommissioned
panda-0044 removed/ decommissioned
panda-0172 removed/ decommissioned
panda-0180 passed self test/sd card replaced
panda-0259 passed self test/sd card replaced
panda-0262 passed self test/sd card replaced
panda-0265 passed self test/sd card replaced
panda-0340 removed/decomissioned
panda-0371 removed/decommissioned
panda-0392 removed/decomissioned
panda-0395 removed/decommissioned
panda-0696 removed/decommissioned
panda-0720 removed/decommissioned
panda-0739 passed self test/sd card replaced
panda-0784 passed self test/sd card replaced
panda-0795 passed self test/sd card replaced
panda-0801 passed self test/sd card replaced
panda-0816 passed self test/sd card replaced
panda-0864 passed self test/sd card replaced
panda-0870 passed self test/sd card replaced
Comment 8•11 years ago
|
||
Before completely removing the decommissioned boards from mozpool/production, we should double check them. I see some boards that were removed but had previously passing tests.
Updated•11 years ago
|
Component: RelOps → Server Operations: DCOps
Product: Infrastructure & Operations → mozilla.org
QA Contact: arich → dmoore
Assignee | ||
Updated•11 years ago
|
colo-trip: --- → scl1
Updated•11 years ago
|
Whiteboard: [Will work with Jake on this Thursday]
Updated•11 years ago
|
Blocks: panda-0694
Updated•11 years ago
|
Blocks: panda-0621
Updated•11 years ago
|
Blocks: panda-0056
Updated•11 years ago
|
Blocks: panda-0081
Updated•11 years ago
|
Blocks: panda-0260
Updated•11 years ago
|
Blocks: panda-0261
Updated•11 years ago
|
Blocks: panda-0263
Updated•11 years ago
|
Blocks: panda-0264
Updated•11 years ago
|
Blocks: panda-0266
Updated•11 years ago
|
Blocks: panda-0258
Updated•11 years ago
|
Blocks: panda-0267
Updated•11 years ago
|
Blocks: panda-0753
Updated•11 years ago
|
Blocks: panda-0733
Updated•11 years ago
|
Blocks: panda-0797
Updated•11 years ago
|
Blocks: panda-0808
Updated•11 years ago
|
Blocks: panda-0831
Updated•11 years ago
|
Blocks: panda-0856
Updated•11 years ago
|
Blocks: panda-0275
Updated•11 years ago
|
Blocks: panda-0273
Updated•11 years ago
|
Blocks: panda-0278
Updated•11 years ago
|
Blocks: panda-0279
Updated•11 years ago
|
Blocks: panda-0281
Updated•11 years ago
|
Blocks: panda-0283
Updated•11 years ago
|
Blocks: panda-0286
Updated•11 years ago
|
Blocks: panda-0288
Updated•11 years ago
|
Blocks: panda-0285
Updated•11 years ago
|
Blocks: panda-0287
Updated•11 years ago
|
Blocks: panda-0284
Updated•11 years ago
|
Blocks: panda-0290
Updated•11 years ago
|
Blocks: panda-0289
Updated•11 years ago
|
Blocks: panda-0268
Updated•11 years ago
|
Blocks: panda-0270
Updated•11 years ago
|
Blocks: panda-0271
Updated•11 years ago
|
Blocks: panda-0277
Updated•11 years ago
|
Blocks: panda-0272
Updated•11 years ago
|
Blocks: panda-0276
Updated•11 years ago
|
Blocks: panda-0269
Updated•11 years ago
|
Blocks: panda-0274
Updated•11 years ago
|
Blocks: panda-0280
Updated•11 years ago
|
Blocks: panda-0554
Updated•11 years ago
|
Blocks: panda-0562
Updated•11 years ago
|
Blocks: panda-0598
Updated•11 years ago
|
Blocks: panda-0679
Updated•11 years ago
|
Blocks: panda-0750
Updated•11 years ago
|
Whiteboard: [Will work with Jake on this Thursday] → [Will work with Jake on this after summit2013]
Updated•11 years ago
|
Blocks: panda-0758
Updated•11 years ago
|
Blocks: panda-0137
Updated•11 years ago
|
Blocks: panda-0552
Updated•11 years ago
|
Blocks: panda-0757
Updated•11 years ago
|
Blocks: panda-0782
Updated•11 years ago
|
Blocks: panda-0136
Updated•11 years ago
|
Blocks: panda-0829
Updated•11 years ago
|
Blocks: panda-0756
Updated•11 years ago
|
Blocks: panda-0745
Updated•11 years ago
|
Blocks: panda-0783
Updated•11 years ago
|
Blocks: panda-0744
Reporter | ||
Comment 9•11 years ago
|
||
Ashlee Chavez [:Ashlee] 2013-10-02 16:47:22 EDT
Whiteboard: [Will work with Jake on this Thursday] → [Will work with Jake on this after summit2013]
ETA on peeking guys?
Flags: needinfo?(jwatkins)
Flags: needinfo?(achavez)
Reporter | ||
Updated•11 years ago
|
Blocks: panda-0558
Reporter | ||
Updated•11 years ago
|
Blocks: panda-0843
Updated•11 years ago
|
Blocks: panda-0026
Updated•11 years ago
|
Blocks: panda-0734
Updated•11 years ago
|
Blocks: panda-0139
Updated•11 years ago
|
Blocks: panda-0796
Comment 10•11 years ago
|
||
(In reply to Justin Wood (:Callek) from comment #9)
> Ashlee Chavez [:Ashlee] 2013-10-02 16:47:22 EDT
> Whiteboard: [Will work with Jake on this Thursday] → [Will work with Jake on
> this after summit2013]
>
> ETA on peeking guys?
I've spoken with Jake via irc, we have not come to a conclusion as to when we will be able to tackle this.
Jake, any ideas?
Updated•11 years ago
|
Blocks: panda-0678
Updated•11 years ago
|
Blocks: panda-0832
Updated•11 years ago
|
Blocks: panda-0718
Updated•11 years ago
|
Depends on: panda-0818
Updated•11 years ago
|
Flags: needinfo?(achavez)
Updated•11 years ago
|
Blocks: panda-0818
No longer depends on: panda-0818
Updated•11 years ago
|
Blocks: panda-0023
Updated•11 years ago
|
Blocks: panda-0444
Updated•11 years ago
|
Blocks: panda-0091
Updated•11 years ago
|
Blocks: panda-0046
Updated•11 years ago
|
Blocks: panda-0362
Updated•11 years ago
|
Blocks: panda-0858
Reporter | ||
Updated•11 years ago
|
Blocks: panda-0479
Reporter | ||
Updated•11 years ago
|
Blocks: panda-0482
Updated•11 years ago
|
Blocks: panda-0095
Updated•11 years ago
|
Blocks: panda-0107
Updated•11 years ago
|
Blocks: panda-0344
Updated•11 years ago
|
Blocks: panda-0337
Updated•11 years ago
|
Blocks: panda-0330
Updated•11 years ago
|
Blocks: panda-0665
Reporter | ||
Updated•11 years ago
|
Blocks: panda-0701
Reporter | ||
Updated•11 years ago
|
Blocks: panda-0357
Reporter | ||
Updated•11 years ago
|
Blocks: panda-0347
Assignee | ||
Comment 11•11 years ago
|
||
Since Ashlee have moved to another team. I want to volunteer and take over this bug and wish to solve this bug before my internship ends..(T.T) What and Who can help guide me through the process to recover the Panda boards?
Flags: needinfo?(bugspam.Callek)
Whiteboard: [Will work with Jake on this after summit2013]
Reporter | ||
Comment 12•11 years ago
|
||
Hey John,
Please work with Jake (already needinfo'd) and coord with dmoore as well to properly allocate your resources here. (It would be good imho, to have a permament member of dcops also go through this process with Jake, so we don't lose the mindshare when your internship ends)
Flags: needinfo?(bugspam.Callek) → needinfo?(dmoore)
Assignee | ||
Comment 13•11 years ago
|
||
(In reply to Justin Wood (:Callek) from comment #12)
> Hey John,
>
> Please work with Jake (already needinfo'd) and coord with dmoore as well to
> properly allocate your resources here. (It would be good imho, to have a
> permament member of dcops also go through this process with Jake, so we
> don't lose the mindshare when your internship ends)
Will do. Thanks for the info!
Updated•11 years ago
|
Assignee: achavez → jpech
Comment 14•11 years ago
|
||
DCops got a good 4hours of training on panda therapy yesterday. So we should get this bug resolved soon and on to a weekly "r/f and clone" schedule.
Flags: needinfo?(jwatkins)
Updated•11 years ago
|
Blocks: panda-0302
Updated•11 years ago
|
Blocks: panda-0294
Comment 15•11 years ago
|
||
Majority of the pandas are in the "ready" state for releng to proceed with testing. The below pandas are failing and will need further troubleshooting. If releng can close out the working pandas in the "block" list above, then it will help me narrow down exactly which pandas need investigation (similar to tegra bugs). Thanks!
panda-0172 failed_pxe_booting vhua-Unable to read "preEnv.txt" from mmc 0:1 **
panda-0444 failed_pxe_booting vhua-23.533630] panic occurred, switching back to text console
panda-0479 failed_pxe_booting vhua-not in chassis
panda-0482 failed_pxe_booting vhua-not in chassis
panda-0638 failed_pxe_booting panda-android-4.0.4_v3.1
panda-0797 failed_pxe_booting android
panda-0081 failed_self_test dividehex-panda-intervention
panda-0173 failed_self_test
panda-0280 failed_self_test vhua-selftest.py[INFO]: test_preseed_file_integrity[FAILED] boot.scr :
panda-0720 failed_self_test vhua-selftest.py[INFO]: test_mmc_blk_dev[FAILED] /dev/mmcblk0 - No such file or directory (tried multiple SD cards)
panda-0678 locked_out android
Comment 16•11 years ago
|
||
(In reply to Vinh Hua [:vinh] from comment #15)
> Majority of the pandas are in the "ready" state for releng to proceed with
> testing. The below pandas are failing and will need further
> troubleshooting. If releng can close out the working pandas in the "block"
> list above, then it will help me narrow down exactly which pandas need
> investigation (similar to tegra bugs). Thanks!
>
>
> panda-0172 failed_pxe_booting vhua-Unable to read "preEnv.txt" from mmc 0:1
Unable to read "preEnv.txt" from mmc 0:1 is a normal error message. The uboot loader should continue to load/netboot
Did you try swapping out the sdcard on this? if you did, does it halt at that msg or continue booting?
> **
> panda-0444 failed_pxe_booting vhua-23.533630] panic occurred, switching
> back to text console
This sounds like a pandaboard hardware issue and it would be interesting to see a entire serial console capture to better identify the deeper issue here. It might be something we can add to selftest to check for. I would suggest swapping the sdcard if you haven't. If you have, and it still continues, remove the panda board (and order a replacement if we don't any spare boards)
> panda-0479 failed_pxe_booting vhua-not in chassis
> panda-0482 failed_pxe_booting vhua-not in chassis
These 2 panda boards were removed from service and should be replaced. I'll remove them from mozpool. see bug836808
> panda-0638 failed_pxe_booting panda-android-4.0.4_v3.1
SDcard swap didn't work here? If so, we can assume the pandaboard is dead and should be replaced.
> panda-0797 failed_pxe_booting android
Same here?
> panda-0081 failed_self_test dividehex-panda-intervention
what is the reason the self_test failed here? (see device log)
> panda-0173 failed_self_test
Same. Why did it fail? (see device log)
> panda-0280 failed_self_test vhua-selftest.py[INFO]:
> test_preseed_file_integrity[FAILED] boot.scr :
Boot.scr integrity check failure indicates outdated preseed image and should be fixed by:
1.) force state to 'troubleshooting'
2.) please_image -> repair-boot
3.) please_self_test
> panda-0720 failed_self_test vhua-selftest.py[INFO]:
> test_mmc_blk_dev[FAILED] /dev/mmcblk0 - No such file or directory (tried
> multiple SD cards)
This test indicates a bad pandaboard. Remove and replace.
> panda-0678 locked_out android
I have no idea why (or who) locked_out this panda. Check with #releng or #ateam. There should always be a bug # in the comment of the panda that is locked_out for obvious reasons. If no one claims they have reserved it, force state to troubleshooting and then run a selftest.
Aside from the pandas listed here, I do think we should close this bug, migrate the list in c15 a new recovery bug and return the rest of the pandas to production. We really want to get into the habit of a weekly bug for DCOPs to handle.
Callek: is it reasonable to do this sometime this week so new problem pandas don't get cluttered up here.
Flags: needinfo?(bugspam.Callek)
Reporter | ||
Comment 17•11 years ago
|
||
(In reply to Jake Watkins [:dividehex] from comment #16)
> Aside from the pandas listed here, I do think we should close this bug,
> migrate the list in c15 a new recovery bug and return the rest of the pandas
> to production. We really want to get into the habit of a weekly bug for
> DCOPs to handle.
>
> Callek: is it reasonable to do this sometime this week so new problem pandas
> don't get cluttered up here.
Indeed, I had already planned to do so today, and got caught up with a power outage at home --> doing so now.
Flags: needinfo?(dmoore)
Flags: needinfo?(bugspam.Callek)
Reporter | ||
Updated•11 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Reporter | ||
Updated•11 years ago
|
Alias: panda-recovery
Updated•10 years ago
|
Product: mozilla.org → Infrastructure & Operations
Updated•10 years ago
|
No longer blocks: panda-0283
You need to log in
before you can comment on or make changes to this bug.
Description
•