Closed
Bug 1223956
Opened 9 years ago
Closed 9 years ago
Change pvtbuilds NFS mount to a transitional volume
Categories
(Infrastructure & Operations Graveyard :: WebOps: Product Delivery, task)
Infrastructure & Operations Graveyard
WebOps: Product Delivery
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: gcox, Assigned: cknowles)
References
Details
(Whiteboard: [kanban:https://webops.kanbanize.com/ctrl_board/2/2149] )
When: next TCW, roughly 15 minutes in duration System(s) affected: builds / treeherder / partner builds Notifs: usual TCW comms Point: cknowles, selenamarie-or-delegate Plan: To help with the evacuation of product delivery, we are going to unmount the existing pvtbuilds NFS mount (living on soon-to-be-off-warranty hardware), and replace it with a same-named, smaller, empty pvtbuilds mount on supported hardware. This will buy time for legacy code to be migrated, past our warranty deadline. "Unmount from all boxes, switch the volume on the filer, remount" covers the window; rollback is same in the other direction. The original volume will be kept, offline but recoverable, for ~1 week before being deleted.
Reporter | ||
Updated•9 years ago
|
Change Request: --- → ?
Updated•9 years ago
|
Blocks: TCW-2015-11-21
Comment 1•9 years ago
|
||
Reviewed 11/18 and scheduled for 11/21/2015 TCW
Change Request: ? → approved
Assignee | ||
Comment 2•9 years ago
|
||
Work completed on schedule - :selenamarie confirmed that things look good post the remount - closing out.
Assignee | ||
Updated•9 years ago
|
Assignee: server-ops-webops → cknowles
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Comment 3•9 years ago
|
||
Just wondering if the remount was mounted with the same permissions It seems we have quite a few errors like this Return code: 1 Failed to log stats. Exception = [Errno 185090050] _ssl.c:340: error:0B084002:x509 certificate routines:X509_load_cert_crl_file:system lib Return code: 1 rsync error: error in file IO (code 11) at main.c(587) [Receiver=3.0.9] rsync: connection unexpectedly closed (9 bytes received so far) [sender] rsync error: error in rsync protocol data stream (code 12) at io.c(600) [sender=3.0.6] Return code: 12 Unable to rsync /builds/slave/b2g_b2g-in_nexus-4_dep-0000000/build/upload to pvtbuilds.pvt.build.mozilla.org:/pvt/mozilla.org/b2gotoro/tinderbox-builds/b2g-inbound-nexus-4/20151121143005! Failed to upload /builds/slave/b2g_b2g-in_nexus-4_dep-0000000/build/upload to b2gbld@pvtbuilds.pvt.build.mozilla.org:/pvt/mozilla.org/b2gotoro/tinderbox-builds/b2g-inbound-nexus-4/20151121143005! http://ftp.mozilla.org/pub/mozilla.org/b2g/tinderbox-builds/b2g-inbound-nexus-4/1448145005/b2g_b2g-inbound_nexus-4_dep-bm73-build1-build139.txt.gz or Cron <b2gbld@upload-cron> nice -n 19 find /mnt/pvt_builds/pvt/mozilla.org/b2gotoro/tinderbox-builds -mindepth 2 -maxdepth 2 -not -wholename '*/mozilla-b2g30_v1_4-hamachi*' -not -wholename '*/*-flame*' -type d -mtime +20 -print0 | xargs -0 rm -rf Inbox x Cron Daemon Cron Daemon <root@upload-cron.private.scl3.mozilla.com> 7:00 PM (3 hours ago) to release rm: cannot remove `/mnt/pvt_builds/pvt/mozilla.org/b2gotoro/tinderbox-builds/b2g-inbound-nexus-5-l-eng/20151031143003/logs/localconfig.json': Permission denied rm: cannot remove `/mnt/pvt_builds/pvt/mozilla.org/b2gotoro/tinderbox-builds/b2g-inbound-nexus-5-l-eng/20151031143003/logs/log_critical.log': Permission denied rm: cannot remove `/mnt/pvt_builds/pvt/mozilla.org/b2gotoro/tinderbox-builds/b2g-inbound-nexus-5-l-eng/20151031143003/logs/log_error.log': Permission denied rm: cannot remove `/mnt/pvt_builds/pvt/mozilla.org/b2gotoro/tinderbox-builds/b2g-inbound-nexus-5-l-eng/20151031143003/logs/log_fatal.log': Permission denied rm: cannot remove `/mnt/pvt_builds/pvt/mozilla.org/b2gotoro/tinderbox-builds/b2g-inbound-nexus-5-l-eng/20151031143003/logs/log_info.log': Permission denied
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Reporter | ||
Comment 4•9 years ago
|
||
The vol is exported from the filer with the same filer perms, and mounted with the same client perms. However, it looks like the data copied over did not retain the perms of the original: A temporary / read-only copy of the old volume: [root@pvtbuilds2.dmz.scl3 ~]# ls -l /tmp/qq/pvt/mozilla.org/b2gotoro/tinderbox-builds/b2g-inbound-nexus-5-l-eng/20151031143003/logs/localconfig.json -rw-rw-r-- 1 b2gbld b2gbld 5928 Oct 31 23:11 /tmp/qq/pvt/mozilla.org/b2gotoro/tinderbox-builds/b2g-inbound-nexus-5-l-eng/20151031143003/logs/localconfig.json The new prod volume: [root@pvtbuilds2.dmz.scl3 ~]# ls -l /mnt/pvt_builds/pvt/mozilla.org/b2gotoro/tinderbox-builds/b2g-inbound-nexus-5-l-eng/20151031143003/logs/localconfig.json -rw-rw-r-- 1 root root 5928 Oct 31 23:11 /mnt/pvt_builds/pvt/mozilla.org/b2gotoro/tinderbox-builds/b2g-inbound-nexus-5-l-eng/20151031143003/logs/localconfig.json I can't change these (well, I COULD but I don't know what I'm doing there). Basically, you probably have some mass chowns needed.
Flags: needinfo?(sdeckelmann)
Comment 5•9 years ago
|
||
How was the data copied that lost the perms in the first place? If this is causing errors, can we match the file permissions from the old volume (is that the desired end state here)? Either by using rsync (if it's some known exact subset), or by using a script that looks at the files in the new mount point and matches the perms from the old one?
Comment 6•9 years ago
|
||
Hi, this impacts now mozilla-central, mozilla-inbound and b2g-inbound tree with the device builds at least like https://treeherder.mozilla.org/logviewer.html#?job_id=3415635&repo=b2g-inbound 23:49:28 INFO - rsync: mkdir "/pvt/mozilla.org/b2gotoro/tinderbox-builds/b2g-inbound-flame-kk-eng/20151122215327" failed: Permission denied (13) so raising this as blocker, since this is a perma failure on the affect trees
Severity: normal → blocker
Comment 7•9 years ago
|
||
closed affected trees due to mass perma failures of the affected buildbot device builds
Comment 8•9 years ago
|
||
I'd like to suggest we run these commands to get the tree re-opened: chown -R b2gbld:b2gbld /mnt/pvt_builds/pvt/mozilla.org/b2gotoro/tinderbox-builds chown -R b2gbld:b2gbld /mnt/pvt_builds/pvt/mozilla.org/b2gotoro/nightly There may be other issues but that should cover the bulk of immediate problem. It's simply setting the group on the top level of the given directories, then fixing root:root ownership of everything within that.
Comment 10•9 years ago
|
||
============================= old permissions before change =============================== root@pvtbuilds2.dmz.scl3 ~]# ls -ld /mnt/pvt_builds/pvt/mozilla.org/b2gotoro/tinderbox-builds drwxr-s--- 38 b2gbld root 4096 Nov 22 00:48 /mnt/pvt_builds/pvt/mozilla.org/b2gotoro/tinderbox-builds [root@pvtbuilds2.dmz.scl3 ~]# ls -ld /mnt/pvt_builds/pvt/mozilla.org/b2gotoro/nightly drwxr-s--- 20 b2gbld root 4096 Nov 20 19:28 /mnt/pvt_builds/pvt/mozilla.org/b2gotoro/nightly [ root@pvtbuilds2.dmz.scl3 ~]# chown -R b2gbld:b2gbld /mnt/pvt_builds/pvt/mozilla.org/b2gotoro/tinderbox-builds [root@pvtbuilds2.dmz.scl3 ~]# chown -R b2gbld:b2gbld /mnt/pvt_builds/pvt/mozilla.org/b2gotoro/nightly ==================== Permissions after change ======================== [root@pvtbuilds2.dmz.scl3 ~]# ls -ld /mnt/pvt_builds/pvt/mozilla.org/b2gotoro/tinderbox-builds drwxr-s--- 38 b2gbld b2gbld 4096 Nov 22 00:48 /mnt/pvt_builds/pvt/mozilla.org/b2gotoro/tinderbox-builds [root@pvtbuilds2.dmz.scl3 ~]# ls -ld /mnt/pvt_builds/pvt/mozilla.org/b2gotoro/nightly drwxr-s--- 20 b2gbld b2gbld 4096 Nov 20 19:28 /mnt/pvt_builds/pvt/mozilla.org/b2gotoro/nightly [root@pvtbuilds2.dmz.scl3 ~]#
Reporter | ||
Comment 13•9 years ago
|
||
[15:41:10] <gcox> Tomcat|Sheriffduty: Heya, bug 1223956 was marked a blocker overnight. Is it still blocking, did the chowns fix it, or are we still waiting to learn more? [15:41:41] <Tomcat|Sheriffduty> gcox: oh its ok now, the fix fixed this [15:42:13] <Tomcat|Sheriffduty> and trees are now open again
Severity: blocker → normal
Status: REOPENED → RESOLVED
Closed: 9 years ago → 9 years ago
Flags: needinfo?(sdeckelmann)
Resolution: --- → FIXED
Blocks: 1227170
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Updated•8 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•