Closed
Bug 646056
Opened 13 years ago
Closed 13 years ago
Releng machines should use ntp.build.mozilla.org as their time server
Categories
(Release Engineering :: General, defect, P3)
Release Engineering
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: zandr, Assigned: bhearsum)
References
Details
(Whiteboard: [puppet][opsi][buildmasters][buildslaves])
Attachments
(4 files)
(deleted),
patch
|
dustin
:
review+
bhearsum
:
checked-in+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
dustin
:
review+
bhearsum
:
checked-in+
|
Details | Diff | Splinter Review |
(deleted),
text/plain
|
Details | |
(deleted),
patch
|
dustin
:
review+
bhearsum
:
checked-in+
|
Details | Diff | Splinter Review |
In bug 617414 we've been working to lock down internet access for test machines. In the logs, I've been specifically ignoring ntp because the use of pool.ntp.org means we have literally hundreds of hosts listed. The macs appear to be hitting time.apple.com, and there's at least a little traffic to Microsoft. If we shut down internet access before this is resolved, we can leave 123 open, but we really should use a local ntp server: ntp.build.mozilla.org.
Comment 1•13 years ago
|
||
There's a real mixture of behaviours here, depending on the age of the reference platform for various classes of machines. Some examples: * bm-xserveN point at ntp1.bmo via /etc/ntpd.conf * moz2-darwin10-slaveN points to time.apple.com and ntp1.bmo (if you modify ntp.conf don't use the System Preferences to modify the server afterwards) * moz2-darwin9-slaveN points to ntp1.bmo only * moz2-linux-slaveN are getting info from dhcp (end of /etc/ntp.conf): # servers generated by /sbin/dhclient-script server 63.245.208.36 server 63.245.208.37 server 10.2.71.5 server 127.127.1.0 fudge 127.127.1.0 stratum 10 I'd be surprised if talos-r3* was modified from the default install behaviour. Bug 539278 is related. Also, why do both ntp1.bmo and ntp.bmo point at machines in scl1 ? It used to be a border box in MPT (like moz2-linux-slaveN get now).
Reporter | ||
Comment 2•13 years ago
|
||
(In reply to comment #1) > Also, why do both ntp1.bmo and ntp.bmo point at machines in scl1 ? It used to > be a border box in MPT (like moz2-linux-slaveN get now). I didn't configure it, but ntp.b.m.o is maintained by IT as The Right Thing. At the moment, it's a round-robin of ntp1 and ntp2.infra.scl1, which are actually ns1 and ns2.infra.scl1. VMs on a lightly loaded cluster seem like a better choice to me than the ntp server on heavily loaded routers, and using that service name will let us change where that service comes from without having to touch all the machines.
Updated•13 years ago
|
Priority: -- → P3
Whiteboard: [puppet][opsi][buildmasters][buildslaves]
Reporter | ||
Comment 3•13 years ago
|
||
Now that DHCP will be serving option ntp-servers, we should use that. This will allow a local NTP server without having to rely on split horizon or search path.
Summary: Releng machines should use ntp.build.mozilla.org for time. → Releng machines should use follow DHCP option ntp-servers for time.
Comment 4•13 years ago
|
||
so we should revert the suggestion in https://bugzilla.mozilla.org/show_bug.cgi?id=646563#c0 to set PEERNTP=no in our dhcp configs?
Assignee | ||
Comment 5•13 years ago
|
||
(In reply to comment #4) > so we should revert the suggestion in > https://bugzilla.mozilla.org/show_bug.cgi?id=646563#c0 to set PEERNTP=no in > our dhcp configs? Google says yes.
Assignee: nobody → bhearsum
Assignee | ||
Comment 6•13 years ago
|
||
Once bug 646126 I'll have a look and see which other classes of machines need updates.
Assignee | ||
Comment 7•13 years ago
|
||
This discussion seems to indicate that it's not possible for Windows to obey an NTP server: http://superuser.com/questions/147248/windows-clients-not-using-ntp-server-provided-via-dhcp. And by way of omission, http://support.microsoft.com/kb/121005 claims NTP servers aren't supported. So, looks like we're managing it ourselves on Windows. We've got OPSI in some places (XP test machines, build machines, maybe some Windows 7 ones?), which will easily manage this. For everything else, should be easy to do over ssh.
Assignee | ||
Comment 8•13 years ago
|
||
(In reply to comment #7) > This discussion seems to indicate that it's not possible for Windows to obey > an NTP server: > http://superuser.com/questions/147248/windows-clients-not-using-ntp-server- > provided-via-dhcp. And by way of omission, > http://support.microsoft.com/kb/121005 claims NTP servers aren't supported. > > So, looks like we're managing it ourselves on Windows. We've got OPSI in > some places (XP test machines, build machines, maybe some Windows 7 ones?), > which will easily manage this. For everything else, should be easy to do > over ssh. Looks like the OS X DHCP Client may not support it either. It's very hard to find information on it, but based on my googling of www.opensource.apple.com, it looks like this option is only referenced in a few header files: http://www.google.ca/search?hl=en&client=firefox-a&hs=OJI&rls=org.mozilla%3Atl%3Aunofficial&q=DHCPTAG_NETWORK_TIME_PROTOCOL_SERVERS+site%3Awww.opensource.apple.com&aq=f&aqi=&aql=&oq= http://www.google.ca/search?hl=en&client=firefox-a&hs=ygI&rls=org.mozilla%3Atl%3Aunofficial&q=dhcptag_network_time_protocol_servers_e+site%3Awww.opensource.apple.com&aq=f&aqi=&aql=&oq= Based on that, I'm not going to spend any more time researching this and set them explicitly through Puppet on Mac.
Assignee | ||
Updated•13 years ago
|
Summary: Releng machines should use follow DHCP option ntp-servers for time. → Releng machines should use follow DHCP option ntp-servers for time (or use ntp.build.mozilla.org)
Assignee | ||
Comment 9•13 years ago
|
||
So, after doing a bit more research, it turns out we need to remove existing NTP servers from ntp.conf before DHCP will update that file. However, to do that, we'd have to manage ntp.conf completely....which means that at boot, DHCP would set the servers in ntp.conf, and then get overridden by Puppet very shortly afterwards. Given that, I'm tossing out the idea of using this DHCP server and going with the simple plan of hardcoding ntp.build.mozilla.org everywhere. For Linux and Mac this will be managed by Puppet. For XP and Windows 2003, by OPSI. For Windows 7, it'll have to be done by hand over SSH.
Summary: Releng machines should use follow DHCP option ntp-servers for time (or use ntp.build.mozilla.org) → Releng machines should use ntp.build.mozilla.org as their time server
Assignee | ||
Comment 10•13 years ago
|
||
This patch syncs out new ntp.conf's everywhere. Mostly it's just re-arranging existing options but it does make sure the server is "ntp.build.mozilla.org" everywhere, and in same places, adds the "restrict 10.0.0.0 mask 255.0.0.0" line. On Linux build machines we're actually not syncing the time currently, and I've kept that behaviour because I don't want to deal with potential issues with ntp + VMs here. These machines will get a useful ntp.conf however, so turning it on later will be trivial. Tested this across all types of machines that sync with Puppet.
Attachment #535333 -
Flags: review?(dustin)
Assignee | ||
Comment 11•13 years ago
|
||
Tested on XP and 2003.
Attachment #535347 -
Flags: review?(dustin)
Assignee | ||
Comment 12•13 years ago
|
||
Updated•13 years ago
|
Attachment #535347 -
Flags: review?(dustin) → review+
Updated•13 years ago
|
Attachment #535333 -
Flags: review?(dustin) → review+
Assignee | ||
Comment 13•13 years ago
|
||
Got bogged down with other work, planning to land all of these changes on Monday.
Assignee | ||
Updated•13 years ago
|
Attachment #535333 -
Flags: checked-in+
Assignee | ||
Comment 14•13 years ago
|
||
I tested the Puppet part on one of each type of slave, and it seems to be landing fine. Moving on to the OPSI and Windows 7 parts.
Assignee | ||
Comment 15•13 years ago
|
||
Comment on attachment 535347 [details] [diff] [review] OPSI package to set the time server Landed, and set to deploy across the board on Windows 2003 build machines & XP test machines.
Attachment #535347 -
Flags: checked-in+
Assignee | ||
Comment 16•13 years ago
|
||
Turns out I can't deploy to Windows 7 over ssh, so I'll have to do it over VNC. Planning to do so in tomorrow morning's downtime, because it'll be much easier when I don't have to worry as much about breaking things.... Deployment will happen with these commands, in a cmd.exe started through "run as administrator": wget -O time.reg --no-check-certificate https://bugzilla.mozilla.org/attachment.cgi?id=535348 reg import time.reg
Assignee | ||
Comment 17•13 years ago
|
||
I've rolled out changes to all of the talos-r3-w7 machines except: - 001, which is awaiting a re-image - 011, 032, and 045, which are awaiting reboots I've left a comment in bug 649835 about updating the NTP server after 001 get's re-imaged, and I'll take care of updating the other 3 when they come back from the reboot.
Assignee | ||
Comment 18•13 years ago
|
||
talos-r3-w7-032 and 045 are done. Just 001 and 011 left.
Assignee | ||
Comment 19•13 years ago
|
||
Dustin reminded me that the ref machines need doing, too. Earlier this morning I made sure talos-r3-xp-ref, win2k3sp2-ref (the master VMware image), and win32-ix-ref were up to date. Just a few minutes ago, I updated talos-r3-w7-ref. That's all of them. (In reply to comment #18) > talos-r3-w7-032 and 045 are done. Just 001 and 011 left. 001 is being re-imaged from the ref machine in bug 649835, so scratch that from the list. Only have 011 to worry about now.
Assignee | ||
Comment 20•13 years ago
|
||
A log capture in bug 646046 showed that the Windows machines are still hitting time.apple.com. Turns out they run AppleTimeSrv.exe, which is at fault. This service claims to keep time in sync when rebooting between OS X and Windows, and indeed, just rebooting a Windows machine with the service disabled leaves me with the correct time. I'm going to disable it by hand on the staging Windows test machines, and if they all still have the correct time next week, I'll roll that change out to the rest of them.
Assignee | ||
Comment 21•13 years ago
|
||
Finally got talos-r3-w7-011 updated. Only thing left to do is figure out if turning off AppleTimeSrv is safe, and if so, do it.
Assignee | ||
Comment 22•13 years ago
|
||
(In reply to comment #20) > A log capture in bug 646046 showed that the Windows machines are still > hitting time.apple.com. Turns out they run AppleTimeSrv.exe, which is at > fault. This service claims to keep time in sync when rebooting between OS X > and Windows, and indeed, just rebooting a Windows machine with the service > disabled leaves me with the correct time. I'm going to disable it by hand on > the staging Windows test machines, and if they all still have the correct > time next week, I'll roll that change out to the rest of them. All of the staging machines still have the correct time, so I'll update the OPSI package to disable this service everywhere, and manually disable it on Windows 7.
Assignee | ||
Comment 23•13 years ago
|
||
Attachment #538988 -
Flags: review?(dustin)
Comment 24•13 years ago
|
||
Comment on attachment 538988 [details] [diff] [review] disable appletimesrv service +sc config AppleTimeSrv start= disabled ^ intentional? r+ on the assumption it worked for you..
Attachment #538988 -
Flags: review?(dustin) → review+
Assignee | ||
Comment 25•13 years ago
|
||
(In reply to comment #24) > Comment on attachment 538988 [details] [diff] [review] [review] > disable appletimesrv service > > +sc config AppleTimeSrv start= disabled > ^ intentional? > > r+ on the assumption it worked for you.. Yup, in fact, it's *required*: http://www.techrepublic.com/forum/discussions/47-171983
Assignee | ||
Comment 26•13 years ago
|
||
Comment on attachment 538988 [details] [diff] [review] disable appletimesrv service I landed this, and marked all of the XP machines for a re-install of the package.
Attachment #538988 -
Flags: checked-in+
Assignee | ||
Comment 27•13 years ago
|
||
I went through all the Windows 7 machines (as they became idle), and disabled the AppleTimeSrv service on them, too. As far as I know, we're all done here.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
You need to log in
before you can comment on or make changes to this bug.
Description
•