Closed
Bug 646076
Opened 14 years ago
Closed 11 years ago
set-up bouncer region/country/ip blocks for build network that only point to internal mirrors, and point build network machines at it
Categories
(Release Engineering :: General, defect, P3)
Release Engineering
General
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: bhearsum, Unassigned)
References
Details
(Whiteboard: [bouncer][network])
In order to it possible to do bug 617414 in a maintainable way we need to change our releasetest tests to look at only internal mirrors, instead of all. By doing so, we avoid needing to whitelist all mirrors.
Reporter | ||
Comment 1•14 years ago
|
||
I've been trying to do this for the better part of the morning and haven't been able to, successfully. Rather than continuing to bang my head and bug you guys on IRC, I'm going to toss this over for someone to grab when they have time.
The machine I've been trying to test with is 10.2.71.18.
So far, I have:
- Added a new Country "Mozilla Land!", which is part of the Stage region.
- Tried adding an IP block for this machine's external address (63.245.208.144), which was in Mozilla Land.
- Tried adding a second Mirror entry for dm-download02, that was *only* in the Stage region.
- Tried bumping up the rating on both of the dm-download02 Mirror entries.
Through all of the above, the machine continues to be redirected to various mirrors across the world.
At this point, I've reverted everything I've done, except the Country.
Assignee: bhearsum → server-ops-releng
Component: Release Engineering → Server Operations: RelEng
QA Contact: release → zandr
Summary: set-up bouncer region/country/ip blocks for build network that only point to internal mirrors → set-up bouncer region/country/ip blocks for build network that only point to internal mirrors, and point build network machines at it
Comment 2•14 years ago
|
||
Moving to Server Ops, as I know little of bouncer. cc: justdave
Assignee: server-ops-releng → server-ops
Component: Server Operations: RelEng → Server Operations
QA Contact: zandr → mrz
Updated•14 years ago
|
Assignee: server-ops → justdave
Comment 5•14 years ago
|
||
Any progress on this?
Comment 6•14 years ago
|
||
(In reply to comment #1)
> - Tried adding an IP block for this machine's external address
> (63.245.208.144), which was in Mozilla Land.
It you didn't also remove that IP from whatever block it was already in, that's probably why it failed.
Comment 7•14 years ago
|
||
(In reply to comment #6)
> (In reply to comment #1)
> > - Tried adding an IP block for this machine's external address
> > (63.245.208.144), which was in Mozilla Land.
>
> It you didn't also remove that IP from whatever block it was already in, that's
> probably why it failed.
Didnt follow, but after discussion in RelEng meeting, nthomas will ping justdave offline.
Comment 8•14 years ago
|
||
Need to figure out the auth for https://tuxedo.stage.mozilla.com/ so that I can play with this in a safe place. I'll have to split any block contain the IP of interest in two.
fwenzel, what's the algorithm for converting 'Ip start addr' to 'Ip start' for an IP block in bouncer ?
Comment 9•14 years ago
|
||
(In reply to comment #8)
> fwenzel, what's the algorithm for converting 'Ip start addr' to 'Ip start' for
> an IP block in bouncer ?
n/m, after a few moments inspection it's
1.2.3.4 --> 1*256^3 + 2*256^2 + 3*256 + 4 = 16909060
Comment 10•14 years ago
|
||
The staging tuxedo instance is no good because the DB is really stale, so I overcame my reticence to futz with production.
This is what I needed to make it work:
* picked linux-ix-slave03, which has IP 63.245.220.220 after traversing NAT to outside world
* in the geo-ip settings changed
Ip start addr Ip start Ip end addr Ip end Country
63.245.128.0 1073053696 63.246.14.221 1073090269 United States (US)
to
63.245.128.0 1073053696 63.245.220.119 1073077367 United States (US)
63.245.220.200 1073077468 63.245.220.200 1073077468 Mozilla Land! (ZZ)
63.245.220.221 1073077469 63.246.14.221 1073090269 United States (US)
* verified country 'Mozilla Land!' is in region 'Stage'
* On the 'Stage' region, set the 'GeoIP Throttle' to 100 (was 0), so all requests from this region go to Stage mirrors, which is just dm-download02
* waited a minute or so for propagation
* got a consistent response
$ curl -I "http://download.mozilla.org/?product=firefox-4.0.1&os=osx&lang=en-US"
[snip]
Location: http://dm-download02.mozilla.org/pub/mozilla.org/firefox/releases/4.0.1/mac/en-US/Firefox%204.0.1.dmg
Setting the GeoIP throttle back to 0 gives a random mirror, as does undoing the IP block changes. I have left bouncer as I found it (ie both changes reverted).
So does this bug become 'figure what space of addresses build machines can end up with' ? Dave mentioned that the geo-ip database needs updating, so we'll have to figure out how to persist the changes we need.
Reporter | ||
Comment 11•14 years ago
|
||
Nick, thanks a bunch for figuring out the hard part here.
(In reply to comment #10)
> So does this bug become 'figure what space of addresses build machines can
> end up with' ? Dave mentioned that the geo-ip database needs updating, so
> we'll have to figure out how to persist the changes we need.
This summary sounds correct to me. I'll work on pushing this through.
Taking this back from Server Ops because it seems like most of the work is in my court, now.
Assignee: justdave → nobody
Component: Server Operations → Release Engineering
QA Contact: mrz → release
Reporter | ||
Comment 12•14 years ago
|
||
It strikes me that one way to make this super easy to maintain is to have the build network resolve download.mozilla.org to an internal IP address. I have no idea if that's feasible or has negative consequences. Dave (or anyone, really), do you know if that's a reasonable option?
Comment 13•14 years ago
|
||
cc: Ravi to add the list of hide NATs for mtv1, scl1, sjc1.
Comment 14•14 years ago
|
||
Re: Comment 13
MTV1: 63.245.220.220/32
SJC1: 63.245.208.144/32
SCL1: 63.245.222.66/32
Reporter | ||
Comment 15•14 years ago
|
||
(In reply to comment #13)
> cc: Ravi to add the list of hide NATs for mtv1, scl1, sjc1.
(In reply to comment #14)
> Re: Comment 13
>
> MTV1: 63.245.220.220/32
> SJC1: 63.245.208.144/32
> SCL1: 63.245.222.66/32
Does this imply that all machines inside of the build network will appear to have one of these IP addresses when connecting to download.mozilla.org?
Reporter | ||
Updated•14 years ago
|
Assignee: nobody → bhearsum
Reporter | ||
Comment 16•14 years ago
|
||
(In reply to comment #15)
> (In reply to comment #13)
> > cc: Ravi to add the list of hide NATs for mtv1, scl1, sjc1.
>
> (In reply to comment #14)
> > Re: Comment 13
> >
> > MTV1: 63.245.220.220/32
> > SJC1: 63.245.208.144/32
> > SCL1: 63.245.222.66/32
>
> Does this imply that all machines inside of the build network will appear to
> have one of these IP addresses when connecting to download.mozilla.org?
Tossing back over the fence to get an answer to this.
Assignee: bhearsum → server-ops
Component: Release Engineering → Server Operations
QA Contact: release → mrz
Comment 17•14 years ago
|
||
(In reply to comment #15)
> > MTV1: 63.245.220.220/32
> > SJC1: 63.245.208.144/32
> > SCL1: 63.245.222.66/32
>
> Does this imply that all machines inside of the build network will appear to
> have one of these IP addresses when connecting to download.mozilla.org?
Yes.
Reporter | ||
Comment 18•14 years ago
|
||
Thanks!
Assignee: server-ops → nobody
Component: Server Operations → Release Engineering
QA Contact: mrz → release
Reporter | ||
Comment 19•14 years ago
|
||
(In reply to comment #10)
> * in the geo-ip settings changed
> Ip start addr Ip start Ip end addr Ip end Country
> 63.245.128.0 1073053696 63.246.14.221 1073090269 United States (US)
> to
> 63.245.128.0 1073053696 63.245.220.119 1073077367 United States (US)
> 63.245.220.200 1073077468 63.245.220.200 1073077468 Mozilla Land! (ZZ)
> 63.245.220.221 1073077469 63.246.14.221 1073090269 United States (US)
> * verified country 'Mozilla Land!' is in region 'Stage'
> * On the 'Stage' region, set the 'GeoIP Throttle' to 100 (was 0), so all
> requests from this region go to Stage mirrors, which is just dm-download02
I re-did these changes, and am now re-running a final verification builder from 4.0.1 to verify.
> So does this bug become 'figure what space of addresses build machines can
> end up with' ? Dave mentioned that the geo-ip database needs updating, so
> we'll have to figure out how to persist the changes we need.
The space of addresses that needs updating is small, thankfully, as confirmed by zandr/ravi/cshields. Maybe we could set-up a nagios check to ensure that the build network is getting correctly routed? Just on one host or a fake host of some sort, per colo.
Component: Release Engineering → Server Operations
Reporter | ||
Updated•14 years ago
|
Component: Server Operations → Release Engineering
Reporter | ||
Updated•14 years ago
|
Assignee: nobody → bhearsum
Reporter | ||
Comment 20•14 years ago
|
||
(In reply to comment #19)
> (In reply to comment #10)
> > * in the geo-ip settings changed
> > Ip start addr Ip start Ip end addr Ip end Country
> > 63.245.128.0 1073053696 63.246.14.221 1073090269 United States (US)
> > to
> > 63.245.128.0 1073053696 63.245.220.119 1073077367 United States (US)
> > 63.245.220.200 1073077468 63.245.220.200 1073077468 Mozilla Land! (ZZ)
> > 63.245.220.221 1073077469 63.246.14.221 1073090269 United States (US)
> > * verified country 'Mozilla Land!' is in region 'Stage'
> > * On the 'Stage' region, set the 'GeoIP Throttle' to 100 (was 0), so all
> > requests from this region go to Stage mirrors, which is just dm-download02
>
> I re-did these changes, and am now re-running a final verification builder
> from 4.0.1 to verify.
The final verification I ran looked good, so I added the other two external IPs to Mozilla Land!. And as it turns out, the one-IP-only blocks seem to override the ranged ones (either that, or newer entries override older ones), so I'm a bit less concerned about getting busted here. Regardless, I think it's good to have Nagios checks for this, I'll be filing a bug on that shortly.
One strange thing I did notice is that when retrieving Firefox products (eg, http://download.mozilla.org/?product=firefox-4.0.1&os=osx&lang=en-US) the redirect always sent me to dm-download02, but when retrieving the nagios test product (http://download.mozilla.org/?product=nagios-test-product&os=none), I got sent to random mirrors. Not sure why this is, but not going to block on it, since Firefox products have been working correctly for the past 12 hours.
Reporter | ||
Comment 21•14 years ago
|
||
This continues to work as expected, resolving.
(In reply to comment #20)
> One strange thing I did notice is that when retrieving Firefox products (eg,
> http://download.mozilla.org/?product=firefox-4.0.1&os=osx&lang=en-US) the
> redirect always sent me to dm-download02, but when retrieving the nagios
> test product
> (http://download.mozilla.org/?product=nagios-test-product&os=none), I got
> sent to random mirrors. Not sure why this is, but not going to block on it,
> since Firefox products have been working correctly for the past 12 hours.
I'm going to guess that this is because Firefox is monitored by Sentry, and nagios-test-product isn't. Regardless, doesn't affect resolution of this bug.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 22•14 years ago
|
||
This doesn't seem to be working fully. For some reason, all of the Windows requests are getting pointed at 3crowd:
curl -I "http://download.mozilla.org/?product=firefox-5.0b3-partial-5.0b2&os=win&lang=af&force=1"
HTTP/1.1 302 Found
Server: Apache
X-Backend-Server: pp-app-dist06
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0, private
Content-Type: text/html; charset=UTF-8
Date: Wed, 01 Jun 2011 16:51:26 GMT
Location: http://mozilla-crowdcache.3crowd.com/mozilla/firefox/releases/5.0b3/update/win32/af/firefox-5.0b2-5.0b3.partial.mar
Pragma: no-cache
Transfer-Encoding: chunked
Connection: Keep-Alive
Set-Cookie: dmo=10.8.84.200.1306947086368240; path=/; expires=Thu, 31-May-12 16:51:26 GMT
X-Powered-By: PHP/5.1.6
While all of the other ones are getting pointed at dm-download02:
curl -I "http://download.mozilla.org/?product=firefox-5.0b3-partial-5.0b2&os=linux64&lang=af&force=1"
HTTP/1.1 302 Found
Server: Apache
X-Backend-Server: pp-app-dist02
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0, private
Content-Type: text/html; charset=UTF-8
Date: Wed, 01 Jun 2011 16:51:36 GMT
Location: http://dm-download02.mozilla.org/pub/mozilla.org/firefox/releases/5.0b3/update/linux-x86_64/af/firefox-5.0b2-5.0b3.partial.mar
Pragma: no-cache
Transfer-Encoding: chunked
Connection: Keep-Alive
Set-Cookie: dmo=10.8.84.200.1306947096991603; path=/; expires=Thu, 31-May-12 16:51:36 GMT
X-Powered-By: PHP/5.1.6
We've got plenty of uptake, and dm-download02 appears to have the Windows files, so I'm not sure what's going on here.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Reporter | ||
Comment 23•14 years ago
|
||
Not sure if it's relevant or not, but I did notice that if lang and os or omitted from the download.m.o query string, we default to win32/en-US. Eg:
curl -I "http://download.mozilla.org/?product=firefox-5.0b3-partial-5.0b2"
HTTP/1.1 302 Found
Server: Apache
X-Backend-Server: pp-app-dist04
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0, private
Content-Type: text/html; charset=UTF-8
Date: Wed, 01 Jun 2011 17:04:10 GMT
Location: http://mozilla-crowdcache.3crowd.com/mozilla/firefox/releases/5.0b3/update/win32/en-US/firefox-5.0b2-5.0b3.partial.mar
Pragma: no-cache
Transfer-Encoding: chunked
Connection: Keep-Alive
Set-Cookie: dmo=10.8.84.200.1306947850717037; path=/; expires=Thu, 31-May-12 17:04:10 GMT
X-Powered-By: PHP/5.1.6
Reporter | ||
Comment 24•14 years ago
|
||
Interestingly, all the win32 requests are now going to dm-download02:
curl -I "http://download.mozilla.org/?product=firefox-5.0b3-partial-5.0b2&os=win&lang=af&force=1"
HTTP/1.1 302 Found
Server: Apache
X-Backend-Server: pp-app-dist08
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0, private
Content-Type: text/html; charset=UTF-8
Date: Wed, 01 Jun 2011 19:29:25 GMT
Location: http://dm-download02.mozilla.org/pub/mozilla.org/firefox/releases/5.0b3/update/win32/af/firefox-5.0b2-5.0b3.partial.mar
Pragma: no-cache
Transfer-Encoding: chunked
Connection: Keep-Alive
Set-Cookie: dmo=10.8.84.200.1306956565400815; path=/; expires=Thu, 31-May-12 19:29:25 GMT
X-Powered-By: PHP/5.1.6
The only change between my previous comment and now is that we've got slightly higher uptake (4000 vs 2500) on Windows. I think Linux and Mac were both at around 4000 when I re-opened this, too, which makes me wonder if Bouncer isn't obeying the '100' throttle that's set for the Stage region when uptake is low.
Reporter | ||
Comment 25•13 years ago
|
||
I hit this again, when we had tons of uptake:
Running on mv-moz2-linux-ix-slave13.build.mozilla.org:
Using config file update.cfg
Using https://aus2.mozilla.org/update/1/Firefox/3.6.17/20110420140830/WINNT_x86-msvc/af/releasetest/update.xml?force=1
Calling <function run_with_timeout at 0xb7c9302c> with args: (['wget', '--no-check-certificate', '-q', '-O', 'update.xml', 'https://aus2.mozilla.org/update/1/Firefox/3.6.17/20110420140830/WINNT_x86-msvc/af/releasetest/update.xml?force=1'], 300, None, None, False, True), kwargs: {}, attempt #1
Executing: ['wget', '--no-check-certificate', '-q', '-O', 'update.xml', 'https://aus2.mozilla.org/update/1/Firefox/3.6.17/20110420140830/WINNT_x86-msvc/af/releasetest/update.xml?force=1']
Process stdio:
Process stderr:
Testing http://download.mozilla.org/?product=firefox-3.6.18-partial-3.6.17&os=win&lang=af&force=1
HTTP/1.1 302 Found
Server: Apache
X-Backend-Server: pm-app-dist05
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0, private
Content-Type: text/html; charset=UTF-8
Date: Tue, 21 Jun 2011 02:51:22 GMT
Location: http://mozilla.ftp.halifax.rwth-aachen.de/mozilla/firefox/releases/3.6.18/update/win32/af/firefox-3.6.17-3.6.18.partial.mar
Pragma: no-cache
Transfer-Encoding: chunked
Connection: Keep-Alive
Set-Cookie: dmo=10.2.84.100.1308624682463410; path=/; expires=Wed, 20-Jun-12 02:51:22 GMT
X-Powered-By: PHP/5.1.6
HTTP/1.1 200 OK
Server: nginx/1.0.0
Date: Tue, 21 Jun 2011 02:51:22 GMT
Content-Type: application/octet-stream
Content-Length: 2193254
Last-Modified: Wed, 15 Jun 2011 09:42:42 GMT
Connection: close
Accept-Ranges: bytes
Uptake looked like this:
Product OS Available Total
Firefox-3.6.18 linux 79826 222115
Firefox-3.6.18 osx 71869 222115
Firefox-3.6.18 win 51807 222115
Firefox-3.6.18 opensolaris-i386 81179 222115
Firefox-3.6.18 opensolaris-sparc 81179 222115
Firefox-3.6.18 solaris-i386 81134 222115
Firefox-3.6.18 solaris-sparc 81134 222115
Comment 26•13 years ago
|
||
Could you clarify what the error is here ? The 302 is to mozilla.ftp.halifax.rwth-aachen.de rather than 3crowd for dm-download02.
Reporter | ||
Comment 27•13 years ago
|
||
Based on the IP block/region modifications I did I'm expecting all requests from within the build network to hit dm-download02, since its the only member of the "stage" group.
Maybe I'm misunderstanding how this works, though?
Comment 28•13 years ago
|
||
More likely I spaced on reading the bug summary.
Reporter | ||
Comment 29•13 years ago
|
||
According to justdave, this is expected behaviour when a Region is overloaded:
[11:15] <bhearsum> justdave: will Bouncer redirect people to outside their assigned Region when that particular region is overloaded? i'm asking in the context of https://bugzilla.mozilla.org/show_bug.cgi?id=646076 only working intermittently
[11:15] <justdave> it should, yes.
Comment 30•13 years ago
|
||
Hmm, how is 'overloaded' defined ?
Reporter | ||
Comment 31•13 years ago
|
||
I sent mail to relops and justdave about this bug, to try and figure out a solution. (I didn't think it was worthwhile to have a bunch of back and forth in here). Once we figure out what to do, I'll update this bug with that information.
Reporter | ||
Comment 32•13 years ago
|
||
Catlee, justdave, and I chatted about this today and I think we've got a path forward. Justdave said that the one mirror currently in the Stage region is backed by an NFS share that's subject to other load, which is probably why it's getting overloaded from time to time.
(In reply to comment #30)
> Hmm, how is 'overloaded' defined ?
According to justdave, "does not respond within 5 seconds".
Therefore, if we have a mirror that's _not_ on that NFS share or an otherwise poorly performing machine, we should be able to avoid redirection. Additionally, bug 613620 (which has just been picked up by Rik from webdev) should allow us to set-up Bouncer to serve 503s instead of redirects to external mirrors if we _do_ become overloaded, which are easy to detect and retry on.
So, I'm going to look at setting up a mirror inside of the Build network, and getting that tracking mozilla-prereleases & in Bouncer.
Reporter | ||
Comment 33•13 years ago
|
||
releng-mirror01 is all set-up now, and I think I've got Bouncer configured correctly to point at it. I more or less started from scratch, here's what I did:
* Added a new region:
Name: Build Network
Priority: 5
GeoIP Throttle: 100
Mirrors: releng-mirror01.build.scl1.mozilla.com
* Added a new country:
Code: ZZ
Region: Build Network
Country Name: Release Engineering
Continent: NA
* Added a new mirror:
Name: releng-mirror01.build.scl1.mozilla.com
Base URL: http://releng-mirror01.build.scl1.mozilla.com/mozilla-prereleases
Rating: 1
Active: Yes
Regions: Build Network
After waiting 5 or 10 minutes (to allow Bouncer/Sentry to catch up, I guess), things seem to be working, with machines within the build network getting redirected to releng-mirror01 100% of the time.
Dave or Nick, does the above look sane to you?
Reporter | ||
Comment 34•13 years ago
|
||
It looks like releng-mirror01, in its current form, may not even be able to keep up with the minuscule load that sentry causes....since I made the changes to Bouncer, Sentry has been marking it red every once in awhile with:
Checking mirror releng-mirror01.build.scl1.mozilla.com ...
http://releng-mirror01.build.scl1.mozilla.com/mozilla-prereleases sent no response after 5 seconds! Checking recent history...
Comment 35•13 years ago
|
||
A couple of things come to mind here. Just thinking out loud...
Did these errors occur while the disk was full? I saw it fill up again recently.
This might be related to the link into scl1 being overloaded, so sentry thinks there's a problem.
Comment 36•13 years ago
|
||
(In reply to comment #33)
Looks fine to me, but could you add the IP details for ZZ for completeness ? Is that just comment #14 ? Would be great to have the full path for IP -> Country -> Region -> Mirror documented.
(In reply to comment #35)
> This might be related to the link into scl1 being overloaded, so sentry
> thinks there's a problem.
I think it's likely this is the problem. FYI, sentry has a 5 second timeout for a response.
Reporter | ||
Comment 37•13 years ago
|
||
(In reply to comment #35)
> A couple of things come to mind here. Just thinking out loud...
>
> Did these errors occur while the disk was full? I saw it fill up again
> recently.
Nope, I cleared that up prior to making the Bouncer changes.
(In reply to comment #36)
> (In reply to comment #35)
> > This might be related to the link into scl1 being overloaded, so sentry
> > thinks there's a problem.
>
> I think it's likely this is the problem. FYI, sentry has a 5 second timeout
> for a response.
Ah! Assuming this is true, things should look better once the P2P link is enabled?
Reporter | ||
Comment 38•13 years ago
|
||
(In reply to comment #33)
> releng-mirror01 is all set-up now, and I think I've got Bouncer configured
> correctly to point at it. I more or less started from scratch, here's what I
> did:
> * Added a new region:
> Name: Build Network
> Priority: 5
> GeoIP Throttle: 100
> Mirrors: releng-mirror01.build.scl1.mozilla.com
> * Added a new country:
> Code: ZZ
> Region: Build Network
> Country Name: Release Engineering
> Continent: NA
> * Added a new mirror:
> Name: releng-mirror01.build.scl1.mozilla.com
> Base URL: http://releng-mirror01.build.scl1.mozilla.com/mozilla-prereleases
> Rating: 1
> Active: Yes
> Regions: Build Network
Per your request, Nick, here's the IP Block configuration:
MTV1:
* IP Start: 1073077468 (63.245.220.220)
* IP End: 1073077468 (63.245.220.220)
* Country: ZZ
SJC1:
* IP Start: 1073074320 (63.245.208.144)
* IP End: 1073074320 (63.245.208.144)
* Country: ZZ
SCL1:
* IP Start: 1073077826 (63.245.222.66)
* IP End: 1073077826 (63.245.222.66)
* Country: ZZ
Reporter | ||
Comment 39•13 years ago
|
||
Let's see how things look after the P2P link is enabled.
Reporter | ||
Comment 40•13 years ago
|
||
Final verification for 6.0b5 hit releng-mirror01 for all of its tests, and passed. So, once it's more stable (hopefully after the P2P link is up) we should be all done here.
Reporter | ||
Comment 41•13 years ago
|
||
bug 677183 talks about the Bouncer changes I made affecting more than just the build network. I had assumed that the hide NATs were specific to the build network (though, now that I read back I don't see anything supporting that assumption). Can someone from IT confirm whether those IPs are for the whole colo, just the build network, or some other subset?
Assignee: bhearsum → nobody
Reporter | ||
Updated•13 years ago
|
Assignee: nobody → bhearsum
Reporter | ||
Comment 42•13 years ago
|
||
(In reply to Ben Hearsum [:bhearsum] from comment #41)
> bug 677183 talks about the Bouncer changes I made affecting more than just
> the build network. I had assumed that the hide NATs were specific to the
> build network (though, now that I read back I don't see anything supporting
> that assumption). Can someone from IT confirm whether those IPs are for the
> whole colo, just the build network, or some other subset?
Throwing over the fence to get this answered.
Assignee: bhearsum → server-ops-releng
Component: Release Engineering → Server Operations: RelEng
QA Contact: release → zandr
Comment 43•13 years ago
|
||
Can someone from netops answer this? In particular, what set of internal hosts would appear at the IPs in comment 13, and would that include seamonkey's hosts? mtv1 desktop systems? Thanks!
Assignee: server-ops-releng → network-operations
Component: Server Operations: RelEng → Server Operations: Netops
QA Contact: zandr → mrz
Comment 44•13 years ago
|
||
And, secondarily, is it straightforward to assign the build vlan its own hide NATs so that bouncer can recognize its request sources as distinct from other Mozilla systems?
Comment 45•13 years ago
|
||
Comment 46•13 years ago
|
||
Re: Comment 44
Just to clarify, the build networks in all three locations are already configured with dedicated hide NATs. The IP addresses provided in Comment 14 are used by build networks only.
Reporter | ||
Comment 47•13 years ago
|
||
(In reply to Derek Moore from comment #46)
> Re: Comment 44
>
> Just to clarify, the build networks in all three locations are already
> configured with dedicated hide NATs. The IP addresses provided in Comment 14
> are used by build networks only.
Are these completely exclusive to the RelEng Build Network, or is it shared with the Community Build Network? bug 677183 hints that it is.
Comment 48•13 years ago
|
||
(In reply to Derek Moore from comment #46)
> Re: Comment 44
>
> Just to clarify, the build networks in all three locations are already
> configured with dedicated hide NATs. The IP addresses provided in Comment 14
> are used by build networks only.
Could we get a list of subnets associated with each NAT? I think we may need to adjust that list.
Comment 49•13 years ago
|
||
Re: Comment 47
The community build network (63.245.210.0/26) is not behind a hide NAT. Each machine is individually addressable.
Comment 50•13 years ago
|
||
Re: Comment 48
63.245.208.144 contains:
10.2.71.0/24 (sjc1 vlan 71)
10.2.90.0/23 (sjc1 vlan 90)
63.245.220.220 contains:
10.250.48.0/22 (mtv1 vlan 500)
63.245.222.66 contains
10.12.40.0/22 (scl1 vlan 40)
10.12.47.0/24 (scl1 vlan 47)
10.12.48.0/22 (scl1 vlan 48)
10.12.75.0/24 (scl1 vlan 75)
Comment 51•13 years ago
|
||
https://bugzilla.mozilla.org/show_bug.cgi?id=677183#c3 suggests that bouncer is throwing traffic from outside "Mozilla Land!" to this mirror as well. Is there a way to prevent that?
If that's the case, we might be doing this to users as well.
Reporter | ||
Comment 52•13 years ago
|
||
(In reply to Zandr Milewski [:zandr] from comment #51)
> https://bugzilla.mozilla.org/show_bug.cgi?id=677183#c3 suggests that bouncer
> is throwing traffic from outside "Mozilla Land!" to this mirror as well. Is
> there a way to prevent that?
I don't understand Bouncer well enough to tell you for sure, but based on my understanding it seems unlikely.
> If that's the case, we might be doing this to users as well.
Yes, indeed. I'm going to dig a bit for obvious answers, and revert if that yields nothing.
Is it possible to find out from Bouncer logs who has been getting redirected to releng-mirror01?
Component: Server Operations: Netops → Server Operations: RelEng
Reporter | ||
Comment 53•13 years ago
|
||
Didn't mean to change the component.
Component: Server Operations: RelEng → Server Operations: Netops
Comment 54•13 years ago
|
||
And so we are. (breaking users)
We need to turn this off and find another solution. I don't want to create a public mirror in scl1, which would be the other fix to this problem.
Reporter | ||
Comment 55•13 years ago
|
||
Backing out those changes right away.
Reporter | ||
Comment 56•13 years ago
|
||
(In reply to Ben Hearsum [:bhearsum] from comment #55)
> Backing out those changes right away.
> MTV1:
> * IP Start: 1073077468 (63.245.220.220)
> * IP End: 1073077468 (63.245.220.220)
> * Country: ZZ
> SJC1:
> * IP Start: 1073074320 (63.245.208.144)
> * IP End: 1073074320 (63.245.208.144)
> * Country: ZZ
> SCL1:
> * IP Start: 1073077826 (63.245.222.66)
> * IP End: 1073077826 (63.245.222.66)
> * Country: ZZ
I switched these blocks back to US.
Based on what we know now here (that actual users have been redirected to this mirror) and bug 677183, I'm starting to suspect that for some Betas Sentry has decided that all of the other internal mirrors (pv-mirror01, dm-download02, etc.) are overloaded, and that it should send people to releng-mirror01 instead. Basically, the inverse of comment #34.
Reporter | ||
Comment 57•13 years ago
|
||
(In reply to Ben Hearsum [:bhearsum] from comment #56)
> (In reply to Ben Hearsum [:bhearsum] from comment #55)
> > Backing out those changes right away.
>
>
> > MTV1:
> > * IP Start: 1073077468 (63.245.220.220)
> > * IP End: 1073077468 (63.245.220.220)
> > * Country: ZZ
> > SJC1:
> > * IP Start: 1073074320 (63.245.208.144)
> > * IP End: 1073074320 (63.245.208.144)
> > * Country: ZZ
> > SCL1:
> > * IP Start: 1073077826 (63.245.222.66)
> > * IP End: 1073077826 (63.245.222.66)
> > * Country: ZZ
>
> I switched these blocks back to US.
...and marked releng-mirror01 as "inactive".
Comment 58•13 years ago
|
||
To clarify from the discussion I just had with Zandr on IRC, bouncer's geoip has a throttle percent on each region. The throttle only affects the ORIGIN of the traffic, not the destination. The throttled percent of traffic ORIGINATING in a region will be sent to mirrors within that region, so a region set to 100% should always keep all of the traffic originating in that region going to mirrors that are allocated to that region. A region set to 50% (like North America, because we don't have enough mirror capacity or the traffic within North America), will serve 50% of the traffic originating in that region to mirrors allocated to that region, and the remaining 50% will get evenly spread across the entire global pool of mirrors (which would include your internal mirror).
Bug 613620 would partially fix this, but would only work if every single region we have explicitly gets a backup region fixed. To truly fix this, we'd need to have a destination throttle in addition to the origin throttle, or somesuch.
Comment 59•13 years ago
|
||
What would it take to run a private instance of bouncer?
Reporter | ||
Comment 60•13 years ago
|
||
(In reply to Zandr Milewski [:zandr] from comment #59)
> What would it take to run a private instance of bouncer?
This isn't a good option IMHO, for two reasons:
1) releasetest channel snippets would have to differ from the release channel ones -- which gives us less confidence that the release channel snippets are correct.
2) We wouldn't be testing the production Bouncer instance at all - which is a huge blind spot.
As a short term way to get bug 617414 in motion again this might be OK, but I wouldn't be comfortable doing this for an extended period of time.
Comment 61•13 years ago
|
||
per meeting with IT yesterday:
Another option we discussed was simply to run these tests on a machine that was not locked down, i.e. outside the build network. Would that pose issues for reporting results?
Reporter | ||
Comment 62•13 years ago
|
||
I'm not super keen on having these run outside of our main pool, but I could live with it until Bouncer lets us do the original plan.
Were you thinking of a Build Slave that is located outside of the build network, or something else? If the former, we'd have to poke a hole in the firewall.
Comment 63•13 years ago
|
||
That violates the goal of this process -- if there's a machine in the build network that has access to the outside world, then getting stuff out of the build network just involves getting that stuff to the machine with access first.
Reporter | ||
Comment 64•13 years ago
|
||
(In reply to Dustin J. Mitchell [:dustin] from comment #63)
> That violates the goal of this process -- if there's a machine in the build
> network that has access to the outside world, then getting stuff out of the
> build network just involves getting that stuff to the machine with access
> first.
In *my* mind, the idea is that we'd have a Build Slave that exists outside of the build network attached to a master inside of it. It's not great, but it's not subject to what you describe above.
Comment 65•13 years ago
|
||
If we go that route, we also need to run mozmill update testing (nightlies+releases) on the slave(s) outside the build network as well.
Reporter | ||
Comment 66•13 years ago
|
||
(In reply to Aki Sasaki [:aki] from comment #65)
> If we go that route, we also need to run mozmill update testing
> (nightlies+releases) on the slave(s) outside the build network as well.
Nightly stuff should be OK because it only needs FTP, not Bouncer/mirrors. releasetest testing will hit this though, ugh :(.
Maybe it's worthwhile waiting for bug 613620. I just chatted with Rik, and he told me he's hoping to have it done by the end of the quarter.
Comment 67•13 years ago
|
||
(In reply to Ben Hearsum [:bhearsum] from comment #64)
> (In reply to Dustin J. Mitchell [:dustin] from comment #63)
> > That violates the goal of this process -- if there's a machine in the build
> > network that has access to the outside world, then getting stuff out of the
> > build network just involves getting that stuff to the machine with access
> > first.
>
> In *my* mind, the idea is that we'd have a Build Slave that exists outside
> of the build network attached to a master inside of it. It's not great, but
> it's not subject to what you describe above.
Thus creating a route in through the firewalls. No different, IMO.
I think we're just going to have to wait on bug 613620 for this one. (the end of Q3 isn't that far away)
Updated•13 years ago
|
OS: Linux → All
Hardware: x86_64 → All
Comment 68•13 years ago
|
||
Where does this bug stand? It's owned by netops but I'm not clear what netops is to do.
Comment 69•13 years ago
|
||
Ravi- It's blocked by 613620. We can't do this without affecting normal users until that's fixed.
Updated•13 years ago
|
Reporter | ||
Comment 70•13 years ago
|
||
I think this bug, which is about setting up the internal-only mirror is a RelEng one. Actually turning off access to machines should be tracked in another bug blocking 617414.
Status: REOPENED → RESOLVED
Closed: 14 years ago → 13 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 71•13 years ago
|
||
Augh, Bugzilla! This is not fixed yet.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 72•13 years ago
|
||
I can't follow this bug. What's the Netops action?
Comment 73•13 years ago
|
||
So in comment 69 Zandr said we were blocked on bug 613620 which is now closed. So I'll echo mrz and myself from comment 68 and ask what is left to do.
If there are action items can someone restate them?
Comment 74•13 years ago
|
||
(In reply to Ravi Pina [:ravi] from comment #73)
> So in comment 69 Zandr said we were blocked on bug 613620 which is now
> closed. So I'll echo mrz and myself from comment 68 and ask what is left to
> do.
>
> If there are action items can someone restate them?
bhearsum is driving this, and I'm sure he'll comment when he returns from vacation on Monday.
Reporter | ||
Comment 75•13 years ago
|
||
With bug 613620 fixed, I think this part needn't involve IT anymore. RelEng all has access to make adjustments to Bouncer mirrors, and I'm planning on doing so soon.
Assignee: network-operations → nobody
Component: Server Operations: Netops → Release Engineering: Automation (General)
QA Contact: mrz → catlee
Updated•13 years ago
|
Assignee: nobody → bhearsum
Comment 76•12 years ago
|
||
Do we have updates here? QA is still waiting in being able to use the internal mirrors for the Mozmill update tests. Do we have any ETA when this bug can be finally solved?
Reporter | ||
Comment 77•12 years ago
|
||
The required Bouncer code has landed in staging, bug 613620. Right now we're waiting for a fully functioning staging Bouncer set-up (bug 750798) so we can test the new code. Once that's done we can push it to production and set-up production RelEng and QA networks to be restricted to the internal mirrors.
Depends on: 750798
Reporter | ||
Updated•12 years ago
|
Assignee: bhearsum → nobody
Updated•12 years ago
|
Priority: -- → P3
Whiteboard: [bouncer][network]
Assignee | ||
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
Reporter | ||
Comment 78•11 years ago
|
||
I don't think we'll be doing this anymore - we've poked holes in the firewall instead.
Status: REOPENED → RESOLVED
Closed: 13 years ago → 11 years ago
Resolution: --- → WONTFIX
Comment 79•11 years ago
|
||
I think WONTFIXing causes issues for bug 617414 and bug 498425 (which was WONTFIXed in favor of what seems like a dup bug 813629)
Reporter | ||
Comment 80•11 years ago
|
||
(In reply to Aki Sasaki [:aki] from comment #79)
> I think WONTFIXing causes issues for bug 617414 and bug 498425 (which was
> WONTFIXed in favor of what seems like a dup bug 813629)
I don't think this is an issue for bug 617414 anymore. The only reason we needed this (AFAIK) was because we previously weren't willing to poke holes in the firewall to allow machines to access real mirrors. I'm pretty sure we're fine with/intend to do that now.
I don't think this affects bugs 498425 or 813629 because we've dropped the concept of "internal-only mirror". All files go straight to the CDN as soon as they hit the "releases" directory.
Assignee | ||
Updated•7 years ago
|
Component: General Automation → General
You need to log in
before you can comment on or make changes to this bug.
Description
•