Closed Bug 1018450 Opened 11 years ago Closed 10 years ago

Please check flows between Domain Controllers dc6 - 9.releng.ad .mozilla.com ( esp. dc8 to others and vice versa)

Categories

(Infrastructure & Operations Graveyard :: NetOps: DC ACL Request, task)

x86_64
Windows 7
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: q, Assigned: dcurado)

References

Details

We are having some communication trouble between dc8.releng.ad.mozilla.com (10.22.69.21) and wintry vlan 244 and between dc8.releng.ad.mozilla.com and dc6.releng.ad.mozilla.com. Given our round robin dns setup all releng DCs (6-9) need any:any communication between themselves and to all windows vlans. Can someone please review the ACLS? Q
I've been informed a work around has been applied and I don't believe this to be as urgent. Moving priority down to stop the alerts to on call.
Severity: major → normal
Here's my translation of the request. Please let me know if I got this wrong: - I should make sure that hosts dc7.releng.ad.mozilla.com (10.12.69.19) and dc9.releng.ad.mozilla.com (10.12.69.20), which are in the DB security-zone in scl1, can communicate with anything on the wintry security-zone in SCL1, which is vlan 244. I checked the firewall policies, and there are only the default policies in place, which allow icmp ping and that's all. So ????? We haven't changed anything in SCL1 as far as I know. - Then I should check if hosts dc6.releng.ad.mozilla.com (10.22.69.18) and dc8.releng.ad.mozilla.com (10.22.69.21) which are in the DB security-zone on fw1.scl3, can communicate with anything on fw1.scl1's wintry security zone. Here's what I found: Anything in the SCL1 DB security-zone (10.12.69.0/24) can communicate with anything in SCL3 DB security-zone (10.22.69.0/24) Nothing in the SCL1 WINTRY security-zone (10.36.44.0/22) can send traffic to anything to SCL3. Nothing in the SCL1 WINTRY security-zone (10.36.44.0/22) can send traffic to anything in the SCL1 DB security-zone (10.12.69.0/24) At this point, I am assuming that I must be translating the request incorrectly. Please restate your request using the most explicit details that you can. i.e. do not assume I know what "all windows vlans" means. Writing firewall requests using IP addresses and ranges is very helpful. Thanks,
Status: NEW → ASSIGNED
Flags: needinfo?(q)
Assignee: network-operations → dcurado
Sorry to be vague Dave. We had comeup with a definition of "Windows vlans" and "Releng windows vlans" when we were working with netops and opssec on licensing. I will get a full list of vlans and ips here shortly and a better description of flows.
Flags: needinfo?(q)
yeah, I'm not trying to give you a hard time about it either, just want to fix whatever the problem is. =-) What's weird is that there are no policies in place, as per comment #2. Thanks!
Here is the issue: All releng domain joined machines need to communicate with releng.ad.mozilla.com this dns entry when handled in a windows DNS environment gets weighted based on several factors to get a particular Domain Controller (DC) . However, in our current DNS setup it is simply round robined between all DC hosts. Therefor all the DCs need unrestricted communications with each other: dc6.releng.ad.mozilla.com 10.22.8.141 dc7.releng.ad.mozilla.com 10.12.75.6 dc8.releng.ad.mozilla.com 10.22.8.141 dc9.releng.ad.mozilla.com 10.12.75.6 All of our client machines also need bidirectional communication with all the DCS. The clients will do heavy lifting on a local DC but it may make lighter requests to any domain controller. So the following vlans need to have unrestricted communication each of the above DCs: 244 wintry releng.scl3 - 10.26.44.0/22 240 wintest releng.scl3 - 10.26.40.0/22 236 winbuild releng.scl3 - 10.26.36.0/22 40 winbuild scl1 - 10.12.40.0/22 ( only for the next few weeks) In addition the above vlans should also have the following communications already open (possibly from global rules and in addition to standard dns and dhcp rules etc): kms1.ad.mozilla.com 10.22.8.141 - Port 1688 tcp/udp wds1.releng.ad.mozilla.com 10.22.8.141 - any:any wds2.releng.ad.mozilla.com 10.12.75.6 - any:any ( only needed for a few more weeks) Based on the symptoms last week communications were failing to dc8 from the wintry vlan and vice versa with the local fire wall off on both machines. This caused some problems in updating and deployment of machines in that vlan. Q
round robin != any2any ;-) But let's make it easy for Windows - last time I've made a list of ports that need to be opened for AD to work it was so long you might as well open all ports and it's not much different. Our Windows licensing flows should be a nice guide here. IF we deny port 3389 (both TCP and UDP) from clients to servers, than I would even say - let's open everything else: clients -> servers and servers -> clients for Windows Vlans.
Of course it's fine to have entire zones on both sides (so no address book maintenance has to be done later).
OK, breaking comment #5 into 2 parts. part 1: > Therefor all the DCs need unrestricted communications with each other: > dc6.releng.ad.mozilla.com 10.22.8.141 > dc7.releng.ad.mozilla.com 10.12.75.6 > dc8.releng.ad.mozilla.com 10.22.8.141 > dc9.releng.ad.mozilla.com 10.12.75.6 Wait, are these IP address correct? I get the following: dc6.releng.ad.mozilla.com has address 10.22.69.18 dc7.releng.ad.mozilla.com has address 10.12.69.19 dc8.releng.ad.mozilla.com has address 10.22.69.21 dc9.releng.ad.mozilla.com has address 10.12.69.20 And this makes more sense from the policies that are currently in place. These two IP subnets can send anything back and forth. Please let me know? Thanks
Flags: needinfo?(q)
For SCL1 (10.12.69.0/24) trying to get to SCL3 (10.22.69.0/24) We have the following policy in place: Policy: ad-windows-all, action-type: permit, State: enabled, Index: 674, Scope Policy: 0 Policy Type: Configured Sequence number: 1 From zone: dc, To zone: db Source addresses: nagios1.private.releng.scl3: 10.26.75.30/32 ad.db.scl1: 10.12.69.0/24 ad.db.phx1: 10.8.69.0/24 Destination addresses: ad: 10.22.69.0/24 Application: any IP protocol: 0, ALG: 0, Inactivity timeout: 0 Source port range: [0-0] Destination port range: [0-0]
DAMN! Those are the oob ip addresses it was copy paste error on my part. Your IP Adresses are correct
In the SCL3 (10.22.69.0/24) towards SCL1 (10.12.69.0/24) we also have a policy in place that allows everything. So for part 1 from comment 5, we should be OK. -------------------------------------------------------------------------- Policy: ad-windows-all, action-type: permit, State: enabled, Index: 103, Scope Policy: 0 Policy Type: Configured Sequence number: 1 From zone: vpn, To zone: db Source addresses: ad.db.scl3: 10.22.69.0/24 ad.db.phx1: 10.8.69.0/24 Destination addresses: ad: 10.12.69.0/24 Application: any IP protocol: 0, ALG: 0, Inactivity timeout: 0 Source port range: [0-0] Destination port range: [0-0]
OK, next issue... > All of our client machines also need bidirectional communication with all the DCS. The clients will do > heavy lifting on a local DC but it may make lighter requests to any domain controller. So the following > vlans need to have unrestricted communication each of the above DCs: > 244 wintry releng.scl3 - 10.26.44.0/22 > 240 wintest releng.scl3 - 10.26.40.0/22 > 236 winbuild releng.scl3 - 10.26.36.0/22 > 40 winbuild scl1 - 10.12.40.0/22 ( only for the next few weeks) currently there are no such policies in place. So, this is basically where I got to on Saturday. If none of these policies are in place, how was this stuff ever working? While we puzzle over that, I'll start the process of creating these policies.
Thank you Q -- for our IRC PM which got me going on this. Turns out Inventory for SCL1 is foobar'd. However, I believe I have the flows in place for SCL1. Now to start working on SCL3!
Flags: needinfo?(q)
Well, this exploded into a lot of security policies. I believe I put them all into place, but I have to admit, I'm pretty confused. If all those policies are required to make stuff work, then I have no idea how any of it could have been working in the past. Last part: > In addition the above vlans should also have the following communications already open (possibly from > global rules and in addition to standard dns and dhcp rules etc): > kms1.ad.mozilla.com 10.22.8.141 - Port 1688 tcp/udp > wds1.releng.ad.mozilla.com 10.22.8.141 - any:any > wds2.releng.ad.mozilla.com 10.12.75.6 - any:any ( only needed for a few more weeks) The first two IP addresses are the same. Is that correct? For these three IPs, should all the windows vlans be able to get to them? And should they be able to get to all of the windows vlans? Thanks
Flags: needinfo?(q)
Once agian I think those are the oob ips my spreadsheet was mis sorted.  They are diffrent machines.
Flags: needinfo?(q)
FYI: this is now blocking our ability to roll out new machines in scl3 (which are trying to talk to the domain controller in scl1). It may also be interfering with our ability to change configurations on other machines in scl3 (bug 1007981). Both of these are blocking Q2 project work.
Blocks: 1014703
Is there something that is not working? Can you tell me which part is not? Thanks.
Flags: needinfo?(arich)
Flags: needinfo?(arich) → needinfo?(q)
Let me nmap from the machines so I have a cohesive answer to what is is open and what is not from the client perspective
Flags: needinfo?(q)
So here is a simple break down of an example our current blocker: A machine in wintry scl3 vlan 244 with the ip address of 10.26.44.133 can't see any open ports (but do get icmp response) on the two dcs in scl1 with ip addresses: 10.12.69.19 10.12.69.20 Making this work would be a good start. Does this make sense? Q
Flags: needinfo?(dcurado)
Q - I have added a policy in SCL1 that allows the flows that seem to be missing. Please let me know if that fixes the problem, and if any other flows are missing? Thanks.
Flags: needinfo?(dcurado) → needinfo?(q)
Looking better let me iterate and make sure things work. Scanning dc9.releng.ad.mozilla.com (10.12.69.20) [1000 ports] Discovered open port 135/tcp on 10.12.69.20 Discovered open port 445/tcp on 10.12.69.20 Discovered open port 139/tcp on 10.12.69.20 Discovered open port 3389/tcp on 10.12.69.20 Discovered open port 49155/tcp on 10.12.69.20 Discovered open port 3269/tcp on 10.12.69.20 Discovered open port 3268/tcp on 10.12.69.20 Discovered open port 49159/tcp on 10.12.69.20 Discovered open port 464/tcp on 10.12.69.20 Discovered open port 5666/tcp on 10.12.69.20 Discovered open port 88/tcp on 10.12.69.20 Discovered open port 636/tcp on 10.12.69.20 Discovered open port 389/tcp on 10.12.69.20 Discovered open port 593/tcp on 10.12.69.20 Discovered open port 49158/tcp on 10.12.69.20 Discovered open port 49154/tcp on 10.12.69.20 Completed SYN Stealth Scan at 10:37, 4.06s elapsed (1000 total
Flags: needinfo?(q)
OK, there were a lot of interruptions today, but I just completed another run through of checking for any missing policies. We should be OK, but please let me know if you still have any trouble? Thanks.
Flags: needinfo?(q)
Thanks Dave, Dave things look goods so far. I still need to check the wintest vlan but wintry is much happier during the last move train.
Flags: needinfo?(q)
Going to close this, please re-open if there are problems found. Thanks -- Dave
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.