Closed Bug 1308980 Opened 8 years ago Closed 7 years ago

enable iptables for signing infra

Categories

(Release Engineering :: Release Automation: Other, defect, P2)

defect

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: mozilla, Assigned: sfraser)

References

Details

Attachments

(5 files)

We will likely want iptables on for most or all of our applicable infra at some point. However, for the moment, let's target signing servers. We can disable SMTP from non-127.0.0.1 on these boxes and limit ssh and nrpe to internal IP ranges, at the least.
We should also look at locking down outbound connections as much as possible. I imagine there are some easy wins there, like disabling outbound ssh?
Thanks for including me on this. I did a lot of thinking about this in Q3 for general strategy to limit pivoting risk for services. Here's the TLDR... 1.) Go with a default DROP policy for INPUT/OUTPUT chains, this means that nothing gets in/out without an explicit rule 2.) Allow inbound SSH from admin hosts and VPN network range (nothing else), this means that someone needs to own a privileged system before they can even talk to the signing server. 3.) Allow any other inbound services you need to support the application. If it's a pull operation, you don't need to open anything. 4.) Be very explicit with any outbound rules (eg. try to limit them to proxy or other support services like DNS and NTP) I think if you are willing to subscribe to the above general principles, I think you'll have a really locked down policy in place. I do have some Ansible scripts we used for the http observatory that I could share that could be used as a starting point here. I'm also happy to advise on iptables related stuff if you're interested, I used to work with it a lot in a past life.
So with a locked down policy, we're looking at something like the below? It's missing some information, notably: - the ip addresses of the taskcluster api - the ip ranges where we store artifacts - do we want to allow pings outbound for diagnostics? - what other nagios checks are done, that don't use nrpe? # loopback is fine iptables -A INPUT -i lo -j ACCEPT iptables -A OUTPUT -o lo -j ACCEPT # established connections - do we want these rules, or specify by port as below? iptables -A INPUT -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT iptables -A OUTPUT -m conntrack --ctstate ESTABLISHED -j ACCEPT # inbound ssh # openvpn2-pool1 10.22.232.0/22; # openvpn2-pool2 10.22.236.0/22; # openvpn2.corpdmz.scl3 10.22.72.132/32; # - source, :justdave VPN_RANGES=10.22.232.0/22,10.22.236.0/22,10.22.72.132/32 # sysadmin admin hosts in scl3, source :justdave ADMIN_RANGES=10.22.75.5,10.22.74.6,10.22.75.7 iptables -A INPUT -p tcp -s ${VPN_RANGES} --dport 22 -m conntrack --ctstate NEW,ESTABLISHED -j ACCEPT iptables -A INPUT -p tcp -s ${ADMIN_RANGES} --dport 22 -m conntrack --ctstate NEW,ESTABLISHED -j ACCEPT iptables -A OUTPUT -p tcp --sport 22 -m conntrack --ctstate ESTABLISHED -j ACCEPT # pinging the server is allowed (see nagios) iptables -A INPUT -p icmp --icmp-type echo-request -j ACCEPT iptables -A OUTPUT -p icmp --icmp-type echo-reply -j ACCEPT # allow outgoing dns connections iptables -A OUTPUT -p udp -o eth0 --dport 53 -j ACCEPT iptables -A INPUT -p udp -i eth0 --sport 53 -j ACCEPT # allow outbound email to specific range # smtp.scl3.mozilla.org MAIL_TRANSPORT=63.245.214.155,63.245.214.156 iptables -A INPUT -i eth0 -p tcp --sport 25 -m state --state ESTABLISHED -j ACCEPT iptables -A OUTPUT -o eth0 -p tcp --dport 25 -s ${MAIL_TRANSPORT} -m state --state NEW,ESTABLISHED -j ACCEPT # nagios querying nrpe. NAGIOS_SERVER= iptables -I INPUT -p tcp --dport 5666 -s ${NAGIOS_SERVER} -m state --state NEW,ESTABLISHED -j ACCEPT iptables -I OUTPUT -p tcp --sport 5666 -m state --state ESTABLISHED -j ACCEPT # Taskcluster communication TASKCLUSTER_ADMIN= ARTIFACT_STORAGE= iptables -I INPUT -p tcp --dport 443 -s ${TASKCLUSTER_ADMIN} -m state --state NEW,ESTABLISHED -j ACCEPT iptables -I OUTPUT -p tcp --sport 443 -m state --state ESTABLISHED -j ACCEPT # default drop iptables -P INPUT DROP iptables -P FORWARD DROP iptables -P OUTPUT DROP
Simon: The lines where you have explicit inbound/outbound ESTABLISHED keywords (beyond the wildcard one near the top) it will end up being redundant. For example, I think this... iptables -A INPUT -p tcp -s ${VPN_RANGES} --dport 22 -m conntrack --ctstate NEW,ESTABLISHED -j ACCEPT iptables -A INPUT -p tcp -s ${ADMIN_RANGES} --dport 22 -m conntrack --ctstate NEW,ESTABLISHED -j ACCEPT iptables -A OUTPUT -p tcp --sport 22 -m conntrack --ctstate ESTABLISHED -j ACCEPT Can safely be replaced with this, and have the same effect because you already allow established traffic inbound and outbound higher in the ruleset... iptables -A INPUT -p tcp -s ${VPN_RANGES} --dport 22 -m conntrack --ctstate NEW -j ACCEPT iptables -A INPUT -p tcp -s ${ADMIN_RANGES} --dport 22 -m conntrack --ctstate NEW -j ACCEPT There are a couple occurrences that match that pattern, that I think could be simplified. Also would like to recommend that you restrict traffic groups to their respective internal servers (ie. DNS, Nagios NRPE). Overall, this looks pretty locked down, I like it.
Agreed about the established connections, in parallel to you writing that we were having a similar discussion. It will definitely make life simpler, reading the rules. It's looking increasingly like we won't be able to narrow down the outbound https traffic to anything more specific than 'all of AWS/S3', so the currrent plan is to just allow https outbound and make sure it's logged, as the number of such connections should be low. When you say 'restrict traffic groups', do you mean, for example, making sure that DNS queries only go to our own resolvers? (I'm adding a note here that I forgot about NTP, and need to add it, too)
I think this covers the puppet side. Looking for opinions on whether it looks reasonable. I will ask for a loaner and get the rules applied there, to test them
Attached patch iptables.diff (deleted) — Splinter Review
I think this captures most of the requirements, except the 'drop all outbound' - I can get that working with iptables commands, but not with puppet. Something goes awry.
Assignee: nobody → sfraser
Priority: -- → P2
Aki, what's your opinion of the diff above in Comment 7? Does it look appropriate?
Flags: needinfo?(aki)
I've barely touched iptables ever, other than turning it off way back in the day. So if it works and passes :claudijd's eyeball test, we're probably good. "Works" depends on the machine type... signing servers will need to be able to accept token, signature, and download requests from the buildbot and signing subnet ranges, and may need to reach microsoft servers for authenticode signing? (Really fuzzy memory here) Plus puppet. Signing scriptworkers will need to be able to speak to the signing servers, internal pypi/puppet, taskcluster, azure, s3... that may be it, or I may be forgetting something. (In reply to Simon Fraser [:sfraser] ⌚️GMT from comment #7) > Created attachment 8839418 [details] [diff] [review] > iptables.diff > > I think this captures most of the requirements, except the 'drop all > outbound' - I can get that working with iptables commands, but not with > puppet. Something goes awry. Does this patch affect puppet connectivity? Internal pypi, internal yum repos, puppet commands. Maybe this is better rephrased, "once you get it working with iptables commands, does `puppet agent --test` still work?" I'm not sure if the goal here is to limit everything and then open what's needed, or limit what we know we can limit without breaking things. I think I was leaning towards the latter, and comment #2 seems to imply the former. If we're going that route, we'll have to identify the ports and ip ranges we have to enable to keep things working for each host type before applying. Are we able to log the traffic that gets through so we can narrow down what network access is used during normal operation? That might help us identify what's needed without going through a break-everything-then-fix-things-one-by-one phase.
Flags: needinfo?(aki)
It shouldn't affect puppet connectivity at all - the loaner I've been testing on can still go outbound to all the usual places, we're just logging it now, which will help us identify normal traffic. The initial goal was to limit everything and then open what's needed, even outbound, but I'm struggling a bit with puppet and default chain rules, so rather than hang on waiting for the solution for that to appear, we could get this step done.
Does this seem an appropriate set up for iptables, Jonathan?
Flags: needinfo?(jclaudius)
:sfraser - it's sort of painful to look at the rules in the puppet syntax, any chance you could apply this to a test host and send me the results of `iptables-save`? I'll probably be able to provide better analysis of it in that format. Also, then working on similar approaches in ansible, it's much nicer if the rules are setup as a flat file and then moved into place. It requires slightly more setup to handle CentOS/Ubuntu variances, but the readability factor goes way way up.
Flags: needinfo?(jclaudius) → needinfo?(sfraser)
Do we have ansible? I've been using the approach the others have used, which is why it's using the puppet firewall module. The file does seem more reliable, but may surprise people. My loaner has disappeared again, so I'll see if I can get it back for the iptables output
Flags: needinfo?(sfraser)
:sfraser - I guess what I was saying was maybe there is a way to land files in puppet similar to a pattern I've used in ansible and keep the iptables formatting, which is pretty universal. I get the sense the ops/config for this system is puppet through and through and I'm not trying to challenge that, so maybe it's just asking puppet to make sure iptables is running, that iptables rules is plopped in the right directory location, and that puppet is told to not manage the config with the puppet firewall module?
Attached patch iptables.diff (deleted) — Splinter Review
Have reworked puppet so it copies down a more standard file. Does this look more maintainable?
Attachment #8845900 - Flags: feedback?(jclaudius)
:sfraser - Very nice! I really appreciate the effort to make that happen, I'm able to sense a couple issues right off because of the format change. 1.) You can probably delete the two lines below. Your OUTPUT chain is set to default ACCEPT and your INPUT chain is set to DROP, but you have an accept related rule on the INPUT chain. You also shouldn't be expecting new inbound connections toward a signing box for NTP over TCP. I would think they would only be outbound and then catch your accept related on the input chain. +-A INPUT -p tcp -m tcp --dport 123 -j ACCEPT <= remove me? +-A OUTPUT -p tcp -m tcp --sport 123 -j ACCEPT <= remove me? 2.) You might consider binding your inbound UDP traffic for DNS/NTP to specific servers, this would prevent you from accidentally hosting NTP services locally on the signing instance and having them being accessible to anyone. +-A INPUT -p udp -m udp --dport 123 -j ACCEPT <= Bind to src server(s)? 3.) I believe the following lines might be redundant because of the default ACCEPT on your output chain already makes this allowed traffic. Seeing these makes me wonder if perhaps this rule set started out as a default DROP and then was ported to a default ACCEPT on the OUTPUT chain. If you truly want a default ACCEPT on the output chain, I would delete these lines. If you want a default DROP on the output chain let me know and I can reevaluate the entire policy given those expectations. The difference between the two approaches is that with the default accept it's less rules, but more default permissiveness outbound where as the default drop would require us to be explicit about anything we want to allow outbound. +-A OUTPUT -s 63.245.214.155/32 -o eth0 -p tcp -m tcp --dport 25 -m state --state NEW -j ACCEPT +-A OUTPUT -s 63.245.214.156/32 -o eth0 -p tcp -m tcp --dport 25 -m state --state NEW -j ACCEPT
You're right that the default rule started out as drop, and I'd intended to leave them in the puppet config as allow in case someone changed that later on. This probably isn't needed as the iptables rules file is easier to follow than the puppet config, and so mistakes are less likely with later modification. I think we eventually do want a default DROP outbound, so if you've advice about that approach it would be helpful. For this stage, I will remove the NTP outbound, bind NTP inbound to specific servers, and comment out the SMTP outbound accepts (because this prevents us having to look up the information again once we become more restrictive)
Attached patch iptables.diff (deleted) — Splinter Review
Updated with suggestions above.
Attachment #8846638 - Flags: review?(jlorenzo)
Comment on attachment 8846638 [details] [diff] [review] iptables.diff Review of attachment 8846638 [details] [diff] [review]: ----------------------------------------------------------------- Looks good to me! I spotted some minor things, which can be fixed in a follow up. Like you said on IRC, I haven't looked to much at the iptables rules, as they were already discussed above. ::: modules/iptables/manifests/init.pp @@ +2,5 @@ > +# License, v. 2.0. If a copy of the MPL was not distributed with this > +# file, You can obtain one at http://mozilla.org/MPL/2.0/. > + > +class iptables ( > + $config = 'puppet:///modules/iptables/sysconfig.iptables' Is this file missing from the patch? If not, I'd avoid defining a default value for now. @@ +5,5 @@ > +class iptables ( > + $config = 'puppet:///modules/iptables/sysconfig.iptables' > + ) { > + # compatibility check.. > + case $operatingsystem { What safety does this check add? I see `iptables::settings` and `packages::iptables` fail if we're not on CentOS/Ubuntu. @@ +31,5 @@ > + enable => true, > + hasrestart => true, > + hasstatus => true, > + require => [ > + Class['packages::iptables'], I'm not sure, but I'd move this requirement to file. I don't know if puppet is smart enough, but the file may be written before the package is installed, which can be reported as an error/a warning by a package manager. @@ +32,5 @@ > + hasrestart => true, > + hasstatus => true, > + require => [ > + Class['packages::iptables'], > + File[$iptables::settings::conf] This line is redundant with the notify at line 25. https://docs.puppet.com/puppet/3.7/lang_relationships.html#relationship-metaparameters ::: modules/packages/manifests/iptables.pp @@ +10,5 @@ > + ensure => latest, > + } > + } > + CentOS: { > + case $::operatingsystemmajrelease { TIL $::operatingsystemmajrelease :)
Attachment #8846638 - Flags: review?(jlorenzo) → review+
Attached patch iptables2.diff (deleted) — Splinter Review
Have made the puppet changes suggested by jlorenzo. aki: when would be good to apply this to the signing scriptworkers? Would you want to canary one of them first?
Flags: needinfo?(aki)
We should probably wait until later this week, when beta2 builds are out of the way. And yeah, applying to a single signing server and verifying that it still works (inbound ssh at the least to make sure we haven't locked ourselves out; scriptworker integration tests possibly; we can then apply to all 4 and rerun some signing tasks). The signing servers use ports 9100, 9110, and 9120; will these outbound ports be blocked? https://hg.mozilla.org/build/puppet/file/tip/modules/signing_scriptworker/templates/passwords.json.erb
Flags: needinfo?(aki)
Outbound isn't going to be blocked at this stage. I'll get the puppet changes ready to just apply to worker 1, and go from there once the potential impact has lowered.
We can do that via puppet env pinning [1], though other options are ok as well. [1] https://wiki.mozilla.org/ReleaseEngineering/PuppetAgain/HowTo/Set_up_a_user_environment#Pinning
Right now it looks like we have beta go-to-builds scheduled for Thursday, Monday, Thursday over the next couple weeks. Once we have a puppet patch, we can test in a puppet env, against a single signing scriptworker, outside of those windows, and then roll out.
There's a puppet patch as an attachment, above, and the 'sfraser' environment on releng-puppet2 has it applied already.
Chemspills got in the way this week. We can try for next week.
1) Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Syntax error at ','; expected '}' at /etc/puppet/environments/aki/modules/iptables/manifests/settings.pp:9 on node dev-linux64-ec2-aki.dev.releng.use1.mozilla.com Warning: Not using cache on failed catalog Error: Could not retrieve catalog; skipping run Fixed this by removing trailing commas (shakes fist at puppet). 2) modules/iptables/manifests/init.pp - ERROR: trailing whitespace found on line 9 - trailing_whitespace also breaks the travis tests. 3) Info: Retrieving pluginfacts Info: Retrieving plugin Info: Loading facts Info: Caching catalog for dev-linux64-ec2-aki.dev.releng.use1.mozilla.com Error: /Stage[main]/Main/Resources[firewall]: Failed to generate additional resources using 'generate': Invalid address from IPAddr.new: conntrack Info: Applying configuration version 'heads/macsigning' which looks like https://ask.puppet.com/question/4461/firewall-invalid-address-from-ipaddrnew/ and https://tickets.puppetlabs.com/browse/MODULES-1552 Otherwise it seems to apply and still allow for incoming ssh (from vpn) and integration tests. However, I'm not sure how serious the IPAddr.new error is. :sfraser, can you address the above issues?
Flags: needinfo?(sfraser)
First two are easy enough, and it sounds like you've got an environment with those changes in - the third, though. This iptables module isn't using the firewall module. It's just copying down a set of iptables rules and making sure iptables has reloaded. Which firewall components is your dev box getting?
Flags: needinfo?(sfraser) → needinfo?(aki)
(In reply to Simon Fraser [:sfraser] ⌚️GMT from comment #28) > First two are easy enough, and it sounds like you've got an environment with > those changes in - the third, though. This iptables module isn't using the > firewall module. It's just copying down a set of iptables rules and making > sure iptables has reloaded. Which firewall components is your dev box > getting? I'm not entirely sure, but it looks like the only place 'conntrack' is specified in my puppet repo is the new modules/signing_scriptworker/files/sysconfig.iptables ... so I'd guess that's the culprit. Is this working on your loaner?
Flags: needinfo?(aki)
It was working, although the loaner has gone away again. The entries with conntrack all look like valid network or host addresses to me - can you spot an error with them?
The workaround at the bottom of https://ask.puppet.com/question/4461/firewall-invalid-address-from-ipaddrnew/ appears to say iptables-save was the culprit for them: "After a simple iptables -t nat --flush the module just works. So it seems that the ! in the iptables-save output is the problem..." I don't see an `!` in the iptables file, but it's possible something else is wrong. Could you take a look? I personally think it's better to fix these things before we roll out to production, rather than after.
I had a look and couldn't see anything wrong, I was hoping you could see something. I don't get the error copying the file into place and restarting iptables, or using iptables-save. "Error: /Stage[main]/Main/Resources[firewall]" indicates it's in the firewall module, which only seems to be included in the signingserver module - is that also being applied?
Comment on attachment 8845900 [details] [diff] [review] iptables.diff Looks good to me, sorry for the delay.
Attachment #8845900 - Flags: feedback?(jclaudius) → feedback+
:dividehex, does the new fw / iptables work mean we can resolve this bug? Or is there still work to do?
Flags: needinfo?(jwatkins)
(In reply to Aki Sasaki [:aki] from comment #35) > :dividehex, does the new fw / iptables work mean we can resolve this bug? Or > is there still work to do? :sfraser pointed me to this bug awhile back so I was able to use some of the ideas here. But overall, this is superseded by all the work done last quarter. At this time, almost all releng hosts are utilizing host based firewalls with explicit ingress rules and a default deny policy except for a few one offs that are being handled in other ways (eg. decomm). I think the only work that was not covered was defining egress rules with default deny. All datacenter hosts have an allow all out policy, while AWS has some limited outgoing policies. And I'm not sure if achieving that is anywhere on the priority list at the moment. So I'm going to call this done for now. If that isn't the case, we should probably open a new bug since the there is an entirely new framework for defining firewall rules in puppet and it covers both iptables and the osx pf.
Status: NEW → RESOLVED
Closed: 7 years ago
Flags: needinfo?(jwatkins)
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: