Kubernetes the not so hard way with Ansible - Harden the instances - (K8s v1.28)
Introduction
In part 1 I created nine hosts for etcd
, the Kubernetes controller and worker nodes and did a first introduction into Ansible configuration management. Before I’m going to install all the Kubernetes services, the hosts should be hardened a little bit and a firewall should be setup. While my Kubernetes roles by default try to bind all Kubernetes services to the VPN interface (I’ll use WireGuard
but you can also use PeerVPN
, OpenVPN
or whatever VPN solution you want) it still makes sense to have a firewall in place just in case.
Prepare DNS
I’ll use two different DNS entries for every host. As you may have recognized in part 1 of this tutorial I’ve files like k8s-010101.i.example.com
in the host_vars
directory. k8s-010101.i.example.com
is the DNS entry for the first etcd
node (see previous blog post). The .i.
in this hostname means internal
. I’ll use the WireGuard
IP for this DNS record. All my Kubernetes Ansible roles will use the host names specified in Ansible’s hosts
file.
This will ensure that all Kubernetes services only will communicate via the VPN connection. It doesn’t matter what kind of VPN software or VPC (like AWS VPC or Google Cloud Network) you use. Just make sure that the connection between the host is encrypted. In my case this will be ensured by WireGuard as mentioned. The important part here is that the host names that you use in Ansible’s hosts
file point to the internal (WireGuard) IP of your VM. If you use AWS VPC or Google’s Cloud Networking you don’t really need a fully meshed VPN solution like WireGuard
as you’re already able to configure such a VPC that all the hosts can talk to each other. So in this case you can skip installing WireGuard
.
One hint about Hetzner Cloud Networks: Just because they are private doesn’t mean that they are secure 😉 Traffic is NOT encrypted in that case. So for secure communication you still need something like WireGuard
. This might also be true for other providers. Please check the documentation. If you can ensure that your whole traffic is encrypted via TLS then a VPN might also not be needed.
I also have DNS records for the public IP of the host e.g. k0s-010101.p.example.com
. .p.
means public
in my case. I use these DNS records for Ansible to provide the ansible_host
variable value. E.g. for k8s-010101.i.example.com
host there exists a file host_vars/k8s-010101.i.example.com
which contains (besides other entries) this entry:
ansible_host: "k8s-010101.p.example.com"
So all Ansible roles and playbooks will use the internal DNS entry. If Ansible needs to connect to the host k8s-010101.i.example.com
via SSH to manage the host e.g. it will use the value from ansible_host
and this is the public DNS record of that host (which means k8s-010101.p.example.com
in my case).
One of the reasons for this is that I’ll setup the WireGuard
VPN also via Ansible and it simply doesn’t exist yet 😉 So I need the public IP of the host to manage the host via Ansible. On the other hand the Kubernetes internal communication should only use the WireGuard
VPN and therefore I use the internal DNS records in Ansible’s hosts
file. My Ansible roles will use these internal host names in various places e.g. to build a list of hosts that include the etcd
cluster nodes or to tell kube-proxy
how to connect to kube-apiserver
.
Here is an example how host_vars/k8s-010101.i.example.com
could look like:
---
ansible_host: "k8s-010101.p.example.com"
wireguard_address: "10.0.11.2/24"
wireguard_port: 51820
wireguard_endpoint: "{{ ansible_host }}"
wireguard_persistent_keepalive: "30"
The wireguard_*
variables are explained further down below. So just skip them for a moment. Ansible is using ansible_host
value if it connects to that host via SSH and that’s the public IP of that host. WireGuard
will also use this value to connect all the hosts together. Some other variables like ansible_port
are already set in Ansible’s hosts
file in the all
group as you might remember from the previous blog post. So make sure to have configured all the settings for all the hosts in host_vars
before proceeding. Of course you can also continue having all the hosts variables in the hosts
file too but it’ll get pretty overloaded soon once more roles and variables are added. So I’ll only specify the connection settings in the hosts
file.
The IP addresses of the interfaces of the Virtual Machines will be in the network range 192.168.11.0 - 192.168.11.255
and the WireGuard
interface will be in 10.0.11.0 - 10.0.11.255
. So if a host has the IP 192.168.11.2
it’ll have an WireGuard
interface (by default called wg0
) with an IP 10.0.11.2
. So the last number of the interfaces IP are always the same.
Ansible’s hosts file and host variables
To make the Kubernetes nodes a little bit more secure we’ll add some basic security settings and of course I’ll use Ansible to deploy it. Ansible directory layoutbest practice will tell you to have a separate inventory file for production and staging servers. Since I don’t have a staging system I just have one hosts
file that looks like this:
all:
hosts:
k8s-01[01:03][01:03].i.example.com:
k8s-01-ansible-ctrl.i.example.com:
vars:
ansible_port: "22"
ansible_user: "deploy"
ansible_become: true
ansible_become_method: "sudo"
ansible_python_interpreter: "/usr/bin/python3"
vpn:
hosts:
k8s-01[01:03][01:03].i.example.com:
k8s_kubectl:
hosts:
k8s-01-ansible-ctrl.i.example.com:
ansible_connection: "local"
k8s_ca:
hosts:
k8s-01-ansible-ctrl.i.example.com:
ansible_connection: "local"
k8s_etcd:
hosts:
k8s-01[01:03]01.i.example.com:
k8s_controller:
hosts:
k8s-01[01:03]02.i.example.com:
k8s_worker:
hosts:
k8s-01[01:03]02.i.example.com:
k8s-01[01:03]03.i.example.com:
k8s_all:
children:
k8s_etcd:
k8s_controller:
k8s_worker:
Note: ansible-inventory --list
is a very handy command to see which hosts belong to which group and which variables are set per host.
Adjust the file to your needs of course! Every group starts with group name (e.g. vpn
), which is used to classify the hosts so to say. Then you have the hosts
key and below that the hosts that belongs to that group. And if needed one can also add variables for every host or hosts group which is the case for all
, k8s_kubectl
and k8s_ca
group.
The first entry is the hosts group all
. all
is one of the two default groups in Ansible that always exists. Since all my Virtual Machines will have the same SSH connection settings, all related settings are defined here. So all
really effects ALL hosts in the hosts
file. The variable settings below vars
will be applied to ALL hosts but can be overridden per host or per host group further down below if needed.
But I also wanted to have a group that only targets the Virtual Hosts that compose the whole Kubernetes Cluster (and only that hosts). For this purpose I’ve defined a hosts group called k8s_all
. As you can see above it has a children
entry with the hosts groups that compose the Kubernetes cluster (for more information see Grouping groups: parent/child group relationships).
Also a special variable called ansible_python_interpreter
is set for the all
group. It specifies the target host’s Python path. By default for the Ubuntu Cloud Images the Python interpreter is called python3
and not python
. I also configured that Ansible should connect to port 22
(for now) and connect as user deploy
(change accordingly) to the remote hosts. In order to enable Ansible to execute tasks that require “root” permissions I specify ansible_become: true
and ansible_become_method: "sudo"
. This of course implies that the deploy
user must have sudo
permissions and sudo
can be executed without requiring a password. For more information see How to build your inventory.
Some hosts have also ansible_connection: "local"
set. This is because the tasks for these hosts should be executed on the Ansible Controller node which is my laptop in my case.
If the Virtual Machines are ready, the DNS entries are created, the host_vars
set correctly and the hosts
file is in place, an Ansible ping
should already work. This tests if SSH connection to that hosts works in general, if Python works, and other things. E.g.:
ansible -m ping all
k8s-010201.i.example.com | SUCCESS => {
"changed": false,
"ping": "pong"
}
k8s-010202.i.example.com | SUCCESS => {
"changed": false,
"ping": "pong"
}
k8s-010102.i.example.com | SUCCESS => {
"changed": false,
"ping": "pong"
}
...
Note: One might wonder why the controller hosts k8s-01[01:03]02.i.example.com
are part of the k8s_worker
group. As you’ll see later the controller nodes won’t run “normal” workloads. But they need to be part of the “Kubernetes network” (there is no such thing actually 😉). There are situations where kube-apiserver
needs to communicate with a webhook e.g. These hooks normally run as Pod
s. So the kube-apiserver
which runs on a controller node needs to know how to reach that Pod
. Those Kubernetes networking needs are handled by Cilium e.g. That’s the Kubernetes network plugin I’m using (there’re way more of course). So to make that kube-apiserver
to Webhook communication possible, we need Cilium also on the controller nodes and not just on the worker nodes. But we’ll make sure later that only important workload like Cilium
will run on the controller nodes. This can be accomplished with so called Taints.
As you can see you can use ranges k8s-01[01:03]...
instead of specifying every node here. Lets talk about a little bit more about the host groups in the hosts
file mentioned above:
The Ansible hosts groups
The host group vpn
includes all hosts that will be part of my fully meshed WireGuard
VPN. Fully meshed means that every host can talk to every other host in this group. This basically means all nine Virtual Machines.
k8s_kubectl
group (well in this case there is only one host in the group…) is used to execute the Kubernetes control utility called kubectl
later. E.g. if you configured kubectl
correctly (will happen later) you can run it directly on the shell on your workstation/laptop/continuous deployment host (e.g. Jenkins) or through Ansible which in turn executes kubectl
locally (if shell
or command
module is used) or uses the K8s API directly by using Ansible’s k8s
module(s). As the Ansible Controller node is my laptop (where ansible
or ansible-playbook
commands gets executed) I added the setting ansible_connection: "local"
. kubectl
will search for its configuration in $HOME/.kube/config
by default (also the k8s
module uses this config file by default). So if Ansible starts kubectl
as user root
it will search in the wrong $HOME
directory! That’s why it is important to tell Ansible to execute kubectl
as the user which owns $HOME/.kube/config
. I will use the host specified in the k8s_kubectl
group later. Of course replace the hostname k8s-01-ansible-ctrl
with the real hostname of your workstation, laptop, Chromebook, continuous deployment host or whatever (actually in my case the name doesn’t matter that much as the connection is local
anyways).
As the group before k8s_ca
also specifies only k8s-01-ansible-ctrl
. I’ll create some certificates for Kubernetes in a later blog post. The generated certificates will be stored locally on my workstation (or Ansible Controller node in general) and copied to the Kubernetes hosts as needed by Ansible.
The k8s_etcd
group contains the hosts which will run the etcd
cluster you probably already guessed 😉 As mentioned already for production you should place the etcd
cluster on separate hosts but it’s also fine to use the Kubernetes Controller nodes for this purpose for smaller Kubernetes clusters.
The hosts in the k8s_controller
group runs at least three important Kubernetes components including kube-apiserver
, kube-scheduler
and kube-controller-manger
. The Ansible roles harden_linux
and kubernetes_controller
will be applied to these Kubernetes hosts.
I also defined a group called k8s_worker
. That group contains the nodes which will run the container workloads and do the heavy work. The roles harden_linux
, kubernetes_worker
, cilium_kubernetes
, cni
, runc
and containerd
will be applied to these Kubernetes hosts. That’s what this group is used for.
Here are examples host_vars
files for the hosts in Ansible’s hosts
file:
Ansible host file for etcd #1
: host_vars/k8s-010101.i.example.com
:
---
# Ansible
ansible_host: "k8s-010101.p.example.com" # IP: 192.168.11.2
# WireGuard
wireguard_address: "10.0.11.2/24"
wireguard_port: "51820"
wireguard_persistent_keepalive: "30"
wireguard_endpoint: "{{ ansible_host }}"
Ansible host file for etcd #2
: host_vars/k8s-010201.i.example.com
:
---
# Ansible
ansible_host: "k8s-010201.p.example.com" # IP: 192.168.11.5
# WireGuard
wireguard_address: "10.0.11.5/24"
wireguard_port: "51820"
wireguard_persistent_keepalive: "30"
wireguard_endpoint: "{{ ansible_host }}"
Ansible host file for etcd #3
: host_vars/k8s-010301.i.example.com
:
---
# Ansible
ansible_host: "k8s-010301.p.example.com" # IP: 192.168.11.8
# WireGuard
wireguard_address: "10.0.11.8/24"
wireguard_port: "51820"
wireguard_persistent_keepalive: "30"
wireguard_endpoint: "{{ ansible_host }}"
Ansible host file for K8s controller #1
: host_vars/k8s-010102.i.example.com
:
---
# Ansible
ansible_host: "k8s-010102.p.example.com" # IP: 192.168.11.3
# WireGuard
wireguard_address: "10.0.11.3/24"
wireguard_port: "51820"
wireguard_persistent_keepalive: "30"
wireguard_endpoint: "{{ ansible_host }}"
Ansible host file for K8s controller #2
: host_vars/k8s-010202.i.example.com
:
---
# Ansible
ansible_host: "k8s-010202.p.example.com" # IP: 192.168.11.6
# WireGuard
wireguard_address: "10.0.11.6/24"
wireguard_port: "51820"
wireguard_persistent_keepalive: "30"
wireguard_endpoint: "{{ ansible_host }}"
Ansible host file for K8s controller #3
: host_vars/k8s-010302.i.example.com
:
---
# Ansible
ansible_host: "k8s-010302.p.example.com" # IP: 192.168.11.9
# WireGuard
wireguard_address: "10.0.11.9/24"
wireguard_port: "51820"
wireguard_persistent_keepalive: "30"
wireguard_endpoint: "{{ ansible_host }}"
Ansible host file for Ansible Controller node
(my laptop): host_vars/k8s-01-ansible-ctrl.i.example
(only needed if you want to make your workstation part of the Kubernetes WireGuard
network mesh)
---
# WireGuard
wireguard_address: "10.0.11.254/24"
wireguard_endpoint: ""
These are of course just examples. Adjust to your needs! wireguard_address
will be used by the WireGuard
role to assign the wg0
interface (that’s the default) the specified IP and netmask. As you can see all hosts will be part of the network 10.0.11.0/24
(the WireGuard VPN network for internal K8s communication). You’ll see later what wireguard_endpoint
is used for. ansible_host
I already explained above.
I just want to mention that is is possible to use a dynamic inventory plugin for Ansible like hcloud_inventory or Scaleway inventory. In this case you just need to tag your instances and the dynamic inventory plugin will discovery the hosts to use for a specific role/task.
To specify what Ansible should install/modify on the hosts I created a playbook file. You already saw it in part 1 in the directory structure and it’s called k8s.yml
. It basically contains the host group names and what role that host group (or host) has (e.g. like being a Kubernetes controller or worker node).
Ansible playbook file
For the impatient the file should look like this at the end when we’re done with the whole tutorial:
---
hosts: k8s_ca
roles:
-
role: githubixx.cfssl
tags: role-cfssl
-
role: githubixx.kubernetes_ca
tags: role-kubernetes-ca
-
hosts: k8s_kubectl
roles:
-
role: githubixx.kubectl
tags: role-kubectl
-
hosts: vpn
roles:
-
role: githubixx.ansible_role_wireguard
tags: role-wireguard
-
hosts: k8s_etcd
roles:
-
role: githubixx.harden_linux
tags: role-harden-linux
-
role: githubixx.etcd
tags: role-etcd
-
hosts: k8s_controller
roles:
-
role: githubixx.harden_linux
tags: role-harden-linux
-
role: githubixx.kubernetes_controller
tags: role-kubernetes-controller
-
hosts: k8s_worker
roles:
-
role: githubixx.harden_linux
tags: role-harden-linux
-
role: githubixx.cni
tags: role-cni
-
role: githubixx.runc
tags: role-runc
-
role: githubixx.containerd
tags: role-containerd
-
role: githubixx.cilium_kubernetes
tags: role-cilium-kubernetes
-
role: githubixx.kubernetes_worker
tags: role-kubernetes-worker
-
hosts: traefik
roles:
- role: githubixx.traefik_kubernetes
tags: role-traefik-kubernetes
-
hosts: cert_manager
roles:
- role: githubixx.cert_manager_kubernetes
tags: role-cert-manager-kubernetes
But using this file right from the start wont work as Ansible would complain that a lots of roles not installed. So lets start with a playbook file k8s.yml
that looks like that:
---
-
hosts: k8s_etcd
roles:
-
role: githubixx.harden_linux
tags: role-harden-linux
-
hosts: k8s_controller
roles:
-
role: githubixx.harden_linux
tags: role-harden-linux
-
hosts: k8s_worker
roles:
-
role: githubixx.harden_linux
tags: role-harden-linux
So for now the file only contains the githubixx.harden_linux
role and the hosts it should be applied.
Hardening Linux
As already mentioned I created a Ansible role for hardening a Linux installation (see ansible-role-harden-linux). It applies some basic security settings. It’s not perfect but it’ll secure the Kubernetes cluster quite a bit. Install the role via
ansible-galaxy install githubixx.harden_linux
e.g. and then include the role into the playbook (k8s.yml
in my case) like in the example above.
In the example above you see that Ansible should apply the role githubixx.harden_linux
to all Kubernetes hosts - controller and worker (you really what to harden all hosts of the Kubernetes cluster). Hardening doesn’t ends here of course. There are further things you can do like installing rootkit and vulnerability scanner, IDS (intrusion detection system), collect and ship logs to other hosts for later analysis in case of a host was compromised and things like that. But that’s not the scope of that role and this blog post.
Regarding the syntax I used above: Later when there is not only one role but a few more or during testing it’s sometimes very handy to apply only one role at a time. That’s possible with the syntax above because if you only want apply the harden_linux role you can run
ansible-playbook --tags=role-harden-linux k8s.yml
This will only run the harden_linux
role on the specified hosts as they have the tag role-harden-linux
.
Note: Sometimes you only want to apply a role to one specific host e.g. because you only want to test it there before deploying it on all hosts. Another case could be that you want to upgrade node by node. That’s possible with e.g.
ansible-playbook --tags=role-harden-linux --limit=k8s-010101.i.example.com k8s.yml
Ansible facts caching
This works fine as long as you don’t need facts from other hosts. But for the etcd
role e.g. you need facts from other hosts. For the etcd
role Ansible needs to know the IPs of all hosts in the [k8s_etcd]
group to render the etcd systemd service file. But if you limit the run to one host Ansible won’t gather the facts of the other hosts and will fail.
One possible workaround is to fetch the facts upfront and cache them. For this to work you need to adjust a few settings in ansible.cfg
. So lets extend the file I already created before. Until yet it only contained the inventory
setting:
[defaults]
inventory = hosts
gathering = smart
fact_caching = jsonfile
fact_caching_connection = ./factscache
fact_caching_timeout = 86400
factscache
is a directory. So in the venv
directory create it with mkdir factscache
. If you now run
ansible -m setup all
Ansible will gather facts (like network addresses, disk information, RAM, CPU, and so on) of all hosts. It will store a file for every host (which is called like the host name) in the directory you specified in fact_caching_connection
and cache the entries for fact_caching_timeout
seconds (in the example above it’s one day). This is very useful and I recommend to use this workaround as it saves quite some pain especially while doing your first experiments.
Run ansible -m setup all
after you add a new host or did some major changes like changing IP address of a host. E.g. after installing the WireGuard
role the first time which adds a new interface and IP address we need to make Ansible aware of that “fact”.
Note: If you add a new host it may have a different SSH port as the other hosts (if you changed it via the harden_linux
role). You can specify specific SSH settings like port and login user in the Ansible host_vars
directory and the matching host file or in the hosts
inventory file as you already saw above. E.g. you added the host and file k8s-010104.i.example.com
then you can temporary define this Ansible variable:
ansible_port: 22
As you can see k8s-010104
uses different SSH port right after it was created and started. So you now can apply the harden_linux
role to the fresh and unmodified node k8s-010104
and later remove/change this entry and extend the node range from k8s-01[01:03][01:03]...
to k8s-01[01:03][01:04]...
because k8s-010104
has now the harden_linux
role applied and should behave like the older nodes. Of course you can also use various parameters with ansible-playbook
command to achieve the same effect.
If you start a new host at Hetzner Cloud or Scaleway you login as root
by default (and I guess that’s true for some other provider too). That’s normally not considered a good practice and that’s one thing the role can change. In general the role will help me to accomplish the following goals (some are optional):
- Change root password
- Add a regular user used for administration (e.g. for Ansible or login via SSH)
- Allow this regular user mentioned above executing commands via sudo
- Adjust APT update intervals
- Setup UFW firewall and allow only SSH access by default (add more ports/networks if you like)
- Adjust sysctl settings (/proc filesystem)
- Change SSH default port
- Disable SSH password authentication
- Disable SSH root login
- Install sshguard and adjust whitelist
- Delete files not needed
- Delete OS packages not needed
- Install optional OS packages
- Install and configure a Network Time Synchronization (NTP) service e.g.
openntpd
,ntp
orsystemd-timesyncd
- Change
systemd-resolved
configuration - Ensure files are absent
Ansible roles can be customized via variables. Let’s talk shortly about the variables that are needed to be specified. The harden_linux
role should be applied to all of my Kubernetes hosts. But the firewall settings will be different for etcd
and the worker
nodes while others are shared by all hosts e.g. So the variables specific for etcd
will go into group_vars/k8s_etcd.yml
. Variables in group_vars
directory will be applied to a group of hosts. In the example that’s the host group k8s_etcd
. That’s the group for all etcd
hosts in the Ansible hosts
file above. Then I’ll also have group_vars/k8s_controller.yml
and group_vars/k8s_worker.yml
to specify variables specific for that host groups. And as you already saw above there is also a group called k8s_all
. It contains all etcd
, K8s controller and worker nodes. Variables that shared by all these hosts will go into group_vars/k8s_all.yml
.
So lets start filling these files mentioned. My Virtual Machines have no password set for root
user. Since I don’t want to have one I don’t care about that setting in my case. But if you want to change a default root
password e.g. you can do so e.g.:
harden_linux_root_password: crypted_pw
This variable is optional and the root
password is only changed if that variable is set. group_vars/k8s_all.yml
might be the file to set this variable if all hosts in that group have a root
password set you want to change or don’t have a root
password at all and you want to add one. As you most probably already guessed this password needs to be encrypted. The same is true for harden_linux_deploy_user_password
(see below). Ansible won’t encrypt the password for you. To create an encrypted password one can do so e.g. with
mkpasswd --method=sha-512
Just to mention it: Passwords or secrets in general can be stored and managed with ansible-vault but that is out of scope of this tutorial.
The next variables are needed if one wants to create a user that is able to run sudo
without password. So far I always used the user that was created by cloud-init
during host startup for Ansible. This is something I gonna change now. I’ll create a user and group called ansible
just for that use case. While it would be possible to set a password for that user with harden_linux_deploy_user_password
(again the password needs to be encrypted), I wont set a password for security reasons. If harden_linux_deploy_group
and harden_linux_deploy_user
are not set, no user will be created. I’ll add these variables to group_vars/k8s_all.yml
:
harden_linux_deploy_group: "ansible"
harden_linux_deploy_group_gid: "9999"
harden_linux_deploy_user: "ansible"
harden_linux_deploy_user_home: "/home/ansible"
harden_linux_deploy_user_shell: "/bin/bash"
harden_linux_deploy_user_uid: "9999"
harden_linux_deploy_user_public_keys:
- /home/user/.ssh/id_ansible.pub
harden_linux_deploy_group
specifies the group name for the user to create and harden_linux_deploy_group_gid
the group ID.
harden_linux_deploy_user
specifies the user I want to use for Ansible to login to the remote hosts. This user will get sudo
permission which is needed for Ansible to do it’s work.
harden_linux_deploy_user_public_keys
specifies a list of public SSH key files (or just one file) that should be added to $HOME/.ssh/authorized_keys
of the deploy user on the remote host. Specifying /home/user/.ssh/id_ansible.pub
e.g. as an argument means that the content of that local file (local in sense of the Ansible Controller node) will be added to $HOME/.ssh/authorized_keys
of the deploy user (user ansible
in this case) on the remote host. Such a SSH key pair can be generated with ssh-keygen -o -a 100 -t ed25519 -f .ssh/id_ansible
e.g. (you have to be in your $HOME
directory).
The following variables have defaults (for all possible settings see defaults/main.yml file of that role). Only change if you need another value for the variable.
The role changes some sshd
settings by default:
harden_linux_sshd_settings:
# Disable password authentication
"^PasswordAuthentication": "PasswordAuthentication no"
# Disable SSH root login
"^PermitRootLogin": "PermitRootLogin no"
# Disable tun(4) device forwarding
"^PermitTunnel": "PermitTunnel no"
# Set sshd port
"^Port ": "Port 22"
Personally I always change the default SSH port as lots of brute force attacks taking place against this port (Note: This won’t prevent you completely from attacks as there are scanners out there that are able to scan the whole internet in a few minutes…). So if you want to change the port setting for example you can do so:
harden_linux_sshd_settings_user:
"^Port ": "Port 22222"
(Please notice the whitespace after “^Port”!). The role will combine harden_linux_sshd_settings
and harden_linux_sshd_settings_user
while the settings in harden_linux_sshd_settings_user
have preference which means it will override the ^Port
setting/key in harden_linux_sshd_settings
.
As you may have noticed all the key’s in harden_linux_sshd_settings
and harden_linux_sshd_settings_user
begin with ^
. That’s because it is a regular expression (regex). One task of the role will search for a line in /etc/ssh/sshd_config
e.g. ^Port
(while the ^
means “a line starting with …”) and replaces the line (if found) with e.g Port 22222
. This makes the playbook very flexible for adjusting settings in sshd_config
(you can basically replace every setting). You’ll see this pattern for other tasks too. So everything mentioned here holds true in such cases.
The role uses UFW - Uncomplicated Firewall and Ansible’s ufw module to setup firewall rules. UFW is basically just a frontend for Linux iptables
or nftables
. So here are some defaults for the firewall:
harden_linux_ufw_defaults:
"^IPV6": 'IPV6=yes'
"^DEFAULT_INPUT_POLICY": 'DEFAULT_INPUT_POLICY="DROP"'
"^DEFAULT_OUTPUT_POLICY": 'DEFAULT_OUTPUT_POLICY="ACCEPT"'
"^DEFAULT_FORWARD_POLICY": 'DEFAULT_FORWARD_POLICY="DROP"'
"^DEFAULT_APPLICATION_POLICY": 'DEFAULT_APPLICATION_POLICY="SKIP"'
"^MANAGE_BUILTINS": 'MANAGE_BUILTINS=no'
"^IPT_SYSCTL": 'IPT_SYSCTL=/etc/ufw/sysctl.conf'
"^IPT_MODULES": 'IPT_MODULES="nf_conntrack_ftp nf_nat_ftp nf_conntrack_netbios_ns"'
These settings are basically changing the values in /etc/defaults/ufw
. While these settings are good default settings, I need to change one for Kubernetes networking to make that work: DEFAULT_FORWARD_POLICY="ACCEPT"
. To override this default setting I add the following to group_vars/k8s_all.yml
:
harden_linux_ufw_defaults_user:
"^DEFAULT_FORWARD_POLICY": 'DEFAULT_FORWARD_POLICY="ACCEPT"'
As already mentioned above this role will also combine harden_linux_ufw_defaults
and harden_linux_ufw_defaults_user
while the settings in harden_linux_ufw_defaults_user
have preference which means it will override the ^DEFAULT_FORWARD_POLICY
setting in harden_linux_ufw_defaults
.
Next I specify some firewall rules with harden_linux_ufw_rules
. This is the default:
harden_linux_ufw_rules:
- rule: "allow"
to_port: "22"
protocol: "tcp"
So by default only SSH access on port 22 is allowed. If you changed the SSH Port
setting above to 22222
e.g., you need to add a firewall rule too to allow incoming traffic. Additionally I also add a firewall rule for WireGuard
(which uses port 51820/udp
by default) which I’ll use in a later blog post. While these two rules would apply for all hosts it wont hold true for other rules I’ll add later. That’s why I add the following settings to group_vars/k8s_etcd.yml
and group_vars/k8s_worker.yml
:
harden_linux_ufw_rules:
- rule: "allow"
to_port: "22222"
protocol: "tcp"
- rule: "allow"
to_port: "51820"
protocol: "udp"
You can add more settings for a rule like interface
, from_ip
, and so on. Please have a look at the role’s README (search for from_ip
) for all possible settings.
You can also allow hosts to communicate on specific networks (without port restrictions). E.g. I’ll add the range I’ll use later for Cilium
(which is used for Pod to Pod communication if the Pods that want to communicate are located on different hosts). kube-controller-manager
’s cluster-cidr
setting is 10.200.0.0/16
by default. Cilium
’s IPAM will allocate /24
network blocks out of 10.0.0.0/8
range (see Cilium’s cluster-pool-ipv4-cidr
and cluster-pool-ipv4-mask-size
settings). Those /24
IP blocks will be assigned to a every node and the Pods running there will get an IP out of that /24
range. Allowing traffic for 10.0.0.0/8
network range also covers the WireGuard
VPN 10.0.11.0/24
. I also add the public IP range of my Kubernetes nodes which is 192.168.11.0/24
(well it’s not “public” in sense that this IP range is available to the public Internet. The “public” IPs of the VMs can be reached in my internal network only.). As this should be only applied to the K8s worker nodes I’ll put the setting into group_vars/k8s_worker.yml
:
harden_linux_ufw_allow_networks:
- "10.0.0.0/8"
- "192.168.11.0/24"
It most probably makes also sense to add at least the IP of the Ansible Controller node here too if it has a static IP.
If you want to avoid problems regarding the firewall rules blocking your Kubernetes traffic you can start with more relaxed settings in a development environment and simply allow all three private IP ranges defined in RFC1918 e.g.:
harden_linux_ufw_allow_networks:
- "10.0.0.0/8"
- "172.16.0.0/12"
- "192.168.0.0/16"
The next settings are some system variablessysctl.conf
/ proc
filesystem). These settings are recommendations from Google which they use for their Google Compute Cloud OS images (see GCP - Requirements to build custom images and Configure security best practices). These are the default settings (if you are happy with this settings you don’t have to do anything but I recommend to verify if they work for your setup):
harden_linux_sysctl_settings:
"net.ipv4.tcp_syncookies": 1 # Enable syn flood protection
"net.ipv4.conf.all.accept_source_route": 0 # Ignore source-routed packets
"net.ipv6.conf.all.accept_source_route": 0 # IPv6 - Ignore ICMP redirects
"net.ipv4.conf.default.accept_source_route": 0 # Ignore source-routed packets
"net.ipv6.conf.default.accept_source_route": 0 # IPv6 - Ignore source-routed packets
"net.ipv4.conf.all.accept_redirects": 0 # Ignore ICMP redirects
"net.ipv6.conf.all.accept_redirects": 0 # IPv6 - Ignore ICMP redirects
"net.ipv4.conf.default.accept_redirects": 0 # Ignore ICMP redirects
"net.ipv6.conf.default.accept_redirects": 0 # IPv6 - Ignore ICMP redirects
"net.ipv4.conf.all.secure_redirects": 1 # Ignore ICMP redirects from non-GW hosts
"net.ipv4.conf.default.secure_redirects": 1 # Ignore ICMP redirects from non-GW hosts
"net.ipv4.ip_forward": 0 # Do not allow traffic between networks or act as a router
"net.ipv6.conf.all.forwarding": 0 # IPv6 - Do not allow traffic between networks or act as a router
"net.ipv4.conf.all.send_redirects": 0 # Don't allow traffic between networks or act as a router
"net.ipv4.conf.default.send_redirects": 0 # Don't allow traffic between networks or act as a router
"net.ipv4.conf.all.rp_filter": 1 # Reverse path filtering - IP spoofing protection
"net.ipv4.conf.default.rp_filter": 1 # Reverse path filtering - IP spoofing protection
"net.ipv4.icmp_echo_ignore_broadcasts": 1 # Ignore ICMP broadcasts to avoid participating in Smurf attacks
"net.ipv4.icmp_ignore_bogus_error_responses": 1 # Ignore bad ICMP errors
"net.ipv4.icmp_echo_ignore_all": 0 # Ignore bad ICMP errors
"net.ipv4.conf.all.log_martians": 1 # Log spoofed, source-routed, and redirect packets
"net.ipv4.conf.default.log_martians": 1 # Log spoofed, source-routed, and redirect packets
"net.ipv4.tcp_rfc1337": 1 # Implement RFC 1337 fix
"kernel.randomize_va_space": 2 # Randomize addresses of mmap base, heap, stack and VDSO page
"fs.protected_hardlinks": 1 # Provide protection from ToCToU races
"fs.protected_symlinks": 1 # Provide protection from ToCToU races
"kernel.kptr_restrict": 1 # Make locating kernel addresses more difficult
"kernel.perf_event_paranoid": 2 # Set perf only available to root
You can override every single setting. For Kubernetes we’ll override the following settings to allow packet forwarding between network interfaces which is needed for the “pod network”. I’ll put these settings into group_vars/k8s_all.yml
as all nodes have the WireGuard
interface:
harden_linux_sysctl_settings_user:
"net.ipv4.ip_forward": 1
"net.ipv6.conf.default.forwarding": 1
"net.ipv6.conf.all.forwarding": 1
One of the Ansible’s role task will combine harden_linux_sysctl_settings
and harden_linux_sysctl_settings_user
while again harden_linux_sysctl_settings_user
settings have preference. Again have a look at defaults/main.yml file of the role for more information about the settings.
If you want UFW (firewall) logging enabled set (group_vars/k8s_all.yml
should be fit best):
harden_linux_ufw_logging: 'on'
Possible values are on
,off
,low
,medium
,high
and full
.
And finally there are the sshguard
settings. sshguard protects from brute force attacks against SSH (and other services). To avoid locking out yourself for a while you can add IPs or IP ranges to a whitelist. By default it’s basically only “localhost”:
harden_linux_sshguard_whitelist:
- "127.0.0.0/8"
- "::1/128"
I recommend to additionally add your WireGuard
ranges here too at least. Also think about adding the IP of the host you administer the Kubernetes cluster and/or the IP of the host you run Ansible (maybe a Jenkins host e.g.).
Now I can apply the role to the hosts:
ansible-playbook --tags=role-harden-linux k8s.yml
If I now run ansible -m ping k8s_all
it’ll fail. That’s because I changed the SSH port to 22222
. Additionally I want Ansible to use the user ansible
to connect to the remote hosts and I created a new SSH key pair just for that user. So I’ve to adjust the settings accordingly in Ansible’s hosts
file. E.g.:
all:
hosts:
...
vars:
ansible_port: "22222"
ansible_user: "ansible"
ansible_ssh_private_key_file: "/home/user/.ssh/id_ansible"
Now ansible -m ping k8s_all
should work again as before.
If you add a new worker sometime in the future you should apply this role first (and only this role) to the new host. This new host might have different connection settings at first if the SSH default port was changed e.g. So add that hosts temporary to Ansible’s hosts
file and set the connection settings accordingly. After the role was deployed you can remove that host again and add it to the group it belongs e.g. To limit the execution of the playbook while it doesn’t have the final settings run
ansible-playbook --tags=role-harden-linux --limit=host.i.example.com k8s.yml
(replace host.i.example.com
with the the actual hostname of course).
sshd_config settings
But now I need to specify the port parameter all the time if I want to ssh
to the host one may object. Don’t fear. Just create a file $HOME/.ssh/config
and add e.g.:
Host k8s-01* *.i.example.com *.p.example.com
Port 22222
User deploy
Of course you need to adjust *.(i|p).example.com
with the domain name you have used above. Now you can use SSH as you did before and don’t need to worry about the SSH port anymore.
Now that the hosts are secured a little bit head over to the next part of the tutorial! There I’ll install WireGuard
and use it as a (not so poor man’s) replacement for AWS VPC or Google Cloud Networking.