Kubernetes the not so hard way with Ansible - The worker (updated for K8s v1.17.x)
2020-08-08
IMPORTANT NOTE:
I no longer use Flannel. I switched to Cilium instead. So this blog post might get outdated as I no longer maintain it. Besides flannel
everything else will stay the same so you should be able to use my new blog post too even if you stay with flannel
. flannel
didn’t changed that much within the last two years so I don’t expect too much change in the future.
2020-04-05
- update
k8s_release
to1.17.4
2019-11-14
- update
k8s_release
to1.16.3
2019-09-12
- update
k8s_release
to1.15.3
- removed deprecated
--allow-privileged
kubelet flag (see Node in K8s changelog)
2019-05-20
- update
k8s_release
to1.14.2
- update
k8s_cni_plugin_version
to0.7.5
- introduce
k8s_cni_plugin_checksum
variable to determine if CNI plugin tarball has changed and needs to be unarchived - update Docker to
18.09.06
- update CoreDNS to
1.4.0
2019-01-15
- update
k8s_release
to1.13.2
2018-10-03
- Since Kubernetes v1.11 CoreDNS has reached General Availability (GA) for Kubernetes Cluster DNS as an alternative to the kube-dns addon. While you still can use kube-dns I added the possibility to use CoreDNS as a kube-dns replacement in the text.
2018-09-30
- upgrade to
k8s_release
to1.11.3
for Kubernetes v1.11.3 - Docker role now allows to distribute CA file for Docker registry with self signed certificate
- kube-proxy now uses IPVS (Linux Virtual Server) to manage Kubernetes service IPs via virtual server table in the Linux kernel. Use
ipvsadm -Ln
on the worker nodes to have a look which entrieskube-proxy
created.
This post is based on Kelsey Hightower’s Kubernetes The Hard Way - Bootstrapping Kubernetes Workers.
To allow easy communication between the hosts and their services (etcd, API server, kubelet, …) we installed WireGuard VPN. This gives us some kind of a unified and secure network for our Kubernetes hosts (like a AWS VPC or Google Cloud Engine VPC). Now we need something similar for our pods we want to run in our cluster (let’s call it the pod network - which doesn’t really exists as it is basically mostly routing and NAT/IPVS stuff but it makes things easier to name ;-) ). One part of this puzzle is flannel. flannel is a network fabric for containers, designed for Kubernetes.
First we need a big IP range for that. The default value in my flannel
role ansible-role-flanneld is 10.200.0.0/16
(flannel_ip_range
). This range is stored in etcd
which we already use for kube-apiserver
. Flannel
will use a /24
subnet for every host where flanneld
runs on out of that big IP range we configured. Further every pod on a worker node will get a IP address out of the /24
subnet which flannel
uses for a specific host. On the flannel site you can see a diagram that shows pretty good how this works.
As already mentioned I created a role for installing flannel: ansible-role-flanneld. Install the role via
ansible-galaxy install githubixx.flanneld
We use the following settings (most of them are defaults anyway):
# The interface on which the K8s services should listen on. As all cluster
# communication should use the VPN interface the interface name is
# normally "wg0", "tap0" or "peervpn0".
k8s_interface: "wg0"
# Directory where the K8s certificates and other configuration are stored
# on the Kubernetes hosts.
k8s_conf_dir: "/var/lib/kubernetes"
# CNI network plugin directory
k8s_cni_conf_dir: "/etc/cni/net.d"
# The directory from where to copy the K8s certificates. By default this
# will expand to user's LOCAL $HOME (the user that run's "ansible-playbook ..."
# plus "/k8s/certs". That means if the user's $HOME directory is e.g.
# "/home/da_user" then "k8s_ca_conf_directory" will have a value of
# "/home/da_user/k8s/certs".
k8s_ca_conf_directory: "{{ '~/k8s/certs' | expanduser }}"
etcd_bin_dir: "/usr/local/bin"
etcd_client_port: 2379
etcd_certificates:
- ca-etcd.pem
- ca-etcd-key.pem
- cert-etcd.pem
- cert-etcd-key.pem
flannel_version: "v0.10.0"
flannel_etcd_prefix: "/kubernetes-cluster/network"
flannel_ip_range: "10.200.0.0/16"
flannel_backend_type: "vxlan"
flannel_cni_interface: "cni0"
flannel_subnet_file_dir: "/run/flannel"
flannel_options_dir: "/etc/flannel"
flannel_bin_dir: "/usr/local/sbin"
flannel_cni_conf_file: "10-flannel"
flannel_cni_spec_version: "0.3.1"
flannel_systemd_restartsec: "5"
flannel_systemd_limitnofile: "40000"
flannel_systemd_limitnproc: "1048576"
# "ExecStartPre" directive in flannel's systemd service file. This command
# is executed before flannel service starts.
flannel_systemd_execstartpre: "/bin/mkdir -p {{flannel_subnet_file_dir}}"
# "ExecStartPost" directive in flannel's systemd service file. This command
# is execute after flannel service is started. If you run in Hetzner cloud
# this may be important. In this case it changes the TX checksumming offload
# parameter for the "flannel.1" interface. It seems that there is a
# (kernel/driver) checksum offload bug with flannel vxlan encapsulation
# (all inside UDP) inside WireGuard encapsulation.
# flannel_systemd_execstartpost: "/sbin/ethtool -K flannel.1 tx off"
flannel_settings:
"etcd-cafile": "{{k8s_conf_dir}}/ca-etcd.pem"
"etcd-certfile": "{{k8s_conf_dir}}/cert-etcd.pem"
"etcd-keyfile": "{{k8s_conf_dir}}/cert-etcd-key.pem"
"etcd-prefix": "{{flannel_etcd_prefix}}"
"iface": "{{k8s_interface}}"
"public-ip": "{{hostvars[inventory_hostname]['ansible_' + k8s_interface].ipv4.address}}"
"subnet-file": "{{flannel_subnet_file_dir}}/subnet.env"
"ip-masq": "true"
"healthz-ip": "0.0.0.0"
"healthz-port": "0" # 0 = disable
flannel_cni_conf: |
{
"name": "{{flannel_cni_interface}}",
"cniVersion": "{{flannel_cni_spec_version}}",
"plugins": [
{
"type": "flannel",
"delegate": {
"hairpinMode": true,
"isDefaultGateway": true
}
},
{
"type": "portmap",
"capabilities": {
"portMappings": true
}
}
]
}
The settings for flannel
daemon defined in flannel_settings
can be overridden by defining a variable called flannel_settings_user
. You can also add additional settings by using this variable. E.g. to override healthz-ip
default value and add kubeconfig-file
setting add the following settings to group_vars/all.yml
or where ever it fit’s best for you:
flannel_settings_user:
"healthz-ip": "1.2.3.4"
"kubeconfig-file": "/etc/k8s/k8s.cfg"
Basically there should be no need to change any of the settings if you used mostly the default settings of my other roles so far. Maybe there’re two settings which you may choose to change. etcd-prefix
is the path in etcd
where flannel
will store it’s config
object. So with the default above the whole path to the flannel
config object in etcd
would be /kubernetes-cluster/network/config
. Next flannel_ip_range
is the big IP range I mentioned above. Don’t make it too small! For every host flannel will choose a /24
subnet out of this range.
Next we extend our k8s.yml
playbook file and add the role e.g.:
-
hosts: k8s:children
roles:
-
role: githubixx.flanneld
tags: role-kubernetes-flanneld
As you can see flanneld will be installed on all nodes (group k8s:children
includes controller, worker and etcd in my case). I decided to do so because I’ll have Docker running on every host so it makes sense to have one unified network setup for all hosts that run Docker. Be aware that flanneld
needs to run BEFORE Docker (the systemd Docker file takes care for you but for now it’s important to install flannel
before Docker)! Now you can apply the role to all specified hosts:
ansible-playbook --tags=role-kubernetes-flanneld k8s.yml
Now we need to install Docker
on all of our nodes (you don’t need Docker
on the etcd
hosts if you used separate nodes for etcd
and controller
). You can use whatever Ansible Docker playbook you want to install Docker
(you should find quite a few out there ;-) ). I created my own because I wanted to use the official Docker binaries archive, overlay FS storage driver and a custom systemd unit file. Be aware that you need to set a few options to make Docker
work with flannel
overlay network. Also we use Docker
18.09. At time of writing this is the latest Docker version which is supported/recommended by Kubernetes v1.17.x. If you want to use my Docker
playbook you can install it via
ansible-galaxy install githubixx.docker
The playbook has the following default variables:
docker_download_dir: "/opt/tmp"
docker_version: "18.09.6"
docker_user: "docker"
docker_group: "docker"
docker_uid: 666
docker_gid: 666
docker_bin_dir: "/usr/local/bin"
dockerd_settings:
"host": "unix:///var/run/docker.sock"
"log-level": "error"
"storage-driver": "overlay"
"iptables": "false"
"ip-masq": "false"
"bip": ""
"mtu": "1472"
The settings for dockerd
daemon defined in dockerd_settings
can be overridden by defining a variable called dockerd_settings_user
. You can also add additional settings by using this variable. E.g. to override mtu
default value and add debug
add the following settings to group_vars/all.yml
or where ever it fit’s best for you:
dockerd_settings_user:
"mtu": "1450"
"debug": ""
There should be no need to change any of this default values besides maybe storage-driver
. If you don’t use my Docker role pay attention to set at least the last four default settings mentioned above correct.
Optional: If you run your own Docker registry it may make sense to distribute the certificate authority file to your worker nodes to make sure that your worker nodes trust the SSL certificate that the registry offers (e.g. if you created a self signed certificate). The role allows you to distribute the CA file:
# The directory from where to copy the Docker CA certificates. By default this
# will expand to user's LOCAL $HOME (the user that run's "ansible-playbook ..."
# plus "/docker-ca-certificates". That means if the user's $HOME directory is e.g.
# "/home/da_user" then "docker_ca_certificates_src_dir" will have a value of
# "/home/da_user/docker-ca-certificates".
docker_ca_certificates_src_dir: "{{ '~/docker-ca-certificates' | expanduser }}"
# The directory where the program "update-ca-certificates" searches for CA certificate
# files (besides other locations).
docker_ca_certificates_dst_dir: "/usr/local/share/ca-certificates"
As usual place the variables in group_vars/all.yml
if you want to change variables. Add the role to our playbook file k8s.yml
e.g.:
-
hosts: k8s:children
roles:
-
role: githubixx.docker
tags: role-docker
A word about storage-driver
: It makes sense to use a recent kernel for Docker in general. I recommend to use a kernel >=4.14.x if possible. Verify that you have overlayfs
filesystem available on your worker instances (execute cat /proc/filesystems | grep overlay
. If you see an output you should be fine). If the kernel module isn’t compiled into the kernel you can normally load it via modprobe -v overlay
(-v
gives us a little bit more information). We’ll configure Docker to use overlayfs
by default because it’s one of the best choices (Docker 1.13.x started to use overlayfs
by default if available). But you can change the storage driver via the storage-driver
setting if you like. Again: Use kernel >=4.14.x if possible!
Now you can roll out Docker role on all nodes using
ansible-playbook --tags=role-docker k8s.yml
In Kubernetes control plane we installed Kubernetes API server, Scheduler and Controller manager on the controller nodes. For the worker I’ve also prepared a Ansible role which installs Kubernetes worker. The Kubernetes part of a worker node needs a kubelet
and a kube-proxy
daemon. The worker do the “real” work. They run the pods and the Docker container. So in production and if you do real work it won’t hurt if you choose bigger iron for the worker hosts ;-) The kubelet
is responsible to create a pod/container on a worker node if the scheduler had chosen that node to run a pod on. The kube-proxy
cares about routes. E.g. if a pod or a service was added kube-proxy
takes care to update routing rules in iptables (or maybe IPVS on newer Kubernetes installations) accordingly.
The worker depend on the infrastructure we installed in control plane. The role uses the following variables:
# The directory to store the K8s certificates and other configuration
k8s_conf_dir: "/var/lib/kubernetes"
# The directory to store the K8s binaries
k8s_bin_dir: "/usr/local/bin"
# K8s release
k8s_release: "1.17.4"
# The interface on which the K8s services should listen on. As all cluster
# communication should use a VPN interface the interface name is
# normally "wg0" (WireGuard),"peervpn0" (PeerVPN) or "tap0".
k8s_interface: "wg0"
# The directory from where to copy the K8s certificates. By default this
# will expand to user's LOCAL $HOME (the user that run's "ansible-playbook ..."
# plus "/k8s/certs". That means if the user's $HOME directory is e.g.
# "/home/da_user" then "k8s_ca_conf_directory" will have a value of
# "/home/da_user/k8s/certs".
k8s_ca_conf_directory: "{{ '~/k8s/certs' | expanduser }}"
# Directory where kubeconfig for Kubernetes worker nodes and kube-proxy
# is stored among other configuration files. Same variable expansion
# rule applies as with "k8s_ca_conf_directory"
k8s_config_directory: "{{ '~/k8s/configs' | expanduser }}"
# K8s worker binaries to download
k8s_worker_binaries:
- kube-proxy
- kubelet
- kubectl
# Certificate/CA files for API server and kube-proxy
k8s_worker_certificates:
- ca-k8s-apiserver.pem
- ca-k8s-apiserver-key.pem
- cert-k8s-apiserver.pem
- cert-k8s-apiserver-key.pem
# Download directory for archive files
k8s_worker_download_dir: "/opt/tmp"
# Directory to store kubelet configuration
k8s_worker_kubelet_conf_dir: "/var/lib/kubelet"
# kubelet settings
k8s_worker_kubelet_settings:
"allow-privileged": "true"
"config": "{{k8s_worker_kubelet_conf_dir}}/kubelet-config.yaml"
"node-ip": "{{hostvars[inventory_hostname]['ansible_' + k8s_interface].ipv4.address}}"
"container-runtime": "docker"
"image-pull-progress-deadline": "2m"
"kubeconfig": "{{k8s_worker_kubelet_conf_dir}}/kubeconfig"
"network-plugin": "cni"
"cni-conf-dir": "{{k8s_cni_conf_dir}}"
"cni-bin-dir": "{{k8s_cni_bin_dir}}"
"cloud-provider": ""
"register-node": "true"
# kublet kubeconfig
k8s_worker_kubelet_conf_yaml: |
kind: KubeletConfiguration
apiVersion: kubelet.config.k8s.io/v1beta1
address: {{hostvars[inventory_hostname]['ansible_' + k8s_interface].ipv4.address}}
authentication:
anonymous:
enabled: false
webhook:
enabled: true
x509:
clientCAFile: "{{k8s_conf_dir}}/ca-k8s-apiserver.pem"
authorization:
mode: Webhook
clusterDomain: "cluster.local"
clusterDNS:
- "10.32.0.254"
failSwapOn: true
healthzBindAddress: "{{hostvars[inventory_hostname]['ansible_' + k8s_interface].ipv4.address}}"
healthzPort: 10248
runtimeRequestTimeout: "15m"
serializeImagePulls: false
tlsCertFile: "{{k8s_conf_dir}}/cert-k8s-apiserver.pem"
tlsPrivateKeyFile: "{{k8s_conf_dir}}/cert-k8s-apiserver-key.pem"
# Directory to store kube-proxy configuration
k8s_worker_kubeproxy_conf_dir: "/var/lib/kube-proxy"
# kube-proxy settings
k8s_worker_kubeproxy_settings:
"config": "{{k8s_worker_kubeproxy_conf_dir}}/kubeproxy-config.yaml"
k8s_worker_kubeproxy_conf_yaml: |
kind: KubeProxyConfiguration
apiVersion: kubeproxy.config.k8s.io/v1alpha1
bindAddress: {{hostvars[inventory_hostname]['ansible_' + k8s_interface].ipv4.address}}
clientConnection:
kubeconfig: "{{k8s_worker_kubeproxy_conf_dir}}/kubeconfig"
healthzBindAddress: {{hostvars[inventory_hostname]['ansible_' + k8s_interface].ipv4.address}}:10256
mode: "ipvs"
ipvs:
minSyncPeriod: 0s
scheduler: ""
syncPeriod: 2s
iptables:
masqueradeAll: true
clusterCIDR: "10.200.0.0/16"
# CNI network plugin settings
k8s_cni_dir: "/opt/cni"
k8s_cni_bin_dir: "{{k8s_cni_dir}}/bin"
k8s_cni_conf_dir: "/etc/cni/net.d"
k8s_cni_plugin_version: "0.7.5"
# SHA512 checksum (see https://github.com/containernetworking/plugins/releases)
k8s_cni_plugin_checksum: "1abfb567c13f87aab94efd6f2c6bb17f3581cbcce87bf8c6216510a92486cb44e6b8701fd3a6cb85273a97981feecbd0c4eb624eab0eca5a005fd85a7d95c284"
The role will search for the certificates we created in certificate authority in the directory you specify in k8s_ca_conf_directory
on the host you run Ansible. The files used here are listed in k8s_certificates
. The Kubernetes worker binaries we need are listed in k8s_worker_binaries
. The Kubelet can use CNI (the Container Network Interface) to manage machine level networking requirements. The CNI archive that we want do download is specified in k8s_cni_plugin_version
and will be placed in k8s_cni_dir
.
If you created a different VPN interface (e.g. peervpn0
) change k8s_interface
.
Now add an entry for your worker hosts into Ansible’s hosts
file e.g.:
[k8s_worker]
worker0[1:3].i.domain.tld
Install the role via
ansible-galaxy install githubixx.kubernetes_worker
Next add the role to k8s.yml
file e.g.:
hosts: k8s_worker
roles:
-
role: githubixx.kubernetes_worker
tags: role-kubernetes-worker
Run the playbook via
ansible-playbook --tags=role-kubernetes-worker k8s.yml
Now that we’ve installed basically everything needed for running pods,deployments,services,… we should be able to do a sample deployment. On your laptop run:
kubectl run my-nginx --image=nginx --replicas=4 --port=80
This will deploy 4 pods running nginx. To get a overview what’s running e.g. pods,services,deployments,… run:
kubectl get all -o wide
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR
deploy/my-nginx 4 4 4 4 1m my-nginx nginx run=my-nginx
NAME DESIRED CURRENT READY AGE CONTAINERS IMAGES SELECTOR
rs/my-nginx-5d69b5ff7 4 4 4 1m my-nginx nginx pod-template-hash=182561993,run=my-nginx
NAME READY STATUS RESTARTS AGE IP NODE
po/my-nginx-5d69b5ff7-66jgk 1/1 Running 0 1m 10.200.25.2 k8s-worker2
po/my-nginx-5d69b5ff7-kphsd 1/1 Running 0 1m 10.200.5.2 k8s-worker1
po/my-nginx-5d69b5ff7-mwcb6 1/1 Running 0 1m 10.200.5.3 k8s-worker1
po/my-nginx-5d69b5ff7-w888j 1/1 Running 0 1m 10.200.25.3 k8s-worker2
You should be also able to run curl
on every master and controller node to get the default page from one of the four nginx webservers. In the case above curl http://10.200.25.2
should work on all nodes in our cluster (flanneld
magic ;-) ).
You can output the worker internal IPs and the pod CIDR’s that was assigned to that host with:
kubectl get nodes --output=jsonpath='{range .items[*]}{.status.addresses[?(@.type=="InternalIP")].address} {.spec.podCIDR} {"\n"}{end}'
10.8.0.111 10.200.0.0/24
10.8.0.112 10.200.1.0/24
The IP addresses 10.8.0.111/112
are addresses I assigned to the VPN interface (wg0
in my case) to worker01/02. That’s important since all communication should travel though the VPN interfaces.
If you just want to see if the worker nodes are ready use:
kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
worker01 Ready <none> 618d v1.17.4 10.8.0.111 <none> Ubuntu 18.04.4 LTS 5.3.0-45-generic docker://18.9.6
worker02 Ready <none> 615d v1.17.4 10.8.0.112 <none> Ubuntu 18.04.4 LTS 5.3.0-45-generic docker://18.9.6
At the end we need one last important addon to enable service discovery or DNS resolution within the K8s cluster: kube-dns
or CoreDNS
. If you use K8s >= v1.11.x
I would recommend to use CoreDNS as it is the default now.
CoreDNS
Skip to next section if you want to use kube-dns
. If you cloned the ansible-kubernetes-playbooks repository already you find a coredns
directory in there with a playbook file called coredns.yml
. I’ve added a detailed README to the playbook repository so please follow the instructions there to install CoreDNS
. Please be aware that this playbook uses Ansible’s k8s
module which was included in Ansible 2.6. The kube-dns
playbook (see next section) also works with Ansible < 2.6
.
kube-dns
Skip this if you already have installed CoreDNS
above. I’ll keep this for reference at the moment but you should go with CoreDNS
whenever possible! Also if you can’t use Ansible >= 2.6
you can still use this playbook as the CoreDNS
playbook above needs Ansible >= 2.6
.
If you cloned the ansible-kubernetes-playbooks repository already you find a kubedns
directory in there with a playbook file called kubedns.yml
. So change directory to kubedns
and adjust templates/kubedns.yaml.j2
. There’re basically only two settings you need to adjust if you don’t have used my default settings: Search for clusterIP
and for cluster.local
and change accordingly. You should find the following entries:
clusterIP: 10.32.0.254
--domain=cluster.local
--probe=kubedns,127.0.0.1:10053,kubernetes.default.svc.cluster.local,5,A
--probe=dnsmasq,127.0.0.1:53,kubernetes.default.svc.cluster.local,5,A
Now run
ansible-playbook kubedns.yml
to roll out kube-dns
deployment. If you run
kubectl get pods -l k8s-app=kube-dns -n kube-system -o wide
you should see something like this:
NAME READY STATUS RESTARTS AGE IP NODE
kube-dns-6c857864fb-bp2kx 3/3 Running 0 32m 10.200.7.9 k8s-worker2
What’s next
There’re a lot more things that could/should be done now but running Sonobuoy could be a good next step. Sonobuoy is a diagnostic tool that makes it easier to understand the state of a Kubernetes cluster by running a set of Kubernetes conformance tests in an accessible and non-destructive manner.
Also you may have a look at Velero
. Velero is a utility for managing disaster recovery, specifically for your Kubernetes cluster resources and persistent volumes.
Next up: Kubernetes the Not So Hard Way With Ansible - Ingress with Traefik