Kubernetes the not so hard way with Ansible - The worker - (K8s v1.28)
This post is based on Kelsey Hightower’s Kubernetes The Hard Way - Bootstrapping Kubernetes Workers.
General information
It makes sense to use a recent Linux kernel in general. Container runtimes like containerd
and also Cilium
(which comes later) profit a lot if a recent kernel is available. I recommend to use a kernel >=5.4
if possible. Ubuntu 20.04
provides a linux-image-5.15.0-83-generic
package with Kernel 5.15
e.g. or install the Hardware Enablement Stack (HWE)
(linux-generic-hwe-20.04
) which contains kernel 5.15
or even newer kernels. Ubuntu 20.04
already uses Kernel 5.4
by default (which contains wireguard
module by default btw.). As of writing this blog post there is already Kernel 6.5
available for Ubuntu 22.04
e.g. (and that’s the one I use by installing linux-generic-hwe-22.04-edge
package).
containerd, runc and CNI plugins
Before containerd a lot of Kubernetes installations most probably used Docker as container runtime. But Docker/dockershim
was deprecated with Kubernetes v1.21
and was removed with Kubernetes v1.24
. Behind the scene Docker
already used containerd
. So Docker
at the end was just an additional “layer” that is no longer needed for Kubernetes. containerd
together with runc
is kind of a replacement for Docker
so to say. I’ve written a blog post how to migrate from Docker/dockershim
to containerd
: Kubernetes: Replace dockershim with containerd and runc.
A container runtime is needed to execute workloads that you deploy to Kubernetes. A workload is normally a Docker container image (which you build locally, on a Jenkins server or whatever build pipeline you have in place) which runs a webserver or any other service that listens on a port.
To make containerd
work, runc and CNI plugins are needed. runc
is a CLI tool for spawning and running containers on Linux according to the OCI specification. CNI
, a Cloud Native Computing Foundation project, consists of a specification and libraries for writing plugins to configure network interfaces in Linux containers, along with a number of supported plugins. CNI concerns itself only with network connectivity of containers and removing allocated resources when the container is deleted.
The defaults of these two roles should be reasonable. I’ll just override one default setting in group_vars/k8s_worker.yml
:
cni_tmp_directory: "/opt/tmp/cni"
So lets install these two roles:
ansible-galaxy install githubixx.runc
ansible-galaxy install githubixx.cni
Next I gonna install containerd which is (kinda) modern replacement for Docker
with the help of my Ansible role for containerd. containerd
is a container runtime which will be installed on each Kubernetes worker node in the cluster so that Pods
(basically the workload distributed as container images) can run there.
So first install the Ansible role for containerd
:
ansible-galaxy install githubixx.containerd
In general the default variables of this role should be just fine. Just make sure if you changed runc_bin_directory
that you also adjust BinaryName
in containerd_config
.
For all variables the containerd
role offers please see default.yml.
As containerd
is relevant for the K8s worker nodes I’ll override two default variables in group_vars/k8s_worker.yml
. E.g.:
containerd_tmp_directory: "/opt/tmp/containerd"
containerd_binary_directory: "/usr/local/sbin"
Also add the roles (runc
, cni
and containerd
) to our playbook file k8s.yml
e.g.:
-
hosts: k8s_worker
roles:
-
role: githubixx.cni
tags: role-cni
-
role: githubixx.runc
tags: role-runc
-
role: githubixx.containerd
tags: role-containerd
If everything is in place the roles can be deployed on all worker nodes (which also includes the controller nodes as I already mentioned previously as they need Cilium
running which is deployed as Pods on every node - so worker and controller hosts):
ansible-playbook --tags=role-runc k8s.yml
ansible-playbook --tags=role-cni k8s.yml
ansible-playbook --tags=role-containerd k8s.yml
Kubernetes worker
In Kubernetes control plane I installed Kubernetes kube-apiserver
, kube-scheduler
and kube-controller-manager
on the controller nodes. For the worker I’ve also prepared an Ansible role which installs Kubernetes worker components. The Kubernetes part of a worker node needs a kubelet
and a kube-proxy
daemon. The worker do the “real” work. They run the Pods
(which are containers deployed via container images). So in production and if you do real work it won’t hurt if you choose bigger iron for the worker hosts 😉
kubelet
is responsible to create a pod/container on a worker node if the scheduler has chosen that node to run a pod on. The kube-proxy
cares about routes. E.g. if a Pod or a Service was added kube-proxy
takes care to update routing rules with iptables
(by default) or IPVS
on newer Kubernetes installations (which is the default in my roles).
The worker depends on the infrastructure that I installed in the control plane blog post. The role provides the following variables:
# The base directory for Kubernetes configuration and certificate files for
# everything worker nodes related. After the playbook is done this directory
# contains various sub-folders.
k8s_worker_conf_dir: "/etc/kubernetes/worker"
# All certificate files (Private Key Infrastructure related) specified in
# "k8s_worker_certificates" (see "vars/main.yml") will be stored here.
# Owner and group of this new directory will be "root". File permissions
# will be "0640".
k8s_worker_pki_dir: "{{ k8s_worker_conf_dir }}/pki"
# The directory to store the Kubernetes binaries (see "k8s_worker_binaries"
# variable in "vars/main.yml"). Owner and group of this new directory
# will be "root" in both cases. Permissions for this directory will be "0755".
#
# NOTE: The default directory "/usr/local/bin" normally already exists on every
# Linux installation with the owner, group and permissions mentioned above. If
# your current settings are different consider a different directory. But make sure
# that the new directory is included in your "$PATH" variable value.
k8s_worker_bin_dir: "/usr/local/bin"
# K8s release
k8s_worker_release: "1.28.5"
# The interface on which the Kubernetes services should listen on. As all cluster
# communication should use a VPN interface the interface name is
# normally "wg0" (WireGuard),"peervpn0" (PeerVPN) or "tap0".
#
# The network interface on which the Kubernetes worker services should
# listen on. That is:
#
# - kube-proxy
# - kubelet
#
k8s_interface: "eth0"
# The directory from where to copy the K8s certificates. By default this
# will expand to user's LOCAL $HOME (the user that run's "ansible-playbook ..."
# plus "/k8s/certs". That means if the user's $HOME directory is e.g.
# "/home/da_user" then "k8s_ca_conf_directory" will have a value of
# "/home/da_user/k8s/certs".
k8s_ca_conf_directory: "{{ '~/k8s/certs' | expanduser }}"
# The IP address or hostname of the Kubernetes API endpoint. This variable
# is used by "kube-proxy" and "kubelet" to connect to the "kube-apiserver"
# (Kubernetes API server).
#
# By default the first host in the Ansible group "k8s_controller" is
# specified here. NOTE: This setting is not fault tolerant! That means
# if the first host in the Ansible group "k8s_controller" is down
# the worker node and its workload continue working but the worker
# node doesn't receive any updates from Kubernetes API server.
#
# If you have a loadbalancer that distributes traffic between all
# Kubernetes API servers it should be specified here (either its IP
# address or the DNS name). But you need to make sure that the IP
# address or the DNS name you want to use here is included in the
# Kubernetes API server TLS certificate (see "k8s_apiserver_cert_hosts"
# variable of https://github.com/githubixx/ansible-role-kubernetes-ca
# role). If it's not specified you'll get certificate errors in the
# logs of the services mentioned above.
k8s_worker_api_endpoint_host: "{% set controller_host = groups['k8s_controller'][0] %}{{ hostvars[controller_host]['ansible_' + hostvars[controller_host]['k8s_interface']].ipv4.address }}"
# As above just for the port. It specifies on which port the
# Kubernetes API servers are listening. Again if there is a loadbalancer
# in place that distributes the requests to the Kubernetes API servers
# put the port of the loadbalancer here.
k8s_worker_api_endpoint_port: "6443"
# OS packages needed on a Kubernetes worker node. You can add additional
# packages at any time. But please be aware if you remove one or more from
# the default list your worker node might not work as expected or doesn't work
# at all.
k8s_worker_os_packages:
- ebtables
- ethtool
- ipset
- conntrack
- iptables
- iptstate
- netstat-nat
- socat
- netbase
# Directory to store kubelet configuration
k8s_worker_kubelet_conf_dir: "{{ k8s_worker_conf_dir }}/kubelet"
# kubelet settings
#
# If you want to enable the use of "RuntimeDefault" as the default seccomp
# profile for all workloads add these settings to "k8s_worker_kubelet_settings":
#
# "seccomp-default": ""
#
# Also see:
# https://kubernetes.io/docs/tutorials/security/seccomp/#enable-the-use-of-runtimedefault-as-the-default-seccomp-profile-for-all-workloads
k8s_worker_kubelet_settings:
"config": "{{ k8s_worker_kubelet_conf_dir }}/kubelet-config.yaml"
"node-ip": "{{ hostvars[inventory_hostname]['ansible_' + k8s_interface].ipv4.address }}"
"kubeconfig": "{{ k8s_worker_kubelet_conf_dir }}/kubeconfig"
# kubelet kubeconfig
k8s_worker_kubelet_conf_yaml: |
kind: KubeletConfiguration
apiVersion: kubelet.config.k8s.io/v1beta1
address: {{ hostvars[inventory_hostname]['ansible_' + k8s_interface].ipv4.address }}
authentication:
anonymous:
enabled: false
webhook:
enabled: true
x509:
clientCAFile: "{{ k8s_worker_pki_dir }}/ca-k8s-apiserver.pem"
authorization:
mode: Webhook
clusterDomain: "cluster.local"
clusterDNS:
- "10.32.0.254"
failSwapOn: true
healthzBindAddress: "{{ hostvars[inventory_hostname]['ansible_' + k8s_interface].ipv4.address }}"
healthzPort: 10248
runtimeRequestTimeout: "15m"
serializeImagePulls: false
tlsCertFile: "{{ k8s_worker_pki_dir }}/cert-{{ inventory_hostname }}.pem"
tlsPrivateKeyFile: "{{ k8s_worker_pki_dir }}/cert-{{ inventory_hostname }}-key.pem"
cgroupDriver: "systemd"
registerNode: true
containerRuntimeEndpoint: "unix:///run/containerd/containerd.sock"
# Directory to store kube-proxy configuration
k8s_worker_kubeproxy_conf_dir: "{{ k8s_worker_conf_dir }}/kube-proxy"
# kube-proxy settings
k8s_worker_kubeproxy_settings:
"config": "{{ k8s_worker_kubeproxy_conf_dir }}/kubeproxy-config.yaml"
k8s_worker_kubeproxy_conf_yaml: |
kind: KubeProxyConfiguration
apiVersion: kubeproxy.config.k8s.io/v1alpha1
bindAddress: {{ hostvars[inventory_hostname]['ansible_' + k8s_interface].ipv4.address }}
clientConnection:
kubeconfig: "{{ k8s_worker_kubeproxy_conf_dir }}/kubeconfig"
healthzBindAddress: {{ hostvars[inventory_hostname]['ansible_' + k8s_interface].ipv4.address }}:10256
mode: "ipvs"
ipvs:
minSyncPeriod: 0s
scheduler: ""
syncPeriod: 2s
iptables:
masqueradeAll: true
clusterCIDR: "10.200.0.0/16"
Make sure that k8s_interface: "wg0"
is set if you use WireGuard
. But it should be already set in group_vars/all.yml
because it was also used by the Control Plane nodes. I’d also recommend to extend k8s_worker_kubelet_settings
by one setting: "seccomp-default": ""
. This enables the use of “RuntimeDefault” as the default seccomp profile for all workloads. In short: This feature disables quite a few system calls e.g. reboot
. There is actually no need for a container to reboot a Kubernetes host e.g. 😉 So while still allow system calls that are relevant for “normal” workload this feature disables all system calls not relevant. For more information see my Kubernetes upgrade notes: Enable default seccomp profile. Since Kubernetes v1.27 this feature is stable. Also: Enable the use of RuntimeDefault as the default seccomp profile for all workloads. To enable and make the seccomp-default
usable I’ll change k8s_worker_kubelet_settings
variable in group_vars/k8s_worker.yml
accordingly by adding "seccomp-default": ""
:
k8s_worker_kubelet_settings:
"config": "{{ k8s_worker_kubelet_conf_dir }}/kubelet-config.yaml"
"node-ip": "{{ hostvars[inventory_hostname]['ansible_' + k8s_interface].ipv4.address }}"
"kubeconfig": "{{ k8s_worker_kubelet_conf_dir }}/kubeconfig"
"seccomp-default": ""
The role will search for the certificates I created in K8s certificate authority blog post in the directory specified in k8s_ca_conf_directory
on my Ansible Controller node. The files used here are listed in k8s_worker_certificates
(see vars/main.yml
).
The Kubernetes worker binaries needed are listed in k8s_worker_binaries
(also defined in vars/main.yml
).
kubelet
service can use CNI
(the Container Network Interface) to manage machine level networking requirements. The CNI plugins
needed were installed with the cni
role which was already mentioned above.
As you might remember I’ve installed HAProxy
in the previous blog post. It was installed on all Kubernetes Controller and Worker nodes. kubelet
and kube-proxy
should also use HAProxy
to connect to kube-apiserver
for higher availability. So I’ll set the following variables in group_vars/k8s_worker.yml
:
k8s_worker_api_endpoint_host: "127.0.0.1"
k8s_worker_api_endpoint_port: "16443"
Now I add an entry for the worker hosts (which also includes the controller nodes as mentioned above) to Ansible’s hosts
file. E.g.:
k8s_worker:
hosts:
k8s-01[01:03]02.i.example.com:
k8s-01[01:03]03.i.example.com:
Then I install the role via
ansible-galaxy install githubixx.kubernetes_worker
Next I add the role to k8s.yml
by extending the roles
list of k8s_worker
hosts list. E.g.:
hosts: k8s_worker
roles:
-
role: githubixx.kubernetes_worker
tags: role-kubernetes-worker
After that the role gets deployed on all worker nodes:
ansible-playbook --tags=role-kubernetes-worker k8s.yml
So by now it should already be possible to fetch the state of the worker nodes:
kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
k8s-010102 NotReady <none> 18m v1.28.5 10.0.11.3 <none> Ubuntu 22.04.3 LTS 6.5.0-14-generic containerd://1.7.12
k8s-010103 NotReady <none> 30s v1.28.5 10.0.11.4 <none> Ubuntu 22.04.3 LTS 6.5.0-14-generic containerd://1.7.12
k8s-010202 NotReady <none> 18m v1.28.5 10.0.11.6 <none> Ubuntu 22.04.3 LTS 6.5.0-14-generic containerd://1.7.12
k8s-010203 NotReady <none> 30s v1.28.5 10.0.11.7 <none> Ubuntu 22.04.3 LTS 6.5.0-14-generic containerd://1.7.12
k8s-010302 NotReady <none> 18m v1.28.5 10.0.11.9 <none> Ubuntu 22.04.3 LTS 6.5.0-14-generic containerd://1.7.12
k8s-010303 NotReady <none> 29s v1.28.5 10.0.11.10 <none> Ubuntu 22.04.3 LTS 6.5.0-14-generic containerd://1.7.12
The STATUS
column now reports NotReady
for all nodes. Looking at the logs on the worker nodes there will be some errors like this:
ansible -m command -a 'journalctl -t kubelet -n 50' k8s_worker
...
May 13 11:40:40 worker01 kubelet[12132]: E0513 11:40:40.646202 12132 kubelet.go:2183] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
May 13 11:40:44 worker01 kubelet[12132]: W0513 11:40:44.981728 12132 cni.go:237] Unable to update cni config: no networks found in /etc/cni/net.d
...
This will be fixed next.
Cilium
What’s missing is the software that makes it possible that pods on different hosts can communicate. Previously I used flannel. Flannel is a simple and easy way to configure a layer 3 network fabric designed for Kubernetes. But as time moves on other interesting projects pop up and one of them is Cilium.
That’s basically a one stop thing for everything which is needed for Kubernetes networking. So there is no need to install additional software for Network Policies e.g. Cilium brings API-aware network security filtering to Linux container frameworks like Docker and Kubernetes. Using a new Linux kernel technology called BPF, Cilium provides a simple and efficient way to define and enforce both network-layer and application-layer security policies based on container/pod identity. That thing has really everything like overlay networking, native routing, IPv4/v6 support, load balancing, direct server return (DSR), Gateway support (replacement for Ingress
), monitoring and troubleshooting, Hubble as an observability platform, network policies, CNI and libnetwork integration, and so on. Use of BFP and XDP makes it also very fast as most of the processing is happening in the Linux kernel and not in userspace. Also documentation is just great and of course there is also a blog.
Ok, enough Cilium praise 😉 Lets install it. I prepared an Ansible Cilium role. Download via
ansible-galaxy install githubixx.cilium_kubernetes
The role is using Cilium Helm Chart in the background. So on the Ansible Controller node I need Helm 3 binary installed. This also true for some of my other roles coming up. There are at least three ways to install Helm
:
- use your favorite package manager if your distribution includes
helm
in its repository (for Archlinux usesudo pacman -S helm
e.g.) - or use one of the Ansible
Helm
roles (e.g. helm which can be installed viaansible-galaxy role install -vr roles/githubixx.cilium_kubernetes/requirements.yml
- or directly download the binary from [Helm releases)[https://github.com/helm/helm/releases]) and put it into
/usr/local/bin/
directory e.g.
Also make sure that KUBECONFIG
variable is set correctly. But this is something that I already did earlier in my blog posts.
The role does a few things on the Kubernetes nodes but most tasks are executed on the Ansible Controller node like installing the Cilium
Helm chart, connecting to kube-apiserver
to check the status of the Cilium
deployment and stuff like that. By default the role “delegates” all tasks that need to connect to the kube-apiserver
to 127.0.0.1
. This can be changed with cilium_delegate_to
variable. I’ll set this variable in group_vars/all.yml
. In my case I’ll set it to k8s-01-ansible-ctrl.i.example.com
which is actually localhost
😉 But if I need to set some variables for this host I can do so later (see further down below).
I’ll now extend the playbook k8s.yml
to specify that the cilium_kubernetes
role should be applied to the hosts in the k8s_worker
group:
-
hosts: k8s_worker
roles:
-
role: githubixx.cilium_kubernetes
tags: role-cilium-kubernetes
As mentioned above the role delegated quite a few tasks to the Ansible Controller node. This also means that it’ll “delegate” the variables I set for this role. As I defined above that cilium_kubernetes
role should be applied to k8s_worker
hosts group I need to define the variables for this role in group_vars/k8s_worker.yml
. E.g.:
cilium_chart_version: "1.14.5"
cilium_etcd_enabled: "true"
cilium_etcd_interface: "{{ k8s_interface }}"
cilium_etcd_client_port: "2379"
cilium_etcd_nodes_group: "k8s_etcd"
cilium_etcd_secrets_name: "cilium-etcd-secrets"
cilium_etcd_cert_directory: "{{ k8s_ca_conf_directory }}"
cilium_etcd_cafile: "ca-etcd.pem"
cilium_etcd_certfile: "cert-cilium.pem"
cilium_etcd_keyfile: "cert-cilium-key.pem"
If your Kubernetes cluster isn’t that big you can actually remove all cilium_etcd_*
variables and just pin the Cilium
Helm chart version to a specific version by setting cilium_chart_version
as above. Without etcd
Cilium
stores its state in Kubernetes custom resources (CRDs). But since I’m adventurous I’ll run Cilium
with an external etcd
key-value store that I already use for my kube-apiserver
. If you’ve very strong security requirements and a big cluster it might make sense to have a separate etcd
cluster just for Cilium
(also see Installation with external etcd).
Regarding the cilium_etcd_*
values: etcd
is listening on the WireGuard
interface only as it’s part of the WireGuard
mesh. So cilium_etcd_interface: "wg0"
needs to be set or you can do something like cilium_etcd_interface: {{ k8s_interface }}
as etcd_interface
is already set in group_vars/all.yml
and so we can keep that in sync. etcd
daemons are listening on port 2379
by default. All etcd
hosts are in Ansible’s k8s_etcd
hosts group. The role will create a Kubernetes Secret
called like value specified in cilium_etcd_secrets_name
. That Secret
will contain the content of the certificate files specified in cilium_etcd_cafile
, cilium_etcd_cafile
and cilium_etcd_cafile
. Also make sure that cilium_etcd_cert_directory: "{{ k8s_ca_conf_directory }}"
is set as all certificate files created with kubernetes_ca
role earlier are stored there and the role needs some of them. The certificate files are needed to allow Cilium
to connect to etcd
.
Besides the default variables you can also adjust the variables for the Helm
chart. The default values are in cilium_values_default.yml.j2. But nothing is made in stone 😉 To use your own values just create a file called cilium_values_user.yml.j2
and put it into the templates
directory. Then this Cilium role will use that file to render the Helm values. You can use cilium_values_default.yml.j2 as a template or just start from scratch. As mentioned above you can modify all settings for the Cilium Helm chart that are different to the default ones which are located here.
To ensure that the correct Python version and KUBECONFIG
variable is used on my Ansible Controller node I’ll set ansible_python_interpreter
in host_vars/k8s-01-ansible-ctrl.i.example.com
to the python
binary in my Python venv
. E.g.
ansible_python_interpreter: "/opt/scripts/ansible/k8s-01_vms/bin/python"
And KUBECONFIG
will be set in the k8s.yml
playbook file:
-
hosts: k8s_worker
environment:
KUBECONFIG: "/opt/scripts/ansible/k8s-01_vms/kubeconfig/admin.kubeconfig"
roles:
...
For further information see the README of the role which also describes all variables. But in general with the settings above in place I should end up with Kubernetes cluster that is able to run already some workload.
If you want to check the what Kubernetes resources will be created and the configuration options you can do so. E.g.:
ansible-playbook --tags=role-cilium-kubernetes --extra-vars cilium_template_output_directory="/tmp/cilium" k8s.yml
This wont install the resources but will create a file /tmp/cilium/template.yml
on the Ansible Controller node. You can inspect the file to check if you’re fine with all the resources and values.
Now Cilium
can be installed on the worker nodes:
ansible-playbook --tags=role-cilium-kubernetes -e cilium_action=install k8s.yml
After a while there should be some Cilium
pods running:
kubectl --namespace cilium get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
cilium-2kwvz 1/1 Running 0 81s 10.0.11.10 k8s-010303 <none> <none>
cilium-57pgx 1/1 Running 0 81s 10.0.11.7 k8s-010203 <none> <none>
cilium-jrfz8 1/1 Running 0 81s 10.0.11.6 k8s-010202 <none> <none>
cilium-jxjws 1/1 Running 0 81s 10.0.11.9 k8s-010302 <none> <none>
cilium-operator-774db8f4cb-b2nzz 1/1 Running 0 81s 10.0.11.7 k8s-010203 <none> <none>
cilium-operator-774db8f4cb-q54pk 1/1 Running 0 81s 10.0.11.10 k8s-010303 <none> <none>
cilium-vb26s 1/1 Running 0 81s 10.0.11.4 k8s-010103 <none> <none>
cilium-xqk7v 1/1 Running 0 81s 10.0.11.3 k8s-010102 <none> <none>
You can also check the logs of the Pods with kubectl -n cilium --tail=500 logs cilium-....
e.g. Also kubectl get nodes -o wide
should now show all nodes as Ready
. You might also have recognized that the IP addresses are the WireGuard
IPs I’ve assigned to the Kubernetes Controller and Worker nodes.
CoreDNS
To resolve Kubernetes cluster internal DNS entries (like *.local
) which is also used for auto-discovery of services, CoreDNS can be used. And that’s also the one I cover here. For this I’ll create a directory called playbooks
in my venv
. Then I’ll change to that directory and clone ansible-kubernetes-playbooks:
git clone https://github.com/githubixx/ansible-kubernetes-playbooks
Then switch into coredns
directory. Basically you can install CoreDNS
by just running ansible-playbook coredns.yml
. By default this will install a CoreDNS
configuration which is defined in configmap.yml.j2. DNS queries to cluster.local
zone will be answered by CoreDNS
. Every other DNS zone will be forwarded to Cloudflare’s 1.1.1.1
or Quad9’s 9.9.9.9
DNS server. You can change that if you want of course and make further adjustments in the ConfigMap
. There is also a second CoreDNS
configuration: configmap_quad9_dot.yml.j2. That’s basically the same as the previous one but uses DoT (that’s DNS over TLS
). It uses Quad9’s TLS enabled DNS servers. If you want to use that one you need to change templates/configmap.yml.j2
to templates/configmap_quad9_dot.yml.j2
in coredns/tasks/install.yml
.
I’ve added a detailed README to the playbook repository. Please have a look there too for further information.
So to finally install CoreDNS
use:
ansible-playbook coredns.yml
If you run kubectl --namespace kube-system get pods -o wide
afterwards you should see the CoreDNS
servers running. E.g.:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
coredns-7fc847d54c-22dvf 1/1 Running 0 17s 10.0.5.30 k8s-010303 <none> <none>
coredns-7fc847d54c-bkj9r 1/1 Running 0 17s 10.0.0.22 k8s-010103 <none> <none>
In k8s_worker_kubelet_conf_yaml
I defined clusterDNS: "10.32.0.254
. That IP is also specified in the CoreDNS
Service. And you’ll also see it here:
kubectl --namespace kube-system get svc kube-dns -o wide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
kube-dns ClusterIP 10.32.0.254 <none> 53/UDP,53/TCP,9153/TCP 45h k8s-app=kube-dns
So if you’d like to have a different IP for the CoreDNS Service you now know where to change.
Make a test deployment
Now that I’ve installed basically everything needed for running Pods, Deployments, Services, and so on I should be able to do a sample deployment. So on my laptop I’ll run:
kubectl create namespace test
kubectl --namespace test apply -f https://k8s.io/examples/application/deployment.yaml
This will deploy two Pods running nginx
. To get a overview of what’s running:
kubectl --namespace test get all -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod/nginx-deployment-86dcfdf4c6-l4hv7 1/1 Running 0 83s 10.0.2.143 k8s-010203 <none> <none>
pod/nginx-deployment-86dcfdf4c6-rwjl7 1/1 Running 0 83s 10.0.0.41 k8s-010103 <none> <none>
NAME READY UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR
deployment.apps/nginx-deployment 2/2 2 2 83s nginx nginx:1.14.2 app=nginx
NAME DESIRED CURRENT READY AGE CONTAINERS IMAGES SELECTOR
replicaset.apps/nginx-deployment-86dcfdf4c6 2 2 2 83s nginx nginx:1.14.2 app=nginx,pod-template-hash=86dcfdf4c6
Or kubectl --namespace test describe deployment nginx-deployment
also does the job.
You should be also able get the default nginx
page on every worker node from one of the two nginx webservers. I use Ansible’s get_url module here and one should see something similar like this (I truncated the output a bit):
ansible -m get_url -a "url=http://10.0.2.143 dest=/tmp/test.html" k8s_worker
k8s-010302.i.example.com | CHANGED => {
"changed": true,
"checksum_dest": null,
"checksum_src": "7dd71afcfb14e105e80b0c0d7fce370a28a41f0a",
"dest": "/tmp/test.html",
"elapsed": 0,
"gid": 0,
"group": "root",
"md5sum": "e3eb0a1df437f3f97a64aca5952c8ea0",
"mode": "0644",
"msg": "OK (612 bytes)",
"owner": "root",
"size": 612,
"src": "/home/ansible/.ansible/tmp/ansible-tmp-1705337744.24709-566702-78220146282609/tmp68i1ohfd",
"state": "file",
"status_code": 200,
"uid": 0,
"url": "http://10.0.2.143"
}
k8s-010103.i.example.com | CHANGED => {
...
}
This should give a valid result no matter on which node the page is fetched. Cilium
“knows” on which node the Pod with the IP 10.0.2.143
is located and the request gets routed accordingly. If you’re done you can delete the nginx deployment again with kubectl --namespace test delete deployment nginx-deployment
(but maybe wait a little bit as the deployment is convenient for further testing…).
You can output the worker internal IPs and the pod CIDRs that were assigned to that host with:
kubectl get nodes --output=jsonpath='{range .items[*]}{.status.addresses[?(@.type=="InternalIP")].address} {.spec.podCIDR} {"\n"}{end}'
10.0.11.3 10.200.0.0/24
10.0.11.4 10.200.4.0/24
10.0.11.6 10.200.2.0/24
10.0.11.7 10.200.3.0/24
10.0.11.9 10.200.1.0/24
10.0.11.10 10.200.5.0/24
The IP addresses 10.0.11.xxx
are addresses I assigned to the WireGuard
VPN interface (wg0
in my case) to the worker and controller nodes. That’s important since all communication should travel though the VPN interfaces.
If you just want to see if the worker nodes are ready use:
kubectl get nodes -o wide
You should now see that STATUS
changed from NotReady
to Ready
.
If you want to test network connectivity, DNS and stuff like that a little bit we can deploy kind of a debug container which is just the slim version of a Docker Debian
image e.g.:
kubectl --namespace test run --attach testpod --rm --image=debian:stable-slim --restart=Never -- sh -c "sleep 14400"
This may take a little bit until the container image was downloaded. In a second terminal run
kubectl --namespace test exec -it testpod -- bash
After entering the container a few utilities should be installed:
apt-get update && apt-get install iputils-ping iproute2 dnsutils curl telnet
Now it should be possible to resolve the internal IP of kube-apiserver
e.g. (which should be 10.32.0.1
if you kept the default Pod IP range setting):
root@testpod:/# dig +short kubernetes.default.svc.cluster.local
10.32.0.1
or
root@testpod:/# dig www.microsoft.com
; <<>> DiG 9.18.19-1~deb12u1-Debian <<>> www.microsoft.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 39431
;; flags: qr rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: 7e754940a38fd61a (echoed)
;; QUESTION SECTION:
;www.microsoft.com. IN A
;; ANSWER SECTION:
www.microsoft.com. 17 IN CNAME www.microsoft.com-c-3.edgekey.net.
www.microsoft.com-c-3.edgekey.net. 17 IN CNAME www.microsoft.com-c-3.edgekey.net.globalredir.akadns.net.
www.microsoft.com-c-3.edgekey.net.globalredir.akadns.net. 17 IN CNAME e13678.dscb.akamaiedge.net.
e13678.dscb.akamaiedge.net. 17 IN A 23.35.229.160
;; Query time: 292 msec
;; SERVER: 10.32.0.254#53(10.32.0.254) (UDP)
;; WHEN: Mon Jan 15 18:21:11 UTC 2024
;; MSG SIZE rcvd: 363
Or resolve the IP address of a pod (that’s one of the nginx
container deployed above into test
namespace):
root@debug-pod:/# dig +short 10-0-2-143.test.pod.cluster.local
10.0.2.143
In both cases the DNS query was resolved by CoreDNS
at 10.32.0.254
. So resolving external and internal cluster.local
DNS queries works as expected. 10.32.0.254
is again kinda load balancer IP. It’s assigned to a Kubernetes Service
called kube-dns
as already mentioned above.
It should also be possible to fetch the default HTML site from the nginx
deployment (output truncated):
curl http://10.0.2.143
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
body {
width: 35em;
margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif;
}
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>
...
If you’re done with testing you can delete the created resources (if not done already). E.g.:
kubectl --namespace test delete pod testpod
kubectl --namespace test delete deployments.apps nginx-deployment
kubectl delete namespaces test
Taint Kubernetes Controller nodes
There is one final thing to do: As mentioned previously no “normal” workload should be executed on the Kubernetes Control Plane nodes. This can be done with the following task. It will add a so called Taint to all Control Plane nodes k8s-01[01:03]02
(also see Well-Know Labels, Annotations and Taints). I’ll create a file playbooks/taint_controller.yml
with the following content:
---
- name: Taint Kubernetes Controller nodes
hosts: k8s-01-ansible-ctrl.i.example.com
gather_facts: true
tasks:
- name: Taint Kubernetes control plane nodes
kubernetes.core.k8s_taint:
kubeconfig: "/opt/scripts/ansible/k8s-01_vms/kubeconfig/admin.kubeconfig"
state: present
name: "{{ hostvars[item]['inventory_hostname_short'] }}"
taints:
- effect: NoSchedule
key: "node-role.kubernetes.io/control-plane"
with_inventory_hostnames:
- k8s_controller
The change can be applied with ansible-playbook playbooks/taint_controller.yml
. The task will be executed on my Ansible Controller node k8s-01-ansible-ctrl.i.example.com
. If node-role.kubernetes.io/control-plane:NoSchedule Taint
is applied, Control Plane nodes allow only critical workloads to be scheduled and that includes the Cilium
pods (they’ve a Toleration
operator: Exists
which basically allows them to run everywhere).
At this state the Kubernetes cluster is basically fully functional 😄 But of course there are lots more that could be done…
What’s next
There’re a lot more things that could/should be done now but running Sonobuoy could be a good next step. Sonobuoy is a diagnostic tool that makes it easier to understand the state of a Kubernetes cluster by running a set of Kubernetes conformance tests (ensuring CNCF conformance) in an accessible and non-destructive manner. The test can run quite long (about an hour) but starting it is as quick as (check if there is a newer version available):
cd /tmp
wget https://github.com/vmware-tanzu/sonobuoy/releases/download/v0.57.1/sonobuoy_0.57.1_linux_amd64.tar.gz
tar xvfz sonobuoy_0.57.1_linux_amd64.tar.gz
export KUBECONFIG=/opt/scripts/ansible/k8s-01_vms/kubeconfig/admin.kubeconfig
./sonobuoy run --wait
After that is done you can inspect the results:
results=$(./sonobuoy retrieve)
./sonobuoy results $results
./sonobuoy delete --wait
Also you may have a look at Velero. It’s a utility for managing disaster recovery, specifically for your Kubernetes cluster resources and persistent volumes.
You may also want to have some monitoring e.g. by using Prometheus + Alertmanager and creating some nice Dashboards with Grafana. Also having a nice a Kubernetes Dashboard like Lens might be helpful.
Having centralized logs from containers and the Kubernetes nodes is also something very useful. For this Loki and again Grafana might be an option but there are also various “logging stacks” like ELK ElasticSearch, Logstash and Kibana out there that could make life easier.
But I’ll do something completely different first 😉 Up until now nobody from the outside can access any service that runs on the Kubernetes cluster. For this something called Ingress is needed. So lets continue with Kubernetes the Not So Hard Way With Ansible - Ingress with Traefik v3 and cert-manager (Part 1). In this blog post I’ll install Traefik ingress controller and cert-manager to automatically fetch and renew TLS certificates from Lets Encrypt.