Kubernetes the not so hard way with Ansible - Certificate authority (CA) - (K8s v1.28)
Introduction
This post is based on Kelsey Hightower’s Kubernetes The Hard Way - Installing the Client Tools and Kubernetes The Hard Way - Provisioning a CA and Generating TLS Certificates.
Now that I’ve done some preparation for our Kubernetes cluster
I need a PKI (public key infrastructure) to secure the communication between the Kubernetes components.
Install kubectl
I’ll use CloudFlare’s CFSSL PKI toolkit to bootstrap certificate authority’s and generate TLS certificates. ansible-role-cfssl will generate a few files for that purpose. You can generate the files on any host you want but I’ll use a directory on my workstation that runs Ansible because other roles need to copy a few of the generated files to the Kubernetes hosts later. So it makes sense to have the files at a place where Ansible has access (but of course you can also use a network share or something like that).
First I install the most important Kubernetes utility called kubectl
. I’ll configure it later. At the moment I just install it. I’ve created a Ansible role to install kubectl
locally. Add the following content to Ansible’s host
file:
k8s_kubectl:
hosts:
k8s-01-ansible-ctrl.i.example.com:
ansible_connection: local
k8s-01-ansible-ctrl
is the hostname of my local workstation/laptop. Actually if ansible_connection: local
or ansible_host
is specified then the hostname doesn’t really matter. You can call it even bob
or sam
😉. But of course Ansible uses this name interally and it’s also relevant how the host variables file in host_vars
is called. So in my case an Ansible host_vars
file look like this and has only one entry (host_vars/k8s-01-ansible-ctrl.i.example.com
):
---
ansible_python_interpreter: "/opt/scripts/ansible/k8s-01_vms/bin/python"
This makes sure that the python
binary of my Python venv
environment will be used when tasks are executed on this host.
As already mentioned in the previous part my workstation could be part of the WireGuard
fully meshed network that connects every Kubernetes node to all the other nodes. So I’d be able to access the Kubernetes API server (kube-apiserver
) via VPN and don’t need SSH forwarding or make kube-apiserver
available to my network or stuff like that to make kubectl
work. But I decided not to do so and make kube-apiserver
available to my internal network by binding the service to all network interfaces. The connection to kube-apiserver
is encrypted anyways via TLS. Additionally firewall rules can be applied so that only some hosts are allowed to connect to kube-apiserver
.
Then install the role with
ansible-galaxy install githubixx.kubectl
The role has a few variables you can change if you like (normally not needed). Just add the variables and values you want to change to host_vars/k8s-01-ansible-ctrl.i.example.com
(if that is the name of your workstation) or where it fit’s best for you. To get an overview see the kubectl role homepage at Github.
To finally deploy kubectl
binary simply run
ansible-playbook --tags=role-kubectl k8s.yml
Setup cfssl
Next we add a additional entry to the Ansible hosts
file:
k8s_ca:
hosts:
k8s-01-ansible-ctrl.i.example.com:
ansible_connection: local
k8s_ca
(short for kubernetes certificate authority) is an Ansible host group (in this case the group contains only one host). As you can see it’s again my workstation/laptop. It will store all certificate authority files.
Lets install the cfssl
role via
ansible-galaxy install githubixx.cfssl
Add
- hosts: k8s_ca
roles:
-
role: githubixx.cfssl
tags: role-cfssl
to your k8s.yml
file. This adds the role githubixx.cfssl
to the hosts group k8s_ca
(which is only one host in my case as already mentioned). Have a look at README file of that role for all variables you can change.
Now we can install the cfssl
binaries via
ansible-playbook --tags=role-cfssl k8s.yml
Setup certificate authorities
Next I generate the certificate authorities (CA) for etcd
and Kubernetes to secure the communication between the services. DigitalOcean provides a good diagram of the Kubernetes operations flow: (from Using Vault as a Certificate Authority for Kubernetes). Have a look at the diagram to get a better understanding of the K8s communication workflow.
As always I’ve prepared a Ansible role to generate the certificate authorities and certificates. Install the role via
ansible-galaxy install githubixx.kubernetes_ca
Add the role to k8s.yml
:
- hosts: k8s_ca
roles:
-
role: githubixx.kubernetes_ca
tags: role-kubernetes-ca
As with the cfssl
role this role will also be applied to the Ansible k8s_ca
host (which is again my workstation/laptop as you may remember from above).
This role has quite a few variables. But that’s mainly information needed for the certificates like algorithm (algo) and key size used, country (C), location (L), organization (O), organizational unit (OU) or state (ST). You can read more about for how and for what the certificates are used in How certificates are used by your cluster.
In contrast to Kelsey’s Hightower’s guide Provisioning a CA and Generating TLS Certificates I create a different certificate authority for etcd
and Kubernetes API server (kube-apiserver
). Since only Kubernetes API server talks to etcd
directly it makes sense not to use the same CA for etcd
and Kubernetes API server to sign certificates. This adds an additional layer of security. All variables are documented at the kubernetes-ca role homepage at Github. So I’ll only discuss the important parts here.
I’ll put all variables for this role into group_vars/k8s_ca.yml
as most of them are just used by the Ansible Controller node. There are very few exceptions and I’ll mention them accordingly in the following text.
k8s_ca_conf_directory
specifies where to store the certificates. I created a directory certificates
in my Python venv
. So the value for this variable is /opt/scripts/ansible/k8s-01_vms/certificates
in my case. With k8s_ca_conf_directory_perm
, k8s_ca_file_perm
, k8s_ca_certificate_owner
and k8s_ca_certificate_group
you can specify who owns that directory and what permissions the directory and the files should have. Since k8s_ca_conf_directory
is used by a few roles that target different hosts I’ll put this variable into group_vars/all.yml
.
Some certificates need to include the IP addresses and the host names (also see All certificates). So this role needs to know what’s the Ansible hosts group for the Kubernetes Controller (k8s_ca_controller_nodes_group
), the hosts group for Kubernetes Worker (k8s_ca_worker_nodes_group
) and for the etcd
hosts (k8s_ca_etcd_nodes_group
). By default the values are k8s_controller
, k8s_worker
and k8s_etcd
and that’s the hosts groups I already used before. It also needs to know the interface name specified by k8s_interface
. In my case it the WireGuard
interface wg0
. This way the role can figure out the IP addresses and host names involved and can include this information in the certificates where necessary. As with k8s_ca_conf_directory
I’ll also put k8s_interface
into group_vars/all.yml
.
ca_etcd_expiry
and ca_k8s_apiserver_expiry
are set to 87600h
by default. That’s ten years. That the time after which the root certificate authorities of etcd
and kube-apiserver
will expire.
In general all these (ca|etcd|k8s)_*_csr_names_(c|l|ou|st)
settings are mostly for informational purposes. So for all *_csr_names_c
variables enter your country code e.g. US
, DE
, or whatever. Accordingly for *_csr_names_l
enter your location e.g. New York
, Berlin
, and so on. For *_csr_names_ou
enter an organizational unit like Engineering Department
and for *_csr_names_st
enter a state like California
, Bayern
, and so on. The default value for all *_csr_key_algo
variables is rsa
and for *_csr_key_size
it’s 2048
. These values should be just fine (also see Manage TLS Certificates in a Cluster). An alternative value for all *_csr_key_algo
variables would be ecdsa
. ecdsa
(Elliptic Curve Digital Signature Algorithm) creates smaller files and it makes use of something called “Elliptic Curve Cryptography” (ECC). ecdsa
is already around since quite a few years (maybe 15+ years). rsa
is way older. So as long as you don’t use some quite dated software that needs to connect to kube-apiserver
it should be fine to go with ecdsa
. In this case the values for *_csr_key_size
can be 256
, 384
and 512
. I’m using ecdsa
and a key size of 384
for my certificates. For rsa
the key size can be 2048
up to 8192
(you normally increase by a multiple of 1024
so 4096
should be also a valid value).
Lets talk about the values that shouldn’t be changed. First is k8s_admin_csr_names_o: "system:masters"
. The role will create a certificate for the “admin” user. That’s basically the very first and most powerful certificate/user. Note: There is actually nothing like a Kubernetes user. It’s is just certificates that identifies you when you connect to kube-apiserver
. But normally just because you have a certificate doesn’t mean that you can to much 😉 That’s different for the “admin” certificate as it specifies that you’re in the system:masters
group and that basically means superpowers. So it makes sense to keep this certificate in a secure place and create a new certificate/user right after the Kubernetes cluster is setup. For more information see User-facing roles.
Then there is k8s_worker_csr_names_o: "system:nodes"
. kubelet
runs on every worker node. These processes needs to be in the system:nodes
group. They’ll also get a username system:node:<nodeName>
. For more information see Using Node Authorization.
Next there is k8s_kube_proxy_csr_cn: "system:kube-proxy"
, k8s_kube_proxy_csr_names_o: "system:node-proxier"
, k8s_scheduler_csr_cn: "system:kube-scheduler"
, k8s_scheduler_csr_names_o: "system:kube-scheduler"
, k8s_controller_manager_csr_cn: "system:kube-controller-manager"
and k8s_controller_manager_csr_names_o: "system:kube-controller-manager"
. There we’ve a “common name” (cn
) and an organization (o
). The cn
specifies the Default ClusterRole
and o
the Default ClusterRoleBinding
. cn
is basically the “who” and o
“what” is allowed. Kubernetes has some default ClusterRole
s and ClusterRoleBinding
s. This way the services identifies them accordingly and have the permissions they need to do their job. For more information see Core Components.
Note: Short version: It’s very important that the value of ca_k8s_apiserver_csr_cn
and k8s_apiserver_csr_cn
are different! Otherwise Python’s urllib3
won’t connect to kube-apiserver
without using insecure-skip-tls-verify: true
in kubeconfig.
Long version: Since this urllib3
is also used by the requests
library (and most probably a lot others) for HTTP(s) connections that means that for example Python’s kubernetes
library and therefore Ansible’s k8s_*
s modules wont work without issues. While Python <= v3.8 worked fine this is no longer true with Python >= 3.9. For HTTPs connections Python uses OpenSSL which issues an error if a self-signed certificate doesn’t contain the X509v3 Authority Key Identifier
in the X509v3 extensions
. Trying to connect to kube-apiserver
with Ansible’s k8s_*
modules will show an error like this (e.g. when trying to retreive information about K8s namespaces):
Max retries exceeded with url: /api/v1/namespaces (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate (_ssl.c:1006)')))
cfssl
is used for creating the certificates in githubixx.kubernetes_ca
role. It’s written in Go. And the README states: As of Go 1.7, self-signed certificates will not include the AKI.
(AKI
= Authority Key Identifier
). But as I figured out here it is possible to get the AKI
included in self-signed certificates if the issuer and subject common name are different. That makes OpenSSL used by Python >= 3.9 happy again 😉 I wanted to write that down as it really took me hours to figure out. At the very end of this blog post I’ll also show a command to verify the certificates. But lets continue with the more important stuff…
etcd_cert_hosts
contains a list of additional IP addresses or host names you’d like to include into the etcd
certificate. As mentioned above already the certificate wont only include the values specified here but also also some automatically collected IP addresses and hostnames. If you intend to install a loadbalancer for the etcd
services running then you should also include the IP address and the DNS name of that loadbalancer. Otherwise you most probably will get certificate errors. Note: Currently the role doesn’t include the IPv6 addresses. So you might need to include them manually.
The same is basically true for k8s_apiserver_cert_hosts
. Later I’ll install haproxy loadbalancer on all nodes. For all services that needs to connect to kube-apiserver
like kubelet
, kube-scheduler
, and so on I’ll configure the loadbalancer as target for kube-apiserver
. So in case one kube-apiserver
goes down for maintenance e.g. the loadbalancer can switch to one of the remaining two kube-apiserver
. In my case haproxy
will listen on localhost
on every host. As you can see it’s already included in the default list. But if you’ve some hardware loadbalancer that handles your load balancing needs then include that IP and DNS name too. Also if you use Kubernetes as OIDC provider you might also want include the IP and/or hostname you specify in the --service-account-issuer
flag for kube-apiserver
later (e.g. api.k8s-01.example.com
) Note: Maybe you’re wondering what this IP 10.32.0.1
is all about. Actually it’s the first IP address of the IP range specified in the --service-cluster-ip-range
option for kube-apiserver
(happens in one of the next blog posts). The default service cluster IP range is 10.32.0.0/16
(see kubernetes-controller role). E.g. if the Kubernetes cluster was successfully deployed and one executes ping kubernetes
or ping kubernetes.default.svc.cluster.local
(if you keep the default cluster.local
domain) within a Pod the commands will return 10.32.0.1
. So that’s basically the “internal” IP address of the kube-apiserver
s. Actually it’s a Service IP which is a load balancer managed by Kubernetes at the end. That’s why it is important to have this IP in the list and also the other kubernetes.default
variants.
And finally there is etcd_additional_clients
. This list should contain all etcd clients that wants to connect to the etcd
cluster. The most important client is kube-apiserver
of course. So you definitely want to keep k8s-apiserver-etcd
in this list. But I’ll also generate certificates for Cilium
and Traefik
which I’ll install later (it’s documented in the roles README). Cilium will be my solution for Kubernetes networking and Traefik for everything Ingress related (allows external users to access your services running in the Kubernetes cluster). While you can also install an etcd
cluster for each of these services it’s additional effort of course. And since I already have one around it makes sense to use it 😉 Securitywise this might be an issue for some environments as one needs to allow connections from Kubernetes worker nodes to etcd
cluster. So you’ve to decide if this is acceptable or not. So in my case it’ll look like this:
etcd_additional_clients:
- k8s-apiserver-etcd
- traefik
- cilium
If you’re done with setting all variables the CSRs and the certificates can be generated via
ansible-playbook --tags=role-kubernetes-ca k8s.yml
This only runs the Ansible kubernetes_ca
role which was tagged as role-kubernetes-ca
. After running the role there will quite a few files in k8s_ca_conf_directory
. The filenames should give a good hint whats the content of these files and for what a file is used (also see the defaults/main.yml file of the role for more information). Here is an overview which files you should at least get:
ca-etcd-config.json
ca-etcd.csr
ca-etcd-csr.json
ca-etcd-key.pem
ca-etcd.pem
ca-k8s-apiserver-config.json
ca-k8s-apiserver.csr
ca-k8s-apiserver-csr.json
ca-k8s-apiserver-key.pem
ca-k8s-apiserver.pem
cert-admin.csr
cert-admin-csr.json
cert-admin-key.pem
cert-admin.pem
cert-cilium.csr
cert-cilium-csr.json
cert-cilium-key.pem
cert-cilium.pem
cert-etcd-peer.csr
cert-etcd-peer-csr.json
cert-etcd-peer-key.pem
cert-etcd-peer.pem
cert-etcd-server.csr
cert-etcd-server-csr.json
cert-etcd-server-key.pem
cert-etcd-server.pem
cert-k8s-010102.i.example.com.csr
cert-k8s-010102.i.example.com-csr.json
cert-k8s-010102.i.example.com-key.pem
cert-k8s-010102.i.example.com.pem
cert-k8s-010103.i.example.com.csr
cert-k8s-010103.i.example.com-csr.json
cert-k8s-010103.i.example.com-key.pem
cert-k8s-010103.i.example.com.pem
cert-k8s-010202.i.example.com.csr
cert-k8s-010202.i.example.com-csr.json
cert-k8s-010202.i.example.com-key.pem
cert-k8s-010202.i.example.com.pem
cert-k8s-010203.i.example.com.csr
cert-k8s-010203.i.example.com-csr.json
cert-k8s-010203.i.example.com-key.pem
cert-k8s-010203.i.example.com.pem
cert-k8s-010302.i.example.com.csr
cert-k8s-010302.i.example.com-csr.json
cert-k8s-010302.i.example.com-key.pem
cert-k8s-010302.i.example.com.pem
cert-k8s-010303.i.example.com.csr
cert-k8s-010303.i.example.com-csr.json
cert-k8s-010303.i.example.com-key.pem
cert-k8s-010303.i.example.com.pem
cert-k8s-apiserver.csr
cert-k8s-apiserver-csr.json
cert-k8s-apiserver-etcd.csr
cert-k8s-apiserver-etcd-csr.json
cert-k8s-apiserver-etcd-key.pem
cert-k8s-apiserver-etcd.pem
cert-k8s-apiserver-key.pem
cert-k8s-apiserver.pem
cert-k8s-controller-manager.csr
cert-k8s-controller-manager-csr.json
cert-k8s-controller-manager-key.pem
cert-k8s-controller-manager.pem
cert-k8s-controller-manager-sa.csr
cert-k8s-controller-manager-sa-csr.json
cert-k8s-controller-manager-sa-key.pem
cert-k8s-controller-manager-sa.pem
cert-k8s-proxy.csr
cert-k8s-proxy-csr.json
cert-k8s-proxy-key.pem
cert-k8s-proxy.pem
cert-k8s-scheduler.csr
cert-k8s-scheduler-csr.json
cert-k8s-scheduler-key.pem
cert-k8s-scheduler.pem
cert-traefik.csr
cert-traefik-csr.json
cert-traefik-key.pem
For the curious ones 😉 : If you would like to know what’s in the certificate files (the .pem
files) you can use this command:
openssl x509 -noout -text -in cert-k8s-apiserver.pem
This will show you the “content” of the file in plain text. E.g. it’ll show you (in case of cert-k8s-apiserver.pem
) the X509v3 Subject Alternative Name
. It contains a list of all IP addresses and host names that were included in this certificate. Esp. for cert-k8s-apiserver.pem
it is important to have all IP addresses and host names listed that you want to use later to connect to that service! For other .pem
files it looks different. There the Subject
key is the more relevant one. This also offers the possibility to check if everything needed is included in the certificate files before deploying them.
And as promised above already here is a command that helps you to verify if OpenSSL is fine with your certificates. It checks if the Certificate Authority (CA) and the certificate (normally the .pem
) files matches. E.g.:
openssl verify -verbose -x509_strict -CAfile ca-k8s-apiserver.pem cert-k8s-apiserver.pem
As mentioned above if ca_k8s_apiserver_csr_cn
and k8s_apiserver_csr_cn
values are the same you’ll get this error:
C=DE, ST=Bayern, L=The_Internet, O=Kubernetes, OU=BY, CN=kubernetes
error 18 at 0 depth lookup: self-signed certificate
error cert-k8s-apiserver.pem: verification failed
While kubectl
utility is fine with that, Python >= 3.9, OpenSSL and urllib3
are not 😉 But this can be fixed as mentioned above. Then just deploy the role again and then it should be fine - hopefully.
That’s it for now. In the next chapter I’ll install the etcd
cluster and use the first CA and certificates that were generated in this part.