Kubernetes the Not So Hard Way With Ansible - Ingress with Traefik v2 and cert-manager (Part 2) [Updated for Traefik v2.10]
Introduction
In part 1 I installed Traefik
proxy. So it’s now basically possible to expose Kubernetes
services to the Internet. But nowadays traffic should be encrypted whenever possible. And even if you don’t think you need it, think about SEO. Google ranks sites with encrypted traffic higher e.g.
So cert-manager can be installed to automatically get TLS certificates from Let’s Encrypt. The certificates can then be used by Traefik
to enable SSL for an Ingress
. cert-manager
will also take care to keep them up to date.
Installation
As with Traefik
I’ve also prepared an Ansible role to install cert-manager
. It’s available at Ansible Galaxy and can be installed via
ansible-galaxy install githubixx.cert_manager_kubernetes
or you just clone the Github repository in your roles directory:
git clone https://github.com/githubixx/ansible-role-cert-manager-kubernetes roles/githubixx.cert_manager_kubernetes
Role requirements
Like for the Traefik
role you also need Helm 3 and kubectl plus a proper configured KUBECONFIG
.
Behind the doors it uses the official Helm chart. Currently procedures like installing, updating/upgrading and deleting the cert-manager
deployment are supported.
Role configuration
So lets have a look at the available role variables:
# Helm chart version
cert_manager_chart_version: "v1.13.3"
# Helm release name
cert_manager_release_name: "cert-manager"
# Helm repository name
cert_manager_repo_name: "jetstack"
# Helm chart name
cert_manager_chart_name: "{{ cert_manager_repo_name }}/{{ cert_manager_release_name }}"
# Helm chart URL
cert_manager_chart_url: "https://charts.jetstack.io"
# Kubernetes namespace where cert-manager resources should be installed
cert_manager_namespace: "cert-manager"
# The following list contains the configurable parameters of the cert-manager
# Helm chart. For all possible values see:
# https://artifacthub.io/packages/helm/jetstack/cert-manager#configuration
# But for most users "installCRDs=true" should be sufficient.
# If true, CRD resources will be installed as part of the Helm chart.
# If enabled, when uninstalling CRD resources will be deleted causing all
# installed custom resources to be DELETED.
cert_manager_values:
- installCRDs=true
- global.leaderElection.namespace="{{ cert_manager_namespace }}"
# To install "ClusterIssuer" for Let's Encrypt (LE) "cert_manager_le_clusterissuer_options"
# needs to be defined. The variable contains a list of hashes and can be defined
# in "group_vars/all.yml" e.g.
#
# name: Defines the name of the "ClusterIssuer"
# email: Use a valid e-mail address to be alerted by LE in case a certificate
# expires
# server: Hostname part of the LE URL
# private_key_secret_ref_name: Name of the secret which stores the private key
# solvers_http01_ingress_class: Value of "kubernetes.io/ingress.class" annotation.
# Depends on your ingress controller. Common values
# are "traefik" for Traefik or "nginx" for nginx.
#
# Besides "email" the following values can be used as is and will create valid
# "ClusterIssuer" for Let's Encrypt staging and production. Only "email" needs
# to be adjusted if Traefik is used as ingress controller. For other ingress
# controllers "solvers_http01_ingress_class" needs to be adjusted too. Currently
# only "ClusterIssuer" and "http01" solver is implemented. For definition also
# see "tasks/install-issuer.yml".
#
cert_manager_le_clusterissuer_options:
- name: letsencrypt-prod
email: insert@your-e-mail-address.here
server: acme-v02
private_key_secret_ref_name: letsencrypt-account-key
solvers_http01_ingress_class: "traefik"
- name: letsencrypt-staging
email: insert@your-e-mail-address.here
server: acme-staging-v02
private_key_secret_ref_name: letsencrypt-staging-account-key
solvers_http01_ingress_class: "traefik"
First check if you want to change any of the default values in default/main.yml
. As usual those values can be overridden in host_vars
or group_vars
. Normally there is no need to change that much. Besides the cert_manager_chart_version
you might want do add a few options to cert_manager_values
. It contains the configurable parameters of the cert-manager
Helm chart. The list is submitted “as is” to helm
binary for template
, install
or upgrade
commands. In my case I’ll add the following values in host_vars/k8s-01-ansible-ctrl.i.example.com
:
cert_manager_values:
- installCRDs=true
- global.leaderElection.namespace="{{ cert_manager_namespace }}"
cert_manager_le_clusterissuer_options:
- name: letsencrypt-prod
email: <...>
server: acme-v02
private_key_secret_ref_name: letsencrypt-account-key
solvers_http01_ingress_class: "traefik"
- name: letsencrypt-staging
email: <...>
server: acme-staging-v02
private_key_secret_ref_name: letsencrypt-staging-account-key
solvers_http01_ingress_class: "traefik"
Setting installCRDs
to true
will cause CRD resources (like certificates.cert-manager.io
, certificaterequests.cert-manager.io
, and so on) to be installed as part of the Helm chart. If enabled, when uninstalling CRD resources will be deleted causing all installed custom resources to be DELETED global.leaderElection.namespace
overrides the namespace used to store the ConfigMap
for leader election. By default that’s kube-system
but I want to have everything in cert-manager
namespace.
As I want cert-manager
mainly for managing Let’s Encrypt TLS certificates, I’ll also specify a few settings in cert_manager_le_clusterissuer_options
. If you want to do the same just replace the placeholder <...>
of the email
with your mail address. When installed (see below) it’ll install two ClusterIssuers for Let’s Encrypt production and staging. During first testing you should definitely use the staging one to not run into the rate limiting of the production one. While the staging one doesn’t generate valid certificates it’s good enough to test if the whole setup works in general. The configuration above will configure the ClusterIssuer
to resolve ACME HTTP-01 challenges. As you can see the solvers_http01_ingress_class
is traefik
. So in short when cert-manager
requests a certificate from Let’s Encrypt, Let’s Encrypt will “call back” 😉 So before requesting a Certificate
make sure that the DNS entry/entries for a domain/hostname have the correct IP address set where Traefik
can “intercept” the Let’s Encrypt HTTP-01 challenge. cert-manager
will create a temporary Ingress
resource that will be created by Traefik
just for this purpose (also see HTTP-01 challenge).
My cert_manager
entry in Ansible’s hosts
file is just
cert-manager:
hosts:
k8s-01-ansible-ctrl.i.example.com:
ansible_connection: local
And in k8s.yml
I’ll add
-
hosts: cert_manager
environment:
KUBECONFIG: "/opt/scripts/ansible/k8s-01_vms/kubeconfig/admin.kubeconfig"
roles:
-
role: githubixx.cert_manager_kubernetes
tags: role-cert-manager-kubernetes
Render and verify YAML resources
The default action is to just render the Kubernetes resources YAML file after replacing all Jinja2 variables and stuff like that (that means not specifying any value via --extra-vars action=...
to ansible-playbook
).
To render just the resource manifests that WOULD be applied (nothing will be installed at this time) I execute the following command (you may set ANSIBLE_STDOUT_CALLBACK=debug
environment variable or stdout_callback = debug
in ansible.cfg
to get a pretty printed output of the rendered YAML):
ansible-playbook --tags=role-cert-manager-kubernetes k8s.yml
Install cert-manager
If the rendered output contains everything you need, the role can be installed which finally deploys cert-manager
:
ansible-playbook --tags=role-cert-manager-kubernetes --extra-vars action=install k8s.yml
To check if everything was deployed use the usual kubectl
commands like kubectl --namespace <cert_manager_namespace> get pods -o wide
. E.g.
kubectl --namespace cert-manager get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
cert-manager-6cd94546c5-jlrgh 1/1 Running 0 41s 10.0.0.160 k8s-010103 <none> <none>
cert-manager-cainjector-58f65899d7-29g9p 1/1 Running 0 41s 10.0.0.42 k8s-010103 <none> <none>
cert-manager-webhook-58fd67545d-7rq96 1/1 Running 0 41s 10.0.5.150 k8s-010303 <none> <none>
Before the playbook finishes it waits for the first cert-manager-webhooks
Pod to become ready. In general wait until all cert-manager
pods are ready before you try to get the first certificate. You can inspect the logs if there were any problems. E.g. (log output truncated):
kubectl --namespace cert-manager logs cert-manager-6cd94546c5-jlrgh
I0117 20:04:15.775236 1 controller.go:263] "cert-manager/controller/build-context: configured acme dns01 nameservers" nameservers=["10.32.0.254:53"]
W0117 20:04:15.775309 1 client_config.go:618] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
I0117 20:04:15.776391 1 controller.go:83] "cert-manager/controller: enabled controllers: [certificaterequests-approver certificaterequests-issuer-acme certificaterequests-issuer-ca certificaterequests-issuer-selfsigned certificaterequests-issuer-vault certificaterequests-issuer-venafi certificates-issuing certificates-key-manager certificates-metrics certificates-readiness certificates-request-manager certificates-revision-manager certificates-trigger challenges clusterissuers ingress-shim issuers orders]"
I0117 20:04:15.776639 1 controller.go:157] "cert-manager/controller: starting leader election"
I0117 20:04:15.776661 1 controller.go:104] "cert-manager/controller: starting metrics server" address="[::]:9402"
I0117 20:04:15.776684 1 controller.go:150] "cert-manager/controller: starting healthz server" address="[::]:9403"
I0117 20:04:15.776952 1 leaderelection.go:250] attempting to acquire leader lease cert-manager/cert-manager-controller...
I0117 20:04:15.797401 1 leaderelection.go:260] successfully acquired lease cert-manager/cert-manager-controller
...
Install ClusterIssuer
The role currently supports deploying a ClusterIssuer for Let’s Encrypt (LE) for LE staging and production. ClusterIssuers
are meant to be used throughout the whole Kubernetes cluster as the name suggests. So you can use them in every namespace. The most relevant variable in this case is cert_manager_le_clusterissuer_options
. I already configured it above. So everything is already in place.
With cert_manager_le_clusterissuer_options
variable adjusted accordingly the ClusterIssuer
can be installed:
ansible-playbook --tags=role-cert-manager-kubernetes --extra-vars action=install-issuer k8s.yml
After deploying the ClusterIssuers
the first time it takes a little bit until they are ready. To figure out if they are ready kubectl
can be used (in this case no namespace
is needed as they’re Cluster scoped):
kubectl get clusterissuer.cert-manager.io
NAME READY AGE
letsencrypt-prod True 10m
letsencrypt-staging True 11m
To get more information about the ClusterIssuer
you can use kubectl describe clusterissuer.cert-manager.io letsencrypt-prod
e.g.
Request Let’s Encrypt certificate
Before a Certificate
can be requested again make sure that the DNS entry for the domain you want to get a certificate for points to one of the Traefik
instances or to the loadbalancer IP that you might have placed “in front” of the Traefik
instances.
Now a certificate can be issued. This happens outside of this Ansible role. E.g. to get a TLS certificate for domain www.domain.name
from Let’s Encrypt staging server (this one is only for testing and doesn’t issue a valid certificate that browsers will accept), create a YAML file (e.g. domain-name.yaml) like this:
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: cert-name
namespace: namespace-name
spec:
commonName: www.domain.name
secretName: secret-name
dnsNames:
- www.domain.name
issuerRef:
name: letsencrypt-staging
kind: ClusterIssuer
issuerRef.name: letsencrypt-staging
points to Let’s Encrypt staging API. Before switching to production API letsencrypt-prod
make sure that staging works fine. The production API has some rate limiting. So if you experiment to much with this issuer Let’s Encrypt might block you for a while. So after changing the values to your needs, apply this file with kubectl apply -f domain-name.yaml
.
If you request a (Cluster)Issuer
or a Certificate
you can watch cert-manager
logs to see what’s going on e.g. (in case you use a different namespace for cert-manager
change the namespace accordingly):
kubectl -n cert-manager logs --tail=5 -f $(kubectl -n cert-manager get pods -l app=cert-manager --output=jsonpath='{.items..metadata.name}')
To get information about a Certificate
this command can be used:
kubectl -n your-namespace get certificate cert-name -o json
Esp. watch out if the Certificate
is ready e.g.:
kubectl -n your-namespace get certificate your-certificate -o json | jq '.status.conditions'
[
{
"lastTransitionTime": "2021-01-03T22:05:59Z",
"message": "Certificate is up to date and has not expired",
"reason": "Ready",
"status": "True",
"type": "Ready"
}
]
For more information also see the README of the role.
Configure IngressRoute
Now that the (staging) certificate is in place I finally can create an IngressRoute. IngressRoute
is a Traefik specific custom implementation of Ingress. The IngressRoute
will use the certificate which is stored as a Kubernetes secret that cert-manager
fetched from Let’s Encrypt. E.g.:
---
apiVersion: traefik.containo.us/v1alpha1
kind: IngressRoute
metadata:
name: www-domain-name
namespace: namespace-name
spec:
entryPoints:
- web
- websecure
routes:
- kind: Rule
match: Host(`www.domain.name`)
services:
- kind: Service
name: service-name
namespace: namespace-name
passHostHeader: true
port: 80
tls:
secretName: cert-name
This manifest specifies an IngressRoute
called www-domain-name
in namespace namespace-name
. It’s bound to the web
and websecure
Traefik
entrypoints. There is also a Rule
. It’ll trigger if the incoming request wants to fetch a page from www.domain.name
and will forward the request to a Service
called service-name
in namespace namespace-name
. Finally a secretName
called cert-name
is specified. That’s the reference to the Certificate
which was created above.
If you save the manifest now to ingressroute.yml
e.g., it can be applied: kubectl apply -f ingressroute.yml
. This will create the resource and you should see the IngressRoute
in the Traefik
dashboard (how to access it see previous blog post).
That’s it basically! 😄
What’s next
You probably already figured out that the whole setup is ok so far but not perfect. If you point your website DNS record to one of the Traefik instances (which basically means to one of the Traefik
DaemonSet
members) and the host dies you’re out of business for a while. Also if you use DNS round robin and distribute the requests to all Traefik nodes you still have the problem if one node fails you loose at least the requests to this nodes. One solution to this problem could be a loadbalancer as already mentioned further above e.g. Achieving High Availability with HAProxy and Keepalived: Building a Redundant Load Balancer. Of course a hardware loadbalancer is just fine too.
One option to solve the problem is MetalLB. Also see Configuring HA Kubernetes cluster on bare metal servers with GlusterFS & MetalLB. But with Cilium
in place you actually already have a solution 😉 See Migrating from MetaLB to Cilium and How I moved from MetalLB to Cilium.
If you use Hetzner cloud hcloud-fip-controller is a possible option that might be sufficient for maybe quite a few use cases. hcloud-fip-controller
is a small controller, to handle floating IP management in a Kubernetes cluster on Hetzner cloud virtual machines. But it seems to be not maintained anymore.
There is also kube-vip. The kube-vip project provides High-Availability and load-balancing for both inside and outside a Kubernetes cluster.
Next up: Kubernetes the Not So Hard Way With Ansible - Upgrading Kubernetes