Kubernetes the Not So Hard Way With Ansible - Ingress with Traefik v2 and cert-manager (Part 2) [Updated for Traefik v2.10]

In part 1 I installed Traefik proxy. So it’s now basically possible to expose Kubernetes services to the Internet. But nowadays traffic should be encrypted whenever possible. And even if you don’t think you need it, think about SEO. Google ranks sites with encrypted traffic higher e.g.

So cert-manager can be installed to automatically get TLS certificates from Let’s Encrypt. The certificates can then be used by Traefik to enable SSL for an Ingress. cert-manager will also take care to keep them up to date.

As with Traefik I’ve also prepared an Ansible role to install cert-manager. It’s available at Ansible Galaxy and can be installed via

bash

ansible-galaxy install githubixx.cert_manager_kubernetes

or you just clone the Github repository in your roles directory:

bash

git clone https://github.com/githubixx/ansible-role-cert-manager-kubernetes roles/githubixx.cert_manager_kubernetes

Like for the Traefik role you also need Helm 3 and kubectl plus a proper configured KUBECONFIG.

Behind the doors it uses the official Helm chart. Currently procedures like installing, updating/upgrading and deleting the cert-manager deployment are supported.

So lets have a look at the available role variables:

yaml

# Helm chart version
cert_manager_chart_version: "v1.13.3"

# Helm release name
cert_manager_release_name: "cert-manager"

# Helm repository name
cert_manager_repo_name: "jetstack"

# Helm chart name
cert_manager_chart_name: "{{ cert_manager_repo_name }}/{{ cert_manager_release_name }}"

# Helm chart URL
cert_manager_chart_url: "https://charts.jetstack.io"

# Kubernetes namespace where cert-manager resources should be installed
cert_manager_namespace: "cert-manager"

# The following list contains the configurable parameters of the cert-manager
# Helm chart. For all possible values see:
# https://artifacthub.io/packages/helm/jetstack/cert-manager#configuration
# But for most users "installCRDs=true" should be sufficient.
# If true, CRD resources will be installed as part of the Helm chart.
# If enabled, when uninstalling CRD resources will be deleted causing all
# installed custom resources to be DELETED.
cert_manager_values:
  - installCRDs=true
  - global.leaderElection.namespace="{{ cert_manager_namespace }}"

# To install "ClusterIssuer" for Let's Encrypt (LE) "cert_manager_le_clusterissuer_options"
# needs to be defined. The variable contains a list of hashes and can be defined
# in "group_vars/all.yml" e.g.
#
# name:   Defines the name of the "ClusterIssuer"
# email:  Use a valid e-mail address to be alerted by LE in case a certificate
#         expires
# server: Hostname part of the LE URL
# private_key_secret_ref_name:  Name of the secret which stores the private key
# solvers_http01_ingress_class: Value of "kubernetes.io/ingress.class" annotation.
#                               Depends on your ingress controller. Common values
#                               are "traefik" for Traefik or "nginx" for nginx.
#
# Besides "email" the following values can be used as is and will create valid
# "ClusterIssuer" for Let's Encrypt staging and production. Only "email" needs
# to be adjusted if Traefik is used as ingress controller. For other ingress
# controllers "solvers_http01_ingress_class" needs to be adjusted too. Currently
# only "ClusterIssuer" and "http01" solver is implemented. For definition also
# see "tasks/install-issuer.yml".
#
cert_manager_le_clusterissuer_options:
  - name: letsencrypt-prod
    email: insert@your-e-mail-address.here
    server: acme-v02
    private_key_secret_ref_name: letsencrypt-account-key
    solvers_http01_ingress_class: "traefik"
  - name: letsencrypt-staging
    email: insert@your-e-mail-address.here
    server: acme-staging-v02
    private_key_secret_ref_name: letsencrypt-staging-account-key
    solvers_http01_ingress_class: "traefik"

First check if you want to change any of the default values in default/main.yml. As usual those values can be overridden in host_vars or group_vars. Normally there is no need to change that much. Besides the cert_manager_chart_version you might want do add a few options to cert_manager_values. It contains the configurable parameters of the cert-manager Helm chart. The list is submitted “as is” to helm binary for template, install or upgrade commands. In my case I’ll add the following values in host_vars/k8s-01-ansible-ctrl.i.example.com:

yaml

cert_manager_values:
  - installCRDs=true
  - global.leaderElection.namespace="{{ cert_manager_namespace }}"

cert_manager_le_clusterissuer_options:
  - name: letsencrypt-prod
    email: <...>
    server: acme-v02
    private_key_secret_ref_name: letsencrypt-account-key
    solvers_http01_ingress_class: "traefik"
  - name: letsencrypt-staging
    email: <...>
    server: acme-staging-v02
    private_key_secret_ref_name: letsencrypt-staging-account-key
    solvers_http01_ingress_class: "traefik"

Setting installCRDs to true will cause CRD resources (like certificates.cert-manager.io, certificaterequests.cert-manager.io, and so on) to be installed as part of the Helm chart. If enabled, when uninstalling CRD resources will be deleted causing all installed custom resources to be DELETED global.leaderElection.namespace overrides the namespace used to store the ConfigMap for leader election. By default that’s kube-system but I want to have everything in cert-manager namespace.

As I want cert-manager mainly for managing Let’s Encrypt TLS certificates, I’ll also specify a few settings in cert_manager_le_clusterissuer_options. If you want to do the same just replace the placeholder <...> of the email with your mail address. When installed (see below) it’ll install two ClusterIssuers for Let’s Encrypt production and staging. During first testing you should definitely use the staging one to not run into the rate limiting of the production one. While the staging one doesn’t generate valid certificates it’s good enough to test if the whole setup works in general. The configuration above will configure the ClusterIssuer to resolve ACME HTTP-01 challenges. As you can see the solvers_http01_ingress_class is traefik. So in short when cert-manager requests a certificate from Let’s Encrypt, Let’s Encrypt will “call back” 😉 So before requesting a Certificate make sure that the DNS entry/entries for a domain/hostname have the correct IP address set where Traefik can “intercept” the Let’s Encrypt HTTP-01 challenge. cert-manager will create a temporary Ingress resource that will be created by Traefik just for this purpose (also see HTTP-01 challenge).

My cert_manager entry in Ansible’s hosts file is just

yaml

cert-manager:
  hosts:
    k8s-01-ansible-ctrl.i.example.com:
      ansible_connection: local

And in k8s.yml I’ll add

yaml

-
  hosts: cert_manager
  environment:
    KUBECONFIG: "/opt/scripts/ansible/k8s-01_vms/kubeconfig/admin.kubeconfig"
  roles:
    -
      role: githubixx.cert_manager_kubernetes
      tags: role-cert-manager-kubernetes

The default action is to just render the Kubernetes resources YAML file after replacing all Jinja2 variables and stuff like that (that means not specifying any value via --extra-vars action=... to ansible-playbook).

To render just the resource manifests that WOULD be applied (nothing will be installed at this time) I execute the following command (you may set ANSIBLE_STDOUT_CALLBACK=debug environment variable or stdout_callback = debug in ansible.cfg to get a pretty printed output of the rendered YAML):

bash

ansible-playbook --tags=role-cert-manager-kubernetes k8s.yml

If the rendered output contains everything you need, the role can be installed which finally deploys cert-manager:

bash

ansible-playbook --tags=role-cert-manager-kubernetes --extra-vars action=install k8s.yml

To check if everything was deployed use the usual kubectl commands like kubectl --namespace <cert_manager_namespace> get pods -o wide. E.g.

bash

kubectl --namespace cert-manager get pods -o wide

NAME                                       READY   STATUS    RESTARTS   AGE   IP           NODE         NOMINATED NODE   READINESS GATES
cert-manager-6cd94546c5-jlrgh              1/1     Running   0          41s   10.0.0.160   k8s-010103   <none>           <none>
cert-manager-cainjector-58f65899d7-29g9p   1/1     Running   0          41s   10.0.0.42    k8s-010103   <none>           <none>
cert-manager-webhook-58fd67545d-7rq96      1/1     Running   0          41s   10.0.5.150   k8s-010303   <none>           <none>

Before the playbook finishes it waits for the first cert-manager-webhooks Pod to become ready. In general wait until all cert-manager pods are ready before you try to get the first certificate. You can inspect the logs if there were any problems. E.g. (log output truncated):

bash

kubectl --namespace cert-manager logs cert-manager-6cd94546c5-jlrgh

I0117 20:04:15.775236       1 controller.go:263] "cert-manager/controller/build-context: configured acme dns01 nameservers" nameservers=["10.32.0.254:53"]
W0117 20:04:15.775309       1 client_config.go:618] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0117 20:04:15.776391       1 controller.go:83] "cert-manager/controller: enabled controllers: [certificaterequests-approver certificaterequests-issuer-acme certificaterequests-issuer-ca certificaterequests-issuer-selfsigned certificaterequests-issuer-vault certificaterequests-issuer-venafi certificates-issuing certificates-key-manager certificates-metrics certificates-readiness certificates-request-manager certificates-revision-manager certificates-trigger challenges clusterissuers ingress-shim issuers orders]"
I0117 20:04:15.776639       1 controller.go:157] "cert-manager/controller: starting leader election"
I0117 20:04:15.776661       1 controller.go:104] "cert-manager/controller: starting metrics server" address="[::]:9402"
I0117 20:04:15.776684       1 controller.go:150] "cert-manager/controller: starting healthz server" address="[::]:9403"
I0117 20:04:15.776952       1 leaderelection.go:250] attempting to acquire leader lease cert-manager/cert-manager-controller...
I0117 20:04:15.797401       1 leaderelection.go:260] successfully acquired lease cert-manager/cert-manager-controller
...

The role currently supports deploying a ClusterIssuer for Let’s Encrypt (LE) for LE staging and production. ClusterIssuers are meant to be used throughout the whole Kubernetes cluster as the name suggests. So you can use them in every namespace. The most relevant variable in this case is cert_manager_le_clusterissuer_options. I already configured it above. So everything is already in place.

With cert_manager_le_clusterissuer_options variable adjusted accordingly the ClusterIssuer can be installed:

bash

ansible-playbook --tags=role-cert-manager-kubernetes --extra-vars action=install-issuer k8s.yml

After deploying the ClusterIssuers the first time it takes a little bit until they are ready. To figure out if they are ready kubectl can be used (in this case no namespace is needed as they’re Cluster scoped):

bash

kubectl get clusterissuer.cert-manager.io

NAME                  READY   AGE
letsencrypt-prod      True    10m
letsencrypt-staging   True    11m

To get more information about the ClusterIssuer you can use kubectl describe clusterissuer.cert-manager.io letsencrypt-prod e.g.

Before a Certificate can be requested again make sure that the DNS entry for the domain you want to get a certificate for points to one of the Traefik instances or to the loadbalancer IP that you might have placed “in front” of the Traefik instances.

Now a certificate can be issued. This happens outside of this Ansible role. E.g. to get a TLS certificate for domain www.domain.name from Let’s Encrypt staging server (this one is only for testing and doesn’t issue a valid certificate that browsers will accept), create a YAML file (e.g. domain-name.yaml) like this:

yaml

---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: cert-name
  namespace: namespace-name
spec:
  commonName: www.domain.name
  secretName: secret-name
  dnsNames:
    - www.domain.name
  issuerRef:
    name: letsencrypt-staging
    kind: ClusterIssuer

issuerRef.name: letsencrypt-staging points to Let’s Encrypt staging API. Before switching to production API letsencrypt-prod make sure that staging works fine. The production API has some rate limiting. So if you experiment to much with this issuer Let’s Encrypt might block you for a while. So after changing the values to your needs, apply this file with kubectl apply -f domain-name.yaml.

If you request a (Cluster)Issuer or a Certificate you can watch cert-manager logs to see what’s going on e.g. (in case you use a different namespace for cert-manager change the namespace accordingly):

bash

kubectl -n cert-manager logs --tail=5 -f $(kubectl -n cert-manager get pods -l app=cert-manager --output=jsonpath='{.items..metadata.name}')

To get information about a Certificate this command can be used:

bash

kubectl -n your-namespace get certificate cert-name -o json

Esp. watch out if the Certificate is ready e.g.:

bash

kubectl -n your-namespace get certificate your-certificate -o json | jq '.status.conditions'

[
  {
    "lastTransitionTime": "2021-01-03T22:05:59Z",
    "message": "Certificate is up to date and has not expired",
    "reason": "Ready",
    "status": "True",
    "type": "Ready"
  }
]

For more information also see the README of the role.

Now that the (staging) certificate is in place I finally can create an IngressRoute. IngressRoute is a Traefik specific custom implementation of Ingress. The IngressRoute will use the certificate which is stored as a Kubernetes secret that cert-manager fetched from Let’s Encrypt. E.g.:

yaml

---
apiVersion: traefik.containo.us/v1alpha1
kind: IngressRoute
metadata:
  name: www-domain-name
  namespace: namespace-name
spec:
  entryPoints:
    - web
    - websecure
  routes:
    - kind: Rule
      match: Host(`www.domain.name`)
      services:
        - kind: Service
          name: service-name
          namespace: namespace-name
          passHostHeader: true
          port: 80
  tls:
    secretName: cert-name

This manifest specifies an IngressRoute called www-domain-name in namespace namespace-name. It’s bound to the web and websecure Traefik entrypoints. There is also a Rule. It’ll trigger if the incoming request wants to fetch a page from www.domain.name and will forward the request to a Service called service-name in namespace namespace-name. Finally a secretName called cert-name is specified. That’s the reference to the Certificate which was created above.

If you save the manifest now to ingressroute.yml e.g., it can be applied: kubectl apply -f ingressroute.yml. This will create the resource and you should see the IngressRoute in the Traefik dashboard (how to access it see previous blog post).

That’s it basically! 😄

You probably already figured out that the whole setup is ok so far but not perfect. If you point your website DNS record to one of the Traefik instances (which basically means to one of the Traefik DaemonSet members) and the host dies you’re out of business for a while. Also if you use DNS round robin and distribute the requests to all Traefik nodes you still have the problem if one node fails you loose at least the requests to this nodes. One solution to this problem could be a loadbalancer as already mentioned further above e.g. Achieving High Availability with HAProxy and Keepalived: Building a Redundant Load Balancer. Of course a hardware loadbalancer is just fine too.

One option to solve the problem is MetalLB. Also see Configuring HA Kubernetes cluster on bare metal servers with GlusterFS & MetalLB. But with Cilium in place you actually already have a solution 😉 See Migrating from MetaLB to Cilium and How I moved from MetalLB to Cilium.

If you use Hetzner cloud hcloud-fip-controller is a possible option that might be sufficient for maybe quite a few use cases. hcloud-fip-controller is a small controller, to handle floating IP management in a Kubernetes cluster on Hetzner cloud virtual machines. But it seems to be not maintained anymore.

There is also kube-vip. The kube-vip project provides High-Availability and load-balancing for both inside and outside a Kubernetes cluster.

Next up: Kubernetes the Not So Hard Way With Ansible - Upgrading Kubernetes