Kubernetes the Not So Hard Way With Ansible - Ingress with Traefik v2 and cert-manager (Part 1) [Updated for Traefik v2.10]

This is an updated version of my older blog post which used Traefik v1.7 for Ingress and also for managing Let’s Encrypt TLS certificates. This new blog post uses Traefik v2.x and cert-manager for managing Let’s Encrypt TLS certificates.

If you followed my blog series Kubernetes the Not So Hard Way With Ansible so far, the Kubernetes installation can only handle internal requests. But most people want to make their services public available (public in sense of public to the Internet or for an internal network). To make this work Kubernetes Ingress is one of the options. More information is provided in the Kubernetes Ingress Resources documentation.

In short: Typically, Services and Pods have IPs only routeable by the cluster network. All traffic that ends up at an edge router is either dropped or forwarded elsewhere. An Ingress is a collection of rules that allow inbound connections to reach the cluster services. It can be configured to give services externally-reachable URLs, load balance traffic, terminate SSL, offer name based virtual hosting, and more. Users request ingress by POSTing the Ingress resource to the Kubernetes API server. An Ingress controller is responsible for fulfilling the Ingress, usually with a loadbalancer, though it may also configure your edge router or additional frontends to help handle the traffic in an HA manner.

Traefik is such a reverse proxy and load balancer. It supports several backends (Docker, Swarm mode, Kubernetes, Marathon, Consul, Etcd, Rancher, Amazon ECS, and a lot more) to manage its configuration automatically and dynamically. If you submit an Ingress object to Kubernetes API server, Traefik as Ingress controller will handle this request and configure the proxy for this request accordingly.

In my previous blog post with Traefik v1 I also used Traefik to dynamically manage Let’s Encrypt TLS certificates to secure the communication between server and client. This is still possible but you need persistent storage to store the acme.json file or the Enterprise version of Traefik if you still want to use etcd as I did in my older blog post. Using the Enterprise Edition of Traefik wasn’t an option for me because the costs are just too high for a private person like me.

With cert-manager exists an option to manage Let’s Encrypt certificates for quite some time. cert-manager also supports HashiCorp Vault, Venafi, self signed and internal certificate authorities. So you also get more flexibility. If cert-manager is used with Kubernetes it will install all resources as custom resource definitions (CRDs). So you’ll get new Kubernetes objects like Issuer, Certificate and CertificateRequest. We’ll come back to this topics later.

I’m still using my Python venv as described in the previous blog posts. Also still the same Ansible’s hosts file and also the same playbook k8s.yml. So if you haven’t read the previous blog posts please consult them if anything is missing because this blog post depends on the previous ones.

I’ve prepared an Ansible role to install Traefik. It’s available at Ansible Galaxy and can be installed via

ansible-galaxy role install githubixx.traefik_kubernetes

or you just clone the Github repository in your roles directory:

git clone https://github.com/githubixx/ansible-role-traefik-kubernetes roles/githubixx.traefik_kubernetes

You also need to have Helm 3 binary installed on the Ansible Controller node (where ansible-playbook runs). But this was already needed for Cilium in the previous blog post. You can either try to use your favorite package manager if your distribution includes helm in its repository or use one of the Ansible Helm roles (e.g. helm or directly download the binary from Helm releases and put it into /usr/local/bin/ directory e.g.) For Archlinux Helm can be installed via sudo pacman -S helm e.g.

Also kubectl should to be installed. It’s normally also available via your package manager or you use my kubectl role. At least you need a proper configured KUBECONFIG that allows to deploy all the resources needed to the Kubernetes cluster (also see Kubernetes the not so hard way with Ansible - Control plane).

Behind the doors it uses the official Helm chart. Currently procedures like installing, updating/upgrading and deleting the Traefik deployment are supported.

The provided default settings are optimized for a bare-metal, on-premise or otherwise self-hosted Kubernetes cluster where Traefik is the public entry point for the Kubernetes Services. While the configuration can of course be customized as you can do with any Helm chart, the default settings will setup Traefik with the following configuration:

  • Traefik instances will be deployed as DaemonSet (so it’ll run on every K8s worker node)
  • Traefik pods uses hostPort
  • Traefik listens on port 80 on all interfaces of the host for incoming HTTP requests
  • Traefik listens on port 443 on all interfaces of the host for incoming HTTPS requests
  • Traefik dashboard is enabled but is not exposed to the public internet
  • TLS certificates are provided by cert-manager (see part 2)

So lets have a look at the default role variables this role provides (and which can be customized of course) at first:

# Helm chart version (uses Traefik v2.10)
traefik_chart_version: "23.1.0"

# Helm release name
traefik_release_name: "traefik"

# Helm repository name
traefik_repo_name: "traefik"

# Helm chart name
traefik_chart_name: "{{ traefik_repo_name }}/{{ traefik_release_name }}"

# Helm chart URL
traefik_chart_url: "https://helm.traefik.io/traefik"

# Kubernetes namespace where Traefik resources should be installed
traefik_namespace: "traefik"

# Directory that contains Helm chart values file. If you specify this
# variable Ansible will try to locate a file called "values.yml.j2" or
# "values.yaml.j2" in the specified directory (".j2" because you can
# use the usual Jinja2 template stuff there). The content of this file
# will be provided to "helm install/template" command as values file.
# By default the directory is the users "$HOME" directory plus
# "/traefik/helm". If the task doesn't find such a file it uses
# the values in "templates/traefik_values_default.yml.j2" by default.
traefik_chart_values_directory: "{{ '~/traefik/helm' | expanduser }}"

# By default CRDs (CustomResourceDefinitions) are not installed. Set to
# "true" if CRDs should be installed. Also see:
# https://github.com/traefik/traefik-helm-chart/tree/master/traefik/crds
# The following CRDs will be installed:
#   - ingressroutes.traefik.containo.us
#   - ingressroutes.traefik.io
#   - ingressroutetcps.traefik.containo.us
#   - ingressroutetcps.traefik.io
#   - ingressrouteudps.traefik.containo.us
#   - ingressrouteudps.traefik.io
#   - middlewares.traefik.containo.us
#   - middlewares.traefik.io
#   - middlewaretcps.traefik.containo.us
#   - middlewaretcps.traefik.io
#   - serverstransports.traefik.containo.us
#   - serverstransports.traefik.io
#   - tlsoptions.traefik.containo.us
#   - tlsoptions.traefik.io
#   - tlsstores.traefik.containo.us
#   - tlsstores.traefik.io
#   - traefikservices.traefik.containo.us
#   - traefikservices.traefik.io
traefik_install_crds: false

# By default all tasks that needs to communicate with the Kubernetes
# cluster are executed on your local host (127.0.0.1). But if that one
# doesn't have direct connection to this cluster or should be executed
# elsewhere this variable can be changed accordingly.
traefik_delegate_to: 127.0.0.1

# Shows the "helm" command that was executed if a task uses Helm to
# install, update/upgrade or deletes such a resource.
traefik_helm_show_commands: false

# Without "action" variable defined this role will only render a file
# with all the resources that will be installed or upgraded. The rendered
# file with the resources will be called "template.yml" and will be
# placed in the directory specified below.
traefik_template_output_directory: "{{ '~/traefik/template' | expanduser }}"

If there is a newer Helm chart version available adjust traefik_chart_version accordingly. You can adjust the other variables too but normally there is no need to do so (besides one…). If you want to install Traefik resources in a different namespace adjust traefik_namespace.

While you normally don’t need to adjust the variable values you might consider setting traefik_install_crds to true. This installs and updates/upgrades the Traefik CustomResourceDefinitions (CRD). They normally only change if you upgrade to a new major Traefik release. These CRDs are needed if you want to create a Kubernetes resource of kind: IngressRoute e.g. That one (besides others) is defined in those CRDs. If you want to install or upgrade the CRDs on your own just keep the default value of that variable value.

The role contains a default values file for the Helm chart in templates/traefik_values_default.yml.j2. Helm will use these values later to render the YAML manifests needed for Traefik like ServiceAccount, Deployment, and so on. So instead of creating the Kubernetes manifests manually as YAML files, Helm will render these templates with the values specified in templates/traefik_values_default.yml.j2 accordingly. But that are details you normally don’t need to care about as the Ansible role abstracts this details away. If you want to use different values just create a directory somewhere and create a file called values.yml.j2 or values.yaml.j2 there. Set the value of traefik_chart_values_directory to this directory. You can also use templates/traefik_values_default.yml.j2 as a template and adjust it to your needs.

So lets have a look at the values. If needed I’ll add additional comments to the comments that are already in the templates/traefik_values_default.yml.j2 file:

# All possible Helm chart values here can be found at:
# https://github.com/traefik/traefik-helm-chart/blob/master/traefik/values.yaml

image:
  registry: docker.io
  repository: traefik
  tag: "2.10.7"
  pullPolicy: IfNotPresent

# These arguments are passed to Traefik's binary. For all options see:
# https://doc.traefik.io/traefik/reference/static-configuration/cli/
#
# First one sets log level accordingly.
#
# Second one sets value of "kubernetes.io/ingress.class"
# annotation to watch for. If a "standard" Kubernetes "Ingress" object
# is submitted to Kubernetes API server (instead of Traefik's own ingress
# implementation called "IngressRoute"), Traefik will handle these requests
# and route them accordingly.
additionalArguments:
  - "--log.level=INFO"
  - "--providers.kubernetesingress.ingressclass=traefik"

If you deploy the role the first time it may make sense to set log.level to DEBUG. In case of problems you might get more information what’s the problem is all about. This should be changed again once you go into production ot avoid extensive logging.

image.tag explicitly specifies that I want to use the specified Traefik version. Otherwise the default specified in the Helm chart is used.

The value of providers.kubernetesingress.ingressclass becomes important later when cert-manager gets installed and a “http solver” will be configured. cert-manager will manage all Let's Encrypt SSL certificates and of course it needs to generate one if it’s not there already or if needs to be renewed. Let's Encrypt needs to verify that you own the domain for which one you want a certificate. So cert-manager will create a certificate request accordingly. Let's Encrypt will “call back” for verification. Before this happens cert-manager has already created an Ingress to intercept this verification request. This Ingress object will contain an annotation called kubernetes.io/ingress.class. This is basically the “signal” for Traefik to handle this Ingress setup if the value of this annotation is traefik. So this value and the cert-manager http01 solver value needs to match. I’ll come back to this when cert-manager gets configured.

# Global arguments passed to Traefik's binary.
# https://doc.traefik.io/traefik/reference/static-configuration/cli/
#
# First one disables periodical check if a new version has been released.
#
# Second one disables anonymous usage statistics. 
globalArguments:
  - "--global.checknewversion=false"
  - "--global.sendanonymoususage=false"

These two options should be quite obvious. We don’t need to check for newer versions as updating/upgrading will be managed by this role anyways if a newer version was specified and the role deployed.

# This creates the Traefik deployment. As "DaemonSet" is specified here
# this will create a Traefik instance on all Kubernetes worker nodes. If
# only a subset of nodes should be used specify "affinity", "nodeSelector"
# or "toleration's" accordingly. See link to Helm chart values above.
deployment:
  enabled: true
  kind: DaemonSet
  dnsPolicy: ClusterFirstWithHostNet

Instead of a Kubernetes Deployment I choose DaemonSet as deployment type. If you don’t have the luxury to run at AWS or Google Cloud where you can use a Service of type LoadBalancer to make your services externally available a DaemonSet is a good alternative.

First of all: What is a DaemonSet? A DaemonSet ensures that all (or some) nodes run a copy of a Pod. As nodes are added to the cluster, Pods are added to them. As nodes are removed from the cluster, those Pods are garbage collected. Deleting a DaemonSet will clean up the Pods it created. For the Traefik pods this means that exactly one Traefik pod will run on every worker node. As I only have a few worker nodes that’s ok for me. If you have tens or hundreds of worker nodes then this makes probably not much sense 😉 But that’s not a problem.

As mentioned already in the description above affinity", nodeSelector or tolerations can be used to deploy Traefik only on a few nodes. Maybe it makes even sense to have a dedicated pool of nodes that only run Traefik pods if you can afford it. Also providers like Hetzner, DigitalOcean, Scaleway, and so on offer Load Balancer which can be put “in front” of the Traefik instances. This way you can achieve high availability (HA). If one worker node goes down the Load Balancer can shift traffic to the remaining nodes. For more information also see Assigning Pods to Nodes.

# Instructs Traefik to listen on various ports.
ports:
  # The name of this one can't be changed as it is used for the readiness
  # and liveness probes, but you can adjust its config to your liking.
  # To access the dashboard you can use "kubectl port-forward" e.g.:
  # kubectl -n traefik port-forward $(kubectl get pods --selector "app.kubernetes.io/name=traefik" --output=name -A | head -1) 9000:9000
  # Opening http://localhost:9000/dashboard/ should show the dashboard.
  traefik:
    port: 9000
    expose: false
    protocol: TCP
  # Unsecured incoming HTTP traffic. If you uncomment "redirectTo: websecure"
  # all traffic that arrives at this port will be redirected to "websecure"
  # entry point which means to the entry point that handles secure HTTPS traffic.
  # But be aware that this could be problematic for cert-manager.
  # Also "hostPort" is used. As "DaemonSet" was specified above that basically
  # means that Traefik pods will answer requests on port 80 and 443 on all
  # Kubernetes worker nodes. So if the hosts have a public IP and port 80/443
  # are not protected by firewall, Traefik is available for requests from the
  # Internet (what you normally want in case of Traefik ;-) ) For other
  # options see link above. These settings are useful for baremetal or
  # on-premise solutions with no further loadbalancer.
  web:
    port: 30080
    hostPort: 80
    expose: true
    protocol: TCP
    # redirectTo: websecure
  # Entry point for HTTPS traffic.
  websecure:
    port: 30443
    hostPort: 443
    expose: true
    protocol: TCP

If you really want to access the dashboard (the traefik port above) from outside of your cluster create a secure ingress. That means the ingress should be at least TLS secured (what you can do with the help of cert-manager which I’ll talk about later) and BasicAuth middleware.

# These security settings are basically best practice to limit attack surface
# as good as possible.
# The attack surface can further limited with "seccomp" which is stable since
# Kubernetes v1.19 and allows to limit system calls to a bare minimum. See:
# https://kubernetes.io/docs/tutorials/clusters/seccomp/"
securityContext:
  capabilities:
    drop:
      - ALL
  readOnlyRootFilesystem: true
  runAsGroup: 65532
  runAsNonRoot: true
  runAsUser: 65532

# All processes of the container are also part of this supplementary group ID.
podSecurityContext:
  fsGroup: 65532

The blog post Configure a Security Context for a Pod or Container has more information about these settings. As Traefik will be your entrypoint from the Internet to your Kubernetes cluster make sure to keep permissions at the lowest level possible. And Restrict a Container’s Syscalls with Seccomp can even more restrict the permissions for the Traefik binary/container.

# Set log level of general log and enable access log.
logs:
  general:
    level: INFO
  access:
    enabled: true

# As Traefik web/websecure ports are exposed by "hostPort" a service isn't
# needed.
service:
  enabled: false

# CPU and RAM resource limits. These settings should also be set to
# reasonable values in case of a memory leak e.g.
resources:
  requests:
    cpu: "100m"
    memory: "50Mi"
  limits:
    cpu: "300m"
    memory: "150Mi"

But nothing is made in stone 😉 To use your own values just create a file called values.yml.j2 or values.yaml.j2 and put it into the directory specified in traefik_chart_values_directory (which is $HOME/traefik/helm by default). Then this role will use that file to render the Helm values. You can use templates/traefik_values_default.yml.j2 as a template or just start from scratch. As mentioned above you can modify all settings for the Helm chart that are different to the default ones which are located here. And since the source template is just a Jinja2 template you can of course use all Ansible template magic.

After the values file is in place and the defaults/main.yml values are checked the role can be installed. Most of the role’s tasks are executed locally so to say as quite a few tasks need to communicate with the Kubernetes API server or executing Helm commands.

So my traefik entry in Ansible’s hosts file will look like this:

traefik:
  hosts:
    k8s-01-ansible-ctrl.i.example.com:
      ansible_connection: local

k8s-01-ansible-ctrl.i.example.com is my Ansible Controller node which is my laptop. That hostname is just an Ansible internal name. That’s why I set ansible_connection: local. In k8s.yml I added

- 
  hosts: traefik
  roles:
    - role: githubixx.traefik_kubernetes
      tags: role-traefik-kubernetes

which I already mentioned in a previous post.

The default action of the role is to just render the Kubernetes resources YAML file after replacing all Jinja2 variables and stuff like that. The role githubixx.traefik-kubernetes has a tag role-traefik-kubernetes assigned as you can see above. Assuming that the values for the Helm chart should be rendered (nothing will be installed in this case) and the playbook is called k8s.yml execute the following command:

ansible-playbook --tags=role-traefik-kubernetes k8s.yml

One of the final tasks is called TASK [githubixx.traefik-kubernetes : Write templates to file]. This allows to check the YAML file before Traefik gets deployed. The file template.yml will be rendered into the directory specified in traefik_template_output_directory. This setting can be temporary set like this:

ansible-playbook --tags=role-traefik-kubernetes --extra-vars traefik_template_output_directory="/tmp/traefik" k8s.yml

But I’ll put this variable together with traefik_install_crds (because I want to install the Traefik Custom Resource Definitions) into host_vars/k8s-01-ansible-ctrl.i.example.com. In a previous blog post I also already set ansible_python_interpreter. So the file will look like this:

ansible_python_interpreter: "/opt/scripts/ansible/k8s-01_vms/bin/python"

traefik_template_output_directory: "/tmp/traefik"
traefik_install_crds: true

If the rendered output contains everything you need the role can be installed which finally deploys Traefik:

ansible-playbook --tags=role-traefik-kubernetes --extra-vars action=install k8s.yml

To check if everything was deployed use the usual kubectl commands like kubectl -n <traefik_namespace> get pods -o wide. E.g.:

kubectl --namespace traefik get pods -o wide

NAME            READY   STATUS    RESTARTS   AGE   IP           NODE         NOMINATED NODE   READINESS GATES
traefik-bpb4x   1/1     Running   0          20m   10.0.2.38    k8s-010203   <none>           <none>
traefik-gbwkd   1/1     Running   0          20m   10.0.5.180   k8s-010303   <none>           <none>
traefik-kt2lf   1/1     Running   0          20m   10.0.0.52    k8s-010103   <none>           <none>

To check if Traefik delivers something on port 80 e.g. a simple curl request should output at last “something” 😉 E.g.

ansible -m command -a "curl --silent http://{{ ansible_default_ipv4.address }}" k8s-010103.i.example.com
k8s-010103.i.example.com | CHANGED | rc=0 >>
404 page not found

HTTP 404 is fine in this case as currently no Ingress resources defined. Both port 80 and 443 should be accessible (see Firewall settings below).

If you enabled the Traefik dashboard you should be able to access that one too now. Create a port forwarding like in this example:

kubectl --namespace traefik port-forward $(kubectl get pods --selector "app.kubernetes.io/name=traefik" --output=name -A | head -1) 9000:9000

If you now open a browser you should be able to open http://localhost:9000/dashboard/ (the last / is important) and see the Traefik dashboard.

As Traefik issues updates/upgrades every few weeks/months the role also can do updates/upgrades. This method can also be used to change existing values without upgrading the Traefik version e.g. Also see Traefik releases before updating Traefik. Changes to the Helm chart can be found in the commit history.

If you want to upgrade Traefik/Helm chart you basically only need to change traefik_chart_version variable e.g. from 23.1.0 to 23.2.0. If only parameters should be changed update the values accordingly.

So to do the Traefik update or to roll out the new values run

ansible-playbook --tags=role-traefik-kubernetes --extra-vars action=upgrade k8s.yml

For more information about the Ansible role see the README.

The firewall settings I already adjusted in the Control plane blog post. So this is just a reminder as the settings should be already in place. Opening port 80 and 443 is not only for “normal” HTTP and HTTPs traffic but also to enable Let’s Encrypt to reach the Traefik instances for the . E.g. for the harden_linux role that means to extend the harden_linux_ufw_rules variable to finally have something like this (if you changed the SSHd port to 22222 as recommended otherwise the value will certainly be 22). So as already mentioned this variable will be placed in group_vars/k8s_worker.yml:

harden_linux_ufw_rules:
  - rule: "allow"
    to_port: "22222"
    protocol: "tcp"
  - rule: "allow"
    to_port: "51820"
    protocol: "udp"
  - rule: "allow"
    to_port: "80"
    protocol: "tcp"
  - rule: "allow"
    to_port: "443"
    protocol: "tcp"
  - rule: "allow"
    to_port: "25"
    protocol: "tcp"

That’s it for part 1. In part 2 I’ll install cert-manager to get TLS certificates from Let’s Encrypt which allows to encrypt the website traffic.