Upgrade Traefik v2 to v3

In my blog posts Kubernetes the Not So Hard Way With Ansible - Ingress with Traefik v2 and cert-manager (part 1 / part 2) I showed how to install Traefik proxy as a Ingress controller for Kubernetes. That blog posts were written for Traefik v2. But earlier this year a new major release of Traefik was released which v3 now. While Traefik v2 will be still supported with security updates for a while it makes sense startup planing the upgrade at least and test everything.

Major releases normally always means breaking changes and a lot of headache 😉. But for the upgrade to Traefik v3 they claim it should be easy. So lets see…

NOTE: This guide just covers Traefik running in a Kubernetes cluster! If you use Traefik in any other environmnet then this guide most probably isn’t for you.

This is what I’ve installed and what is relevant for the upgrade (your setup might differ of course and additional changes might be needed!):

  • A Kubernetes cluster running K8s v1.29.x with Traefik v2.11.x as Ingress controller installed. It normally always makes sense to have the latest previous version installed before starting with the migration. Currently that’s Traefik v2.11.8.
  • Out of all the resources that Traefik offers I’m only using ingressroutes.traefik.containo.us and middlewares.traefik.containo.us. This needs to be changed before the upgrade (see below). If you use other Traefik resources like ingressroutetcps.traefik.containo.us you need to adjust them too.
  • I’m using my Ansible role ansible-role-traefik-kubernetes to install or upgrade Traefik. It uses the official Helm chart behind the scenes. So some of the information below should be still relevant even without using the Ansible role but the Helm chart directly e.g.

Before the upgrade to Traefik v3 can be started a few things need to be done.

Before you actually start planing the Traefik v3 upgrade make sure that you have some monitoring in place that monitors your websites. Also monitoring the traefik DaemonSet/Deployment makes sense and most probably a few other resources. But this depends on your needs.

In case the upgrade doesn’t work or you figure out some problems make sure you’ve a plan to roll back to the old Traefik version! In general that shouldn’t be a problem as long as you have the old Traefik CRDs (Custom Resource Defintions) *.traefik.containo.us around that were installed with Traefik v2. Deleting the old *.traefik.containo.us CRDs should be really the very last thing you should do to finish the migration.

You definitely should read the following documentation:

The Custom Resource Definition (CRD) API group traefik.containo.us was deprecated and is was removed in Traefik v3. So before you upgrade to this release, make sure your Traefik resources are changed accordingly. Please use the API Group traefik.io instead. E.g.:

  • ingressroutes.traefik.containo.us -> ingressroutes.traefik.io
  • ingressroutetcps.traefik.containo.us -> ingressroutetcps.traefik.io
  • ingressrouteudps.traefik.containo.us -> ingressrouteudps.traefik.io
  • middlewares.traefik.containo.us -> middlewares.traefik.io
  • middlewaretcps.traefik.containo.us -> middlewaretcps.traefik.io
  • serverstransports.traefik.containo.us -> serverstransports.traefik.io
  • tlsoptions.traefik.containo.us -> tlsoptions.traefik.io
  • tlsstores.traefik.containo.us -> tlsstores.traefik.io
  • traefikservices.traefik.containo.us -> `traefikservices.traefik.io

While still running Traefik v2 I adjusted a few IngressRoutes that were still using traefik.containo.us API group. E.g.:

yaml

apiVersion: traefik.containo.us/v1alpha1
kind: IngressRoute
metadata:
  name: example-com
  namespace: www-example-com
spec:
...

needs to be changed to and applied:

yaml

apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
  name: example-com
  namespace: www-example-com
spec:
...

An example for Middleware. This

yaml

apiVersion: traefik.containo.us/v1alpha1
kind: Middleware
metadata:
  name: redirect-to-https
  namespace: www-example-com
spec:
  redirectScheme:
    scheme: https
    permanent: true

needs to be changed to

yaml

apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
  name: redirect-to-https
  namespace: www-example-com
spec:
  redirectScheme:
    scheme: https
    permanent: true

NOTE: If all API groups are adjusted accordingly, do not delete the now obsolete traefik.containo.us CRDs yet! This should be done after Traefik 3.x was installed and the IngressRoutes and the other Traefik resources are still working.

As a reminder: The dynamic configuration can be updated while Traefik is running and contains all the routing information. E.g. IngressRoute is something that can be changed dynamically. The Helm values more or less belong to the static configuration, e.g. the parameters that are provided to the Traefik binary during startup. You can get the current static configuration parameters by running kubectl -n traefik get daemonsets.apps traefik -o yaml | yq '.spec.template.spec.containers[0].args' (if you have the traefik DaemonSet in traefik namespace). This uses yq utility to parse the YAML output and just displays the requested information.

No matter if you use my Traefik Ansible role or the Traefik Helm chart directly there are some changes needed in the Helm values.

In case of my role the Ansible variable traefik_default_path_matcher_syntax: v2 needs to be set! This is really important if you use IngressRoute e.g. Otherwise the path/host matching might not work anymore as some matchers have either been removed or the syntax was changed (see Dynamic Configuration Changes). With traefik_default_path_matcher_syntax: v2 variable set OR having set

yaml

core:
   defaultRuleSyntax: v2

in Helm values.yaml file directly or in traefik_values_default.yml.j2 if you use a custom version of this file, Traefik v3 is still able to work with the old syntax. This is actually already one of the most important settings for the upgrade if you use IngressRoute. After the upgrade every IngressRoute can be migrated to use the new v3 syntax or you go straight to Gateway API for even more features and future proof. Also check the rest of Dynamic Configuration Changes documentation for other changes that might affect you.

A few other changes that happened in traefik_values_default.yml.j2 which you might need to adjust if you don’t use the defaults (or of you use Helm directly and the values.yaml file). Some of the changes are needed because the previous version of my Ansible role was using Traefik Helm chart version 23.2.0 and was upgraded to version 31.1.1.

It makes sense to make a backup of the values file in case you need to downgrade again (or if you’ve the changes in Git you can also easily revert).

The Traefik version to use (as of writting this blog post it was 3.1.4) was changed of course:

yaml

image:
  tag: "3.1.4"

ports.traefik.expose: true was changed to ports.traefik.expose.default: true. Same for ports.web.expose and ports.websecure.expose. E.g.:

yaml

ports:
  traefik:
    port: 9000
    protocol: TCP
    expose:
      default: false
  web:
  ...
    expose:
      default: false
  websecure:
  ...
    expose:
      default: false

gateway key was added to traefik_values_default.yml.j2 for better visibility. But it’s set to

yaml

gateway:
  enabled: false

because some further adjustments might be needed for TLS support e.g.

Default for updateStrategy changed from:

yaml

updateStrategy:
  type: RollingUpdate

to

yaml

updateStrategy:
  type: RollingUpdate
  rollingUpdate:
    maxUnavailable: 1
    maxSurge: 0

If maxUnavailable is set to 1, maxSurge needs to be 0.

Also added:

yaml

providers:
  kubernetesCRD:
    # Load Kubernetes "IngressRoute" provider
    enabled: true
  kubernetesIngress:
    # Load Kubernetes "Ingress" provider
    enabled: true
  kubernetesGateway:
    # Enable Traefik "Gateway" provider for Gateway API
    enabled: false

# Create a default IngressClass for Traefik
ingressClass:
  enabled: true
  isDefaultClass: true

But these values are default anyways. They were added for better visibility.

One note regarding Helm: While I first tried to install the new Traefik version I got this error when my Ansible role tried to render the YAML manifests (or if I executed helm template ...):

plain

Error: chart requires kubeVersion: >=1.22.0-0 which is incompatible with Kubernetes v1.20.0

I had Helm version 3.16.4 installed which was the latest at that time. But no matter what I did I always got this error. So the workaround was to downgrade to Helm 3.14.3 (3.15.x had the same issues). Maybe it’s just me. But if you get this error you can try downgrading Helm.

Next I upgrade my Ansible Traefik role to the latest version. I’m using ansible-galaxy but if you cloned the repository via git command then git pull might work too. E.g.:

bash

ansible-galaxy role install --force githubixx.traefik_kubernetes

Now that everything is ready I finally upgraded Traefik (k8s.yml is the name of my Ansible playbook):

bash

ansible-playbook --tags=role-traefik-kubernetes --extra-vars action=upgrade k8s.yml

kubectl -n traefik get pods -o wide should now indicate that the Traefik Pods are upgraded or about to be upgraded (the new Pods normally have a lower value in AGE column).

If you now have a look at the logs of one of the Traefik Pods you should see something like this (kubectl -n traefik logs traefik-xxxxx - replace traefik-xxxxx with one of the Traefik):

log

2024-09-29T19:59:54Z WRN v2 rules syntax is now deprecated, please use v3 instead...
2024-09-29T19:59:54Z INF Traefik version 3.1.4 built on 2024-09-19T13:47:17Z version=3.1.4
...
2024-09-29T19:59:54Z INF Starting provider aggregator aggregator.ProviderAggregator
2024-09-29T19:59:54Z INF Starting provider *traefik.Provider
2024-09-29T19:59:54Z INF Starting provider *ingress.Provider
2024-09-29T19:59:54Z INF ingress label selector is: "" providerName=kubernetes
2024-09-29T19:59:54Z INF Creating in-cluster Provider client providerName=kubernetes
2024-09-29T19:59:54Z INF Starting provider *crd.Provider
2024-09-29T19:59:54Z INF label selector is: "" providerName=kubernetescrd
2024-09-29T19:59:54Z INF Creating in-cluster Provider client providerName=kubernetescrd
2024-09-29T19:59:54Z INF Starting provider *acme.ChallengeTLSALPN

As you can see the very first line is WRN (warning): v2 rules syntax is now deprecated, please use v3 instead.... This is of course expected as I enabled the v2 path/host matcher syntax as you might remember. To get rid of this warning and to be able to switch to v3 path/host matcher syntax by default I now need to adjust my IngressRoute objects in Kubernetes.

But nevertheless all IngressRoutes and Ingress’ objects should still work. So all your websites should be still available. Check accordingly with your web browser, curl, or whatever tool you prefer. But of course you should monitor that anyways and if something fails your monitoring system should tell you.

In case the upgrade failed for whatever reason it should be easy to downgrade to Traefik v2 as long as the old CRDs are still there as mentioned above. With ansible-galaxy this should download the last version of my Ansible Traefik role that supports Traefik v2:

bash

ansible-galaxy role install --force githubixx.traefik_kubernetes,6.1.0+23.2.0

Also if you don’t use the defaults values file traefik_values_default.yml.j2 you need to revert the changes for Traefik v3 there too. If you made a copy of the old file it’s just a matter of copy&paste. Executing

bash

ansible-playbook --tags=role-traefik-kubernetes --extra-vars action=upgrade k8s.yml

should then do the downgrade to Traefik v2. The same of course should also work if you use helm and a values.yaml file directly.

In my case the upgrade went pretty smooth so that I now adjusted the path/host matcher syntax in all IngressRoute objects as I want to get rid of core.defaultRuleSyntax: v2 setting. Sooner or later Traefik wont support the old syntax anymore so it makes sense to adjust that right away.

So lets have a look at this example:

yaml

apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
  name: example-com 
  namespace: www-example-com
spec:
  entryPoints:
    - web
  routes:
    - kind: Rule
      match: Host(`example.com`, `www.example.com`)
      services:
        - kind: Service
          name: nginx-example-com
          namespace: www-example-com
          passHostHeader: true
          port: 80

In general I can keep everything as is besides the value of match. The Configuration Details for Migrating from Traefik v2 to v3 mentions quite a few changes in Router Rule Matchers. Most matchers now only take a single value.

In the example above there are now two possibilities to make it compatible with the v3 matcher rules. E.g. I can use logical OR like this:

yaml

match: Host(`example.com`) || Host(`www.example.com`) 

This syntax actually works with Traefik v2 and v3. So in this case it’d be good enough already and can be applied this way. Or I can use the next one which matches every subdomain of example.com (so the example above handles only two cases while the next one works with every subdomain):

yaml

match: HostRegexp(`^.+\.?example\.com$`)
syntax: v3

As you can see the second example explicitly states that Traefik should use the v3 matcher. I think it makes sense to add this property in general to every IngressRoute you’ve touched to make it easier to figure out what IngressRoutes using the new syntax already and which aren’t.

You can find more examples in the Traefik v3.1 Rule documenation. To compare Traefik v2.11 Routers documentation

In my case I’ve now changed everything that was needed to finally delete the old *.traefik.containo.us Custon Resource Definitions:

bash

kubectl delete crds \
  ingressroutes.traefik.containo.us \
  ingressroutetcps.traefik.containo.us \
  ingressrouteudps.traefik.containo.us \
  middlewares.traefik.containo.us \
  middlewaretcps.traefik.containo.us \
  serverstransports.traefik.containo.us \
  tlsoptions.traefik.containo.us \
  tlsstores.traefik.containo.us \
  traefikservices.traefik.containo.us

Check if any traefik.containo.us CRDs are left:

bash

kubectl get crds  | grep traefik.containo.us

Now one final thing needs to be done. If you’ve set traefik_default_path_matcher_syntax to v2 previously for the upgrade set it now to v3. If you use a values.yaml file change:

yaml

core:
  defaultRuleSyntax: v2

to

yaml

core:
  defaultRuleSyntax: v3

In case of using my Ansible role the change can now be applied:

bash

ansible-playbook --tags=role-traefik-kubernetes --extra-vars action=upgrade k8s.yml

After that’s done check the Traefik logs for any errors or warnings. If none of those occurred: You’re done! 😉