Upgrade Traefik v2 to v3
Introduction
In my blog posts Kubernetes the Not So Hard Way With Ansible - Ingress with Traefik v2 and cert-manager
(part 1 / part 2) I showed how to install Traefik proxy as a Ingress controller for Kubernetes. That blog posts were written for Traefik v2. But earlier this year a new major release of Traefik was released which v3 now. While Traefik v2 will be still supported with security updates for a while it makes sense startup planing the upgrade at least and test everything.
Major releases normally always means breaking changes and a lot of headache 😉. But for the upgrade to Traefik v3 they claim it should be easy. So lets see…
NOTE: This guide just covers Traefik running in a Kubernetes cluster! If you use Traefik in any other environmnet then this guide most probably isn’t for you.
Starting point
This is what I’ve installed and what is relevant for the upgrade (your setup might differ of course and additional changes might be needed!):
- A Kubernetes cluster running K8s
v1.29.x
with Traefikv2.11.x
as Ingress controller installed. It normally always makes sense to have the latest previous version installed before starting with the migration. Currently that’s Traefikv2.11.8
. - Out of all the resources that Traefik offers I’m only using
ingressroutes.traefik.containo.us
andmiddlewares.traefik.containo.us
. This needs to be changed before the upgrade (see below). If you use other Traefik resources likeingressroutetcps.traefik.containo.us
you need to adjust them too. - I’m using my Ansible role ansible-role-traefik-kubernetes to install or upgrade Traefik. It uses the official Helm chart behind the scenes. So some of the information below should be still relevant even without using the Ansible role but the Helm chart directly e.g.
Before the upgrade to Traefik v3 can be started a few things need to be done.
Monitoring and plan for downgrading
Before you actually start planing the Traefik v3 upgrade make sure that you have some monitoring in place that monitors your websites. Also monitoring the traefik
DaemonSet/Deployment makes sense and most probably a few other resources. But this depends on your needs.
In case the upgrade doesn’t work or you figure out some problems make sure you’ve a plan to roll back to the old Traefik version! In general that shouldn’t be a problem as long as you have the old Traefik CRDs (Custom Resource Defintions) *.traefik.containo.us
around that were installed with Traefik v2. Deleting the old *.traefik.containo.us
CRDs should be really the very last thing you should do to finish the migration.
Some reading
Resources
You definitely should read the following documentation:
- Traefik 3.0 GA Has Landed: Here’s How to Migrate
- Traefik Helm Chart Change Log
- Traefik Proxy Change Log
- Configuration Details for Migrating from Traefik v2 to v3
- Migration Guide: From v2 to v3
- Traefik v3 minor migrations
Further interesting reading
Change the API Group
The Custom Resource Definition (CRD) API group traefik.containo.us
was deprecated and is was removed in Traefik v3. So before you upgrade to this release, make sure your Traefik resources are changed accordingly. Please use the API Group traefik.io
instead. E.g.:
ingressroutes.traefik.containo.us
->ingressroutes.traefik.io
ingressroutetcps.traefik.containo.us
->ingressroutetcps.traefik.io
ingressrouteudps.traefik.containo.us
->ingressrouteudps.traefik.io
middlewares.traefik.containo.us
->middlewares.traefik.io
middlewaretcps.traefik.containo.us
->middlewaretcps.traefik.io
serverstransports.traefik.containo.us
->serverstransports.traefik.io
tlsoptions.traefik.containo.us
->tlsoptions.traefik.io
tlsstores.traefik.containo.us
->tlsstores.traefik.io
traefikservices.traefik.containo.us
-> `traefikservices.traefik.io
While still running Traefik v2 I adjusted a few IngressRoute
s that were still using traefik.containo.us
API group. E.g.:
apiVersion: traefik.containo.us/v1alpha1
kind: IngressRoute
metadata:
name: example-com
namespace: www-example-com
spec:
...
needs to be changed to and applied:
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
name: example-com
namespace: www-example-com
spec:
...
An example for Middleware
. This
apiVersion: traefik.containo.us/v1alpha1
kind: Middleware
metadata:
name: redirect-to-https
namespace: www-example-com
spec:
redirectScheme:
scheme: https
permanent: true
needs to be changed to
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
name: redirect-to-https
namespace: www-example-com
spec:
redirectScheme:
scheme: https
permanent: true
NOTE: If all API groups are adjusted accordingly, do not delete the now obsolete traefik.containo.us
CRDs yet! This should be done after Traefik 3.x was installed and the IngressRoutes
and the other Traefik resources are still working.
Configuration changes: Adjust Helm values
As a reminder: The dynamic configuration can be updated while Traefik is running and contains all the routing information. E.g. IngressRoute
is something that can be changed dynamically. The Helm values more or less belong to the static configuration, e.g. the parameters that are provided to the Traefik binary during startup. You can get the current static configuration parameters by running kubectl -n traefik get daemonsets.apps traefik -o yaml | yq '.spec.template.spec.containers[0].args'
(if you have the traefik
DaemonSet in traefik
namespace). This uses yq utility to parse the YAML output and just displays the requested information.
No matter if you use my Traefik Ansible role or the Traefik Helm chart directly there are some changes needed in the Helm values.
In case of my role the Ansible variable traefik_default_path_matcher_syntax: v2
needs to be set! This is really important if you use IngressRoute
e.g. Otherwise the path/host matching might not work anymore as some matchers have either been removed or the syntax was changed (see Dynamic Configuration Changes). With traefik_default_path_matcher_syntax: v2
variable set OR having set
core:
defaultRuleSyntax: v2
in Helm values.yaml
file directly or in traefik_values_default.yml.j2 if you use a custom version of this file, Traefik v3 is still able to work with the old syntax. This is actually already one of the most important settings for the upgrade if you use IngressRoute
. After the upgrade every IngressRoute
can be migrated to use the new v3
syntax or you go straight to Gateway API for even more features and future proof. Also check the rest of Dynamic Configuration Changes documentation for other changes that might affect you.
A few other changes that happened in traefik_values_default.yml.j2 which you might need to adjust if you don’t use the defaults (or of you use Helm directly and the values.yaml
file). Some of the changes are needed because the previous version of my Ansible role was using Traefik Helm chart version 23.2.0
and was upgraded to version 31.1.1
.
It makes sense to make a backup of the values file in case you need to downgrade again (or if you’ve the changes in Git you can also easily revert).
The Traefik version to use (as of writting this blog post it was 3.1.4
) was changed of course:
image:
tag: "3.1.4"
ports.traefik.expose: true
was changed to ports.traefik.expose.default: true
. Same for ports.web.expose
and ports.websecure.expose
. E.g.:
ports:
traefik:
port: 9000
protocol: TCP
expose:
default: false
web:
...
expose:
default: false
websecure:
...
expose:
default: false
gateway
key was added to traefik_values_default.yml.j2 for better visibility. But it’s set to
gateway:
enabled: false
because some further adjustments might be needed for TLS support e.g.
Default for updateStrategy
changed from:
updateStrategy:
type: RollingUpdate
to
updateStrategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
maxSurge: 0
If maxUnavailable
is set to 1
, maxSurge
needs to be 0
.
Also added:
providers:
kubernetesCRD:
# Load Kubernetes "IngressRoute" provider
enabled: true
kubernetesIngress:
# Load Kubernetes "Ingress" provider
enabled: true
kubernetesGateway:
# Enable Traefik "Gateway" provider for Gateway API
enabled: false
# Create a default IngressClass for Traefik
ingressClass:
enabled: true
isDefaultClass: true
But these values are default anyways. They were added for better visibility.
Helm version
One note regarding Helm: While I first tried to install the new Traefik version I got this error when my Ansible role tried to render the YAML manifests (or if I executed helm template ...
):
Error: chart requires kubeVersion: >=1.22.0-0 which is incompatible with Kubernetes v1.20.0
I had Helm version 3.16.4
installed which was the latest at that time. But no matter what I did I always got this error. So the workaround was to downgrade to Helm 3.14.3
(3.15.x
had the same issues). Maybe it’s just me. But if you get this error you can try downgrading Helm.
Upgrade Ansible Traefik role
Next I upgrade my Ansible Traefik role to the latest version. I’m using ansible-galaxy
but if you cloned the repository via git
command then git pull
might work too. E.g.:
ansible-galaxy role install --force githubixx.traefik_kubernetes
Now that everything is ready I finally upgraded Traefik (k8s.yml
is the name of my Ansible playbook):
ansible-playbook --tags=role-traefik-kubernetes --extra-vars action=upgrade k8s.yml
kubectl -n traefik get pods -o wide
should now indicate that the Traefik Pods are upgraded or about to be upgraded (the new Pods normally have a lower value in AGE
column).
If you now have a look at the logs of one of the Traefik Pods you should see something like this (kubectl -n traefik logs traefik-xxxxx
- replace traefik-xxxxx
with one of the Traefik):
2024-09-29T19:59:54Z WRN v2 rules syntax is now deprecated, please use v3 instead...
2024-09-29T19:59:54Z INF Traefik version 3.1.4 built on 2024-09-19T13:47:17Z version=3.1.4
...
2024-09-29T19:59:54Z INF Starting provider aggregator aggregator.ProviderAggregator
2024-09-29T19:59:54Z INF Starting provider *traefik.Provider
2024-09-29T19:59:54Z INF Starting provider *ingress.Provider
2024-09-29T19:59:54Z INF ingress label selector is: "" providerName=kubernetes
2024-09-29T19:59:54Z INF Creating in-cluster Provider client providerName=kubernetes
2024-09-29T19:59:54Z INF Starting provider *crd.Provider
2024-09-29T19:59:54Z INF label selector is: "" providerName=kubernetescrd
2024-09-29T19:59:54Z INF Creating in-cluster Provider client providerName=kubernetescrd
2024-09-29T19:59:54Z INF Starting provider *acme.ChallengeTLSALPN
As you can see the very first line is WRN
(warning): v2 rules syntax is now deprecated, please use v3 instead...
. This is of course expected as I enabled the v2
path/host matcher syntax as you might remember. To get rid of this warning and to be able to switch to v3
path/host matcher syntax by default I now need to adjust my IngressRoute
objects in Kubernetes.
But nevertheless all IngressRoute
s and Ingress
’ objects should still work. So all your websites should be still available. Check accordingly with your web browser, curl, or whatever tool you prefer. But of course you should monitor that anyways and if something fails your monitoring system should tell you.
Upgrade failed
In case the upgrade failed for whatever reason it should be easy to downgrade to Traefik v2 as long as the old CRDs are still there as mentioned above. With ansible-galaxy
this should download the last version of my Ansible Traefik role that supports Traefik v2:
ansible-galaxy role install --force githubixx.traefik_kubernetes,6.1.0+23.2.0
Also if you don’t use the defaults values file traefik_values_default.yml.j2 you need to revert the changes for Traefik v3 there too. If you made a copy of the old file it’s just a matter of copy&paste. Executing
ansible-playbook --tags=role-traefik-kubernetes --extra-vars action=upgrade k8s.yml
should then do the downgrade to Traefik v2. The same of course should also work if you use helm
and a values.yaml
file directly.
Adjust IngressRoute objects with old path/host matcher syntax
In my case the upgrade went pretty smooth so that I now adjusted the path/host matcher syntax in all IngressRoute
objects as I want to get rid of core.defaultRuleSyntax: v2
setting. Sooner or later Traefik wont support the old syntax anymore so it makes sense to adjust that right away.
So lets have a look at this example:
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
name: example-com
namespace: www-example-com
spec:
entryPoints:
- web
routes:
- kind: Rule
match: Host(`example.com`, `www.example.com`)
services:
- kind: Service
name: nginx-example-com
namespace: www-example-com
passHostHeader: true
port: 80
In general I can keep everything as is besides the value of match
. The Configuration Details for Migrating from Traefik v2 to v3 mentions quite a few changes in Router Rule Matchers. Most matchers now only take a single value.
In the example above there are now two possibilities to make it compatible with the v3 matcher rules. E.g. I can use logical OR like this:
match: Host(`example.com`) || Host(`www.example.com`)
This syntax actually works with Traefik v2 and v3. So in this case it’d be good enough already and can be applied this way. Or I can use the next one which matches every subdomain of example.com
(so the example above handles only two cases while the next one works with every subdomain):
match: HostRegexp(`^.+\.?example\.com$`)
syntax: v3
As you can see the second example explicitly states that Traefik should use the v3
matcher. I think it makes sense to add this property in general to every IngressRoute
you’ve touched to make it easier to figure out what IngressRoute
s using the new syntax already and which aren’t.
You can find more examples in the Traefik v3.1 Rule documenation. To compare Traefik v2.11 Routers documentation
Delete traefik.containo.us CRDs
In my case I’ve now changed everything that was needed to finally delete the old *.traefik.containo.us
Custon Resource Definitions:
kubectl delete crds \
ingressroutes.traefik.containo.us \
ingressroutetcps.traefik.containo.us \
ingressrouteudps.traefik.containo.us \
middlewares.traefik.containo.us \
middlewaretcps.traefik.containo.us \
serverstransports.traefik.containo.us \
tlsoptions.traefik.containo.us \
tlsstores.traefik.containo.us \
traefikservices.traefik.containo.us
Check if any traefik.containo.us
CRDs are left:
kubectl get crds | grep traefik.containo.us
Switch to v3 matcher syntax
Now one final thing needs to be done. If you’ve set traefik_default_path_matcher_syntax
to v2
previously for the upgrade set it now to v3
. If you use a values.yaml
file change:
core:
defaultRuleSyntax: v2
to
core:
defaultRuleSyntax: v3
In case of using my Ansible role the change can now be applied:
ansible-playbook --tags=role-traefik-kubernetes --extra-vars action=upgrade k8s.yml
After that’s done check the Traefik logs for any errors or warnings. If none of those occurred: You’re done! 😉