Kubernetes upgrade notes: 1.22.x to 1.23.x
Introduction
If you used my Kubernetes the Not So Hard Way With Ansible blog posts to setup a Kubernetes (K8s) cluster this notes might be helpful for you (and maybe for others too that manage a K8s cluster on their own e.g.). I’ll only mention changes that might be relevant because they will either be interesting for most K8s administrators anyways (even in case they run a fully managed Kubernetes deployment) or if it’s relevant if you manage your own bare-metal/VM based on-prem Kubernetes deployment. I normally skip changes that are only relevant for GKE, AWS EKS, Azure or other cloud providers.
I’ve a general upgrade guide Kubernetes the Not So Hard Way With Ansible - Upgrading Kubernetes that worked quite well for me for the last past K8s upgrades. So please read that guide if you want to know HOW the components are updated. This post here is esp. for the 1.22.x to 1.23.x upgrade and WHAT was interesting for me.
First: As usual I don’t update a production system before the .2
release of a new major version is released. In my experience the .0
and .1
are just too buggy (and to be honest sometimes it’s even better to wait for the .5
release ;-) ). Of course it is important to test new releases already in development or integration systems and report bugs!
Second: I only upgrade from the latest version of the former major release. In my case I was running 1.22.5
and at the time writing this text 1.22.6
was the latest 1.22.x
release. After reading the 1.22.x CHANGELOG to see if any important changes where made between 1.22.5
and 1.22.6
I don’t saw anything that prevented me updating and I don’t needed to change anything.
BUT this time the 1.23.x CHANGELOG mentions a change which can be deployed together with the upgrade to the latest 1.22.x
version. For the kube-scheduler
the support for configuration file version v1beta1
was removed. Update configuration files to v1beta2
BEFORE upgrading to 1.23. So for kube-scheduler.yaml
apiVersion: kubescheduler.config.k8s.io/v1beta1
needs to be changed to apiVersion: kubescheduler.config.k8s.io/v1beta2
So I did the 1.22.5
to 1.22.6
upgrade first. If you use my Ansible roles that basically only means to change k8s_release
variable from 1.22.5
to 1.22.6
and deploy the changes for the control plane and worker nodes as described in my upgrade guide.
After that everything still worked as expected so I continued with the next step.
Here are a few links that might be interesting regarding what’s new in regards to new features in Kubernetes 1.23:
Kubernetes 1.23 CHANGELOG
Kubernetes 1.23: The next Frontier
Kubernetes 1.23 – What’s new? - SysDig blog
Since K8s 1.14 there are also searchable release notes available. You can specify the K8s version and a K8s area/component (e.g. kublet, apiserver, …) and immediately get an overview what changed in that regard. Quite nice! :-)
As it is normally no problem to have a newer kubectl
utility that is only one major version ahead of the server version I also updated kubectl
from 1.22.x
to 1.23.x
using my kubectl Ansible role.
Urgent Upgrade Notes
As always before a major upgrade read the Urgent Upgrade Notes! If you used my Ansible roles to install Kubernetes and used most of the default settings then there should be no need to adjust any settings. For K8s 1.23
release I actually couldn’t find any urgent notes that were relevant for my Ansible roles or my own on-prem setup. But nevertheless there are three notes that might be relevant for some people:
- If you log messages in JSON format previously those messages are logged to
stdout
and now tostderr
to not mess with the normal output. This may become relevant for command line tools like “kubectl` - Support for the seccomp annotations
seccomp.security.alpha.kubernetes.io/pod
andcontainer.seccomp.security.alpha.kubernetes.io/[name]
has been deprecated since 1.19, will be dropped in 1.25 - kube-log-runner is included in release tar balls. It can be used to replace the deprecated
--log-file
parameter.
What’s New (Major Themes)
In What’s New (Major Themes) I’ve found the following highlights that looks most important to me:
- IPv4/IPv6 Dual-stack Networking graduates to GA: The use of dual-stack networking is not mandatory. Although clusters are enabled to support dual-stack networking, Pods and Services continue to default to single-stack. To use dual-stack networking: Kubernetes nodes have routable IPv4/IPv6 network interfaces, a dual-stack capable CNI network plugin is used, Pods are configured to be dual-stack and Services have their .spec.ipFamilyPolicy field set to either PreferDualStack or RequireDualStack.
- HorizontalPodAutoscaler v2 graduates to GA
- Generic Ephemeral Volume feature graduates to GA
- Ephemeral Containers graduated to Beta (feature flag
EphemeralContainers
is set totrue
by default now). This makes debugging pods quite easy. E.g.kubectl debug mypod -it --image=busybox
. This creates an interactive debugging session in podmypod
and immediately attach to it. - Skip Volume Ownership change graduates to GA: The feature to configure volume permission and ownership change policy for Pods moved to GA in 1.23. This allows users to skip recursive permission changes on mount and speeds up the pod start up time.
- PodSecurity graduates to Beta: PodSecurity replaces the deprecated
PodSecurityPolicy
.
Deprecation
A few interesting things I’ve found in Deprecation:
- kube-scheduler: The flags
--port
and--address
have no effect and will be removed in1.24
- kube-scheduler:
--authorization-kubeconfig
and--authentication-kubeconfig
MUST be specified and correctly set to get authentication/authorization working. Normally that means setting the value of this parameters to the same value you supply tokubeconfig
parameter inkube-scheduler.yaml
. This also requires--requestheader-client-ca-file
to be set. - liveness/readiness probes to K8s Controller Manager MUST use HTTPS now, and the default port has been changed to
10259
- Removed
kubectl --dry-run
empty default value and boolean values.kubectl --dry-run
usage must be specified with--dry-run=(server|client|none)
.
API changes
A few interesting API Changes also took place of course:
- Ephemeral containers graduated to beta and are now available by default. So something like
kubectl debug mypod -it --image=busybox
now works out of the box (in this case creates an interactive debugging session in podmypod
and immediately attach to it). Already mentioned above but still: Very handy :-) - In
kubelet
, log verbosity and flush frequency can also be configured via the configuration file and not just via command line flags. kube-apiserver
: Therbac.authorization.k8s.io/v1alpha1
API version is removed; use therbac.authorization.k8s.io/v1
API, available since v1.8. Thescheduling.k8s.io/v1alpha1
API version is removed; use thescheduling.k8s.io/v1
API, available since v1.14.kube-scheduler
: support for configuration file versionv1beta1
is removed. Update configuration files tov1beta2
BEFORE upgrading to 1.23. I already mentioned this in the introduction.- Removed deprecated
--seccomp-profile-root
/seccompProfileRoot
config.
Features
And finally the interesting Features:
- Added ability for
kubectl wait
to wait on arbitrary JSON path. - Adds new [alpha] command kubectl events
- Kubectl will now provide shell completion choices for the
--output/-o
flag - The
PodSecurity
admission plugin has graduated tobeta
and is enabled by default. The admission configuration version has been promoted topod-security.admission.config.k8s.io/v1beta1
. See Pod Security Admission for usage guidelines.
Docker/dockershim deprecation - last call! ;-)
Before I upgraded to Kubernetes 1.22
from 1.21
last time I switched the container runtime from Docker/dockershim
to containerd
. I already mentioned this last time but if you still run Docker/dockershim
you should really migrate NOW ;-) Kubernetes 1.24
won’t support that anymore! So I repeat it one last time: As you most probably know Docker/dockershim
is deprecated since K8s 1.20
and will be removed in 1.24
. So I took the opportunity while upgrading to the lastest 1.21
release (before upgrading to the latest 1.22
release) to remove Docker/dockershim
and replace it with containerd
and runc
. The whole process is documented in its own blog post: Kubernetes: Replace dockershim with containerd and runc. So I recommend to get rid of Docker/dockershim
with the upgrade to K8s 1.23
at latest! But the earlier you do it the less pressure you have at the end ;-) And TBH with containerd
in place now it looks like Pods are now starting a little bit fast then before ;-)
CSI
If you use CSI then also check the CSI Sidecar Containers documentation. Every sidecar container contains a matrix which version you need at a minimum, maximum and which version is recommend to use with whatever K8s version.
Nevertheless if your K8s update to v1.23
worked fine I would recommend to also update the CSI sidecar containers sooner or later because a) lots of changes happen ATM in this area and b) you might require the newer versions for the next K8s version anyways.
Update Kubernetes
In my case I needed to add three new settings for kube-scheduler
(those are already added to my Ansible kubernetes-controller role): authentication-kubeconfig
, authorization-kubeconfig
and requestheader-client-ca-file
needed to be added to k8s_scheduler_settings
(see K8s Deprecations 1.23. The value for the first two is basically the same as for kubeconfig
(which is the kube-scheduler.kubeconfig
file). For requestheader-client-ca-file
the value needs to be set to the same value as the already present root-ca-file
setting. It points to the file of the certificate authority which kube-apiserver
uses.
With the latest update from Kubernetes 1.22.5
to 1.22.6
I already changed kubescheduler.config.k8s.io/v1beta1
to kubescheduler.config.k8s.io/v1beta2
in kube-scheduler.yaml
. If you haven’t done so now it needs to be done.
Enable default seccomp profile
This is optional: Introduced in Kubernetes 1.22
as Alpha feature was Enable the use of RuntimeDefault as the default seccomp profile for all workloads. Normally I don’t enable Alpha features but as this feature increases workload security quite a bit I thought I’ll give it a try.
If enabled, the kubelet will use the RuntimeDefault
seccomp profile by default, which is defined by the container runtime, instead of using the Unconfined
(seccomp disabled) mode. The default profiles aim to provide a strong set of security defaults while preserving the functionality of the workload. It is possible that the default profiles differ between container runtimes and their release versions, for example when comparing those from CRI-O
and containerd
.
So when you used Docker
on your laptop e.g. you already used a default seccomp profile that comes with Docker
. But if you use Docker
, CRI-O
or containerd
with Kubernetes as container runtime those default seccomp profiles are disabled. That means a process in a container can basically execute every syscall that it wants to. But is there really a need for a container process to call the reboot
syscall to reboot the host? Most probably not ;-) So besides reboot
syscall there are quite a few more which the default seccomp profile of various container runtimes don’t allow to be called. Obviously that’s a good thing from a security point of view ;-) This causes at least a few attacks to fail and makes it more unlikely that an attacker can escape a container that he/she might have hacked e.g.
Before you enable this feature of course test it in a development environment or on one of your K8s nodes with your current workload. There might be a process that could fail if you enable that feature. But as most container runtimes uses quite a similar set of allowed syscalls as Docker
does, chances are pretty high that if your container image works with a local Docker container on your laptop that it will also run with containerd
or CRI-O
on a Kubernetes node. At the end Docker also uses containerd
in the background.
That said I’ve added --feature-gates=SeccompDefault=true
and --seccomp-default
flags to the kublet
process which runs on every worker node and which is responsible to start a container image on that node if the scheduler decided that the workload should be started there. This is actually not included in my Ansible Kubernetes Worker role by default as this is an Alpha feature. So in that case the variable k8s_worker_kubelet_settings
needs to be extended e.g.:
k8s_worker_kubelet_settings:
...
"feature-gates": "SeccompDefault=true"
"seccomp-default": ""
If you later have upgraded one node and the kubelet
there has been started with the above feature gate enabled you can test which syscalls are blocked now. So lets assume there is Kubernetes node that is called worker01
(kubectl get nodes
to get a list of all nodes) and that one is the node that has kubelet
with the feature gate enabled. We can now use amicontainerd container to find out what container runtime is being used as well as features available and what syscalls are blocked. E.g.
kubectl -n kube-system run -it --rm test --image=jess/amicontained --overrides='{"spec": { "nodeSelector": {"kubernetes.io/hostname": "worker01"}}}' -- sh
So the container image jess/amicontained
will be started as container test
in kube-system
namespace (because it’s normally available in every K8s cluster but you can use whatever namespace you want of course). And since we want to test the seccomp
enabled node we can specify an override to force the launch of the container on worker01
.
Running amicontained
command in the container should produce a output like this an a Ubuntu 20.04 host:
Container Runtime: not-found
Has Namespaces:
pid: true
user: false
AppArmor Profile: cri-containerd.apparmor.d (enforce)
Capabilities:
BOUNDING -> chown dac_override fowner fsetid kill setgid setuid setpcap net_bind_service net_raw sys_chroot mknod audit_write setfcap
Seccomp: filtering
Blocked Syscalls (61):
PTRACE SYSLOG SETPGID SETSID USELIB USTAT SYSFS VHANGUP PIVOT_ROOT _SYSCTL ACCT SETTIMEOFDAY MOUNT UMOUNT2 SWAPON SWAPOFF REBOOT SETHOSTNAME SETDOMAINNAME IOPL IOPERM CREATE_MODULE INIT_MODULE DELETE_MODULE GET_KERNEL_SYMS QUERY_MODULE QUOTACTL NFSSERVCTL GETPMSG PUTPMSG AFS_SYSCALL TUXCALL SECURITY LOOKUP_DCOOKIE CLOCK_SETTIME VSERVER MBIND SET_MEMPOLICY GET_MEMPOLICY KEXEC_LOAD ADD_KEY REQUEST_KEY KEYCTL MIGRATE_PAGES UNSHARE MOVE_PAGES PERF_EVENT_OPEN FANOTIFY_INIT NAME_TO_HANDLE_AT OPEN_BY_HANDLE_AT SETNS PROCESS_VM_READV PROCESS_VM_WRITEV KCMP FINIT_MODULE KEXEC_FILE_LOAD BPF USERFAULTFD PKEY_MPROTECT PKEY_ALLOC PKEY_FREE
Looking for Docker.sock
So lets compare this with worker02
which DOESN’T have the feature flag enabled:
Container Runtime: not-found
Has Namespaces:
pid: true
user: false
AppArmor Profile: cri-containerd.apparmor.d (enforce)
Capabilities:
BOUNDING -> chown dac_override fowner fsetid kill setgid setuid setpcap net_bind_service net_raw sys_chroot mknod audit_write setfcap
Seccomp: disabled
Blocked Syscalls (22):
MSGRCV SYSLOG SETPGID SETSID VHANGUP PIVOT_ROOT ACCT SETTIMEOFDAY SWAPON SWAPOFF REBOOT SETHOSTNAME SETDOMAINNAME INIT_MODULE DELETE_MODULE KEXEC_LOAD FANOTIFY_INIT OPEN_BY_HANDLE_AT FINIT_MODULE KEXEC_FILE_LOAD BPF USERFAULTFD
Looking for Docker.sock
So we’ve 61 vs 22 blocked syscalls now. That’s pretty nice 😄
What I can tell so far Tomcat, PostgreSQL, MySQL, Mattermost, Postfix, cert-manager, CoreDNS, Redis, Traefik, Apache and a few other Go and Python programs working without issue with the seccomp default profile
enabled.
But enough about seccomp default profile
… Now is the time to finally update the K8s controller and worker nodes to version 1.23.x
as described in Kubernetes the Not So Hard Way With Ansible - Upgrading Kubernetes.
That’s it for today! Happy upgrading! 😉