Kubernetes upgrade notes: 1.22.x to 1.23.x

If you used my Kubernetes the Not So Hard Way With Ansible blog posts to setup a Kubernetes (K8s) cluster this notes might be helpful for you (and maybe for others too that manage a K8s cluster on their own e.g.). I’ll only mention changes that might be relevant because they will either be interesting for most K8s administrators anyways (even in case they run a fully managed Kubernetes deployment) or if it’s relevant if you manage your own bare-metal/VM based on-prem Kubernetes deployment. I normally skip changes that are only relevant for GKE, AWS EKS, Azure or other cloud providers.

I’ve a general upgrade guide Kubernetes the Not So Hard Way With Ansible - Upgrading Kubernetes that worked quite well for me for the last past K8s upgrades. So please read that guide if you want to know HOW the components are updated. This post here is esp. for the 1.22.x to 1.23.x upgrade and WHAT was interesting for me.

First: As usual I don’t update a production system before the .2 release of a new major version is released. In my experience the .0 and .1 are just too buggy (and to be honest sometimes it’s even better to wait for the .5 release ;-) ). Of course it is important to test new releases already in development or integration systems and report bugs!

Second: I only upgrade from the latest version of the former major release. In my case I was running 1.22.5 and at the time writing this text 1.22.6 was the latest 1.22.x release. After reading the 1.22.x CHANGELOG to see if any important changes where made between 1.22.5 and 1.22.6 I don’t saw anything that prevented me updating and I don’t needed to change anything.
BUT this time the 1.23.x CHANGELOG mentions a change which can be deployed together with the upgrade to the latest 1.22.x version. For the kube-scheduler the support for configuration file version v1beta1 was removed. Update configuration files to v1beta2 BEFORE upgrading to 1.23. So for kube-scheduler.yaml apiVersion: kubescheduler.config.k8s.io/v1beta1 needs to be changed to apiVersion: kubescheduler.config.k8s.io/v1beta2

So I did the 1.22.5 to 1.22.6 upgrade first. If you use my Ansible roles that basically only means to change k8s_release variable from 1.22.5 to 1.22.6 and deploy the changes for the control plane and worker nodes as described in my upgrade guide.

After that everything still worked as expected so I continued with the next step.

Here are a few links that might be interesting regarding what’s new in regards to new features in Kubernetes 1.23:

Kubernetes 1.23 CHANGELOG
Kubernetes 1.23: The next Frontier
Kubernetes 1.23 – What’s new? - SysDig blog

Since K8s 1.14 there are also searchable release notes available. You can specify the K8s version and a K8s area/component (e.g. kublet, apiserver, …) and immediately get an overview what changed in that regard. Quite nice! :-)

As it is normally no problem to have a newer kubectl utility that is only one major version ahead of the server version I also updated kubectl from 1.22.x to 1.23.x using my kubectl Ansible role.

As always before a major upgrade read the Urgent Upgrade Notes! If you used my Ansible roles to install Kubernetes and used most of the default settings then there should be no need to adjust any settings. For K8s 1.23 release I actually couldn’t find any urgent notes that were relevant for my Ansible roles or my own on-prem setup. But nevertheless there are three notes that might be relevant for some people:

  • If you log messages in JSON format previously those messages are logged to stdout and now to stderr to not mess with the normal output. This may become relevant for command line tools like “kubectl`
  • Support for the seccomp annotations seccomp.security.alpha.kubernetes.io/pod and container.seccomp.security.alpha.kubernetes.io/[name] has been deprecated since 1.19, will be dropped in 1.25
  • kube-log-runner is included in release tar balls. It can be used to replace the deprecated --log-file parameter.

In What’s New (Major Themes) I’ve found the following highlights that looks most important to me:

  • IPv4/IPv6 Dual-stack Networking graduates to GA: The use of dual-stack networking is not mandatory. Although clusters are enabled to support dual-stack networking, Pods and Services continue to default to single-stack. To use dual-stack networking: Kubernetes nodes have routable IPv4/IPv6 network interfaces, a dual-stack capable CNI network plugin is used, Pods are configured to be dual-stack and Services have their .spec.ipFamilyPolicy field set to either PreferDualStack or RequireDualStack.
  • HorizontalPodAutoscaler v2 graduates to GA
  • Generic Ephemeral Volume feature graduates to GA
  • Ephemeral Containers graduated to Beta (feature flag EphemeralContainers is set to true by default now). This makes debugging pods quite easy. E.g. kubectl debug mypod -it --image=busybox. This creates an interactive debugging session in pod mypod and immediately attach to it.
  • Skip Volume Ownership change graduates to GA: The feature to configure volume permission and ownership change policy for Pods moved to GA in 1.23. This allows users to skip recursive permission changes on mount and speeds up the pod start up time.
  • PodSecurity graduates to Beta: PodSecurity replaces the deprecated PodSecurityPolicy.

A few interesting things I’ve found in Deprecation:

  • kube-scheduler: The flags --port and --address have no effect and will be removed in 1.24
  • kube-scheduler: --authorization-kubeconfig and --authentication-kubeconfig MUST be specified and correctly set to get authentication/authorization working. Normally that means setting the value of this parameters to the same value you supply to kubeconfig parameter in kube-scheduler.yaml. This also requires --requestheader-client-ca-file to be set.
  • liveness/readiness probes to K8s Controller Manager MUST use HTTPS now, and the default port has been changed to 10259
  • Removed kubectl --dry-run empty default value and boolean values. kubectl --dry-run usage must be specified with --dry-run=(server|client|none).

A few interesting API Changes also took place of course:

  • Ephemeral containers graduated to beta and are now available by default. So something like kubectl debug mypod -it --image=busybox now works out of the box (in this case creates an interactive debugging session in pod mypod and immediately attach to it). Already mentioned above but still: Very handy :-)
  • In kubelet, log verbosity and flush frequency can also be configured via the configuration file and not just via command line flags.
  • kube-apiserver: The rbac.authorization.k8s.io/v1alpha1 API version is removed; use the rbac.authorization.k8s.io/v1 API, available since v1.8. The scheduling.k8s.io/v1alpha1 API version is removed; use the scheduling.k8s.io/v1 API, available since v1.14.
  • kube-scheduler: support for configuration file version v1beta1 is removed. Update configuration files to v1beta2 BEFORE upgrading to 1.23. I already mentioned this in the introduction.
  • Removed deprecated --seccomp-profile-root/seccompProfileRoot config.

And finally the interesting Features:

  • Added ability for kubectl wait to wait on arbitrary JSON path.
  • Adds new [alpha] command kubectl events
  • Kubectl will now provide shell completion choices for the --output/-o flag
  • The PodSecurity admission plugin has graduated to beta and is enabled by default. The admission configuration version has been promoted to pod-security.admission.config.k8s.io/v1beta1. See Pod Security Admission for usage guidelines.

Before I upgraded to Kubernetes 1.22 from 1.21 last time I switched the container runtime from Docker/dockershim to containerd. I already mentioned this last time but if you still run Docker/dockershim you should really migrate NOW ;-) Kubernetes 1.24 won’t support that anymore! So I repeat it one last time: As you most probably know Docker/dockershim is deprecated since K8s 1.20 and will be removed in 1.24. So I took the opportunity while upgrading to the lastest 1.21 release (before upgrading to the latest 1.22 release) to remove Docker/dockershim and replace it with containerd and runc. The whole process is documented in its own blog post: Kubernetes: Replace dockershim with containerd and runc. So I recommend to get rid of Docker/dockershim with the upgrade to K8s 1.23 at latest! But the earlier you do it the less pressure you have at the end ;-) And TBH with containerd in place now it looks like Pods are now starting a little bit fast then before ;-)

If you use CSI then also check the CSI Sidecar Containers documentation. Every sidecar container contains a matrix which version you need at a minimum, maximum and which version is recommend to use with whatever K8s version.
Nevertheless if your K8s update to v1.23 worked fine I would recommend to also update the CSI sidecar containers sooner or later because a) lots of changes happen ATM in this area and b) you might require the newer versions for the next K8s version anyways.

In my case I needed to add three new settings for kube-scheduler (those are already added to my Ansible kubernetes-controller role): authentication-kubeconfig, authorization-kubeconfig and requestheader-client-ca-file needed to be added to k8s_scheduler_settings (see K8s Deprecations 1.23. The value for the first two is basically the same as for kubeconfig (which is the kube-scheduler.kubeconfig file). For requestheader-client-ca-file the value needs to be set to the same value as the already present root-ca-file setting. It points to the file of the certificate authority which kube-apiserver uses.

With the latest update from Kubernetes 1.22.5 to 1.22.6 I already changed kubescheduler.config.k8s.io/v1beta1 to kubescheduler.config.k8s.io/v1beta2 in kube-scheduler.yaml. If you haven’t done so now it needs to be done.

This is optional: Introduced in Kubernetes 1.22 as Alpha feature was Enable the use of RuntimeDefault as the default seccomp profile for all workloads. Normally I don’t enable Alpha features but as this feature increases workload security quite a bit I thought I’ll give it a try.
If enabled, the kubelet will use the RuntimeDefault seccomp profile by default, which is defined by the container runtime, instead of using the Unconfined (seccomp disabled) mode. The default profiles aim to provide a strong set of security defaults while preserving the functionality of the workload. It is possible that the default profiles differ between container runtimes and their release versions, for example when comparing those from CRI-O and containerd.
So when you used Docker on your laptop e.g. you already used a default seccomp profile that comes with Docker. But if you use Docker, CRI-O or containerd with Kubernetes as container runtime those default seccomp profiles are disabled. That means a process in a container can basically execute every syscall that it wants to. But is there really a need for a container process to call the reboot syscall to reboot the host? Most probably not ;-) So besides reboot syscall there are quite a few more which the default seccomp profile of various container runtimes don’t allow to be called. Obviously that’s a good thing from a security point of view ;-) This causes at least a few attacks to fail and makes it more unlikely that an attacker can escape a container that he/she might have hacked e.g. Before you enable this feature of course test it in a development environment or on one of your K8s nodes with your current workload. There might be a process that could fail if you enable that feature. But as most container runtimes uses quite a similar set of allowed syscalls as Docker does, chances are pretty high that if your container image works with a local Docker container on your laptop that it will also run with containerd or CRI-O on a Kubernetes node. At the end Docker also uses containerd in the background.
That said I’ve added --feature-gates=SeccompDefault=true and --seccomp-default flags to the kublet process which runs on every worker node and which is responsible to start a container image on that node if the scheduler decided that the workload should be started there. This is actually not included in my Ansible Kubernetes Worker role by default as this is an Alpha feature. So in that case the variable k8s_worker_kubelet_settings needs to be extended e.g.:

k8s_worker_kubelet_settings:
  ...
  "feature-gates": "SeccompDefault=true"
  "seccomp-default": ""

If you later have upgraded one node and the kubelet there has been started with the above feature gate enabled you can test which syscalls are blocked now. So lets assume there is Kubernetes node that is called worker01 (kubectl get nodes to get a list of all nodes) and that one is the node that has kubelet with the feature gate enabled. We can now use amicontainerd container to find out what container runtime is being used as well as features available and what syscalls are blocked. E.g.

kubectl -n kube-system run -it --rm test --image=jess/amicontained --overrides='{"spec": { "nodeSelector": {"kubernetes.io/hostname": "worker01"}}}' -- sh

So the container image jess/amicontained will be started as container test in kube-system namespace (because it’s normally available in every K8s cluster but you can use whatever namespace you want of course). And since we want to test the seccomp enabled node we can specify an override to force the launch of the container on worker01.

Running amicontained command in the container should produce a output like this an a Ubuntu 20.04 host:

Container Runtime: not-found
Has Namespaces:
        pid: true
        user: false
AppArmor Profile: cri-containerd.apparmor.d (enforce)
Capabilities:
        BOUNDING -> chown dac_override fowner fsetid kill setgid setuid setpcap net_bind_service net_raw sys_chroot mknod audit_write setfcap
Seccomp: filtering
Blocked Syscalls (61):
        PTRACE SYSLOG SETPGID SETSID USELIB USTAT SYSFS VHANGUP PIVOT_ROOT _SYSCTL ACCT SETTIMEOFDAY MOUNT UMOUNT2 SWAPON SWAPOFF REBOOT SETHOSTNAME SETDOMAINNAME IOPL IOPERM CREATE_MODULE INIT_MODULE DELETE_MODULE GET_KERNEL_SYMS QUERY_MODULE QUOTACTL NFSSERVCTL GETPMSG PUTPMSG AFS_SYSCALL TUXCALL SECURITY LOOKUP_DCOOKIE CLOCK_SETTIME VSERVER MBIND SET_MEMPOLICY GET_MEMPOLICY KEXEC_LOAD ADD_KEY REQUEST_KEY KEYCTL MIGRATE_PAGES UNSHARE MOVE_PAGES PERF_EVENT_OPEN FANOTIFY_INIT NAME_TO_HANDLE_AT OPEN_BY_HANDLE_AT SETNS PROCESS_VM_READV PROCESS_VM_WRITEV KCMP FINIT_MODULE KEXEC_FILE_LOAD BPF USERFAULTFD PKEY_MPROTECT PKEY_ALLOC PKEY_FREE
Looking for Docker.sock

So lets compare this with worker02 which DOESN’T have the feature flag enabled:

Container Runtime: not-found
Has Namespaces:
        pid: true
        user: false
AppArmor Profile: cri-containerd.apparmor.d (enforce)
Capabilities:
        BOUNDING -> chown dac_override fowner fsetid kill setgid setuid setpcap net_bind_service net_raw sys_chroot mknod audit_write setfcap
Seccomp: disabled
Blocked Syscalls (22):
        MSGRCV SYSLOG SETPGID SETSID VHANGUP PIVOT_ROOT ACCT SETTIMEOFDAY SWAPON SWAPOFF REBOOT SETHOSTNAME SETDOMAINNAME INIT_MODULE DELETE_MODULE KEXEC_LOAD FANOTIFY_INIT OPEN_BY_HANDLE_AT FINIT_MODULE KEXEC_FILE_LOAD BPF USERFAULTFD
Looking for Docker.sock

So we’ve 61 vs 22 blocked syscalls now. That’s pretty nice 😄

What I can tell so far Tomcat, PostgreSQL, MySQL, Mattermost, Postfix, cert-manager, CoreDNS, Redis, Traefik, Apache and a few other Go and Python programs working without issue with the seccomp default profile enabled.

But enough about seccomp default profile… Now is the time to finally update the K8s controller and worker nodes to version 1.23.x as described in Kubernetes the Not So Hard Way With Ansible - Upgrading Kubernetes.

That’s it for today! Happy upgrading! 😉