Kubernetes the not so hard way with Ansible (at Scaleway) - Part 7 - The worker [updated for Kubernetes v1.9]

Installing Flannel,Docker,kube-apiserver,kube-controller-manager,kube-scheduler and kube-dns

February 20, 2017

CHANGELOG

2018-01-06

  • update to Kubernetes v1.9.1
  • kubedns service template updated for Kubernetes 1.9
  • change defaults for k8s_ca_conf_directory and k8s_config_directory
  • introduce flexible parameter settings for kubelet via `k8s_worker_kubelet_settings/k8s_worker_kubelet_settings_user’ variables
  • introduce flexible parameter settings for kube-proxy via k8s_worker_kubeproxy_settings/k8s_worker_kubeproxy_settings_user
  • add kube-proxy healthz-bind-address setting

2018-01-03

  • k8s_cluster_dns and k8s_cluster_domain variables are gone. Values for clusterIP (the Kubernetes cluster IP for kube-dns) is now hardcoded to 10.32.0.254. Same is true for k8s_cluster_domain which was hardcoded to cluster.local. User needs to review kubedns/templates/kubedns.yaml.j2 template and adjust accordingly if the default values are not used.

2017-11-19

  • update to flannel 0.9.1

2017-10-10

  • update to flannel 0.9.0
  • flanneld config now uses VXLAN backend by default
  • add –healthz-ip and –healthz-port options to flanneld systemd service file
  • removed alsologtostderr option from systemd service file
  • use variable for flannel subnet directory
  • update CNI plugin to 0.6.0
  • variable local_cert_dir changed to k8s_ca_conf_directory / added k8s_ca_conf_directory
  • Docker update to 17.03.2-ce
  • added –masquerade-all to kube-proxy settings to avoid DNS problems
  • added healthz-bind-address and healthz-port option to kube-apiserver
  • added task to install several needed network packages
  • added missing default variable k8s_controller_manager_cluster_cidr
  • changed variable k8s_download_dir to k8s_worker_download_dir
  • a few fixes in the role
  • rename local_cert_dir -> k8s_ca_conf_directory
  • rename k8s_cni_plugins -> k8s_cni_plugin_version
  • removed k8s_kubelet_token as we now use RBAC (RBAC everywhere ;-) )

This post is based on Kelsey Hightower’s Kubernetes The Hard Way - Bootstrapping Kubernetes Workers.

To allow easy communication between the hosts and their services (etcd, API server, kubelet, …) we installed PeerVPN . This gives us some kind of a unified and secure network for our Kubernetes hosts (like a AWS VPC or Google Cloud Engine VPC). Now we need the same for our pods we want to run in our cluster (let’s call it the pod network). For this we use flannel. flannel is a network fabric for containers, designed for Kubernetes.

First we need a big IP range for that. The default value in my flannel role ansible-role-flanneld is 10.200.0.0/16 (flannel_ip_range). This range is stored in etcd. Flannel will use a /24 subnet for every host where flanneld runs on out of that big IP range we configured. Further every pod on a worker node will get a IP address out of the /24 subnet which flannel uses for a specific host. On the flannel site you can see a diagram that shows pretty good how this works.

As already mentioned I created a role for installing flannel: ansible-role-flanneld. Install the role via

ansible-galaxy install githubixx.flanneld

The role has the following default settings:

# The interface on which the K8s services should listen on. As all cluster
# communication should use the PeerVPN interface the interface name is
# normally "tap0" or "peervpn0".
k8s_interface: "tap0"
# The directory to store the K8s certificates and other configuration
k8s_conf_dir: "/var/lib/kubernetes"
# CNI network plugin settings
k8s_cni_conf_dir: "/etc/cni/net.d"
# The directory from where to copy the K8s certificates. By default this
# will expand to user's LOCAL $HOME (the user that run's "ansible-playbook ..."
# plus "/k8s/certs". That means if the user's $HOME directory is e.g.
# "/home/da_user" then "k8s_ca_conf_directory" will have a value of
# "/home/da_user/k8s/certs".
k8s_ca_conf_directory: "{{ '~/k8s/certs' | expanduser }}"

etcd_conf_dir: "/etc/etcd"
etcd_bin_dir: "/usr/local/bin"
etcd_client_port: 2379
etcd_certificates:
  - ca-etcd.pem
  - ca-etcd-key.pem
  - cert-etcd.pem
  - cert-etcd-key.pem

flannel_version: "v0.9.1"
flannel_etcd_prefix: "/kubernetes-cluster/network"
flannel_ip_range: "10.200.0.0/16"
flannel_backend_type: "vxlan"
flannel_cni_name: "podnet"
flannel_subnet_file_dir: "/run/flannel"
flannel_options_dir: "/etc/flannel"
flannel_bin_dir: "/usr/local/sbin"
flannel_ip_masq: "true"
flannel_cni_conf_file: "10-flannel"
flannel_healthz_ip: "0.0.0.0"
flannel_healthz_port: "0" # 0 = disable

Basically there should be no need to change any of the settings if you used mostly the default settings of my other roles so far. Maybe there’re two settings which you may choose to change. flannel_etcd_prefix is the path in etcd where flannel will store it’s config object. So with the default above the whole path to the flannel config object in etcd would be /kubernetes-cluster/network/config. Next flannel_ip_range is the big IP range I mentioned above. Don’t make it too small! For every host flannel will choose a /24 subnet out of this range.

Next we extend our k8s.yml playbook file and add the role e.g.:

-
  hosts: k8s:children
  roles:
    -
      role: githubixx.flanneld
      tags: role-kubernetes-flanneld

As you can see flanneld will be installed on all nodes (group k8s:children includes controller, worker and etcd in my case). I decided to do so because I’ll have Docker running on every host so it makes sense to have one unified network setup for all Docker daemones. Be aware that flanneld needs to run BEFORE Docker! Now you can apply the role to all specifed hosts:

ansible-playbook --tags=role-kubernetes-flanneld k8s.yml

Now we need to install Docker on all of our nodes (you don’t need Docker on the etcd hosts if you used separate nodes for etcd and controller). You can use whatever Ansible Docker playbook you want to install Docker (you should find quite a few out there ;-) ). I created my own because I wanted to use the official Docker binaries archive, overlay FS storage driver and a custom systemd unit file. Be aware that you need to set a few options to make Docker work with flannel overlay network. Also we use Docker 17.03.2-ce. At time of writing this is the latest Docker version which is supported by Kubernetes v1.9. If you want to use my Docker playbook you can install it via

ansible-galaxy install githubixx.docker

The playbook has the following default variables:

docker_download_dir: "/opt/tmp"

docker_version: "17.03.2-ce"
docker_user: "docker"
docker_group: "docker"
docker_uid: 666
docker_gid: 666
docker_bin_dir: "/usr/local/bin"
docker_storage_driver: "overlay"
docker_log_level: "error"
docker_iptables: "false"
docker_ip_masq: "false"
docker_bip: ""
docker_mtu: 1472

There should be no need to change any of this default values besides maybe docker_storage_driver. If you don’t use my Docker role pay attention to set at least the last four default settings mentioned above correct. As usual place the variables in group_vars/k8s.yml if you want to change variables. Add the role to our playbook file k8s.yml e.g.:

-
  hosts: k8s:children
  roles:
    -
      role: githubixx.docker
      tags: role-docker

A word about docker_storage_driver: Since we use Ubuntu 16.04 at Scaleway we should have already a very recent kernel running (at time of writing this blog post it was kernel 4.14.x on my VPS instance). It makes sense to use a recent kernel for Docker in general. I recommend to use a kernel >=4.14.x if possible. Verify that you have overlayfs filesystem available on your worker instances (execute cat /proc/filesystems | grep overlay. If you see an output you should be fine). If the kernel module isn’t compiled into the kernel you can normally load it via modprobe -v overlay (-v gives us a little bit more information). We’ll configure Docker to use overlayfs by default because it’s one of the best choises (Docker 1.13.x started to use overlayfs by default if available). But you can change the storage driver via the docker_storage_driver variable if you like. Again: Use kernel >=4.14.x if possible!

Now you can roll out Docker role on all nodes using

ansible-playbook --tags=role-docker k8s.yml

In part 6 we installed Kubernetes API server, Scheduler and Controller manager on the controller nodes. For the worker I’ve also prepared a Ansible role which installes Kubernetes worker. The Kubernetes part of a worker node needs a kubelet and a kube-proxy daemon. The worker do the “real” work. They run the pods and the Docker container. So in production and if you do real work it won’t hurt if you choose bigger iron for the worker hosts ;-) The kubelet is responsible to create a pod/container on a worker node if the scheduler had choosen that node to run a pod on. The kube-proxy cares about routes. E.g. if a pod or a service was added kube-proxy takes care to update routing rules in iptables accordingly.

The worker depend on the infrastructure we installed in part 6. The role uses the following variables:

# The directory to store the K8s certificates and other configuration
k8s_conf_dir: "/var/lib/kubernetes"
# The directory to store the K8s binaries
k8s_bin_dir: "/usr/local/bin"
# K8s release
k8s_release: "1.9.1"
# The interface on which the K8s services should listen on. As all cluster
# communication should use the PeerVPN interface the interface name is
# normally "tap0" or "peervpn0".
k8s_interface: "tap0"

# The directory from where to copy the K8s certificates. By default this
# will expand to user's LOCAL $HOME (the user that run's "ansible-playbook ..."
# plus "/k8s/certs". That means if the user's $HOME directory is e.g.
# "/home/da_user" then "k8s_ca_conf_directory" will have a value of
# "/home/da_user/k8s/certs".
k8s_ca_conf_directory: "{{ '~/k8s/certs' | expanduser }}"
# Directory where kubeconfig for Kubernetes worker nodes and kube-proxy
# is stored among other configuration files. Same variable expansion
# rule applies as with "k8s_ca_conf_directory"
k8s_config_directory: "{{ '~/k8s/configs' | expanduser }}"

# K8s worker binaries to download
k8s_worker_binaries:
  - kube-proxy
  - kubelet
  - kubectl

# Certificate/CA files for API server and kube-proxy
k8s_worker_certificates:
  - ca-k8s-apiserver.pem
  - ca-k8s-apiserver-key.pem
  - cert-k8s-apiserver.pem
  - cert-k8s-apiserver-key.pem
  - cert-kube-proxy.pem
  - cert-kube-proxy-key.pem

# Download directory for archive files
k8s_worker_download_dir: "/opt/tmp"

# Directory to store kubelet configuration
k8s_worker_kubelet_conf_dir: "/var/lib/kubelet"

# kubelet settings
k8s_worker_kubelet_settings:
  "address": "{{hostvars[inventory_hostname]['ansible_' + k8s_interface].ipv4.address}}"
  "node-ip": "{{hostvars[inventory_hostname]['ansible_' + k8s_interface].ipv4.address}}"
  "allow-privileged": "true"
  "cluster-domain": "cluster.local"
  "cluster-dns": "10.32.0.254"
  "container-runtime": "docker"
  "docker": "unix:///var/run/docker.sock"
  "enable-custom-metrics": "true"
  "image-pull-progress-deadline": "2m"
  "kubeconfig": "{{k8s_worker_kubelet_conf_dir}}/kubeconfig"
  "register-node": "true"
  "runtime-request-timeout": "10m"
  "tls-cert-file": "{{k8s_conf_dir}}/cert-k8s-apiserver.pem"
  "tls-private-key-file": "{{k8s_conf_dir}}/cert-k8s-apiserver-key.pem"
  "serialize-image-pulls": "false"
  "cadvisor-port": "4194" # port or "0" to disable
  "healthz-bind-address": "{{hostvars[inventory_hostname]['ansible_' + k8s_interface].ipv4.address}}"
  "healthz-port": "10248"
  "cloud-provider": ""
  "network-plugin": "cni"
  "cni-conf-dir": "{{k8s_cni_conf_dir}}"
  "cni-bin-dir": "{{k8s_cni_bin_dir}}"
  "fail-swap-on": "false"

# Directroy to store kube-proxy configuration
k8s_worker_kubeproxy_conf_dir: "/var/lib/kube-proxy"

# kube-proxy settings
k8s_worker_kubeproxy_settings:
  "bind-address": "{{hostvars[inventory_hostname]['ansible_' + k8s_interface].ipv4.address}}"
  "healthz-bind-address": "{{hostvars[inventory_hostname]['ansible_' + k8s_interface].ipv4.address}}"
  "proxy-mode": "iptables"
  "cluster-cidr": "10.200.0.0/16"
  "masquerade-all": "true"
  "kubeconfig": "{{k8s_worker_kubeproxy_conf_dir}}/kubeconfig"

# CNI network plugin settings
k8s_cni_dir: "/opt/cni"
k8s_cni_bin_dir: "{{k8s_cni_dir}}/bin"
k8s_cni_conf_dir: "/etc/cni/net.d"
k8s_cni_plugin_version: "0.6.0"

The role will search for the certificates we created in part 4 in the directory you specify in k8s_ca_conf_directory on the host you run Ansible. The files used here are listed in k8s_certificates. The Kubernetes worker binaries we need are listed in k8s_worker_binaries. The Kubelet can use CNI (the Container Network Interface) to manage machine level networking requirements. The CNI archive that we want do download is specified in k8s_cni_plugin_version and will be placed in k8s_cni_dir.

If you created a different PeerVPN interface (e.g. peervpn0) change k8s_interface.

Now add an entry for your worker hosts into Ansible’s hosts file e.g.:

[k8s_worker]
worker[1:3].your.tld

Install the role via

ansible-galaxy install githubixx.kubernetes-worker

Next add the role to k8s.yml file e.g.:

  hosts: k8s_worker
  roles:
    -
      role: githubixx.kubernetes-worker
      tags: role-kubernetes-worker

Run the playbook via

ansible-playbook --tags=role-kubernetes-worker k8s.yml

Now that we’ve installed basically everything needed for running pods,deployments,services,… we should be able to do a sample deployment. On your laptop run:

kubectl run my-nginx --image=nginx --replicas=4 --port=80

This will deploy 4 pods running nginx. To get a overview what’s running e.g. pods,services,deployments,… run:

kubectl get all -o wide

NAME              DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE       CONTAINERS   IMAGES    SELECTOR
deploy/my-nginx   4         4         4            4           1m        my-nginx     nginx     run=my-nginx

NAME                    DESIRED   CURRENT   READY     AGE       CONTAINERS   IMAGES    SELECTOR
rs/my-nginx-5d69b5ff7   4         4         4         1m        my-nginx     nginx     pod-template-hash=182561993,run=my-nginx

NAME                          READY     STATUS    RESTARTS   AGE       IP            NODE
po/my-nginx-5d69b5ff7-66jgk   1/1       Running   0          1m        10.200.25.2   k8s-worker2
po/my-nginx-5d69b5ff7-kphsd   1/1       Running   0          1m        10.200.5.2    k8s-worker1
po/my-nginx-5d69b5ff7-mwcb6   1/1       Running   0          1m        10.200.5.3    k8s-worker1
po/my-nginx-5d69b5ff7-w888j   1/1       Running   0          1m        10.200.25.3   k8s-worker2

You should be also able to run curl on every master and controller node to get the default page from one of the four nginx webservers. In the case above curl http://10.200.25.2 should work on all nodes in our cluster (flanneld magic ;-) ).

You can output the worker internal IPs and the pod CIDR’s that was assigned to that host with:

kubectl get nodes --output=jsonpath='{range .items[*]}{.status.addresses[?(@.type=="InternalIP")].address} {.spec.podCIDR} {"\n"}{end}'

10.3.0.211 10.200.0.0/24 
10.3.0.212 10.200.1.0/24 

The IP adress 10.3.0.211/212 are adressess I assigned to the PeerVPN interface to worker1/2. That’s important since all communication should travel though the PeerVPN interfaces.

If you just want to see if the worker nodes are ready use:

kubectl get nodes -o wide

NAME          STATUS    ROLES     AGE       VERSION   EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION          CONTAINER-RUNTIME
k8s-worker1   Ready     <none>    44d       v1.9.1    <none>        Ubuntu 16.04.3 LTS   4.14.11-mainline-rev1   docker://17.3.2
k8s-worker2   Ready     <none>    44d       v1.9.1    <none>        Ubuntu 16.04.3 LTS   4.14.11-mainline-rev1   docker://17.3.2

Now finally we install KubeDNS to enable services in our K8s cluster to do DNS lookups of internal services (service discovery) in a predictable way. If you cloned the ansible-kubernetes-playbooks repository already you find a kubedns directory in there with a playbook file called kubedns.yml. So change directory to kubedns and adjust templates/kubedns.yaml.j2. There’re basically only two settings you need to adjust if you don’t have used my default settings: Search for clusterIP and for cluster.local and change accordingly. You should find the following entries:

clusterIP: 10.32.0.254
--domain=cluster.local
--probe=kubedns,127.0.0.1:10053,kubernetes.default.svc.cluster.local,5,A
--probe=dnsmasq,127.0.0.1:53,kubernetes.default.svc.cluster.local,5,A

Now run

ansible-playbook kubedns.yml

to roll out KubeDNS deployment. If you run

kubectl get pods -l k8s-app=kube-dns -n kube-system -o wide

you should see something like this:

NAME                        READY     STATUS    RESTARTS   AGE       IP           NODE
kube-dns-6c857864fb-bp2kx   3/3       Running   0          32m       10.200.7.9   k8s-worker2

There’re a lot more things that could/should be done now but running Heptio Sonobuoy could be a good next step. Heptio Sonobuoy is a diagnostic tool that makes it easier to understand the state of a Kubernetes cluster by running a set of Kubernetes conformance tests in an accessible and non-destructive manner.

Next up: Kubernetes the Not So Hard Way With Ansible (at Scaleway) - Part 8 - Ingress with Traefik