Slaying the KuberPiDragon

By Gavin Davies 25 July 2019

An anti tutorial about me, a novice, building a Kubernetes cluster on Raspberry Pi by essentially catapulting myself headlong into a solid wall of incomprehension until the wall broke. Then I wondered why I did it.

Kubernetes is a beast - or so it seemed to me. A vast, sprawling, Lovecraftian horror of epic proportions - it can’t be comprehended by mortals, lest it break their minds. So, I started a new job a month ago. One of the technologies used in my new company is Kubernetes, I read Kubernetes Up and Running from cover to cover. Thus fortified, I embarked upon my journey to slay the beast.

I suppose this is the worst possible way to approach things. Hooray!

I faffed around with Minikube at first, but I remembered a former colleague telling me he’d build a Rapberry Pi (RPi) cluster and run K8s on it. Finally - a use case for the Pi? (like thousands of others, I bought a first gen pi, installed Linux on it, connected it to my TV and then couldn’t think of an application for it).

So I spent an amount of money that I’m not yet prepared to disclose to my wife, and purchased a bunch of Pis, cables, switch, all that physical world gubbins that I barely understand.

Day 1: The initial (partial) success

I followed this tutorial from Mofizur Rahman.

Now, one problem with tutorials, is they are at a point in time. You can write the best tutorial of all time, but dependencies change, incompatibilities creep in.

However, I did get it working. After about 4 hours’ effort I had a very basic cluster set up. At this point I got overconfident.

I started building my own containers for it - I didn’t realise that the Pi was ARM based, though, and that most existing containers weren’t built for that. I had to learn docker buildx to get my own “hello world” working on ARM architectures.

So, I powered everything down and went to bed secure in the knowledge that I was ready to start properly learning about K8s. Bonzer!

Day 2: It doesn’t work any more

I got home from work, anxious to learn K8s, and powered up my cluster.

Nothing worked.

NOTHING.

The logs on the master node were full of errors that I didn’t understand. I flailed helplessly around for a while, but I could not get it working.

I went to bed with a sadface, feeling defeated.

Day 3: bang head against wall

There are thousands of Kubernetes (k8s) tutorials out there. There are dozens of Kubernetes on the Raspberry Pi cluster.

I managed to bungle pretty much all of them.

I rebuilt my cluster a dozen times, re-etching the SD cards. I tried going down to Raspian Stretch, up to Buster. I tried Docker 18, Docker 19, various versions of K8s.

I read logs. I googled error messages. So many containers, so many logs, /var/log/syslog, all kinds of gubbins. Could be anything. What’s the root cause? WHY WON’T kubeadm init WORK?!

The inevitable meltdown:

IS there a root cause? Am I just rubbish at this? Maybe I’m too old. Maybe my experience counts for nothing. Maybe K8s has a super high frequency message that only under 25s can hear that tells you how to make it work. Maybe I’m a terrible father, a failure as a husband, maybe I shouldn’t be in the industry at all. What else could I do, though?! I ain’t good at anything else!

K8s wins.

Day 4: grinding for levels

Right, I ain’t going out like that.

I gave up on the tutorials and did it the hard way. I started learning about the order k8s boots in, what’s going on under the hood a bit. I don’t understand it all (who could?) but it was like I started to find some edge pieces in a 10 million piece jigsaw puzzle.

Some of the error messages came into focus - I started looking into the network overlay, how containers work, varous aspects of Docker and K8s, really trying to come at it from all angles.

Still unable to kubeadm init. Still can’t figure out which of the several dozen error messages is the root cause and which can safely be ignored.

I start thinking - it’s 30 degrees c in the UK right now. These Pis slow down a lot in the hot (or so I’ve heard) and they lack heatsinks. So, I order a bunch of heatsinks and go to bed defeated once more.

Day 5: BREAKTHROUGH!

I fitted my heatsinks, which made me relax a bit at least.

My thinking at this point was “k8s was starting but stuff timed out. After all, even when I had it working, it was mega mega slow. So, some service didn’t come up. Maybe also permissions or deps”.

I found a Github issue that talked about some problems in line with my suspicions. I tried breaking kubeadm init into steps, then intercepting the config files and jacking up the retries and timeouts.

IT WORKED!

Oh man. Really? It’s up? Are you serious? I have a cluster?

I added the workers to the cluster. No problems!!! Still everything slow but it’s up!

Setback!

I added my hello container as a replica set. It worked but only from the individual nodes - they could only talk to their own pods. The service - you could query it, but it seemed to randomly pick a pod to route the request to, and would time out if it didn’t happen to be a pod on the same node as the request came from.

I dug into K8s networking and finally figured out that something, possibly Weave, possibly Raspiian, had created an IPTables rule. I turned it off:

iptables -P FORWARD ACCEPT

Finally! I can query services! It’s working! IT’S WORKING! Dearest wife, come and look at this! Come see! “What is it?” It’s Kubernetes! I’ve defeated it “That’s nice, how much Red Bull have you had?”

What did I learn?

I don’t know, really. I mean, I have a working k8s cluster, and I can do things like pull cables out of some nodes and see the other ones compensate. But, was this a good way to learn? It’s not like it’s made me an expert or anything

Tutorials are amazing, but it’s seldom so simple as it seems. After all, you don’t know what you don’t know, and troubleshooting on something as huge and complex as Kubernetes with little knowledge is super hard!

Still, once you’ve slain a beast, you can afford a few seconds to stand in triumph before you hoist up your greatsword and trudge on to seek out the next slavering monster threatening to render you obsolete.

What worked for me?

I built a cluster with 3 nodes with DHCP hard set IPs:

Node	Role	IP
red	master	`192.168.0.102`
green	worker	`192.168.0.101`
blue	worker	`192.168.0.100`

My 'cluster'l

For each node:

####### SET THESE VARIABLES SOF #######
HOSTNAME=green
####### SET THESE VARIABLES EOF #######

# Install some tools
sudo apt-get -y install vim mlocate

# Host name
sudo hostnamectl --transient set-hostname $HOSTNAME
sudo hostnamectl --static set-hostname $HOSTNAME
sudo hostnamectl --pretty set-hostname $HOSTNAME
sudo sed -i "s/raspberrypi/$HOSTNAME/"  /etc/hosts

# Configure cgroup gubbins - NOT idempotent
sudo sh -c 'echo "`cat /boot/cmdline.txt` cgroup_enable=cpuset cgroup_memory=1 cgroup_enable=memory" > /boot/cmdline.txt'

# Disable swap
sudo dphys-swapfile swapoff
sudo dphys-swapfile uninstall
sudo update-rc.d dphys-swapfile remove
sudo apt purge -y dphys-swapfile
sudo swapoff -a

# SSH only by key
mkdir -p ~/.ssh
echo 'ssh-rsa < my public key >' > ~/.ssh/authorized_keys
sudo sed -i  "s/^#PasswordAuthentication yes$/PasswordAuthentication no/" /etc/ssh/sshd_config

# Reboot Now
sudo reboot now

# Install Docker on Buster. get-docker script failed for me - I did fix it but this is simpler
# https://blog.alexellis.io/how-to-fix-docker-for-raspbian-buster/
wget https://download.docker.com/linux/debian/dists/buster/pool/stable/armhf/containerd.io_1.2.6-3_armhf.deb
wget https://download.docker.com/linux/debian/dists/buster/pool/stable/armhf/docker-ce-cli_18.09.7~3-0~debian-buster_armhf.deb
wget https://download.docker.com/linux/debian/dists/buster/pool/stable/armhf/docker-ce_18.09.7~3-0~debian-buster_armhf.deb
sudo dpkg -i containerd.io_1.2.6-3_armhf.deb
sudo dpkg -i docker-ce-cli_18.09.7~3-0~debian-buster_armhf.deb
sudo dpkg -i docker-ce_18.09.7~3-0~debian-buster_armhf.deb
sudo usermod pi -aG docker && newgrp docker

sudo reboot now

# Configures k8s
sudo sh -c "echo 'deb http://apt.kubernetes.io/ kubernetes-xenial main' > /etc/apt/sources.list.d/kubernetes.list"
sudo sh -c "curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -"
sudo apt-get update
sudo apt-get install -y kubeadm

Master node only

sudo kubeadm init phase certs all
sudo kubeadm init phase kubeconfig all
sudo kubeadm init phase control-plane all --pod-network-cidr 192.168.0.0/24
# Here's the bit that fixed everything for me
sudo sed -i 's/initialDelaySeconds: [0-9][0-9]/initialDelaySeconds: 240/g' /etc/kubernetes/manifests/kube-apiserver.yaml
sudo sed -i 's/failureThreshold: [0-9]/failureThreshold: 18/g'             /etc/kubernetes/manifests/kube-apiserver.yaml
sudo sed -i 's/timeoutSeconds: [0-9][0-9]/timeoutSeconds: 40/g'            /etc/kubernetes/manifests/kube-apiserver.yaml
sudo kubeadm init --v=1 --skip-phases=certs,kubeconfig,control-plane --ignore-preflight-errors=all --pod-network-cidr 192.168.0.0/24  --token-ttl=0

mkdir -p $HOME/.kube ; sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config ; sudo chown $(id -u):$(id -g) $HOME/.kube/config

# WeaveNet network overlay
kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"

Now add the workers to the cluster

sudo kubeadm join 192.168.0.102:6443 --token <token> \
    --discovery-token-ca-cert-hash sha256:<token>

Hardest reset of master I could manage

For tearing down everything on master, here’s a little script:

sudo kubeadm reset -f
sudo rm -rf /home/pi/.kube
sudo rm -rf /etc/kubernetes
sudo sh -c 'iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X'