Kubernetes is a beast - or so it seemed to me. A vast, sprawling, Lovecraftian horror of epic proportions - it can’t be comprehended by mortals, lest it break their minds. So, I started a new job a month ago. One of the technologies used in my new company is Kubernetes, I read Kubernetes Up and Running from cover to cover. Thus fortified, I embarked upon my journey to slay the beast.
I suppose this is the worst possible way to approach things. Hooray!
I faffed around with Minikube at first, but I remembered a former colleague telling me he’d build a Rapberry Pi (RPi) cluster and run K8s on it. Finally - a use case for the Pi? (like thousands of others, I bought a first gen pi, installed Linux on it, connected it to my TV and then couldn’t think of an application for it).
So I spent an amount of money that I’m not yet prepared to disclose to my wife, and purchased a bunch of Pis, cables, switch, all that physical world gubbins that I barely understand.
Day 1: The initial (partial) success
I followed this tutorial from Mofizur Rahman.
Now, one problem with tutorials, is they are at a point in time. You can write the best tutorial of all time, but dependencies change, incompatibilities creep in.
However, I did get it working. After about 4 hours’ effort I had a very basic cluster set up. At this point I got overconfident.
I started building my own containers for it - I didn’t realise that the Pi was ARM based, though, and that most existing containers weren’t built for that. I had to learn docker buildx to get my own “hello world” working on ARM architectures.
So, I powered everything down and went to bed secure in the knowledge that I was ready to start properly learning about K8s. Bonzer!
Day 2: It doesn’t work any more
I got home from work, anxious to learn K8s, and powered up my cluster.
The logs on the master node were full of errors that I didn’t understand. I flailed helplessly around for a while, but I could not get it working.
I went to bed with a sadface, feeling defeated.
Day 3: bang head against wall
There are thousands of Kubernetes (k8s) tutorials out there. There are dozens of Kubernetes on the Raspberry Pi cluster.
I managed to bungle pretty much all of them.
I rebuilt my cluster a dozen times, re-etching the SD cards. I tried going down to Raspian Stretch, up to Buster. I tried Docker 18, Docker 19, various versions of K8s.
I read logs. I googled error messages. So many containers, so many logs, /var/log/syslog, all kinds of gubbins. Could be anything. What’s the root cause? WHY WON’T
kubeadm init WORK?!
The inevitable meltdown:
IS there a root cause? Am I just rubbish at this? Maybe I’m too old. Maybe my experience counts for nothing. Maybe K8s has a super high frequency message that only under 25s can hear that tells you how to make it work. Maybe I’m a terrible father, a failure as a husband, maybe I shouldn’t be in the industry at all. What else could I do, though?! I ain’t good at anything else!
Day 4: grinding for levels
Right, I ain’t going out like that.
I gave up on the tutorials and did it the hard way. I started learning about the order k8s boots in, what’s going on under the hood a bit. I don’t understand it all (who could?) but it was like I started to find some edge pieces in a 10 million piece jigsaw puzzle.
Some of the error messages came into focus - I started looking into the network overlay, how containers work, varous aspects of Docker and K8s, really trying to come at it from all angles.
Still unable to
kubeadm init. Still can’t figure out which of the several dozen error messages is the root cause and which can safely be ignored.
I start thinking - it’s 30 degrees c in the UK right now. These Pis slow down a lot in the hot (or so I’ve heard) and they lack heatsinks. So, I order a bunch of heatsinks and go to bed defeated once more.
Day 5: BREAKTHROUGH!
I fitted my heatsinks, which made me relax a bit at least.
My thinking at this point was “k8s was starting but stuff timed out. After all, even when I had it working, it was mega mega slow. So, some service didn’t come up. Maybe also permissions or deps”.
I found a Github issue that talked about some problems in line with my suspicions. I tried breaking kubeadm init into steps, then intercepting the config files and jacking up the retries and timeouts.
Oh man. Really? It’s up? Are you serious? I have a cluster?
I added the workers to the cluster. No problems!!! Still everything slow but it’s up!
I added my hello container as a replica set. It worked but only from the individual nodes - they could only talk to their own pods. The service - you could query it, but it seemed to randomly pick a pod to route the request to, and would time out if it didn’t happen to be a pod on the same node as the request came from.
I dug into K8s networking and finally figured out that something, possibly Weave, possibly Raspiian, had created an IPTables rule. I turned it off:
iptables -P FORWARD ACCEPT
Finally! I can query services! It’s working! IT’S WORKING! Dearest wife, come and look at this! Come see! “What is it?” It’s Kubernetes! I’ve defeated it “That’s nice, how much Red Bull have you had?”
What did I learn?
I don’t know, really. I mean, I have a working k8s cluster, and I can do things like pull cables out of some nodes and see the other ones compensate. But, was this a good way to learn? It’s not like it’s made me an expert or anything
Tutorials are amazing, but it’s seldom so simple as it seems. After all, you don’t know what you don’t know, and troubleshooting on something as huge and complex as Kubernetes with little knowledge is super hard!
Still, once you’ve slain a beast, you can afford a few seconds to stand in triumph before you hoist up your greatsword and trudge on to seek out the next slavering monster threatening to render you obsolete.
What worked for me?
I built a cluster with 3 nodes with DHCP hard set IPs:
For each node:
####### SET THESE VARIABLES SOF ####### HOSTNAME=green ####### SET THESE VARIABLES EOF ####### # Install some tools sudo apt-get -y install vim mlocate # Host name sudo hostnamectl --transient set-hostname $HOSTNAME sudo hostnamectl --static set-hostname $HOSTNAME sudo hostnamectl --pretty set-hostname $HOSTNAME sudo sed -i "s/raspberrypi/$HOSTNAME/" /etc/hosts # Configure cgroup gubbins - NOT idempotent sudo sh -c 'echo "`cat /boot/cmdline.txt` cgroup_enable=cpuset cgroup_memory=1 cgroup_enable=memory" > /boot/cmdline.txt' # Disable swap sudo dphys-swapfile swapoff sudo dphys-swapfile uninstall sudo update-rc.d dphys-swapfile remove sudo apt purge -y dphys-swapfile sudo swapoff -a # SSH only by key mkdir -p ~/.ssh echo 'ssh-rsa < my public key >' > ~/.ssh/authorized_keys sudo sed -i "s/^#PasswordAuthentication yes$/PasswordAuthentication no/" /etc/ssh/sshd_config # Reboot Now sudo reboot now # Install Docker on Buster. get-docker script failed for me - I did fix it but this is simpler # https://blog.alexellis.io/how-to-fix-docker-for-raspbian-buster/ wget https://download.docker.com/linux/debian/dists/buster/pool/stable/armhf/containerd.io_1.2.6-3_armhf.deb wget https://download.docker.com/linux/debian/dists/buster/pool/stable/armhf/docker-ce-cli_18.09.7~3-0~debian-buster_armhf.deb wget https://download.docker.com/linux/debian/dists/buster/pool/stable/armhf/docker-ce_18.09.7~3-0~debian-buster_armhf.deb sudo dpkg -i containerd.io_1.2.6-3_armhf.deb sudo dpkg -i docker-ce-cli_18.09.7~3-0~debian-buster_armhf.deb sudo dpkg -i docker-ce_18.09.7~3-0~debian-buster_armhf.deb sudo usermod pi -aG docker && newgrp docker sudo reboot now # Configures k8s sudo sh -c "echo 'deb http://apt.kubernetes.io/ kubernetes-xenial main' > /etc/apt/sources.list.d/kubernetes.list" sudo sh -c "curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -" sudo apt-get update sudo apt-get install -y kubeadm
Master node only
sudo kubeadm init phase certs all sudo kubeadm init phase kubeconfig all sudo kubeadm init phase control-plane all --pod-network-cidr 192.168.0.0/24 # Here's the bit that fixed everything for me sudo sed -i 's/initialDelaySeconds: [0-9][0-9]/initialDelaySeconds: 240/g' /etc/kubernetes/manifests/kube-apiserver.yaml sudo sed -i 's/failureThreshold: [0-9]/failureThreshold: 18/g' /etc/kubernetes/manifests/kube-apiserver.yaml sudo sed -i 's/timeoutSeconds: [0-9][0-9]/timeoutSeconds: 40/g' /etc/kubernetes/manifests/kube-apiserver.yaml sudo kubeadm init --v=1 --skip-phases=certs,kubeconfig,control-plane --ignore-preflight-errors=all --pod-network-cidr 192.168.0.0/24 --token-ttl=0 mkdir -p $HOME/.kube ; sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config ; sudo chown $(id -u):$(id -g) $HOME/.kube/config # WeaveNet network overlay kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"
Now add the workers to the cluster
sudo kubeadm join 192.168.0.102:6443 --token <token> \ --discovery-token-ca-cert-hash sha256:<token>
Hardest reset of master I could manage
For tearing down everything on master, here’s a little script:
sudo kubeadm reset -f sudo rm -rf /home/pi/.kube sudo rm -rf /etc/kubernetes sudo sh -c 'iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X'