Upgrading an AKS cluster – how it works

I was talking to someone about the way AKS handles cluster upgrades, which is pretty easy to understand.  You have a cluster which is sized to N nodes, and you request an upgrade.  The first thing we do is add another node to the cluster.  We then, one by one take a node out of the cluster available pool before updating the version of Kubernetes, and then put it back into the pool.  This happens for every node except the last one, which we just trash and leave you with the extra node that was added at the beginning of the process.

I’ve included the “kubectl get nodes -o wide -w” log below from a recent upgrade from 1.12.4 to 1.12.6 so you can see what I mean, and the timings that are involved.

It took around 3 minutes for the new node aks-agentpool-24883706-2 (of the correct 1.12.6 K8S version) to be added to my cluster.  Then a subsequent 4 minutes before the node aks-agentpool-24883706-0 had been upgraded.

If you also look at the private IP Address range that gets allocated to aks-agentpool-24883706-2 you can see why the formula for sizing your virtual network is important.  See: https://gordon.byers.me/azure/networking-basics-in-the-azure-kubernetes-service/