Monitoring API’s and Web Apps that run inside your Virtual Network with PaaS

I was with a customer last week who has a cluster full of API’s.  Literally.  

What they wanted was a way of regularly hitting some of the API’s and evaluating if they’re working.  Eg.  Is there a 200 status code, if certain data was posted into the API then does the response match the expected result.  This is all pretty easy when the API is public facing, (see my status cake post).  However when the API’s are internally facing inside a virtual network it’s a little harder. 

One option would be to deploy a monitoring service inside the cluster that tests itself, but having a resource monitor itself has never been a good idea.  Another way would be to use a very IaaS method and stand up a VM (or two) to monitor from.  It’s separate from the cluster, but doesn’t feel like a particularly good answer as you now have to feed and water (patch and monitor) the VMs.

I’m a massive fan of Logic Apps for these kind of workflows, but Logic Apps don’t connect into your Virtual Network.  You could use API Management or an ISE (Integration Service Environment: https://docs.microsoft.com/en-us/azure/logic-apps/connect-virtual-network-vnet-isolated-environment-overview) to gain access to the Virtual Network, but for a single ping monitor it’s hard to justify overall costs of these services if they are just supporting this single scenario. 

The solution I think makes the most sense, is to code an Azure Function which you host inside an App Service Plan.  The App Service plan has the capability to connect into a Virtual Network without actually being deployed inside it.  It makes use of a point to site VPN to achieve this.  It’s a one way connection from the app service plan to the Virtual Network.  I’m choosing a Function App because of the native bindings that are available. 

Mocking a sample API 

So as a prerequisite to coding the function app, I needed an API to call.  I needed something that would be hosted in a private Virtual Network, so decided that a .Net Core API hosted on a Windows 2016 VM would be about as good a choice as any.  I created an API that used a Random Number to decide on what kind of http response code to provide.  Here’s the code; https://github.com/Gordonby/Snippets/tree/master/APISamples/TheUnreliableAPI

Creating the Function App / App Service Plan

The selling point of an Azure Function is commonly the serverless billing model, referred to as the Consumption plan.  However, in order to connect into a Virtual Network we need to use the static billing variant; the App Service Plan.  We also need to be using the Standard tier or higher to get the Virtual Network capability.

Coding the Function App 

The code required for the app is pretty easy, just using the HTTPClient to initiate an HTTP call and then parse the response code and response body.  When an invalid response is received (http status code, or response body) then an exception is generated which is subsequently captured in Application Insights.
Here’s the code: https://github.com/Gordonby/Snippets/blob/master/VNetHttpPinger/Function1.cs

Connecting the Function App into the Virtual Network 

Follow the standard process for enabling access; https://docs.microsoft.com/en-us/azure/app-service/web-sites-integrate-with-vnet#enabling-vnet-integration

The key steps are;

  • Creating a gateway subnet inside your virtual network
  • Creating a VPN Gateway service into the virtual network
  • Enabling point to site configuration

The two things i’d suggest making sure you follow are;

If the gateway isn’t in the basic SKU, then IKEV2 must be disabled in the point-to-site configuration and SSTP must be selected. The address space must be in one of the following address blocks:

10.0.0.0/8 – This means an IP address range from 10.0.0.0 to 10.255.255.255

172.16.0.0/12 – This means an IP address range from 172.16.0.0 to 172.31.255.255

192.168.0.0/16 – This means an IP address range from 192.168.0.0 to 192.168.255.255

The typical error message you’ll see from App Service if connectivity is present but the P2S config is incorrect is;

An attempt was made to access a socket in a way forbidden by its access permissions

Using App Insights for telemetry 

The benefit of using Application Insights to view the telemetry is the jumpstart into quickly viewing the API status and setting up some basic alerting into the business.

 

 

Upgrading an AKS cluster – how it works

I was talking to someone about the way AKS handles cluster upgrades, which is pretty easy to understand.  You have a cluster which is sized to N nodes, and you request an upgrade.  The first thing we do is add another node to the cluster.  We then, one by one take a node out of the cluster available pool before updating the version of Kubernetes, and then put it back into the pool.  This happens for every node except the last one, which we just trash and leave you with the extra node that was added at the beginning of the process.

I’ve included the “kubectl get nodes -o wide -w” log below from a recent upgrade from 1.12.4 to 1.12.6 so you can see what I mean, and the timings that are involved.

It took around 3 minutes for the new node aks-agentpool-24883706-2 (of the correct 1.12.6 K8S version) to be added to my cluster.  Then a subsequent 4 minutes before the node aks-agentpool-24883706-0 had been upgraded.

If you also look at the private IP Address range that gets allocated to aks-agentpool-24883706-2 you can see why the formula for sizing your virtual network is important.  See: https://gordon.byers.me/azure/networking-basics-in-the-azure-kubernetes-service/

 

 

 

 

 

Azure Kubernetes and Calico network policies

In this post i’m looking at the Network Policies in AKS which uses Calico to restrict pod network communication in a number of different ways.

Preview feature

The first point to make about Network Policies in AKS is that they’re currently in preview.  New features to AKS will always hit preview first, for a couple of months before going GA.  This is a model that most people familiar with Azure will be happy with.  The official documentation on Network Policies is here; https://docs.microsoft.com/en-us/azure/aks/use-network-policies – Where you can check the current status of the feature and read the Preview caveat in more detail.

Network Enforcement through labels

Labels on Pods drive the network policy enforcement.  Taking a simple 2 tier app, like the Azure Vote Sample app.  It has a front end pod which contains the HTML and JS, and the backend pod which serves as the Datastore.  The original yaml deployment file is here: https://raw.githubusercontent.com/Azure-Samples/azure-voting-app-redis/master/azure-vote-all-in-one-redis.yaml
I’ve added a couple of extra pod labels to this version of the deployment: https://raw.githubusercontent.com/Gordonby/Snippets/master/AKS/Azure-Vote-Labelled.yml

PodLabels

Deploying the app

Lets start by deploying the app to our cluster and making sure it works.  I won’t go through cluster configuration and enablement of the Network Policies here as that’s already covered extensively in the official documentation: https://docs.microsoft.com/en-us/azure/aks/use-network-policies

curl -O https://raw.githubusercontent.com/Gordonby/Snippets/master/AKS/Azure-Vote-Labelled.yml
kubectl create -f Azure-Vote-Labelled.yml

AKS-Vote-Labelled-Created

After we have a external ip allocated, lets test it works.

Azure-Vote-Works

Deny All

It’s important to first make sure we’re operating with least privilege, so lets start by denying all traffic in a network policy and then checking our app again to make sure that it’s broken.

curl -O https://raw.githubusercontent.com/Gordonby/Snippets/master/AKS/Deny-All.yml
kubectl create –f Deny-All-yml

Aks-Calico-Deny-All

If you now navigate to the front-end, you’ll see that the app is now unreachable.  You can also take a look at the network policy definition with this kubectl command;

DenyAllPolicy

Allowing Front-End access

Our app has a couple of labels, and I want to specifically allow access to any pod that has the label; role: frontend
This is going to save me time for every subsequent web app I want to add.

curl -O https://raw.githubusercontent.com/Gordonby/Snippets/master/AKS/NetworkPolicy-1-FrontEnd.yml
kubectl apply -f NetworkPolicy-1-FrontEnd.yml

web-allow-external

Allowing Back-End access

For this network policy, i’m going to limit it to just the specific Azure Frontend pod.

curl -O https://raw.githubusercontent.com/Gordonby/Snippets/master/AKS/NetworkPolicy-2-BackEnd.yml
kubectl apply -f NetworkPolicy-2-BackEnd.yml

BackEndPolicy

Networking basics in the Azure Kubernetes Service

The Azure Kubernetes Services provides two Network Plugin options. Kubenet, which was the first available option, and the Azure CNI (Advanced Networking).
The Azure CNI is the only networking option that supports provides support for capabilities like Vnet peering and network policies – basically most enterprise scenarios will require using the Azure CNI.

There’s a really comprehensive guide to the Azure CNI here : https://docs.microsoft.com/en-us/azure/aks/configure-azure-cni
This post is intended to serve as an example, and to emphasise several of the points in the official documentation.

Using an existing virtual network


I have a /23 Vnet that offers 445 addresses. As each pod on each node will take an IP address from the virtual network, it’s pretty important to realise the limitations of using a small virtual network for your clusters. Where the virtual network is peered with other networks, including your on-prem network this can often mean you’ll end up needing a larger network address range than you first thought.
From the Azure CNI documentation, there’s a pretty handy formula you can plug into Excel to start seeing how many nodes/pods your address space will support.
=(A2+1) + ((A2+1) * B2)

Nodes Pods Addresses
50 30 1581
10 100 1111
6 62 441

So you can see that in my /23 Vnet, a suitable combination is 6 nodes with a maximum pod capacity of 62.

Once you’ve done this, you can create your cluster. The important piece to note at this point is;
The service address range is a set of virtual IPs (VIPs) that Kubernetes assigns to internal services in your cluster.
Therefore when you select the Service address range, you need to ensure it won’t overlap with any other IP ranges you may use… EG.

After the cluster has been created, and you’ve provisioned some pods/services you’ll see the IP addresses used as per the following;

Using the edge Azure CLI in a Centos VM

If you’re wanting to use the Edge (developer, nightly build) of the Azure CLI tools I can definitely recommend using an Azure VM.
Firstly, the advantages are

  • Using a developer AZ CLI build in isolation from your main work machine
  • VM’s in Azure can using Managed Service Identity to easily authenticate with the control plane
  • You can run it on a really cheap B series VM ($9/$18 a month!)

See here for the GitHub repo: https://github.com/Azure/azure-cli#edge-builds

VM Spec

I’m running a CentOS Standard_B1ms VM which gives me 1 core 2GB RAM, 800 IOPS with a 32GB standard managed disk which offers a capped IOPS of 500. This costs approx. $18/month for 24×7 compute and approximately $2/month for my disk.

NSG

I have two IP’s opened up on port 22 so I can SSH on. Everything else is locked down.

VM Updates

The VM is enrolled for update management, as I’ve gone for the B-series the VM will be kept up 24×7 so I don’t need to worry about turning it on to get patched.

Managed Identity

The VM has a system managed identity in Azure Active Directory and I have given it limited Contributor access to one resource group and reader access to a few other resource groups.

Setup script

On a fresh CentOS VM here’s what I run to install the CLI and login to Azure.

Whenever I log into the VM I run this command to update to the latest build

Next steps

I kinda like having a VM running all the time, i’m putting myself through a very basic IaaS experience using fundamental Azure components. I could make use of the Docker build of the AZ CLI and run it up on a Azure Container Instance on-demand…. Maybe next week, eh 🙂