Monitoring API’s and Web Apps that run inside your Virtual Network with PaaS

I was with a customer last week who has a cluster full of API’s.  Literally.  

What they wanted was a way of regularly hitting some of the API’s and evaluating if they’re working.  Eg.  Is there a 200 status code, if certain data was posted into the API then does the response match the expected result.  This is all pretty easy when the API is public facing, (see my status cake post).  However when the API’s are internally facing inside a virtual network it’s a little harder. 

One option would be to deploy a monitoring service inside the cluster that tests itself, but having a resource monitor itself has never been a good idea.  Another way would be to use a very IaaS method and stand up a VM (or two) to monitor from.  It’s separate from the cluster, but doesn’t feel like a particularly good answer as you now have to feed and water (patch and monitor) the VMs.

I’m a massive fan of Logic Apps for these kind of workflows, but Logic Apps don’t connect into your Virtual Network.  You could use API Management or an ISE (Integration Service Environment: https://docs.microsoft.com/en-us/azure/logic-apps/connect-virtual-network-vnet-isolated-environment-overview) to gain access to the Virtual Network, but for a single ping monitor it’s hard to justify overall costs of these services if they are just supporting this single scenario. 

The solution I think makes the most sense, is to code an Azure Function which you host inside an App Service Plan.  The App Service plan has the capability to connect into a Virtual Network without actually being deployed inside it.  It makes use of a point to site VPN to achieve this.  It’s a one way connection from the app service plan to the Virtual Network.  I’m choosing a Function App because of the native bindings that are available. 

Mocking a sample API 

So as a prerequisite to coding the function app, I needed an API to call.  I needed something that would be hosted in a private Virtual Network, so decided that a .Net Core API hosted on a Windows 2016 VM would be about as good a choice as any.  I created an API that used a Random Number to decide on what kind of http response code to provide.  Here’s the code; https://github.com/Gordonby/Snippets/tree/master/APISamples/TheUnreliableAPI

Creating the Function App / App Service Plan

The selling point of an Azure Function is commonly the serverless billing model, referred to as the Consumption plan.  However, in order to connect into a Virtual Network we need to use the static billing variant; the App Service Plan.  We also need to be using the Standard tier or higher to get the Virtual Network capability.

Coding the Function App 

The code required for the app is pretty easy, just using the HTTPClient to initiate an HTTP call and then parse the response code and response body.  When an invalid response is received (http status code, or response body) then an exception is generated which is subsequently captured in Application Insights.
Here’s the code: https://github.com/Gordonby/Snippets/blob/master/VNetHttpPinger/Function1.cs

Connecting the Function App into the Virtual Network 

Follow the standard process for enabling access; https://docs.microsoft.com/en-us/azure/app-service/web-sites-integrate-with-vnet#enabling-vnet-integration

The key steps are;

  • Creating a gateway subnet inside your virtual network
  • Creating a VPN Gateway service into the virtual network
  • Enabling point to site configuration

The two things i’d suggest making sure you follow are;

If the gateway isn’t in the basic SKU, then IKEV2 must be disabled in the point-to-site configuration and SSTP must be selected. The address space must be in one of the following address blocks:

10.0.0.0/8 – This means an IP address range from 10.0.0.0 to 10.255.255.255

172.16.0.0/12 – This means an IP address range from 172.16.0.0 to 172.31.255.255

192.168.0.0/16 – This means an IP address range from 192.168.0.0 to 192.168.255.255

The typical error message you’ll see from App Service if connectivity is present but the P2S config is incorrect is;

An attempt was made to access a socket in a way forbidden by its access permissions

Using App Insights for telemetry 

The benefit of using Application Insights to view the telemetry is the jumpstart into quickly viewing the API status and setting up some basic alerting into the business.

 

 

Upgrading an AKS cluster – how it works

I was talking to someone about the way AKS handles cluster upgrades, which is pretty easy to understand.  You have a cluster which is sized to N nodes, and you request an upgrade.  The first thing we do is add another node to the cluster.  We then, one by one take a node out of the cluster available pool before updating the version of Kubernetes, and then put it back into the pool.  This happens for every node except the last one, which we just trash and leave you with the extra node that was added at the beginning of the process.

I’ve included the “kubectl get nodes -o wide -w” log below from a recent upgrade from 1.12.4 to 1.12.6 so you can see what I mean, and the timings that are involved.

It took around 3 minutes for the new node aks-agentpool-24883706-2 (of the correct 1.12.6 K8S version) to be added to my cluster.  Then a subsequent 4 minutes before the node aks-agentpool-24883706-0 had been upgraded.

If you also look at the private IP Address range that gets allocated to aks-agentpool-24883706-2 you can see why the formula for sizing your virtual network is important.  See: https://gordon.byers.me/azure/networking-basics-in-the-azure-kubernetes-service/