Scheduled json post in linux using cron and du – to Logic Apps and PowerBi

This post digs into getting regular disk space data out of linux, sending to an Azure Logic App for submission into PowerBi for visualisation.

So in my last post, i set up an rsync server to take a load of files from my “on-premises” storage. I have a reasonable internet connection, but am impatient and was always checking progress by logging into my rsync server and checking directory size using the du command.

-h gives you a human readable disk size, and max-depth only reports on the top level directories

du output to json

So the first step is take the output that i like, and get it into a better format.
Here’s what i came up with, it’s a little hacky as to create valid json i have an empty object at the end of the array to avoid messing around with last-comma removal.

This outputs valid json, which i can then curl out to a waiting Logic App, Power Bi dataset, or just a web service.

Logic App

So i chose to post the data to an Azure Logic App. This way i can take advantage of it’s pre-built connectors for email/power-bi whenever i change my mind about what to do with the data.

I new up a Logic App in Azure, choosing a Http Request/Response trigger template, pasting in the json – it creates the schema for me and i’m ready to go.

logic app new

Curling the json

So now that i’ve got a URL from Logic Apps to post to i can create the curl command on my linux box.

Lets see if that worked by checking the Logic App run history

Scheduling the curl

Ok, so all works as planned. Lets get this reporting the data every hour.

First lets put the code in a shell script file

Then lets schedule it for every 30 minutes.

Logic App run history, oh good it’s working every 30mins 🙂

Developing the Logic App

So up until this point the Logic App contains 1 action, a trigger to receive the data which it does nothing with.
Lets send the data over to PowerBi so i can quickly check it from my mobile app when I’m interested.

First up, head to PowerBi, click on a workspace and then add a dataset. I’m going to use a streaming dataset, with the API model.
You provide the field names and that’s it.

Next, we add add a few more actions to the Logic App.
– Filter Array (so we’re just working with the total size item)
– Add rows to a Powerbi dataset
– Use a simple calculation to work out the GB size from the KB size provided in the request

PowerBi Reports

So, once we have data going into a PowerBi hybrid streaming dataset we can craft some reports.

Streaming dashboard

Data over time

Troubleshooting rsync with Readynas, Azure and Centos

Years ago i bought a couple of Netgear Readynas devices.  A duo, and then subsequently a duo v2.  They’re pretty basic, but offered good squeezebox support and a cheap way of me storing TB of data in a RAID config.

Both of the Readynas devices support backup operations to send their data on a scheduled basis.  I’d normally opt for the simplicity of CIFS/Samba, but my internet provider has decided to block those ports and the Readynas devices don’t allow you to use a non-standard port.  Thusly the only other way to get the job done is to use rsync.

My desired location for the data backup is in Azure (naturally!).  Ideally in Azure files as my data will be the most accessible to me over an smb share in the same way that i’ve always accessed my ReadyNas devices.

Here’s a run down of a bunch of the errors i received when doing this and how to get around them.

rsync: getaddrinfo: 873: Name or service not known

It turned out that my rsync daemon wasn’t listening correctly.

Check with

The quick command to get it running is

rsync: failed to set times on “.” (in frontbut): Operation not permitted (1)

At first i thought this problem was because of the way i was mounting Azure files and that it’s filesystem didn’t support it. Most of the solutions on the web tell you to use the -O flag to omit updating directory times.

However the solution was that the username my Readynas was using was not the owner of the directory.

This statement changes the ownership (recursively) of the directory to user1. This should match the username you are using in the Readynas and the rsyncd.conf file

ERROR: The remote path must start with a module name not a /

Pretty easy one here. The path must only represent the module defined in the rsyncd.conf file – not the directory path.
Rnas backup destination

@ERROR: Unknown module ‘mnt’

I was having an issue whereby the config file i was using wasn’t being picked up by rsync (typo).
I was editing /etc/rsync.conf when it should have been /etc/rsyncd.conf.
Inside this configuration files are various module definitions (specifying the path etc), the module must be used.
rsync module

@ERROR: chroot failed

In your rsyncd.conf file make sure that chroot = false

@ERROR: chdir failed

Ensure that the directory has the correct permissions allocated.

My final Rsyncd.conf file

Linux Mint on Azure

Azure’s got a nice set of supported Linux distros.
The Azure Marketplace also has a pretty comprehensive list of where you can get started quickly with various Linux distros.
There’s even a couple of nice templates in the Quickstart templates GitHub where you can have Gnome or Xfce desktops installed, and RDP enabled.

I’ve had the issue of needing to get a desktop build of Linux Mint (17.3 Rosa) running in Azure. I’m writing this post to help avoid some of the pitfalls that I have experienced and worked around. I’ve also got a handy script file that makes the process a lot quicker.

The first thing to point out is that i’m using Hyper-V 10 (on my x64 Windows 10 build) to create the disk images locally before uploading them to Azure. You can use Virtualbox or other workstation hypervisors to build your disk image.

Which ISO?

The first choice you need to make is the right image iso to use.I wasted a lot of time with x86 images. Don’t. They don’t work correctly in Azure. My Azure boot diagnostics showed that it wouldn’t progress past "Waiting for Network".
My Linux Mint testing (Ubuntu derived distro) didn’t even show this message.

You need to download x64 ISO images only

Hyper-V settings

Before creating a VM, you should create the hard disk.

  • VHD, not VHDx
  • Fixed size

Next, create the VM.

  • Make sure you have your Virtual Switch created first as you’ll need network connectivity.
  • Generation 1 VM as Azure cannot accept a Generation 2 VM
  • Choose a small disk as this will speed up the upload later. I tend to go for 10GB.
  • Opt for 2048mb of allocated memory (not dynamic)

Mint installation settingsChoose your own partition/disk config. The default setup will include a Swap partition – which you don’t need.OS ConfigurationAs soon as the Ubuntu has installed and the VM has rebooted, you’re ready to get it configured for Azure. There’s a good guide on the Azure site, but I’ve shortened it down to a single script file, with various little tweaks that make the process a little easier. It’s structured that you can run in blocks by copy and pasting into terminal.

Here’s my script file. – Inside your VM Navigate to it and download for easy reference.Each script block has a comment.

Sending the VHD to Azure

Once the VM is off, we’ll want to do the final preparation before uploading into Azure storage. Since my host environment is Windows, I use Powershell. First step is to make the disk static is size (so the vhd file takes up the full 10GB of disk space). Then we can upload into Azure Storage.

Testing the VM Image

By far the quickest way to test the creation of a VM from a VHD is by using DevTest labs. They have their own storage account, so if you need to copy it into the lab storage account