I’ve been using the Telegraf–InfluxDB–Chronograf-Kapacitor stack for a couple of months at home and at work, for monitoring the state of devices, process and home automation.
We actually I’ve been using theĀ Telegraf-InfluxDB-Grafana stack – I have no idea why they decided to create Chronograf as a fork of Grafana, but it really is pretty rubbish in comparison.
That said, overall the solution is brilliant – Telegraf is pretty good at grabbing stats from your servers, and is highly configurable (at least on Linux – the Windows version could do with some work). The only area that really lets it down is the inability to sum up stats when monitoring processes, so anything that spawns child processes tends to make a mess of the stats.
Influx is very easy to use – the line protocol mechanism for adding data with a simple web request makes it very accessible, with a simple bash script and some sed reformatting able to create a data dump very easily. It seems pretty disk intensive, but I guess that’s always going to be the case with something writing datapoints every minute. Getting used to a timeseries database takes a bit of patience, with pretty limited options for querying, but it’s worth it for the performance and space saving. The only significant lack here is handling of offsets – it’s a very clear use-case to compare timeseries from two equivalent points in time, and surprising it isn’t supported.
Then Grafana tops it off with flexible and powerful visualisation.
I’d recommended anyone who is looking after any sort of IT system to have a play around with it.