Create sum of multiple queries with influxdb - influxdb

I have four singlestat panels which show my used space on different hosts (every host has also different type_instances):
The query for one of this singlestats is the following:
Question: Is there a way to create a fifth singlestat panel which sows the sum of the other 4 singlestats ? (The sum of all "storj_value" where type=shared)

The influx query language does not currently support aggregations across metrics (eg, JOINs). It is possible with Kapacitor but that requires that new aggregated values for all the measurements are written to the DB, by writing code to do it, which will need to be queried separately.
Only option currently is to use an API that does have cross-metric function support, for example Graphite with an InfluxDB storage back-end, InfluxGraph.
The two APIs are quite different - Influx's is query language based, Graphite is not - and tagged InfluxDB data will need to be configured as a Graphite metric path via templates, see configuration examples.
After that, Graphite functions that act across series can be used, in particular for the above question, sumSeries.

Related

How to send non aggregated metric to Influx from Springboot application?

I have a SpringBoot application that is under moderate load. I want to collect metric data for a few of the operations of my app. I am majorly interested in Counters and Timers.
I want to count the number of times a method was invoked (# of invocation over a window, for example, #invocation over last 1 day, 1 week, or 1 month)
If the method produces any unexpected result increase failure count and publish a few tags with that metric
I want to time a couple of expensive methods, i.e. I want to see how much time did that method took, and also I want to publish a few tags with metrics to get more context
I have tried StatsD-SignalFx and Micrometer-InfluxDB, but both these solutions have some issues I could not solve
StatsD aggregates the data over flush window and due to aggregation metric tags get messed up. For example, if I send 10 events in a flush window with different tag values, and the StatsD agent aggregates those events and publishes only one event with counter = 10, then I am not sure what tag values it's sending with aggregated data
Micrometer-InfluxDB setup has its own problems, one of them being micrometer sending 0 values for counters if no new metric is produced and in that fake ( 0 value counter) it uses same tag values from last valid (non zero counter)
I am not sure how, but Micrometer also does some sort of aggregation at the client-side in MeterRegistry I believe, because I was getting a few counters with a value of 0.5 in InfluxDB
Next, I am planning to explore Micrometer/StatsD + Telegraf + Influx + Grafana to see if it suits my use case.
Questions:
How to avoid metric aggregation till it reaches the data store (InfluxDB). I can do the required aggregation in Grafana
Is there any standard solution to the problem that I am trying to solve?
Any other suggestion or direction for my use case?

Cloud Bigtable multi-prefix scan in dataflow

UPDATE: it seems that the recently released org.apache.beam.sdk.io.hbase-2.6.0 includes the HBaseIO.readAll() api. I tested in google dataflow, and it seems to be working. Will there be any issue or pitfall of using HBaseIO directly in Google Cloud Dataflow setting?
The BigtableIO.read takes PBegin as an input, I am wondering if there is anything like SpannerIO's readAll API, where the BigtableIO's read API input could be a PCollection of ReadOperations (e.g, Scan), and produce a PCollection<Result> from those ReadOperations.
I have a use case where I need to have multiple prefix scans, each with different prefix, and the number of rows with the same prefix can be small (a few hundred) or big (a few hundreds of thousands). If nothing like ReadAll is already available. I am thinking about having a DoFn to have a 'limit' scan, and if the limit scan doesn't reach the end of the key range, I will split it into smaller chunks. In my case, the key space is uniformly distributed, so the number of remaining rows can be well estimated by the last scanned row (assuming all keys smaller than the last scanned key is returned from the scan).
Apology if similar questions have been asked before.
HBaseIO is not compatible with Bigtable HBase connector due to region locator logic. And we haven't implemented the SplittableDoFn api for Bigtable yet.
How big are your rows, are they small enough that scanning a few hundred thousand row can be handled by a single worker?
If yes, then I'll assume that the expensive work you are trying parallelize is further down in your pipeline. In this case, you can:
create a subclass of AbstractCloudBigtableTableDoFn
in the DoFn, use the provided client directly, issuing scan for each prefix element
Each row resulting from the scan should be assigned a shard id and emitted as a KV(shard id, row). The shard id should be a incrementing int mod some multiple of the number of workers.
Then do a GroupBy after the custom DoFn to fan out the shards. It's important to do a GroupByKey to allow for fanout, otherwise a single worker will have to process all of the emitted rows for a prefix.
If your rows are big and you need to split each prefix scan across multiple workers then you will have to augment the above approach:
in main(), issue a SampleRowKeys request, which will give rough split points
insert a step in your pipeline before the manual scanning DoFn to split the prefixes using the results from SampleRowsKeys. ie. If the prefix is a and SampleRowKeys contains 'ac', 'ap', 'aw', then the range that it should emit would be [a-ac), [ac-ap), [ap-aw), [aw-b). Assign a shard id and group by it.
feed the prefixes to manual scan step from above.

InfluxDB: query to calculate average of StatsD "executionTime" values

I'm sending metrics in StatsD format to Telegraf, which forwards them to InfluxDB 0.9.
I'm measuring execution times (of some event) from multiple hosts. The measurement is called "execTime", and the tag is "host". Once Telegraf gets these numbers, it calculates mean/upper/lower/count, and stores them in separate measurements.
Sample data looks like this in influxdb:
TIME...FIELD..............HOST..........VALUE
t1.....execTime.count.....VM1...........3
t1.....execTime.mean......VM1...........15
t1.....execTime.count.....VM2...........6
t1.....execTime.mean......VM2...........22
(So at time t1, there were 3 events on VM1, with mean execution time 15ms, and on VM2 there were 6 events, and the mean execution time was 22ms)
Now I want to calculate the mean of the operation execution time across both hosts at time t1. Which is (3*15 + 6*22)/(3+6) ms.
But since the count and mean values are in two different series, I can't simply use "select mean(value) from execTime.mean"
Do I need to change my schema, or can I do this with the current setup?
What I need is essentially a new series, which is a combination of the execTime.count and execTime.mean across all hosts. Instead of calculating this on-the-fly, the best approach seems to be to actually create the series along with the others.
So now I have two timer stats being generated on each host for each event:
1. one event with actual hostname for the 'host' tag
2. second event with one tag "host=all"
I can use the first set of series to check mean execution times per host. And the second series gives me the mean time for all hosts combined.
It is possible to do mathematical operations on fields from two different series, provided both series are members of the same measurement. I suspect your schema is non-optimized for your use case.

Grafana dynamically display new hosts added by collectd

How to get grafana to dynamically add graphs for newly added hosts? For example, I have grafana chart to display load average for existing hosts. When I add a new host, the collectd will send the new host metrics to influxdb. But every time I have to manually add one more graph in grafana which is not desired? Is there a way to get grafana automatically plot the new host metrics without changing grafana?
You have to make use of the Grafana HTTP api and update your dashboard by adding the new graph that you want. This practically means that you have to:
use the api to take the json of the dashboard
handle this data and add your extra code for the new panel that you want to add
use the api again to update the dashboard
The hierarchy is simple: a dashboard has rows and rows have panels. Probably you will have to add some json code inside panels. Go check your json file and all these will make sense to you...
You can use regexp patterns in InfluxDB 0.8 (see also the 0.9 equivalent docs) to match all your newly added hosts. InfluxDB regexps use the Golang syntax.
For example, to match all series starting with stats.cpuNUMBER:
series: /^stats\.cpu\d+/
select: avg(load)
However this way you won't get one new plot for each newly added host, but a line for every host in the same plot.
You have to add regex in your select clause.
SELECT mean(value) FROM /logstash.*.requests.count/ WHERE $timeFilter
GROUP BY time($interval)
Above script will plot each series matching above regex automatically for all hosts without changing the grafana.
logstash.ABC1.requests.count
logstash.ABC2.requests.count
logstash.ABC3.requests.count
When ABC4 host is added and it is shipped correctly, new graph will be plotted automatically.

How do I handle large amounts of logfile data for display in dynamic charts?

I have a lot of logfile data that I want to display dynamic graphs from, for basically arbitrary time periods, optionally filtered or aggregated by different columns (that I could pregenerate). I'm wondering about the best way to store the data in a database and access it for displaying charts, when:
the time resolution should be variable from one second to a year
there are entries that span several 'time buckets', e.g. a connection might have been open for a few days and I want to count and display the user for every hour she was connected, not just in the hour 'slot' the connection was created or finished
Are there best practices, or tools/plugins for rails that help handle this kind and amount of data? Are there maybe database engines specifically tailored towards this, or having helpful functions (e.g. CouchDB indexes)?
EDIT: I'm looking for a scalable way to handle this data and access pattern. Things we considered: Run a query for each bucket, merge in app - probably way too slow. GROUP BY timestamp/granularity - does not count connections correctly. Preprocessing data into rows by smallest granularity and downsampling on query - probably the best way.
I think you can use mysql timestamps for this.
The way I solved it in the end was to pre-process the data into per-minute buckets, so there's one row for every event and minute. That makes it easy and fast enough to select and yields correct results. To get different granularity, you can do integer arithmetic on the timestamp columns - select abs(timestamp/factor)*factor and group by abs(timestamp/factor)*factor.

Resources