Take the median of a grouped set - influxdb

I am quite new to Flux and want to solve an issue:
I got a bucket containing measurements, which are generated by a worker-service.
Each measurement belongs to a site and has an identifier (uuid). Each measurement contains three measurement points containing a value.
What I want to archive now is the following: Create a graph/list/table of measurements for a specific site and aggregate the median value of each of the three measurement points per measurement.
TLDR;
Get all measurementpoints that belong to the specific site-uuid
As each measurement has an uuid and contains three measurement points, group by measurement and take the median for each measurement
Return a result that only contains the median value for each measurement
This does not work:
from(bucket: "test")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "lighthouse")
|> filter(fn: (r) => r["_field"] == "speedindex")
|> filter(fn: (r) => r["site"] == "1d1a13a3-bb07-3447-a3b7-d8ffcae74045")
|> group(columns: ["measurement"])
|> aggregateWindow(every: v.windowPeriod, fn: mean, createEmpty: false)
|> yield(name: "mean")
This does not throw an error, but it of course does not take the median of the specific groups.
This is the result (simple table):

If I understand your question correctly you want a single number to be returned.
In that case you'll want to use the |> mean() function:
from(bucket: "test")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "lighthouse")
|> filter(fn: (r) => r["_field"] == "speedindex")
|> filter(fn: (r) => r["site"] == "1d1a13a3-bb07-3447-a3b7-d8ffcae74045")
|> group(columns: ["measurement"])
|> mean()
|> yield(name: "mean")
The aggregateWindow function aggregates your values over (multiple) windows of time. The script you posted computes the mean over each v.windowPeriod (in this case 20 minutes).
I am not entirely sure what v.windowPeriod represents, but I usually use time literals for all times (including start and stop), I find it easier to understand how the query relates to the result that way.
On a side note: the yield function only renames your result and allows you to have multiple returning queries, it does not compute anything.

Related

Break down a curve showing accumulated consumption per day

I'm using InfluxDB 2 and I've got the following curve:
Instead of showing the total accumulated, I want to know the consumption per day.
from(bucket: "my-bucket")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "kWh")
|> filter(fn: (r) => r["entity_id"] == "plug_energy")
|> filter(fn: (r) => r["_field"] == "value")
|> aggregateWindow(every: 1d, fn: sum, createEmpty: false)
I thought this would work but it actually gives me the number of time the smart plug was turned on each day, not the energy consumed per day.
I have found this similar question but this looks like a very complicated solution for something that should be simpler?

How does Influxdb sort and draw the line by default?

I have a simple query for fetching some data like:
from(bucket: "almon")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "binary")
|> filter(fn: (r) => r["_field"] == "duration_mili")
|> group()
|> yield(name: "mean")
and the graph I get is
What I don't understand is why are the data points sorted by _time, but the actual line seems to not follow that. After exploring the data it seems like the line is drawn in the order of sorted tags. Why is that so and is that documented somewhere? What influences the logic for drawing the line on the graph?
By default InfluxDB returns data grouped by measurement+tags and then within those groups it is sorted by _time.
Because you called group() you removed that default grouping, but that doesn't force a re-sorting, so you still have the data ordered by groups even though it's no longer separated by groups.
If you add |> sort(columns: ["_time"]) after your group() that should take care of things for you.

What is the query to select multiple columns and group by one of the column in InfluxDb using Flux?

How to write similar query using Flux:
SELECT field_a,field_b from 'measurement' where field_a = 10 and group by field_b
I'm afraid that the InfluxQL above won't work as currently InfluxDB only supports tags and time interval in GROUP BY clause, not the fields. This could be inferred from the syntax of group by clause (for more information refer to InfluxDB documentation).
Nevertheless, if you are grouping by some tag as follows:
SELECT field_a,tag_b from 'measurement' where field_a = 10 and group by tag_b
This is the equivalent Flux query:
from(bucket: "thisIsYourBucketInInfluxDBV2")
// specify start:0 to query from all time. Equivalent to SELECT * from db1. Use just as cautiously.
|> range(start: 0)
|> filter(fn: (r) => r._measurement == "measurement" and r._field == "field_a" and r._value = 10)
|> filter(fn: (r) => r._value = 10)
Here is a guide for you to migrate your InfluxQL to Flux.
You can query several fields using a regex. And you can group by fields if you pivot your result table using the schema.fieldsAsCols() function; that way, the result of the query has columns that have the names of the queried fields. See this query:
import "influxdata/influxdb/schema"
from(bucket: "yourBucket")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "measurement")
|> filter(fn: (r) => r["_field"] =~ /^(field_a|field_b)$/)
|> aggregateWindow(every: v.windowPeriod, fn: first, createEmpty: false)
//|> group()
//|> sort(columns: ["_time"])
|> schema.fieldsAsCols()
|> filter(fn: (r) => r.field_a == 10)
|> group(columns: ["field_b"])
//|> max(column: "field_b")
|> yield()
Two remarks :
To make sure that you have only one table before you group by field_b, uncomment the lines |> group() and |> sort(columns: ["_time"]). The first ungroups the result which is otherwise splitted into different values of your tags (if you have any). The latter sortes the ungrouped result by the timestamp.
Since there is no aggregation in your initial query, the flux query outputs several result tables depending on the number of different values of field_b. If you are for example interested in the the max of field_a for every group uncomment the line before |> yield().

InfluxDB Sum Messages per hour

I am writing to a InfluxDB counters per time period (the delta between each submission for the new seen messages of that type). I would like to combine the total count of messages over a time period to give messages per hour (or other time periods).
I have the below query, and using https://docs.influxdata.com/influxdb/cloud/reference/flux/stdlib/built-in/transformations/aggregates/sum/:
from(bucket: "ServiceStats")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "PolygonIoStream")
|> filter(fn: (r) => r["_field"] == "aggregatesCounter" or r["_field"] == "quotesCounter" or r["_field"] == "statusesCounter" or r["_field"] == "tradesCounter")
|> aggregateWindow(every: v.windowPeriod, fn: mean, createEmpty: false)
|> Sum()
|> yield(name: "mean")
However i get the error runtime error #6:6-6:74: aggregateWindow: missing time column "_time"
I will be honest, this as my first query I have quickly gotten out of my depth - pointers much appreciated.

How to create Influxdb alert for deviating from average hourly values?

So I'm trying to find any documentation on more complex Flux queries but after days of searching I'm still lost. I want to be able to calculate average values for each hour of the week and then when new data comes in I want to check if it deviates by x standard deviations for that hour.
Basically I want to have 24x7 array fields each representing the mean/median value for each hour of the week for the last 1 year. Then I want to compare last days values for each hour against these averages and report an error. I do not understand how to calculate these averages. Is there some hidden extensive documentation on Flux?
I don't really need a full solution, just some direction would be nice. Like, are there some utility functions for this in the standard lib or whatever
EDIT: After some reading, it really looks like all I need to do is use the window and aggregateWindow functions but I haven't yet found how exactly
Ok, so, this is what worked for me. Needs some cleaning up but gets the values successfully grouped per hour+weekday and the mean of all the values
import "date"
tab1 = from(bucket: "qweqwe")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "asdasd")
|> filter(fn: (r) => r["_field"] == "reach")
|> aggregateWindow(every: 1h, fn: mean, createEmpty: false)
mapped = tab1
|> map(fn: (r) => ({ r with wd: string(v: date.weekDay(t: r._time)), h: string(v: date.hour(t: r._time)) }))
|> map(fn: (r) => ({ r with mapped_time: r.wd + " " + r.h }))
grouped = mapped
|> group(columns: ["mapped_time"], mode: "by")
|> mean()
|> group()
|> toInt()
|> yield()

Resources