InfluxDB Sum Messages per hour - influxdb

I am writing to a InfluxDB counters per time period (the delta between each submission for the new seen messages of that type). I would like to combine the total count of messages over a time period to give messages per hour (or other time periods).
I have the below query, and using https://docs.influxdata.com/influxdb/cloud/reference/flux/stdlib/built-in/transformations/aggregates/sum/:
from(bucket: "ServiceStats")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "PolygonIoStream")
|> filter(fn: (r) => r["_field"] == "aggregatesCounter" or r["_field"] == "quotesCounter" or r["_field"] == "statusesCounter" or r["_field"] == "tradesCounter")
|> aggregateWindow(every: v.windowPeriod, fn: mean, createEmpty: false)
|> Sum()
|> yield(name: "mean")
However i get the error runtime error #6:6-6:74: aggregateWindow: missing time column "_time"
I will be honest, this as my first query I have quickly gotten out of my depth - pointers much appreciated.

Related

Break down a curve showing accumulated consumption per day

I'm using InfluxDB 2 and I've got the following curve:
Instead of showing the total accumulated, I want to know the consumption per day.
from(bucket: "my-bucket")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "kWh")
|> filter(fn: (r) => r["entity_id"] == "plug_energy")
|> filter(fn: (r) => r["_field"] == "value")
|> aggregateWindow(every: 1d, fn: sum, createEmpty: false)
I thought this would work but it actually gives me the number of time the smart plug was turned on each day, not the energy consumed per day.
I have found this similar question but this looks like a very complicated solution for something that should be simpler?

Compute time spent at a given location

I have saved my location in a bucket with latitude and longitude.
So far, I've been able to get the result to show nicely in a table with both latitude and longitude with the following code:
import "experimental/geo"
from(bucket: "home-assistant")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["entity_id"] == "maxime")
|> filter(fn: (r) => r["_field"] == "latitude" or r["_field"] == "longitude")
|> keep(columns: ["_time", "_value", "_field"])
|> pivot(rowKey: ["_time"], columnKey: ["_field"], valueColumn: "_value")
Now, let say I want to know how much time I've been home per day.
I feel like it should be something doable but seeing this issue, I'm suddenly not so sure about it.
What I need I believe is a mix between the geo.filterRows so that I can have a certain radius to designate my house, and maybe elapsed? But if that's the case I haven't managed to wrap my head around it to get the expected result.

Take the median of a grouped set

I am quite new to Flux and want to solve an issue:
I got a bucket containing measurements, which are generated by a worker-service.
Each measurement belongs to a site and has an identifier (uuid). Each measurement contains three measurement points containing a value.
What I want to archive now is the following: Create a graph/list/table of measurements for a specific site and aggregate the median value of each of the three measurement points per measurement.
TLDR;
Get all measurementpoints that belong to the specific site-uuid
As each measurement has an uuid and contains three measurement points, group by measurement and take the median for each measurement
Return a result that only contains the median value for each measurement
This does not work:
from(bucket: "test")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "lighthouse")
|> filter(fn: (r) => r["_field"] == "speedindex")
|> filter(fn: (r) => r["site"] == "1d1a13a3-bb07-3447-a3b7-d8ffcae74045")
|> group(columns: ["measurement"])
|> aggregateWindow(every: v.windowPeriod, fn: mean, createEmpty: false)
|> yield(name: "mean")
This does not throw an error, but it of course does not take the median of the specific groups.
This is the result (simple table):
If I understand your question correctly you want a single number to be returned.
In that case you'll want to use the |> mean() function:
from(bucket: "test")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "lighthouse")
|> filter(fn: (r) => r["_field"] == "speedindex")
|> filter(fn: (r) => r["site"] == "1d1a13a3-bb07-3447-a3b7-d8ffcae74045")
|> group(columns: ["measurement"])
|> mean()
|> yield(name: "mean")
The aggregateWindow function aggregates your values over (multiple) windows of time. The script you posted computes the mean over each v.windowPeriod (in this case 20 minutes).
I am not entirely sure what v.windowPeriod represents, but I usually use time literals for all times (including start and stop), I find it easier to understand how the query relates to the result that way.
On a side note: the yield function only renames your result and allows you to have multiple returning queries, it does not compute anything.

Difference in performance or execution between single vs multiple, chained lambdas in Flux

In Influx Flux, is there a technical difference (like in execution or performance) between setting a filter operation in a single statement vs. using multiple, chained statements?
For example, the single statement:
from(bucket: "example-bucket")
|> range(start: -1h)
|> filter(fn: (r) =>
r._measurement == "example-measurement" and
r._field == "example-field" and
r.tag == "example-tag"))
... versus using multiple, chained lambda's:
from(bucket: "example-bucket")
|> range(start: -1h)
|> filter(fn: (r) => r._measurement == "example-measurement")
|> filter(fn: (r) => r._field == "example-field")
|> filter(fn: (r) => r.tag == "example-tag"))
Perhaps both operations are executed equally. But I cannot find canon in the docs on it, although the examples seem to prefer the first example.
I understand that logical operator OR isn't ideal in the second case. Let's assume for this question it's all AND.

How do I "check" (alert on) an aggregate in InfluxDB 2.0 over a rolling window?

I want to raise an alarm when the count of a particular kind of event is less than 5 for the 3 hours leading up to the moment the check is evaluated, but I need to do this check every 15 minutes.
Since I need to check more frequently than the span of time I'm measuring, I can't do this based on my raw data (according to the docs, "[the schedule] interval matches the aggregate function interval for the check query". But I figured I could use a "task" to transform my data into a form that would work.
I was able to aggregate the data in the way that I hoped via a flux query, and I even saved the resultant rolling count to a dashboard.
from(bucket: "myBucket")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) =>
(r._measurement == "measurementA"))
|> filter(fn: (r) =>
(r._field == "booleanAttributeX"))
|> window(
every: 15m,
period: 3h,
timeColumn: "_time",
startColumn: "_start",
stopColumn: "_stop",
createEmpty: true,
)
|> count()
|> yield(name: "count")
|> to(bucket: "myBucket", org: "myOrg")
Results in the following scatterplot.
My hope was that I could just copy-paste this as a new task and get my nice new aggregated dataset. After resolving a couple of legible syntax errors, I settled on the following task definition:
option v = {timeRangeStart: -12h, timeRangeStop: now()}
option task = {name: "blech", every: 15m}
from(bucket: "myBucket")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) =>
(r._measurement == "measurementA"))
|> filter(fn: (r) =>
(r._field == "booleanAttributeX"))
|> window(
every: 15m,
period: 3h,
timeColumn: "_time",
startColumn: "_start",
stopColumn: "_stop",
createEmpty: true,
)
|> count()
|> yield(name: "count")
|> to(bucket: "myBucket", org: "myOrg")
Unfortunately, I'm stuck on an error that I can't find any mention of anywhere: could not execute task run; Err: no time column detected: no time column detected.
If you could help me debug this task run error, or sidestep it by accomplishing this task in some other manner, I'll be very grateful.
I know I'm late here, but the to function needs a _time column, but the count aggregate you are adding returns a _start and _stop column to indicate the time frame for the count, not a _time.
You can solve this by either adding |> duplicate(column: "_stop", as: "_time") just before your to function, or leveraging the aggregateWindow function which handles this for you.
|> aggregateWindow(every: 15m, fn: count)
References:
https://v2.docs.influxdata.com/v2.0/reference/flux/stdlib/built-in/transformations/aggregates/count
https://v2.docs.influxdata.com/v2.0/reference/flux/stdlib/built-in/transformations/duplicate/
https://v2.docs.influxdata.com/v2.0/reference/flux/stdlib/built-in/transformations/aggregates/aggregatewindow/

Resources