InfluxDB - Aggregate data on state duration - influxdb

I have a measurement in InfluxDB that keeps track of the status of a system. For example, consider the following measures:
03/22/18 00:00:00AM STATUS_A
03/22/18 09:00:00AM STATUS_B
03/22/18 13:00:00AM STATUS_C
03/22/18 18:00:00AM STATUS_B
03/22/18 19:00:00AM STATUS_D
03/22/18 21:00:00AM STATUS_A
What I need to do now is to derive how long the system was in each state every day. In the above example, the desired result is something like
STATUS_A 12h (from 00:00 to 09:00 and from 21:00 to 24:00)
STATUS_B 5h (from 09:00 to 13:00 and from 18:00 to 19:00)
STATUS_C 5h (from 13:00 to 18:00)
STATUS_D 2h (from 09:00 to 21:00)
I'm very new to the TICK stack, so I could be missing something quite elementary. I was thinking to use Kapacitor to create the aggregate result, but I don't really know how to obtain the result

You mentioned TICK stack so I assume you are using InfluxDB version 1.
It is feasible to report the total duration for a given state in InfluxDB but only in Flux not InfluxQL (easily).
You could try these steps:
Enable Flux in v1.8 with the configuration change here
Sample Flux could be:
from(bucket: "yourDatabaseName/autogen") |> range(start: 2018-03-20T00:00:00Z, stop: 2018-03-20T23:59:59Z) |> filter(fn: (r) => r._measurement == "yourMeasurementName") |> stateDuration(fn: (r) => r._value == "STATUS_A", column: "state", unit: 1h)
Again it's still not possible to do it yet in InfluxQL though the community has been waiting for this for a while. See more details here.

Related

InfluxDB notifications not running

Create a check. Schedule every 1m offset 8s. CRIT when value is above 20 -- which it is.
Create a notification endpoint: HTTP POST to nc running on http://127.0.0.1:8087 with no authentication.
while true; do cat 201.txt | nc -l -p 8087 -q 1 | dos2unix >> in.txt; echo -e "\n----" >> in.txt ; done
Where 201.txt is
HTTP/1.1 201 Created
Server: netcat
Content-Type: text/plain; charset=UTF-8
Created
Checked working from REST client.
Create a notification rule: run every 2m offset 21s when status is equal to CRIT and message to endpoint.
Nothing happens!
The check has a green tick next to it saying success.
The view history shows a graph with a red horizontal line at the threshold and a blue line for the data. There are two rows in the table below showing runs 90 and 100 minutes ago, and corresponding vertical red lines on the graph.
Suspiciously, when clicking through to the task the red threshold band is not shown on the chart. Clicking to edit shows this definition
import "influxdata/influxdb/monitor"
import "influxdata/influxdb/v1"
data = from(bucket: "zigbee")
|> range(start: -1m)
|> filter(fn: (r) => r["room"] == "Outside")
|> filter(fn: (r) => r["_field"] == "temperature")
|> aggregateWindow(every: 1m, fn: mean, createEmpty: false)
option task = {name: "Test outside 20", every: 1m, offset: 8s}
check = {_check_id: "09b167b262e86000", _check_name: "Test outside 20", _type: "threshold", tags: {}}
crit = (r) => r["temperature"] > 20.0
messageFn = (r) => "Check: ${ r._check_name } is: ${ r._level }. Value is ${r._value}"
data |> v1["fieldsAsCols"]() |> monitor["check"](data: check, messageFn: messageFn, crit: crit)
(I tried lengthening the aggregate window in case it was too small to get any data points, but it hasn't made a difference.)
The notification endpoint history has two entries also from 90 and 100 minuted ago (new data points have been received since then) both with an orange triangle in the SENT column.
The notificaiton rule has a green tick next to it and the same two history entries with orange triangles.
What's going on?!
The orange triangles are probably a red herring because no new ones are appearing -- nothing is triggering. Irritatingly, there appears to be no way to find out what the orange triangles mean.
Just checking through it again there was an orange triangle next to the alert, but when I refreshed the page it went back to a green tick. There are no new rows in the history table.
Update
If I edit the task and then save it then the UI goes to a secret Task page that shows logs.
could not execute task run: runtime error #16:33-16:96: check: failed to evaluate map function: 14:71-14:82: interpolated expression produced a null value
That means my message: apparently there is no such thing as ${r._value}?

Combining two different sources with different timestamps in influxdb/flux

I have 2 measurements as follows:
metrics,app=app1 cpu=10 1654150510
metrics,app=app1 cpu=12 1654150512
metrics,app=app1 cpu=13 1654150514
metrics,app=app1 cpu=14 1654150516
metrics,app=app1 cpu=15 1654150519
The frequency of the "metrics" measurement is about once in 2/3 seconds.
And the second one is:
http_requests,app=app1 response_time=12 1654150509
http_requests,app=app1 response_time=11 1654150510
http_requests,app=app1 response_time=15 1654150511
http_requests,app=app1 response_time=14 1654150512
http_requests,app=app1 response_time=13 1654150513
http_requests,app=app1 response_time=10 1654150514
http_requests,app=app1 response_time=12 1654150515
http_requests,app=app1 response_time=11 1654150516
http_requests,app=app1 response_time=13 1654150517
http_requests,app=app1 response_time=12 1654150518
The frequency for http_requests is about 1 second.
I want to combine the 2 metrics into a single table.
_time,value_cpu,value_response_time
1654150509,10,12
1654150510,10,11
1654150511,12,15
As timestamps may be different, is there a way to combine them in flux? Is fill the way. I'm not sure if timeshift will help here. Although I didnt understand it completly. I assume some sort of downsampling is needed (not sure how to do that either in flux). Is there a way to mathch the measuerment based on the closest time differece?
IE...
if response measurements came at time instances
1654150510,app=app1 response_time=10
1654150513,app=app1 response_time=12
1654150514,app=app1 response_time=11
1654150516,app=app1 response_time=13
and CPU came in at
1654150512,app=app1 cpu=20
1654150515,app=app1 cpu=30
Then resulting table is
_time,response_time,cpu
1654150510,10,
1654150513,12,20
1654150514,11,
1654150516,13,30
The CPU value combines to the point with the closest timestamp (+/- difference)
How can this be achieved in flux in influxdb?
I guess downsampling with aggregateWindow and fill could work.
Alternative way is to pivot and then fill missing values using previous value. The advantage, at least from performance point of view, is that when there is no record in neither measurement at given time, no new row filled with previous values is created.
With
from(bucket: "stackx")
|> range(start: -1d)
|> filter(fn: (r) => r._measurement == "metrics" or r._measurement == "http_requests")
|> drop(columns: ["_measurement"]) // or remove from group other way
|> pivot(rowKey:["_time"], columnKey: ["_field"], valueColumn: "_value")
|> sort(columns: ["_time"], desc: false)
|> fill(column: "cpu", usePrevious: true)
|> fill(column: "response_time", usePrevious: true)
the result would be
_time,app,cpu,response_time
2022-06-02T06:15:09Z,app1,,12
2022-06-02T06:15:10Z,app1,10,11
2022-06-02T06:15:11Z,app1,10,15
2022-06-02T06:15:12Z,app1,12,14
2022-06-02T06:15:13Z,app1,12,13
2022-06-02T06:15:14Z,app1,13,10
2022-06-02T06:15:15Z,app1,13,12
2022-06-02T06:15:16Z,app1,14,11
2022-06-02T06:15:17Z,app1,14,13
2022-06-02T06:15:18Z,app1,14,12
2022-06-02T06:15:19Z,app1,15,12

Improve InfluxDB response time

I try to improve the respond-time of Grafana and InfluxDB.
Hopefully someone can give me some support.
My setup is:
OS: Windows 10
Docker version 20.10.5
Docker-Container: Grafana v7.5.5
Docker-Container: InfluxDB Version 2.0.5
There are different Dashboards, with different numbers of panels.
One Granfa-Dashboard for example has 24 Graph-Panels,
with refesh-rate of 500ms,1s,2s which should updated the Graph-Panel fast.
I know that 500ms refesh-rate is high, but the same behavior is seen with 1s, 2s, 5s.
The queries use this scheme:
from(bucket: "bucket")
|> range(start: v.timeRangeStart, stop:v.timeRangeStop)
|> filter(fn: (r) =>
r._measurement == "DEVICE1" and
r.COMMON == "VALUE1"
)
|> set(key: "_field", value: "")
|> set(key: "COMMON", value: "DEVICE 1")
|> aggregateWindow(every: v.windowPeriod, fn: last)
What I observe is that some graph updates very slowly.
With the Chrome-Browser I have observed the traffic.
There are some states, like "canceled" or "pending", that seems to slow down the response.
A second observation is, that some queries require up to 500 ms to process.
So my main question is, how I could improve response time if a Grafana-Dashboard request a query from InfluxDB.
Update:
With these lines before a query, different statistics (TotalDuration, CompileDuration, QueueDuration,...) about the execution will be enabled:
import "profiler"
option profiler.enabledProfilers = ["query",
"operator"]
Is it possible to plot this statistics?
Thx in advance
Source:
https://docs.influxdata.com/flux/v0.x/stdlib/profiler/

How can I query an aggregation of values grouped by hour of day from InfluxDB?

Question
I have an InfluxDB v1.8 running on a Raspberry Pi 2.
I have a measurement "gridElectricityMeter" that has (next to other fields not important here) a field "import" which contains the total reading of the energy meter in Watt-hours. The measurement receives a new value every 10 seconds.
I would like to create a Bar chart that shows the imported amount of energy by hour of day in some time frame to be specified (I am using Grafana to create the chart, so the time frame is set there).
E.g. if the raw data in InfluxDB looks like this:
time
import
2021-01-01T00:00:00Z
0
2021-01-01T01:00:00Z
2
2021-01-01T01:20:00Z
8
2021-01-01T02:00:00Z
10
2021-01-02T00:00:00Z
10
2021-01-02T01:00:00Z
20
(In reality there are of course far more values, one value per 10 seconds. A slight lack of precision because the timestamps are likely a few seconds of the full hour may be accepted.)
Then I want the following result:
hour
sum
explanation
0
12
2 (first day) + 10 (second day)
1
8
8 (first day)
2
0
0
3
0
0
...
...
Data should be filled with zero if unavailable (e.g. querying "today", of course there is no data for the time after "now").
What I have tried so far
As far as I understand I won't be able to do this with InfluxQL. So here is what I tried using flux. Also I think I understood that InfluxDB >= 2.0 is 64-bit only and thus won't run on Raspberry Pi 2.
Here is what I came up with:
import "date"
import "generate"
// generate a table with 24 entries (hours 0-23) and "sum=0":
initial = generate.from(
count: 24,
fn: (n) => n,
// start and stop are actually irrelevant, but they are required args
start: 2021-01-01T00:00:00Z,
stop: 2021-01-02T00:00:00Z,
)
|> map(fn: (r) => ({hour: r._value, sum: 0}))
// First group data by day and hour
data = from(bucket: "myDatabase")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r._measurement == "gridElectricityMeter" and r._field == "import")
|> map(fn: (r) => ({r with hour: date.hour(t: r._time), group: date.truncate(t: r._time, unit: 1h)}))
|> group(columns: ["group"])
// In each group we get the first&last point so that we get the first and last reading from the energy meter. The difference between those values is the energy imported.
first = data
|> first()
last = data
|> last()
// Calculate energy used in each group and then regroup by hour (and not day):
byhour = join(tables: {first: first, last: last}, on: ["group"])
|> map(fn: (r) => ({r with hour: r.hour_first, _value: r._value_last - r._value_first}))
|> group(columns: ["hour"])
// manual summing because of https://github.com/influxdata/flux/issues/2505
|> reduce(fn: (r, accumulator) => ({sum: accumulator.sum + r._value}), identity: {sum: 0})
// Fill hours we have no data for with zero:
union(tables: [byhour, initial])
|> group(columns: ["hour"])
|> reduce(fn: (r, accumulator) => ({sum: accumulator.sum + r.sum}), identity: {sum: 0})
|> group()
|> sort(columns: ["hour"])
It does seem to work and actually do what I want it to do. However it seem ridiculously complicated. And also it is slow. Querying data for a single day takes about 7 seconds. A single day contains 66024=8640 data points in that measurements which really does not sound very much to me and it should not be that slow.
Is there a better way of doing it?

influxdb 2.0 storage measurements utilization

We have an influxdb 2.0 instance that is quickly growing in size (suspect docker metrics), so I would like to find out which measurements are "spamming".
TLDR: how do we find the measurements that take up most of the disk space on influxdb 2.0
Long version:
i know old influx had a stats _internal db with some relevant metrics, found _monitoring in 2.0, but mine is kinda empty, only have write_errors measurement there
tried a bunch of flux queries like, but i don't think they give what i'm looking for:
from(bucket: "telegraf")
|> range(start: -15m)
|> filter(fn: (r) => r._measurement == "docker_container_blkio")
|> count()
tried influxQL queries like via the v1 API, like:
SELECT COUNT(system) FROM telegraf
but that gave:
"results": [
{
"statement_id": 0
}
]
}
played a bit with influxd inspect export-index, that would also be usefull if it gave me some stats about the measurements
i saw influxdb 1.8 had this: https://docs.influxdata.com/influxdb/v1.8/tools/influx_inspect/#report-disk

Resources