Combining two different sources with different timestamps in influxdb/flux - time-series

I have 2 measurements as follows:
metrics,app=app1 cpu=10 1654150510
metrics,app=app1 cpu=12 1654150512
metrics,app=app1 cpu=13 1654150514
metrics,app=app1 cpu=14 1654150516
metrics,app=app1 cpu=15 1654150519
The frequency of the "metrics" measurement is about once in 2/3 seconds.
And the second one is:
http_requests,app=app1 response_time=12 1654150509
http_requests,app=app1 response_time=11 1654150510
http_requests,app=app1 response_time=15 1654150511
http_requests,app=app1 response_time=14 1654150512
http_requests,app=app1 response_time=13 1654150513
http_requests,app=app1 response_time=10 1654150514
http_requests,app=app1 response_time=12 1654150515
http_requests,app=app1 response_time=11 1654150516
http_requests,app=app1 response_time=13 1654150517
http_requests,app=app1 response_time=12 1654150518
The frequency for http_requests is about 1 second.
I want to combine the 2 metrics into a single table.
_time,value_cpu,value_response_time
1654150509,10,12
1654150510,10,11
1654150511,12,15
As timestamps may be different, is there a way to combine them in flux? Is fill the way. I'm not sure if timeshift will help here. Although I didnt understand it completly. I assume some sort of downsampling is needed (not sure how to do that either in flux). Is there a way to mathch the measuerment based on the closest time differece?
IE...
if response measurements came at time instances
1654150510,app=app1 response_time=10
1654150513,app=app1 response_time=12
1654150514,app=app1 response_time=11
1654150516,app=app1 response_time=13
and CPU came in at
1654150512,app=app1 cpu=20
1654150515,app=app1 cpu=30
Then resulting table is
_time,response_time,cpu
1654150510,10,
1654150513,12,20
1654150514,11,
1654150516,13,30
The CPU value combines to the point with the closest timestamp (+/- difference)
How can this be achieved in flux in influxdb?

I guess downsampling with aggregateWindow and fill could work.
Alternative way is to pivot and then fill missing values using previous value. The advantage, at least from performance point of view, is that when there is no record in neither measurement at given time, no new row filled with previous values is created.
With
from(bucket: "stackx")
|> range(start: -1d)
|> filter(fn: (r) => r._measurement == "metrics" or r._measurement == "http_requests")
|> drop(columns: ["_measurement"]) // or remove from group other way
|> pivot(rowKey:["_time"], columnKey: ["_field"], valueColumn: "_value")
|> sort(columns: ["_time"], desc: false)
|> fill(column: "cpu", usePrevious: true)
|> fill(column: "response_time", usePrevious: true)
the result would be
_time,app,cpu,response_time
2022-06-02T06:15:09Z,app1,,12
2022-06-02T06:15:10Z,app1,10,11
2022-06-02T06:15:11Z,app1,10,15
2022-06-02T06:15:12Z,app1,12,14
2022-06-02T06:15:13Z,app1,12,13
2022-06-02T06:15:14Z,app1,13,10
2022-06-02T06:15:15Z,app1,13,12
2022-06-02T06:15:16Z,app1,14,11
2022-06-02T06:15:17Z,app1,14,13
2022-06-02T06:15:18Z,app1,14,12
2022-06-02T06:15:19Z,app1,15,12

Related

Storing numbers above 2e19

I am using InfluxDB and need to store very large numbers (uint256) with full precision (no floating point).
I am currently using strings to achieve this, but I thus loose the ability to perform arithmetic operations on these numbers, which is something I need to implement.
ex.
{ _measurement=transfer, _field=amount, _value=90000000000000000000001 }
{ _measurement=transfer, _field=amount, _value=12000000000000000000000 }
from(bucket: "xxx")
|> range(start: 0)
|> filter(fn: (r) => r._measurement == "transfer" and r._field == "amount")
|> toUint()
|> sum()
I get the following error: runtime error #4:6-4:13: toInt: failed to evaluate map function: cannot convert string "90000000000000000000001" to int due to invalid syntax
Is there a built-in solution like ClickHouse's uint256, or a third-party package to achieve this?
Unfortunately InfluxDB doesn't support big numbers yet. It stores all integers as signed int64 data types. The minimum and maximum valid values for int64 are -9223372036854775808 and 9223372036854775807. See more info here.

Improve InfluxDB response time

I try to improve the respond-time of Grafana and InfluxDB.
Hopefully someone can give me some support.
My setup is:
OS: Windows 10
Docker version 20.10.5
Docker-Container: Grafana v7.5.5
Docker-Container: InfluxDB Version 2.0.5
There are different Dashboards, with different numbers of panels.
One Granfa-Dashboard for example has 24 Graph-Panels,
with refesh-rate of 500ms,1s,2s which should updated the Graph-Panel fast.
I know that 500ms refesh-rate is high, but the same behavior is seen with 1s, 2s, 5s.
The queries use this scheme:
from(bucket: "bucket")
|> range(start: v.timeRangeStart, stop:v.timeRangeStop)
|> filter(fn: (r) =>
r._measurement == "DEVICE1" and
r.COMMON == "VALUE1"
)
|> set(key: "_field", value: "")
|> set(key: "COMMON", value: "DEVICE 1")
|> aggregateWindow(every: v.windowPeriod, fn: last)
What I observe is that some graph updates very slowly.
With the Chrome-Browser I have observed the traffic.
There are some states, like "canceled" or "pending", that seems to slow down the response.
A second observation is, that some queries require up to 500 ms to process.
So my main question is, how I could improve response time if a Grafana-Dashboard request a query from InfluxDB.
Update:
With these lines before a query, different statistics (TotalDuration, CompileDuration, QueueDuration,...) about the execution will be enabled:
import "profiler"
option profiler.enabledProfilers = ["query",
"operator"]
Is it possible to plot this statistics?
Thx in advance
Source:
https://docs.influxdata.com/flux/v0.x/stdlib/profiler/

How can I query an aggregation of values grouped by hour of day from InfluxDB?

Question
I have an InfluxDB v1.8 running on a Raspberry Pi 2.
I have a measurement "gridElectricityMeter" that has (next to other fields not important here) a field "import" which contains the total reading of the energy meter in Watt-hours. The measurement receives a new value every 10 seconds.
I would like to create a Bar chart that shows the imported amount of energy by hour of day in some time frame to be specified (I am using Grafana to create the chart, so the time frame is set there).
E.g. if the raw data in InfluxDB looks like this:
time
import
2021-01-01T00:00:00Z
0
2021-01-01T01:00:00Z
2
2021-01-01T01:20:00Z
8
2021-01-01T02:00:00Z
10
2021-01-02T00:00:00Z
10
2021-01-02T01:00:00Z
20
(In reality there are of course far more values, one value per 10 seconds. A slight lack of precision because the timestamps are likely a few seconds of the full hour may be accepted.)
Then I want the following result:
hour
sum
explanation
0
12
2 (first day) + 10 (second day)
1
8
8 (first day)
2
0
0
3
0
0
...
...
Data should be filled with zero if unavailable (e.g. querying "today", of course there is no data for the time after "now").
What I have tried so far
As far as I understand I won't be able to do this with InfluxQL. So here is what I tried using flux. Also I think I understood that InfluxDB >= 2.0 is 64-bit only and thus won't run on Raspberry Pi 2.
Here is what I came up with:
import "date"
import "generate"
// generate a table with 24 entries (hours 0-23) and "sum=0":
initial = generate.from(
count: 24,
fn: (n) => n,
// start and stop are actually irrelevant, but they are required args
start: 2021-01-01T00:00:00Z,
stop: 2021-01-02T00:00:00Z,
)
|> map(fn: (r) => ({hour: r._value, sum: 0}))
// First group data by day and hour
data = from(bucket: "myDatabase")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r._measurement == "gridElectricityMeter" and r._field == "import")
|> map(fn: (r) => ({r with hour: date.hour(t: r._time), group: date.truncate(t: r._time, unit: 1h)}))
|> group(columns: ["group"])
// In each group we get the first&last point so that we get the first and last reading from the energy meter. The difference between those values is the energy imported.
first = data
|> first()
last = data
|> last()
// Calculate energy used in each group and then regroup by hour (and not day):
byhour = join(tables: {first: first, last: last}, on: ["group"])
|> map(fn: (r) => ({r with hour: r.hour_first, _value: r._value_last - r._value_first}))
|> group(columns: ["hour"])
// manual summing because of https://github.com/influxdata/flux/issues/2505
|> reduce(fn: (r, accumulator) => ({sum: accumulator.sum + r._value}), identity: {sum: 0})
// Fill hours we have no data for with zero:
union(tables: [byhour, initial])
|> group(columns: ["hour"])
|> reduce(fn: (r, accumulator) => ({sum: accumulator.sum + r.sum}), identity: {sum: 0})
|> group()
|> sort(columns: ["hour"])
It does seem to work and actually do what I want it to do. However it seem ridiculously complicated. And also it is slow. Querying data for a single day takes about 7 seconds. A single day contains 66024=8640 data points in that measurements which really does not sound very much to me and it should not be that slow.
Is there a better way of doing it?

Modifying the series data using flux language in influx

I have a function which takes the 1st value from a masurement every 5mins.
ever5Mins1st = from(bucket: "Historian/oneday")
|> range(start: dashboardTime)
|> filter(fn: (r) =>
r._measurement == "InventoryStock" and
r._field =="Value" and
r.Location=="NYC"
)
|>window(every:5m)
|>first()
|>window(every:inf)
Now If I want to do a difference of two consecutive points , I can do by using difference function
ever5Mins1st
|>difference()
But what If I do I want a sum of every consecutive points or every 5 points.
I know I can write custom functions which recieves piped data. But do I need to write a for loop? Are there any examples of for loops and conditional statements?
// Function definition
add = (tables=<-) =>
tables
|> map(fn: (r) => //What should I do here, for loop?)
I don't believe this is possible at the moment. This would require a chunk function, to split a list into chunks of lists.
I'll add an issue to Flux and start to implement this functionality :+1:

InfluxDB - Aggregate data on state duration

I have a measurement in InfluxDB that keeps track of the status of a system. For example, consider the following measures:
03/22/18 00:00:00AM STATUS_A
03/22/18 09:00:00AM STATUS_B
03/22/18 13:00:00AM STATUS_C
03/22/18 18:00:00AM STATUS_B
03/22/18 19:00:00AM STATUS_D
03/22/18 21:00:00AM STATUS_A
What I need to do now is to derive how long the system was in each state every day. In the above example, the desired result is something like
STATUS_A 12h (from 00:00 to 09:00 and from 21:00 to 24:00)
STATUS_B 5h (from 09:00 to 13:00 and from 18:00 to 19:00)
STATUS_C 5h (from 13:00 to 18:00)
STATUS_D 2h (from 09:00 to 21:00)
I'm very new to the TICK stack, so I could be missing something quite elementary. I was thinking to use Kapacitor to create the aggregate result, but I don't really know how to obtain the result
You mentioned TICK stack so I assume you are using InfluxDB version 1.
It is feasible to report the total duration for a given state in InfluxDB but only in Flux not InfluxQL (easily).
You could try these steps:
Enable Flux in v1.8 with the configuration change here
Sample Flux could be:
from(bucket: "yourDatabaseName/autogen") |> range(start: 2018-03-20T00:00:00Z, stop: 2018-03-20T23:59:59Z) |> filter(fn: (r) => r._measurement == "yourMeasurementName") |> stateDuration(fn: (r) => r._value == "STATUS_A", column: "state", unit: 1h)
Again it's still not possible to do it yet in InfluxQL though the community has been waiting for this for a while. See more details here.

Resources