I have 2 measurements as follows:
metrics,app=app1 cpu=10 1654150510
metrics,app=app1 cpu=12 1654150512
metrics,app=app1 cpu=13 1654150514
metrics,app=app1 cpu=14 1654150516
metrics,app=app1 cpu=15 1654150519
The frequency of the "metrics" measurement is about once in 2/3 seconds.
And the second one is:
http_requests,app=app1 response_time=12 1654150509
http_requests,app=app1 response_time=11 1654150510
http_requests,app=app1 response_time=15 1654150511
http_requests,app=app1 response_time=14 1654150512
http_requests,app=app1 response_time=13 1654150513
http_requests,app=app1 response_time=10 1654150514
http_requests,app=app1 response_time=12 1654150515
http_requests,app=app1 response_time=11 1654150516
http_requests,app=app1 response_time=13 1654150517
http_requests,app=app1 response_time=12 1654150518
The frequency for http_requests is about 1 second.
I want to combine the 2 metrics into a single table.
_time,value_cpu,value_response_time
1654150509,10,12
1654150510,10,11
1654150511,12,15
As timestamps may be different, is there a way to combine them in flux? Is fill the way. I'm not sure if timeshift will help here. Although I didnt understand it completly. I assume some sort of downsampling is needed (not sure how to do that either in flux). Is there a way to mathch the measuerment based on the closest time differece?
IE...
if response measurements came at time instances
1654150510,app=app1 response_time=10
1654150513,app=app1 response_time=12
1654150514,app=app1 response_time=11
1654150516,app=app1 response_time=13
and CPU came in at
1654150512,app=app1 cpu=20
1654150515,app=app1 cpu=30
Then resulting table is
_time,response_time,cpu
1654150510,10,
1654150513,12,20
1654150514,11,
1654150516,13,30
The CPU value combines to the point with the closest timestamp (+/- difference)
How can this be achieved in flux in influxdb?
I guess downsampling with aggregateWindow and fill could work.
Alternative way is to pivot and then fill missing values using previous value. The advantage, at least from performance point of view, is that when there is no record in neither measurement at given time, no new row filled with previous values is created.
With
from(bucket: "stackx")
|> range(start: -1d)
|> filter(fn: (r) => r._measurement == "metrics" or r._measurement == "http_requests")
|> drop(columns: ["_measurement"]) // or remove from group other way
|> pivot(rowKey:["_time"], columnKey: ["_field"], valueColumn: "_value")
|> sort(columns: ["_time"], desc: false)
|> fill(column: "cpu", usePrevious: true)
|> fill(column: "response_time", usePrevious: true)
the result would be
_time,app,cpu,response_time
2022-06-02T06:15:09Z,app1,,12
2022-06-02T06:15:10Z,app1,10,11
2022-06-02T06:15:11Z,app1,10,15
2022-06-02T06:15:12Z,app1,12,14
2022-06-02T06:15:13Z,app1,12,13
2022-06-02T06:15:14Z,app1,13,10
2022-06-02T06:15:15Z,app1,13,12
2022-06-02T06:15:16Z,app1,14,11
2022-06-02T06:15:17Z,app1,14,13
2022-06-02T06:15:18Z,app1,14,12
2022-06-02T06:15:19Z,app1,15,12
Question
I have an InfluxDB v1.8 running on a Raspberry Pi 2.
I have a measurement "gridElectricityMeter" that has (next to other fields not important here) a field "import" which contains the total reading of the energy meter in Watt-hours. The measurement receives a new value every 10 seconds.
I would like to create a Bar chart that shows the imported amount of energy by hour of day in some time frame to be specified (I am using Grafana to create the chart, so the time frame is set there).
E.g. if the raw data in InfluxDB looks like this:
time
import
2021-01-01T00:00:00Z
0
2021-01-01T01:00:00Z
2
2021-01-01T01:20:00Z
8
2021-01-01T02:00:00Z
10
2021-01-02T00:00:00Z
10
2021-01-02T01:00:00Z
20
(In reality there are of course far more values, one value per 10 seconds. A slight lack of precision because the timestamps are likely a few seconds of the full hour may be accepted.)
Then I want the following result:
hour
sum
explanation
0
12
2 (first day) + 10 (second day)
1
8
8 (first day)
2
0
0
3
0
0
...
...
Data should be filled with zero if unavailable (e.g. querying "today", of course there is no data for the time after "now").
What I have tried so far
As far as I understand I won't be able to do this with InfluxQL. So here is what I tried using flux. Also I think I understood that InfluxDB >= 2.0 is 64-bit only and thus won't run on Raspberry Pi 2.
Here is what I came up with:
import "date"
import "generate"
// generate a table with 24 entries (hours 0-23) and "sum=0":
initial = generate.from(
count: 24,
fn: (n) => n,
// start and stop are actually irrelevant, but they are required args
start: 2021-01-01T00:00:00Z,
stop: 2021-01-02T00:00:00Z,
)
|> map(fn: (r) => ({hour: r._value, sum: 0}))
// First group data by day and hour
data = from(bucket: "myDatabase")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r._measurement == "gridElectricityMeter" and r._field == "import")
|> map(fn: (r) => ({r with hour: date.hour(t: r._time), group: date.truncate(t: r._time, unit: 1h)}))
|> group(columns: ["group"])
// In each group we get the first&last point so that we get the first and last reading from the energy meter. The difference between those values is the energy imported.
first = data
|> first()
last = data
|> last()
// Calculate energy used in each group and then regroup by hour (and not day):
byhour = join(tables: {first: first, last: last}, on: ["group"])
|> map(fn: (r) => ({r with hour: r.hour_first, _value: r._value_last - r._value_first}))
|> group(columns: ["hour"])
// manual summing because of https://github.com/influxdata/flux/issues/2505
|> reduce(fn: (r, accumulator) => ({sum: accumulator.sum + r._value}), identity: {sum: 0})
// Fill hours we have no data for with zero:
union(tables: [byhour, initial])
|> group(columns: ["hour"])
|> reduce(fn: (r, accumulator) => ({sum: accumulator.sum + r.sum}), identity: {sum: 0})
|> group()
|> sort(columns: ["hour"])
It does seem to work and actually do what I want it to do. However it seem ridiculously complicated. And also it is slow. Querying data for a single day takes about 7 seconds. A single day contains 66024=8640 data points in that measurements which really does not sound very much to me and it should not be that slow.
Is there a better way of doing it?
We have an influxdb 2.0 instance that is quickly growing in size (suspect docker metrics), so I would like to find out which measurements are "spamming".
TLDR: how do we find the measurements that take up most of the disk space on influxdb 2.0
Long version:
i know old influx had a stats _internal db with some relevant metrics, found _monitoring in 2.0, but mine is kinda empty, only have write_errors measurement there
tried a bunch of flux queries like, but i don't think they give what i'm looking for:
from(bucket: "telegraf")
|> range(start: -15m)
|> filter(fn: (r) => r._measurement == "docker_container_blkio")
|> count()
tried influxQL queries like via the v1 API, like:
SELECT COUNT(system) FROM telegraf
but that gave:
"results": [
{
"statement_id": 0
}
]
}
played a bit with influxd inspect export-index, that would also be usefull if it gave me some stats about the measurements
i saw influxdb 1.8 had this: https://docs.influxdata.com/influxdb/v1.8/tools/influx_inspect/#report-disk
I have a function which takes the 1st value from a masurement every 5mins.
ever5Mins1st = from(bucket: "Historian/oneday")
|> range(start: dashboardTime)
|> filter(fn: (r) =>
r._measurement == "InventoryStock" and
r._field =="Value" and
r.Location=="NYC"
)
|>window(every:5m)
|>first()
|>window(every:inf)
Now If I want to do a difference of two consecutive points , I can do by using difference function
ever5Mins1st
|>difference()
But what If I do I want a sum of every consecutive points or every 5 points.
I know I can write custom functions which recieves piped data. But do I need to write a for loop? Are there any examples of for loops and conditional statements?
// Function definition
add = (tables=<-) =>
tables
|> map(fn: (r) => //What should I do here, for loop?)
I don't believe this is possible at the moment. This would require a chunk function, to split a list into chunks of lists.
I'll add an issue to Flux and start to implement this functionality :+1:
I have a measurement in InfluxDB that keeps track of the status of a system. For example, consider the following measures:
03/22/18 00:00:00AM STATUS_A
03/22/18 09:00:00AM STATUS_B
03/22/18 13:00:00AM STATUS_C
03/22/18 18:00:00AM STATUS_B
03/22/18 19:00:00AM STATUS_D
03/22/18 21:00:00AM STATUS_A
What I need to do now is to derive how long the system was in each state every day. In the above example, the desired result is something like
STATUS_A 12h (from 00:00 to 09:00 and from 21:00 to 24:00)
STATUS_B 5h (from 09:00 to 13:00 and from 18:00 to 19:00)
STATUS_C 5h (from 13:00 to 18:00)
STATUS_D 2h (from 09:00 to 21:00)
I'm very new to the TICK stack, so I could be missing something quite elementary. I was thinking to use Kapacitor to create the aggregate result, but I don't really know how to obtain the result
You mentioned TICK stack so I assume you are using InfluxDB version 1.
It is feasible to report the total duration for a given state in InfluxDB but only in Flux not InfluxQL (easily).
You could try these steps:
Enable Flux in v1.8 with the configuration change here
Sample Flux could be:
from(bucket: "yourDatabaseName/autogen") |> range(start: 2018-03-20T00:00:00Z, stop: 2018-03-20T23:59:59Z) |> filter(fn: (r) => r._measurement == "yourMeasurementName") |> stateDuration(fn: (r) => r._value == "STATUS_A", column: "state", unit: 1h)
Again it's still not possible to do it yet in InfluxQL though the community has been waiting for this for a while. See more details here.