Flux left join on empty table - influxdb

I'm looking to join two data streams together but receive the following error from Influx:
error preparing right side of join: cannot join on an empty table
I'm trying to build a query which compares the total sales by a store this month compared to last month. If the store has no sales this month then I don't want it to show. Below is a basic example of my current query.
import "join"
lastMonth = from(bucket: "my-bucket")
|> range(start: 2022-10-01, stop: 2022-11-01)
|> filter(fn: (r) => r._measurement == "transaction")
|> pivot(rowKey: ["_time"], columnKey: ["_field"], valueColumn: "_value")
|> group(columns: ["storeId"], mode: "by")
|> reduce(
fn: (r, accumulator) => ({
storeId: r.storeId,
amount: accumulator.amount + (r.totalAmount - r.refundAmount)
}),
identity: {
storeId: "",
amount: 0.0
}
)
from(bucket: "my-bucket")
|> range(start: 2022-11-01, stop: 2022-12-01)
|> filter(fn: (r) => r._measurement == "transaction")
|> pivot(rowKey: ["_time"], columnKey: ["_field"], valueColumn: "_value")
|> group(columns: ["storeId"], mode: "by")
|> reduce(
fn: (r, accumulator) => ({
storeId: r.storeId,
amount: accumulator.amount + (r.totalAmount - r.refundAmount)
}),
identity: {
storeId: "",
amount: 0.0
}
)
|> join.left(
right: lastMonth,
on: (l, r) => l.storeId == r.storeId,
as: (l, r) => ({
storeId: l.storeId,
thisMonthAmount: l.amount,
lastMonthAmount: r.amount
})
)
How can I achieve this in Flux without encountering this issue?

Related

InfluxDB 2.0 - Flux query: How to sum a column and use the sum for further calculations

I am new to flux query language (with Influx DB 2) and cant find a solution for the following problem:
I have data with changing true and false values:
I was able to calculate the time in seconds until the next change by using the events.duration function:
Now I want to calculate the total time and the time of all "false"-events and after that I want to calculate the percentage of all false events. I tryed the following
import "contrib/tomhollingworth/events"
total = from(bucket: "********")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "********")
|> filter(fn: (r) => r["Server"] == "********")
|> filter(fn: (r) => r["_field"] == "********")
|> filter(fn: (r) => r["DataNode"] == "********")
|> events.duration(
unit: 1s,
columnName: "duration",
timeColumn: "_time",
stopColumn: "_stop"
)
|> sum(column: "duration")
|> yield(name: "total")
downtime = from(bucket: "********")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "********")
|> filter(fn: (r) => r["Server"] == "********")
|> filter(fn: (r) => r["_field"] == "********")
|> filter(fn: (r) => r["DataNode"] == "********")
|> events.duration(
unit: 1s,
columnName: "duration",
timeColumn: "_time",
stopColumn: "_stop"
)
|> pivot(rowKey:["_time"], columnKey: ["_value"], valueColumn: "duration")
|> drop(columns: ["true"])
|> sum(column: "false")
|> yield(name: "downtime")
downtime_percentage = downtime.false / total.duration
With this I am getting the following error error #44:23-44:31: expected {A with false:B} but found [C]
I also tryed some variations but couldnet get it to work.
I guess I am getting some basic things wrong but I couldnt figure it out yet. Let me know, if you need more information.
I have found a way to solve my problem. Although I am sure that there is a more elegant solution, I document my way here, maybe it helps someone and we can improve it together.
import "contrib/tomhollingworth/events"
//Set time window in seconds (based on selected time)
time_window = int(v: v.timeRangeStart)/-1000000000
//Filter (IoT-)Data
data= from(bucket: "*******")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "*******")
|> filter(fn: (r) => r["Server"] == "*******")
|> filter(fn: (r) => r["Equipment"] == "*******")
|> filter(fn: (r) => r["DataNode"] == "******")
//Use events.duration to calculate the duration in seconds of each true/false event.
|> events.duration(
unit: 1s,
columnName: "duration",
timeColumn: "_time",
stopColumn: "_stop"
)
//Sum up the event times via "sum()" and save them as an array variable via "findColumn()". This is the only way to access the value later (As far as I know. please let me know if you know other ways!).
total_array = data
|> sum(column: "duration")
|> findColumn(
fn: (key) => key._field == "*******",
column: "duration",
)
//Calculate "missing time" in seconds in the time window, because the first event in the time window is missing.
missing_time = time_window - total_array[0]
//Create an array with the first event to determine if it is true or false
first_value_in_window = data
|> first()
|> findColumn(
fn: (key) => key._field == "*******",
column: "_value",
)
//Calculate the downtime by creating columns with the true and false values via pivot. Then sum up the column with the false values
downtime = data
|> map(fn: (r) => ({ r with duration_percentage: float(v: r.duration)/float(v: time_window) }))
|> pivot(rowKey:["_time"], columnKey: ["_value"], valueColumn: "duration_percentage")
|> map( fn: (r) => ({r with
downtime: if exists r.false then
r.false
else
0.0
}))
|> sum(column: "downtime")
//Create an array with the downtime so that this value can be accessed later on
downtime_array = downtime
|> findColumn(
fn: (key) => key._field == "PLS_Antrieb_laeuft",
column: "downtime",
)
//If the first value in the considered time window is true, then the remaining time in the time window (missing_time) was downtime. Write this value in the column "false_percentage_before_window".
//The total downtime is calculated from the previously calculated sum(downtime_array) and, if applicable, the downtime of the remaining time in the time window if the first value is true (first_value_in_window[0])
data
|> map( fn: (r) => ({r with
false_percentage_before_window: if first_value_in_window[0] then
float(v: missing_time)/float(v: time_window)
else
0.0
}))
|> map(fn: (r) => ({ r with _value: (downtime_array[0] + r.false_percentage_before_window) * 100.00 }))
|> first()
|> keep(columns: ["_value"])
|> yield(name: "Total Downtime")
This solution assumes that the true/false events only occur alternately.

Influxdb Flux query with custom window aggregate function

Could you please help me with the InfluxDB 2 Flux query syntax to build a windowed query with a custom aggregate function.
I went through the online docs, but they seem to be lacking examples on how to get to the actual window content (first, last records) from within the custom aggregate function. It also doesn't immediately describe the expected signature of the custom functions.
I'd like to build a query with a sliding window that would produce a difference between the first and the last value in the window. Something along these lines:
difference = (column, tables=<-) => ({ tables.last() - tables.first() })
from(bucket: "my-bucket")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "simple")
|> filter(fn: (r) => r["_field"] == "value")
|> aggregateWindow(every: 1mo, fn: difference, column: "_value", timeSrc: "_stop", timeDst: "_time", createEmpty: true)
|> yield(name: "diff")
The syntax of the above example is obviously wrong, but hopefully you can understand, what I'm trying to do.
Thank you!
Came up with the following. It works at least syntactically:
from(bucket: "my-bucket")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "simple")
|> filter(fn: (r) => r["_field"] == "value")
|> aggregateWindow(
every: 1mo,
fn: (column, tables=<-) => tables |> reduce(
identity: {first: -1.0, last: -1.0, diff: -1.0},
fn: (r, acc) => ({
first:
if acc.first < 0.0 then r._value
else acc.first,
last:
r._value,
diff:
if acc.first < 0.0 then 0.0
else (acc.last - acc.first)
})
)
|> drop(columns: ["first", "last"])
|> set(key: "_field", value: column)
|> rename(columns: {diff: "_value"})
)
|> yield(name: "diff")
The window is not really sliding though.
The same for the sliding window:
from(bucket: "my-bucket")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "simple")
|> filter(fn: (r) => r["_field"] == "value")
|> window(every: 1h, period: 1mo)
|> reduce(
identity: {first: -1.0, last: -1.0, diff: -1.0},
fn: (r, acc) => ({
first:
if acc.first < 0.0 then r._value
else acc.first,
last:
r._value,
diff:
if acc.first < 0.0 then 0.0
else (acc.last - acc.first)
})
)
|> duplicate(column: "_stop", as: "_time")
|> drop(columns: ["first", "last"])
|> rename(columns: {diff: "_value"})
|> window(every: inf)

Optimising a query (Min, Max, First and Last)

First post here and new to InfluxDB.
I have been given the following query, and I have no clue how I can optimize it.
A period of 24 hours and 1m windows takes around 2.4 seconds at the moment (is this the expected amount of time).
I suspect one of the reasons is that there are 4 tables (querying the same set of data) and 3 joins.
I have looked into the map function to try and reduce it to one table but I can't seem to get it to work with the window.
bucketName = "${bucket}"
startTime = -${period}
interval = ${interval}
token = "${token}"
minPrice = from (bucket: bucketName)
|> range(start: startTime, stop: now())
|> filter(fn: (r) => r["_field"] == token)
|> window(every: interval)
|> min()
|> duplicate(column: "_value", as: "low")
|> keep(columns: ["low", "_start", "_stop"] )
maxPrice = from (bucket: bucketName)
|> range(start: startTime, stop: now())
|> filter(fn: (r) => r["_field"] == token)
|> window(every: interval)
|> max()
|> duplicate(column: "_value", as: "high")
|> keep(columns: ["high", "_start", "_stop"] )
openPrice = from (bucket: bucketName)
|> range(start: startTime, stop: now())
|> filter(fn: (r) => r["_field"] == token)
|> window(every: interval)
|> first()
|> duplicate(column: "_value", as: "open")
|> keep(columns: ["open", "_stop", "_start"] )
closePrice = from (bucket: bucketName)
|> range(start: startTime, stop: now())
|> filter(fn: (r) => r["_field"] == token)
|> window(every: interval)
|> last()
|> duplicate(column: "_value", as: "close")
|> keep(columns: ["close", "_stop", "_start"] )
highLowData = join(tables: {min: minPrice, max: maxPrice}, on: ["_start", "_stop"])
openCloseData = join(tables: {open: openPrice, close: closePrice}, on: ["_start", "_stop"])
join(tables: {highLow: highLowData, openClose: openCloseData}, on: ["_start", "_stop"])
I have managed to optimize it down to 0.7s by using a union rather than a join. However now I'm faced with data that has empty fields.
Like this:
Query below
startTime = -24h
breakDown = 1m
token = "tokenName"
all = from (bucket: "prices")
|> range(start: startTime, stop: now())
|> filter(fn: (r) => r["_field"] == token)
|> window(every: breakDown)
lowPrice = all
|> min()
|> duplicate(column: "_value", as: "low")
|> keep(columns: ["low", "_stop", "_start"] )
highPrice = all
|> max()
|> duplicate(column: "_value", as: "high")
|> keep(columns: ["high", "_stop", "_start"] )
openPrice = all
|> first()
|> duplicate(column: "_value", as: "open")
|> keep(columns: ["open", "_stop", "_start"] )
closePrice = all
|> last()
|> duplicate(column: "_value", as: "close")
|> keep(columns: ["close", "_stop", "_start"] )
highLowData = union(tables: [lowPrice, highPrice])
openCloseData = union(tables: [openPrice, closePrice])
result = union(tables: [highLowData, openCloseData])
|> yield (name: "Result")

Counting boolean values in flux query

I have a bucket where one field is a boolean
I'd like to count the number of true and the number of false for each hour
from(bucket: "xxx")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> window(every: 1h)
|> filter(fn: (r) => r["_measurement"] == "xxx")
|> filter(fn: (r) => r["_field"] == "myBoolField")
|> group(columns: ["_stop"])
Because this is issued from a cron that runs every minute (more or less), this will give something like :
table _start _stop _time _value otherfield1 otherfield2
0 2021-05-18T19:00:00 2021-05-18T20:00 2021-05-18T19:01 false xxx xxx
0 2021-05-18T19:00:00 2021-05-18T20:00 2021-05-18T19:02 true xxx xxx
0 2021-05-18T19:00:00 2021-05-18T20:00 2021-05-18T19:03 true xxx xxx
...
1 2021-05-18T20:00:00 2021-05-18T21:00 2021-05-18T20:01 false xxx xxx
1 2021-05-18T20:00:00 2021-05-18T21:00 2021-05-18T20:02 false xxx xxx
1 2021-05-18T20:00:00 2021-05-18T21:00 2021-05-18T20:03 false xxx xxx
...
Now, I'd like to count the total, the number of false and the number of true for each hour (so for each table) but without losing/dropping the other fields
So I'd like a structure like
table _stop _value nbFalse nbTrue otherfield1 otherfield2
0 2021-05-18T20:00 59 1 58 xxx xxx
1 2021-05-18T21:00 55 4 51 xxx xxx
I've tried many combinations of pivot, count, ... without success
From my understanding, the correct way to do is
drop _start and _time
duplicate _value into nbTrue and nbFalse
re-aggregate by _stop to keep only true in nbTrue and false in nbFalse
count the three columns _value, nbTrue and nbFalse
|> drop(columns: ["_start", "_time"])
|> duplicate(column: "_value", as: "nbTrue")
|> duplicate(column: "_value", as: "nbFalse")
but I am stucked at step 3...
Didn't test it, but I have something similar to this on my mind:
from(bucket: "xxx")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "xxx")
|> filter(fn: (r) => r["_field"] == "myBoolField")
|> aggregateWindow(
every: 1h,
fn: (column, tables=<-) => tables |> reduce(
identity: {true_count: 0.0},
fn: (r, accumulator) => ({
true_count:
if r._value == true then accumulator.true_count + 1.0
else accumulator.true_count + 0.0
})
)
)
I got this from the docs and adjusted it a bit, I think it should get you what you need.
Answer from #dskalec should work, I did not test it directly because, in the end, I needed to aggregate more than just the boolean field
here is my query, you can see that it is using the same aggregate+reduce (I just use a pivot before to have more than one field to aggregate)
from(bucket: "rt")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "xxx")
|> pivot(
rowKey:["_time"],
columnKey: ["_field"],
valueColumn: "_value"
)
|> window(every: 1h, createEmpty: true)
|> reduce(fn: (r, accumulator) => ({
nb: accumulator.nb + 1,
field1: accumulator.field1 + r["field1"], //field1 is an int
field2: if r["field2"] then accumulator.field2 + 1 else accumulator.field2, //field2 is a boolean
}),
identity: {nb: 0, field1: 0, field2: 0}
)
|> duplicate(column: "_stop", as: "_time")
|> drop(columns: ["_start", "_stop"])

How to merge (join) two tables in a specific way in Grafana using InfluxDB flux query?

Grafana: 7.1.5
InfluxDB: 1.8
I currently have three separate table panels in Grafana where the only difference between each query is the time range (Year, Month, Day). I would like to combine these three tables into one, where the measurement's value is separated into three columns (one for each time range).
More explicitly, what I have currently is:
Table1 Columns: [Tag1+Tag2, _value] where _value is the units this
year
Table2 Columns: [Tag1+Tag2, _value] where _value is the units
this month
Table3 Columns: [Tag1+Tag2, _value] where _value is the units this
day
What I want is:
Table Columns: [Tag1+Tag2, Table1_value (Year), Table2_value (Month), Table3_value (Day)]
These are my queries:
import "date"
thisYearSoFar = date.truncate(t: now(), unit: 1y)
thisMonthSoFar = date.truncate(t: now(), unit: 1mo)
thisDaySoFar = date.truncate(t: now(), unit: 1d)
from(bucket: "consumption")
|> range(start: thisYearSoFar, stop: now())
|> filter(fn: (r) => r._measurement == "stuff" and r._field == "units" and r._value > 0)
|> group(columns: ["datacenter","tenant"])
|> sum(column: "_value")
|> map(fn: (r) => ({r with _value: r._value / 4.0}))
from(bucket: "consumption")
|> range(start: thisMonthSoFar, stop: now())
|> filter(fn: (r) => r._measurement == "stuff" and r._field == "units" and r._value > 0)
|> group(columns: ["datacenter","tenant"])
|> sum(column: "_value")
|> map(fn: (r) => ({r with _value: r._value / 4.0}))
from(bucket: "consumption")
|> range(start: thisDaySoFar, stop: now())
|> filter(fn: (r) => r._measurement == "stuff" and r._field == "units" and r._value > 0)
|> group(columns: ["datacenter","tenant"])
|> sum(column: "_value")
|> map(fn: (r) => ({r with _value: r._value / 4.0}))
I've tried joining these tables in various ways, but nothing I'm doing is working properly to get me the one table with 4 columns that I'm looking for.
Anyone have ideas on how to achieve this? Thanks!
I worked with a Flux developer that helped me come up with the solution:
import "date"
sum_over_range = (unit) =>
from(bucket: "consumption")
|> range(start: date.truncate(t: now(), unit: unit))
|> filter(fn: (r) => r._measurement == "stuff" and r._field == "units" and r._value > 0)
|> group(columns: ["datacenter", "tenant"])
|> sum()
|> map(fn: (r) => ({r with _value: r._value / 4.0, _field: string(v: unit), _time: 0}))
union(tables: [sum_over_range(unit: 1y), sum_over_range(unit: 1mo), sum_over_range(unit: 1d)
])
|> group(columns: ["datacenter", "tenant"])
|> pivot(rowKey: ["_time"], columnKey: ["_field"], valueColumn: "_value")
|> drop(columns: ["_time", "_start", "_stop", "result"])
|> group()
Then additionally in Grafana, I had to apply the 'Filter by name' transformation to hide the 'result' and 'table' columns that showed.

Resources