How to join data with flux (in the sense of SQL)? - join

I have a measurement with sessions and a measurement with events in my influxdb. Each session has n events. I want to join the streams to group the events by session.AppVersion later. I use the following flux script:
import "join"
import "array"
bucket = "test03"
// left table
events = from(bucket: bucket)
|> range(start: -600m)
|> filter(fn: (r) => r["_measurement"] == "event")
|> filter(fn: (r) => r._field == "SessionId")
// right table
sessions = from(bucket: bucket)
|> range(start: -60d)
|> filter(fn: (r) => r["_measurement"] == "session")
eventsWithSession = join.left(
left: events,
right: sessions,
on: (l, r) => l.SessionIdTag == r.SessionIdTag,
as: (l, r) => ({l with RSessionIdTag: r.SessionIdTag ,AppVersion: r.AppVersion, App: r.App}),)
eventsWithSession
|>yield(name: "debug")
Unfortunately are the columns r.AppVersion, r.App empty.
If I use join.inner() I get an empty result.
How can I fix that?
If I do the similar with static values/rows it works.
import "join"
import "array"
eventsTest = array.from(
rows: [
{SessionIdTag: "6ApWdNEtIkaoh2ZgkA69yA", Action: "ModA: save"},
],
)
sessionsTest = array.from(
rows: [
{SessionIdTag: "6ApWdNEtIkaoh2ZgkA69yA", App: "A", AppVersion: "1.1.1.1"},
],
)
eventsWithSessionTest = join.left(
left: eventsTest,
right: sessionsTest,
on: (l, r) => l.SessionIdTag == r.SessionIdTag,
as: (l, r) => ({l with RSessionIdTag: r.SessionIdTag ,RAppVersion: r.AppVersion, RApp: r.App}),)
eventsWithSessionTest
|>yield(name: "debug")

Related

Flux left join on empty table

I'm looking to join two data streams together but receive the following error from Influx:
error preparing right side of join: cannot join on an empty table
I'm trying to build a query which compares the total sales by a store this month compared to last month. If the store has no sales this month then I don't want it to show. Below is a basic example of my current query.
import "join"
lastMonth = from(bucket: "my-bucket")
|> range(start: 2022-10-01, stop: 2022-11-01)
|> filter(fn: (r) => r._measurement == "transaction")
|> pivot(rowKey: ["_time"], columnKey: ["_field"], valueColumn: "_value")
|> group(columns: ["storeId"], mode: "by")
|> reduce(
fn: (r, accumulator) => ({
storeId: r.storeId,
amount: accumulator.amount + (r.totalAmount - r.refundAmount)
}),
identity: {
storeId: "",
amount: 0.0
}
)
from(bucket: "my-bucket")
|> range(start: 2022-11-01, stop: 2022-12-01)
|> filter(fn: (r) => r._measurement == "transaction")
|> pivot(rowKey: ["_time"], columnKey: ["_field"], valueColumn: "_value")
|> group(columns: ["storeId"], mode: "by")
|> reduce(
fn: (r, accumulator) => ({
storeId: r.storeId,
amount: accumulator.amount + (r.totalAmount - r.refundAmount)
}),
identity: {
storeId: "",
amount: 0.0
}
)
|> join.left(
right: lastMonth,
on: (l, r) => l.storeId == r.storeId,
as: (l, r) => ({
storeId: l.storeId,
thisMonthAmount: l.amount,
lastMonthAmount: r.amount
})
)
How can I achieve this in Flux without encountering this issue?

InfluxDB 2.0 - Flux query: How to sum a column and use the sum for further calculations

I am new to flux query language (with Influx DB 2) and cant find a solution for the following problem:
I have data with changing true and false values:
I was able to calculate the time in seconds until the next change by using the events.duration function:
Now I want to calculate the total time and the time of all "false"-events and after that I want to calculate the percentage of all false events. I tryed the following
import "contrib/tomhollingworth/events"
total = from(bucket: "********")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "********")
|> filter(fn: (r) => r["Server"] == "********")
|> filter(fn: (r) => r["_field"] == "********")
|> filter(fn: (r) => r["DataNode"] == "********")
|> events.duration(
unit: 1s,
columnName: "duration",
timeColumn: "_time",
stopColumn: "_stop"
)
|> sum(column: "duration")
|> yield(name: "total")
downtime = from(bucket: "********")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "********")
|> filter(fn: (r) => r["Server"] == "********")
|> filter(fn: (r) => r["_field"] == "********")
|> filter(fn: (r) => r["DataNode"] == "********")
|> events.duration(
unit: 1s,
columnName: "duration",
timeColumn: "_time",
stopColumn: "_stop"
)
|> pivot(rowKey:["_time"], columnKey: ["_value"], valueColumn: "duration")
|> drop(columns: ["true"])
|> sum(column: "false")
|> yield(name: "downtime")
downtime_percentage = downtime.false / total.duration
With this I am getting the following error error #44:23-44:31: expected {A with false:B} but found [C]
I also tryed some variations but couldnet get it to work.
I guess I am getting some basic things wrong but I couldnt figure it out yet. Let me know, if you need more information.
I have found a way to solve my problem. Although I am sure that there is a more elegant solution, I document my way here, maybe it helps someone and we can improve it together.
import "contrib/tomhollingworth/events"
//Set time window in seconds (based on selected time)
time_window = int(v: v.timeRangeStart)/-1000000000
//Filter (IoT-)Data
data= from(bucket: "*******")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "*******")
|> filter(fn: (r) => r["Server"] == "*******")
|> filter(fn: (r) => r["Equipment"] == "*******")
|> filter(fn: (r) => r["DataNode"] == "******")
//Use events.duration to calculate the duration in seconds of each true/false event.
|> events.duration(
unit: 1s,
columnName: "duration",
timeColumn: "_time",
stopColumn: "_stop"
)
//Sum up the event times via "sum()" and save them as an array variable via "findColumn()". This is the only way to access the value later (As far as I know. please let me know if you know other ways!).
total_array = data
|> sum(column: "duration")
|> findColumn(
fn: (key) => key._field == "*******",
column: "duration",
)
//Calculate "missing time" in seconds in the time window, because the first event in the time window is missing.
missing_time = time_window - total_array[0]
//Create an array with the first event to determine if it is true or false
first_value_in_window = data
|> first()
|> findColumn(
fn: (key) => key._field == "*******",
column: "_value",
)
//Calculate the downtime by creating columns with the true and false values via pivot. Then sum up the column with the false values
downtime = data
|> map(fn: (r) => ({ r with duration_percentage: float(v: r.duration)/float(v: time_window) }))
|> pivot(rowKey:["_time"], columnKey: ["_value"], valueColumn: "duration_percentage")
|> map( fn: (r) => ({r with
downtime: if exists r.false then
r.false
else
0.0
}))
|> sum(column: "downtime")
//Create an array with the downtime so that this value can be accessed later on
downtime_array = downtime
|> findColumn(
fn: (key) => key._field == "PLS_Antrieb_laeuft",
column: "downtime",
)
//If the first value in the considered time window is true, then the remaining time in the time window (missing_time) was downtime. Write this value in the column "false_percentage_before_window".
//The total downtime is calculated from the previously calculated sum(downtime_array) and, if applicable, the downtime of the remaining time in the time window if the first value is true (first_value_in_window[0])
data
|> map( fn: (r) => ({r with
false_percentage_before_window: if first_value_in_window[0] then
float(v: missing_time)/float(v: time_window)
else
0.0
}))
|> map(fn: (r) => ({ r with _value: (downtime_array[0] + r.false_percentage_before_window) * 100.00 }))
|> first()
|> keep(columns: ["_value"])
|> yield(name: "Total Downtime")
This solution assumes that the true/false events only occur alternately.

Query last value in Flux

I'm trying to get the last value from some IoT sensors and I actually achieved an intermediary result with the following Flux query:
from(bucket:"mqtt-bucket")
|> range(start:-10m )
|> filter(fn: (r) => r["_measurement"] == "mqtt_consumer")
|> filter(fn: (r) => r["thingy"] == "things/green-1/shadow/update"
or r["thingy"] == "things/green-3/shadow/update"
or r["thingy"] == "things/green-2/shadow/update")
|> filter(fn: (r) => r["_field"] == "data")
|> filter(fn: (r) => r["appId"] == "TEMP" or r["appId"] == "HUMID")
|> toFloat()
|> last()
The problem: I would like to get the last mesured value independently of a time range.
I saw in the docs that there is no way to unbound the range function. Maybe there is a work around ?
I just found this:
from(bucket: "stockdata")
|> range(start: 0)
|> filter(fn: (r) => r["_measurement"] == "nasdaq")
|> filter(fn: (r) => r["symbol"] == "OPEC/ORB")
|> last()

Counting boolean values in flux query

I have a bucket where one field is a boolean
I'd like to count the number of true and the number of false for each hour
from(bucket: "xxx")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> window(every: 1h)
|> filter(fn: (r) => r["_measurement"] == "xxx")
|> filter(fn: (r) => r["_field"] == "myBoolField")
|> group(columns: ["_stop"])
Because this is issued from a cron that runs every minute (more or less), this will give something like :
table _start _stop _time _value otherfield1 otherfield2
0 2021-05-18T19:00:00 2021-05-18T20:00 2021-05-18T19:01 false xxx xxx
0 2021-05-18T19:00:00 2021-05-18T20:00 2021-05-18T19:02 true xxx xxx
0 2021-05-18T19:00:00 2021-05-18T20:00 2021-05-18T19:03 true xxx xxx
...
1 2021-05-18T20:00:00 2021-05-18T21:00 2021-05-18T20:01 false xxx xxx
1 2021-05-18T20:00:00 2021-05-18T21:00 2021-05-18T20:02 false xxx xxx
1 2021-05-18T20:00:00 2021-05-18T21:00 2021-05-18T20:03 false xxx xxx
...
Now, I'd like to count the total, the number of false and the number of true for each hour (so for each table) but without losing/dropping the other fields
So I'd like a structure like
table _stop _value nbFalse nbTrue otherfield1 otherfield2
0 2021-05-18T20:00 59 1 58 xxx xxx
1 2021-05-18T21:00 55 4 51 xxx xxx
I've tried many combinations of pivot, count, ... without success
From my understanding, the correct way to do is
drop _start and _time
duplicate _value into nbTrue and nbFalse
re-aggregate by _stop to keep only true in nbTrue and false in nbFalse
count the three columns _value, nbTrue and nbFalse
|> drop(columns: ["_start", "_time"])
|> duplicate(column: "_value", as: "nbTrue")
|> duplicate(column: "_value", as: "nbFalse")
but I am stucked at step 3...
Didn't test it, but I have something similar to this on my mind:
from(bucket: "xxx")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "xxx")
|> filter(fn: (r) => r["_field"] == "myBoolField")
|> aggregateWindow(
every: 1h,
fn: (column, tables=<-) => tables |> reduce(
identity: {true_count: 0.0},
fn: (r, accumulator) => ({
true_count:
if r._value == true then accumulator.true_count + 1.0
else accumulator.true_count + 0.0
})
)
)
I got this from the docs and adjusted it a bit, I think it should get you what you need.
Answer from #dskalec should work, I did not test it directly because, in the end, I needed to aggregate more than just the boolean field
here is my query, you can see that it is using the same aggregate+reduce (I just use a pivot before to have more than one field to aggregate)
from(bucket: "rt")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "xxx")
|> pivot(
rowKey:["_time"],
columnKey: ["_field"],
valueColumn: "_value"
)
|> window(every: 1h, createEmpty: true)
|> reduce(fn: (r, accumulator) => ({
nb: accumulator.nb + 1,
field1: accumulator.field1 + r["field1"], //field1 is an int
field2: if r["field2"] then accumulator.field2 + 1 else accumulator.field2, //field2 is a boolean
}),
identity: {nb: 0, field1: 0, field2: 0}
)
|> duplicate(column: "_stop", as: "_time")
|> drop(columns: ["_start", "_stop"])

InfluxDB Flux - Getting last and first values as a column

I am trying to create two new columns with the first and last values using the last() and first() functions. However the function isn’t working when I try to map the new columns. Here is the sample code below. Is this possible using Flux?
from(bucket: "bucket")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "price_info")
|> filter(fn: (r) => r["_field"] == "price")
|> map(fn: (r) => ({r with
open: last(float(v: r._value)),
close: first(float(v: r._value)),
})
I am not answering directly to the question, however it might help.
I wanted to perform some calculation between first and last, here is my method, I have no idea if it is the right way to do.
The idea is to create 2 tables, one with only the first value and the other with only the last value, then to perform a union between both.
data = from(bucket: "bucket")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "plop")
l = data
|> last()
|> map(fn:(r) => ({ r with _time: time(v: "2011-01-01T01:01:01.0Z") }))
f = data
|> first()
|> map(fn:(r) => ({ r with _time: time(v: "2010-01-01T01:01:01.0Z") }))
union(tables: [f, l])
|> sort(columns: ["_time"])
|> difference()
For an unknown reason I have to set wrong date, just to be able to sort values and take into account than first is before last.
Just a quick thank you. I was struggeling with this as well. This is my code now:
First = from(bucket: "FirstBucket")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "mqtt_consumer")
|> filter(fn: (r) => r["topic"] == "Counters/Watermeter 1")
|> filter(fn: (r) => r["_field"] == "Counter")
|> first()
|> yield(name: "First")
Last = from(bucket: "FirstBucket")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "mqtt_consumer")
|> filter(fn: (r) => r["topic"] == "Counters/Watermeter 1")
|> filter(fn: (r) => r["_field"] == "Counter")
|> last()
|> yield(name: "Last")
union(tables: [First, Last])
|> difference()
Simple answer is to use join (You may also use old join, when using "new" join remember to import "join")
Example:
import "join"
balance_asset_gen = from(bucket: "telegraf")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "balance")
|> aggregateWindow(every: v.windowPeriod, fn: mean, createEmpty: false)
balance_asset_raw = from(bucket: "telegraf")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "balance_raw")
|> aggregateWindow(every: v.windowPeriod, fn: mean, createEmpty: false)
// In my example I merge two data sources but you may just use 1 data source
balances_merged = union(tables: [balance_asset_gen, balance_asset_raw])
|> group(columns:["_time"], mode:"by")
|> sum()
f = balances_merged |> first()
l = balances_merged |> last()
// Watch out, here we assume we work on single TABLE (we don't have groups/one group)
join.left(
left: f,
right: l,
on: (l, r) => l.my_tag == r.my_tag, // pick on what to merge e.g. l._measurement == r._measurement
as: (l, r) => ({
_time: r._time,
_start: l._time,
_stop: r._time,
_value: (r._value / l._value), // we can calculate new field
first_value: l._value,
last_value: r._value,
}),
)
|> yield()

Resources