Join measurements to fieldkeys in InfluxDB using map and reduce functions - influxdb

In InfluxDB 2.x, I need to create a simplified data dictionary listing all of the measurements and their fields. I want to get all of the fieldKeys for all of the measurements within a bucket. The query should get all of the measurements for the bucket, then for each measurement get all of the fieldKeys and concat them into a large string separated with commas. The end result should have two columns with the measurement and list of fieldKeys like the following:
mem,"field1,field2,field3"
cpu,"field1,field2,field3,field4"
I can create the concatenated list of fieldKeys with the following function:
import "influxdata/influxdb/schema"
import "strings"
import "array"
getFieldsForMeasure = (measureName) => schema.measurementFieldKeys(bucket: "mybucket", measurement: measureName)
|> reduce(
fn: (r, accumulator) => ({sum: r._value + "," + accumulator.sum}),
identity: {sum: ""},
)
getFieldsForMeasure(measureName: "mem")
Putting that with a map function and a query for the measruements look like the following:
import "influxdata/influxdb/schema"
import "strings"
import "array"
getFieldsForMeasure = (measureName) => schema.measurementFieldKeys(bucket: "mybucket", measurement: measureName)
|> reduce(
fn: (r, accumulator) => ({sum: r._value + "," + accumulator.sum}),
identity: {sum: ""},
)
data = schema.measurements(bucket: "mybucket")
|> yield()
|> findColumn(
fn: (key) => key._field == "",
column: "_value",
)
data |> map(fn: (r) => ({fieldsList: string(v: getFieldsForMeasure(measureName: r._value))}))
Although, I get this error:
Error: failed to execute query: 400 Bad Request: error #20:1-20:5: expected stream[{A with _value: B}] but found [{A with _value: B}] (array) (argument tables)

It is because findColumn returns an array, but pipeline functions require stream input. It seems that you have just misplaced it in the code, I guess you meant to use it for fields list extraction in getFieldsForMeasure.
So I hope the following returns expected result:
import "influxdata/influxdb/schema"
import "strings"
import "array"
getFieldsForMeasure = (measureName) => schema.measurementFieldKeys(bucket: "mybucket", measurement: measureName)
|> reduce(
fn: (r, accumulator) => ({sum: r._value + "," + accumulator.sum}),
identity: {sum: ""},
)
|> findColumn(column: "sum", fn: (key) => true)
data = schema.measurements(bucket: "mybucket")
data
|> map(fn: (r) => ({r with fieldsList: string(v: getFieldsForMeasure(measureName: r._value)[0])}))
Output:
win_cpu Percent_User_Time,Percent_Processor_Time,Percent_Privileged_Time,Percent_Interrupt_Time,Percent_Idle_Time,Percent_DPC_Time,
win_disk Percent_Idle_Time,Percent_Free_Space,Percent_Disk_Write_Time,Percent_Disk_Time,Percent_Disk_Read_Time,Free_Megabytes,Current_Disk_Queue_Length,
win_mem ...

Related

Flux Query : join.inner() returns nothing if I don't limit my stream used

I get an issue understanding how to use the join.inner() function.
It seems I can only have a result (and the correct one) if I use the limit() function to the stream I want to use the join.inner function with.
If don't limit this left stream, I don't get any error but just no result.
It is because of how I get my left stream ?
Do you have any idea what I am doing wrong here ?
I am pretty new using InfluxDB therefore the flux language so it must be me.
Thank you all for your answers !
import "array"
import "join"
left =
from(bucket: "TestBucket")
|> range(start: 0)
|> filter(fn: (r) => r["_measurement"] == "TestMeasurement")
|> limit(n : 1000000000000000000)
|> group()
//|> yield(name: "LEFT")
right =
array.from(
rows: [
{arrayValue: "123", _time: 2023-02-07T12:00:00.000Z}, //This timestamp exists in the left stream
],
)
//|> yield(name: "RIGHT")
result = join.inner(
left: left,
right: right,
on: (l, r) => l._time == r._time, // I made sure that there is indeed a common time
as: (l, r) => ({l with rightValue: r.arrayValue}),
)
|> yield(name: "RESULT")
Ok, the solution was to group by _time column the stream AND the table :
|> group(columns: ["_time"])

Dynamic Filtering using variable in filter function in Flux

Using the quantile function, I was able to get 95 % percentile value in a stream.
Now, i want to filter records which lie below the 95% percentile.
hence, I loop over my recods and filter records which lie below the percentile.
However, at this topic I get error –
Please find code below –
percentile = totalTimeByDoc
|> filter(fn: (r) => r["documentType"] == "PurchaseOrder")
|> group(columns:["documentType"])
// |> yield()
|> quantile(column: "processTime", q: 0.95, method: "estimate_tdigest", compression: 9999.0)
|> limit(n: 1)
|> rename(columns: {processTime: "pt"})
Gives me data – >
0 PurchaseOrder 999
Now, I try to loop over my records and filter -
percentile_filered = totalTimeByDoc
|> filter(fn: (r) => r["documentType"] == "PurchaseOrder")
|> filter(fn: (r) => r.processTime < percentile[0]["pt"])
|> yield()
Where, totalTimeByDoc is like below –
|0|PurchaseOrder|testpass22PID230207222747-1|1200|
|1|PurchaseOrder|testpass22PID230207222747-2|807|
|2|PurchaseOrder|testpass22PID230207222934-1|671|
|3|PurchaseOrder|testpass22PID230207222934-2|670|
I get following error from above query –
error #116:41-116:51: expected [{A with pt: B}] (array) but found stream[{A with pt: B}]
You are only missing column extraction from percentile stream. Have a look at Extract scalar values. In this very case, you could do
percentile = totalTimeByDoc
|> ...
|> rename(columns: {processTime: "pt"})
|> findColumn(fn: (key) => true, column: "pt")
percentile_filtered = totalTimeByDoc
|> filter(fn: (r) => r["documentType"] == "PurchaseOrder")
|> filter(fn: (r) => r.processTime < percentile[0])
|> yield()

Flux left join on empty table

I'm looking to join two data streams together but receive the following error from Influx:
error preparing right side of join: cannot join on an empty table
I'm trying to build a query which compares the total sales by a store this month compared to last month. If the store has no sales this month then I don't want it to show. Below is a basic example of my current query.
import "join"
lastMonth = from(bucket: "my-bucket")
|> range(start: 2022-10-01, stop: 2022-11-01)
|> filter(fn: (r) => r._measurement == "transaction")
|> pivot(rowKey: ["_time"], columnKey: ["_field"], valueColumn: "_value")
|> group(columns: ["storeId"], mode: "by")
|> reduce(
fn: (r, accumulator) => ({
storeId: r.storeId,
amount: accumulator.amount + (r.totalAmount - r.refundAmount)
}),
identity: {
storeId: "",
amount: 0.0
}
)
from(bucket: "my-bucket")
|> range(start: 2022-11-01, stop: 2022-12-01)
|> filter(fn: (r) => r._measurement == "transaction")
|> pivot(rowKey: ["_time"], columnKey: ["_field"], valueColumn: "_value")
|> group(columns: ["storeId"], mode: "by")
|> reduce(
fn: (r, accumulator) => ({
storeId: r.storeId,
amount: accumulator.amount + (r.totalAmount - r.refundAmount)
}),
identity: {
storeId: "",
amount: 0.0
}
)
|> join.left(
right: lastMonth,
on: (l, r) => l.storeId == r.storeId,
as: (l, r) => ({
storeId: l.storeId,
thisMonthAmount: l.amount,
lastMonthAmount: r.amount
})
)
How can I achieve this in Flux without encountering this issue?

Append calculated field (percentage) and combine with results from different datasets, in Influx Flux

I'm struggling with an Influx 2 query in Flux on how to join and map data from two differents sets (tables) into a specific desired output.
My current Flux query is this:
data = from(bucket: "foo")
|> range(start:-1d)
|> filter(fn: (r) => r._measurement == "io")
|> filter(fn: (r) => r["device_id"] == "12345")
|> filter(fn: (r) => r._field == "status_id" )
# count the total points
totals = data
|> count(column: "_value")
|> toFloat()
|> set(key: "_field", value: "total_count")
# calculate the amount of onlines points (e.g. status = '1')
onlines = data
|> filter(fn: (r) => r._value == 1)
|> count(column: "_value")
|> toFloat()
|> set(key: "_field", value: "online_count")
union(tables: [totals, onlines])
This returns as output:
[{'online_count': 58.0}, {'total_count': 60.0}]
I would like to have appended to this output a percentage calculated from this. Something like:
[{'online_count': 58.0}, {'total_count': 60.0}, {'availability': 0.96666667}]
I've tried combining this using .map(), but to no avail:
# It feels like the map() is what I need, but can't find the right
# combination with .join/union(), .map(), .set()., .keep() etc.
union(tables: [totals, onlines])
|> map(fn: (r) => ({ r with percentage_online: r.onlines.online_count / r.totals.total_count * 100 }))
How can I append the (calculated) percentage as new field 'availability' in this Flux query?
Or, alternatively, is there a different Flux query approach to achieve this outcome?
N.B. I am aware of the Calculate percentages with Flux article from the docs, which I can't get working into this specific scenario. But it's close.

Aggregate Flux Query in Influxdb

I am new to Influxdb. I am using 1.8+ Influxdb and com.influxdb:influxdb-client-java:1.11.0. I have a below measurement
stocks {
(tag) symbol: String
(field) price: Double
(field) volume: Long
(time) ts: Long
}
I am trying to query the measurement with a 15 min window. I have the below query
"from(bucket: \"test/autogen\")" +
" |> range(start: -12h)" +
" |> filter(fn: (r) => (r[\"_measurement\"] == \"$measurementName\" and r[\"_field\"] == \"volume\"))" +
" |> cumulativeSum(columns: [\"_value\"])" +
" |> window(every: 15m, period: 15m)"
I believe that the above query calculates the cumulative sum over the data and returns just the volume field. However, I want the entire measurement including price, symbol, and ts along with the cumulative sum of the volume in a single flux query. I am not sure how to do this. Any help is appreciated. Thanks.
Thanks to Ethan Zhang. Flux output tables use a vertical (column-wise) data layout for fields.
Note that the price and the volume fields are stored as two separate rows.
To achieve the result you can use a function called v1.fieldsAsCols() to convert the table from a vertical layout back to the horizontal layout. Here is a link to its documentation: https://docs.influxdata.com/influxdb/v2.0/reference/flux/stdlib/influxdb-v1/fieldsascols/
Hence query can be rewritten as follows: sample query 1
from(bucket: \"test/autogen\")
|> range(start: -1h)
|> filter(fn: (r) => r["_measurement"] == "stocks"))
|> v1.fieldsAsCols()
|> group()
|> cumulativeSum(columns: ["volume"])
|> window(every: 15m, period: 15m)
Another approach is using pivot: sample query 2
from(bucket: \"test/autogen\")
|> range(start: -1h)
|> filter(fn: (r) => r["_measurement"] == "stocks")
|> pivot(rowKey:[\"_time\"], columnKey: [\"_field\"], valueColumn: \"_value\")
|> group()
|> cumulativeSum(columns: ["volume"])
|> window(every: 15m, period: 15m)

Resources