Join measurements to fieldkeys in InfluxDB using map and reduce functions

Join measurements to fieldkeys in InfluxDB using map and reduce functions - influxdb

In InfluxDB 2.x, I need to create a simplified data dictionary listing all of the measurements and their fields. I want to get all of the fieldKeys for all of the measurements within a bucket. The query should get all of the measurements for the bucket, then for each measurement get all of the fieldKeys and concat them into a large string separated with commas. The end result should have two columns with the measurement and list of fieldKeys like the following:
mem,"field1,field2,field3"
cpu,"field1,field2,field3,field4"
I can create the concatenated list of fieldKeys with the following function:
import "influxdata/influxdb/schema"
import "strings"
import "array"
getFieldsForMeasure = (measureName) => schema.measurementFieldKeys(bucket: "mybucket", measurement: measureName)
|> reduce(
fn: (r, accumulator) => ({sum: r._value + "," + accumulator.sum}),
identity: {sum: ""},
)
getFieldsForMeasure(measureName: "mem")
Putting that with a map function and a query for the measruements look like the following:
import "influxdata/influxdb/schema"
import "strings"
import "array"
getFieldsForMeasure = (measureName) => schema.measurementFieldKeys(bucket: "mybucket", measurement: measureName)
|> reduce(
fn: (r, accumulator) => ({sum: r._value + "," + accumulator.sum}),
identity: {sum: ""},
)
data = schema.measurements(bucket: "mybucket")
|> yield()
|> findColumn(
fn: (key) => key._field == "",
column: "_value",
)
data |> map(fn: (r) => ({fieldsList: string(v: getFieldsForMeasure(measureName: r._value))}))
Although, I get this error:
Error: failed to execute query: 400 Bad Request: error #20:1-20:5: expected stream[{A with _value: B}] but found [{A with _value: B}] (array) (argument tables)

It is because findColumn returns an array, but pipeline functions require stream input. It seems that you have just misplaced it in the code, I guess you meant to use it for fields list extraction in getFieldsForMeasure.
So I hope the following returns expected result:
import "influxdata/influxdb/schema"
import "strings"
import "array"
getFieldsForMeasure = (measureName) => schema.measurementFieldKeys(bucket: "mybucket", measurement: measureName)
|> reduce(
fn: (r, accumulator) => ({sum: r._value + "," + accumulator.sum}),
identity: {sum: ""},
)
|> findColumn(column: "sum", fn: (key) => true)
data = schema.measurements(bucket: "mybucket")
data
|> map(fn: (r) => ({r with fieldsList: string(v: getFieldsForMeasure(measureName: r._value)[0])}))
Output:
win_cpu Percent_User_Time,Percent_Processor_Time,Percent_Privileged_Time,Percent_Interrupt_Time,Percent_Idle_Time,Percent_DPC_Time,
win_disk Percent_Idle_Time,Percent_Free_Space,Percent_Disk_Write_Time,Percent_Disk_Time,Percent_Disk_Read_Time,Free_Megabytes,Current_Disk_Queue_Length,
win_mem ...

Related

Flux Query : join.inner() returns nothing if I don't limit my stream used

I get an issue understanding how to use the join.inner() function.
It seems I can only have a result (and the correct one) if I use the limit() function to the stream I want to use the join.inner function with.
If don't limit this left stream, I don't get any error but just no result.
It is because of how I get my left stream ?
Do you have any idea what I am doing wrong here ?
I am pretty new using InfluxDB therefore the flux language so it must be me.
Thank you all for your answers !
import "array"
import "join"
left =
from(bucket: "TestBucket")
|> range(start: 0)
|> filter(fn: (r) => r["_measurement"] == "TestMeasurement")
|> limit(n : 1000000000000000000)
|> group()
//|> yield(name: "LEFT")
right =
array.from(
rows: [
{arrayValue: "123", _time: 2023-02-07T12:00:00.000Z}, //This timestamp exists in the left stream
],
)
//|> yield(name: "RIGHT")
result = join.inner(
left: left,
right: right,
on: (l, r) => l._time == r._time, // I made sure that there is indeed a common time
as: (l, r) => ({l with rightValue: r.arrayValue}),
)
|> yield(name: "RESULT")

Ok, the solution was to group by _time column the stream AND the table :
|> group(columns: ["_time"])

Dynamic Filtering using variable in filter function in Flux

Using the quantile function, I was able to get 95 % percentile value in a stream.
Now, i want to filter records which lie below the 95% percentile.
hence, I loop over my recods and filter records which lie below the percentile.
However, at this topic I get error –
Please find code below –
percentile = totalTimeByDoc
|> filter(fn: (r) => r["documentType"] == "PurchaseOrder")
|> group(columns:["documentType"])
// |> yield()
|> quantile(column: "processTime", q: 0.95, method: "estimate_tdigest", compression: 9999.0)
|> limit(n: 1)
|> rename(columns: {processTime: "pt"})
Gives me data – >
0 PurchaseOrder 999
Now, I try to loop over my records and filter -
percentile_filered = totalTimeByDoc
|> filter(fn: (r) => r["documentType"] == "PurchaseOrder")
|> filter(fn: (r) => r.processTime < percentile[0]["pt"])
|> yield()
Where, totalTimeByDoc is like below –
|0|PurchaseOrder|testpass22PID230207222747-1|1200|
|1|PurchaseOrder|testpass22PID230207222747-2|807|
|2|PurchaseOrder|testpass22PID230207222934-1|671|
|3|PurchaseOrder|testpass22PID230207222934-2|670|
I get following error from above query –
error #116:41-116:51: expected [{A with pt: B}] (array) but found stream[{A with pt: B}]

You are only missing column extraction from percentile stream. Have a look at Extract scalar values. In this very case, you could do
percentile = totalTimeByDoc
|> ...
|> rename(columns: {processTime: "pt"})
|> findColumn(fn: (key) => true, column: "pt")
percentile_filtered = totalTimeByDoc
|> filter(fn: (r) => r["documentType"] == "PurchaseOrder")
|> filter(fn: (r) => r.processTime < percentile[0])
|> yield()

Flux left join on empty table

I'm looking to join two data streams together but receive the following error from Influx:
error preparing right side of join: cannot join on an empty table
I'm trying to build a query which compares the total sales by a store this month compared to last month. If the store has no sales this month then I don't want it to show. Below is a basic example of my current query.
import "join"
lastMonth = from(bucket: "my-bucket")
|> range(start: 2022-10-01, stop: 2022-11-01)
|> filter(fn: (r) => r._measurement == "transaction")
|> pivot(rowKey: ["_time"], columnKey: ["_field"], valueColumn: "_value")
|> group(columns: ["storeId"], mode: "by")
|> reduce(
fn: (r, accumulator) => ({
storeId: r.storeId,
amount: accumulator.amount + (r.totalAmount - r.refundAmount)
}),
identity: {
storeId: "",
amount: 0.0
}
)
from(bucket: "my-bucket")
|> range(start: 2022-11-01, stop: 2022-12-01)
|> filter(fn: (r) => r._measurement == "transaction")
|> pivot(rowKey: ["_time"], columnKey: ["_field"], valueColumn: "_value")
|> group(columns: ["storeId"], mode: "by")
|> reduce(
fn: (r, accumulator) => ({
storeId: r.storeId,
amount: accumulator.amount + (r.totalAmount - r.refundAmount)
}),
identity: {
storeId: "",
amount: 0.0
}
)
|> join.left(
right: lastMonth,
on: (l, r) => l.storeId == r.storeId,
as: (l, r) => ({
storeId: l.storeId,
thisMonthAmount: l.amount,
lastMonthAmount: r.amount
})
)
How can I achieve this in Flux without encountering this issue?

Append calculated field (percentage) and combine with results from different datasets, in Influx Flux

I'm struggling with an Influx 2 query in Flux on how to join and map data from two differents sets (tables) into a specific desired output.
My current Flux query is this:
data = from(bucket: "foo")
|> range(start:-1d)
|> filter(fn: (r) => r._measurement == "io")
|> filter(fn: (r) => r["device_id"] == "12345")
|> filter(fn: (r) => r._field == "status_id" )
# count the total points
totals = data
|> count(column: "_value")
|> toFloat()
|> set(key: "_field", value: "total_count")
# calculate the amount of onlines points (e.g. status = '1')
onlines = data
|> filter(fn: (r) => r._value == 1)
|> count(column: "_value")
|> toFloat()
|> set(key: "_field", value: "online_count")
union(tables: [totals, onlines])
This returns as output:
[{'online_count': 58.0}, {'total_count': 60.0}]
I would like to have appended to this output a percentage calculated from this. Something like:
[{'online_count': 58.0}, {'total_count': 60.0}, {'availability': 0.96666667}]
I've tried combining this using .map(), but to no avail:
# It feels like the map() is what I need, but can't find the right
# combination with .join/union(), .map(), .set()., .keep() etc.
union(tables: [totals, onlines])
|> map(fn: (r) => ({ r with percentage_online: r.onlines.online_count / r.totals.total_count * 100 }))
How can I append the (calculated) percentage as new field 'availability' in this Flux query?
Or, alternatively, is there a different Flux query approach to achieve this outcome?
N.B. I am aware of the Calculate percentages with Flux article from the docs, which I can't get working into this specific scenario. But it's close.

Aggregate Flux Query in Influxdb

I am new to Influxdb. I am using 1.8+ Influxdb and com.influxdb:influxdb-client-java:1.11.0. I have a below measurement
stocks {
(tag) symbol: String
(field) price: Double
(field) volume: Long
(time) ts: Long
}
I am trying to query the measurement with a 15 min window. I have the below query
"from(bucket: \"test/autogen\")" +
" |> range(start: -12h)" +
" |> filter(fn: (r) => (r[\"_measurement\"] == \"$measurementName\" and r[\"_field\"] == \"volume\"))" +
" |> cumulativeSum(columns: [\"_value\"])" +
" |> window(every: 15m, period: 15m)"
I believe that the above query calculates the cumulative sum over the data and returns just the volume field. However, I want the entire measurement including price, symbol, and ts along with the cumulative sum of the volume in a single flux query. I am not sure how to do this. Any help is appreciated. Thanks.

Thanks to Ethan Zhang. Flux output tables use a vertical (column-wise) data layout for fields.
Note that the price and the volume fields are stored as two separate rows.
To achieve the result you can use a function called v1.fieldsAsCols() to convert the table from a vertical layout back to the horizontal layout. Here is a link to its documentation: https://docs.influxdata.com/influxdb/v2.0/reference/flux/stdlib/influxdb-v1/fieldsascols/
Hence query can be rewritten as follows: sample query 1
from(bucket: \"test/autogen\")
|> range(start: -1h)
|> filter(fn: (r) => r["_measurement"] == "stocks"))
|> v1.fieldsAsCols()
|> group()
|> cumulativeSum(columns: ["volume"])
|> window(every: 15m, period: 15m)
Another approach is using pivot: sample query 2
from(bucket: \"test/autogen\")
|> range(start: -1h)
|> filter(fn: (r) => r["_measurement"] == "stocks")
|> pivot(rowKey:[\"_time\"], columnKey: [\"_field\"], valueColumn: \"_value\")
|> group()
|> cumulativeSum(columns: ["volume"])
|> window(every: 15m, period: 15m)

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Join measurements to fieldkeys in InfluxDB using map and reduce functions - influxdb

Related

Flux Query : join.inner() returns nothing if I don't limit my stream used

Dynamic Filtering using variable in filter function in Flux

Flux left join on empty table

Append calculated field (percentage) and combine with results from different datasets, in Influx Flux

Aggregate Flux Query in Influxdb

Categories

Resources