Aggregate Flux Query in Influxdb - influxdb

I am new to Influxdb. I am using 1.8+ Influxdb and com.influxdb:influxdb-client-java:1.11.0. I have a below measurement
stocks {
(tag) symbol: String
(field) price: Double
(field) volume: Long
(time) ts: Long
}
I am trying to query the measurement with a 15 min window. I have the below query
"from(bucket: \"test/autogen\")" +
" |> range(start: -12h)" +
" |> filter(fn: (r) => (r[\"_measurement\"] == \"$measurementName\" and r[\"_field\"] == \"volume\"))" +
" |> cumulativeSum(columns: [\"_value\"])" +
" |> window(every: 15m, period: 15m)"
I believe that the above query calculates the cumulative sum over the data and returns just the volume field. However, I want the entire measurement including price, symbol, and ts along with the cumulative sum of the volume in a single flux query. I am not sure how to do this. Any help is appreciated. Thanks.

Thanks to Ethan Zhang. Flux output tables use a vertical (column-wise) data layout for fields.
Note that the price and the volume fields are stored as two separate rows.
To achieve the result you can use a function called v1.fieldsAsCols() to convert the table from a vertical layout back to the horizontal layout. Here is a link to its documentation: https://docs.influxdata.com/influxdb/v2.0/reference/flux/stdlib/influxdb-v1/fieldsascols/
Hence query can be rewritten as follows: sample query 1
from(bucket: \"test/autogen\")
|> range(start: -1h)
|> filter(fn: (r) => r["_measurement"] == "stocks"))
|> v1.fieldsAsCols()
|> group()
|> cumulativeSum(columns: ["volume"])
|> window(every: 15m, period: 15m)
Another approach is using pivot: sample query 2
from(bucket: \"test/autogen\")
|> range(start: -1h)
|> filter(fn: (r) => r["_measurement"] == "stocks")
|> pivot(rowKey:[\"_time\"], columnKey: [\"_field\"], valueColumn: \"_value\")
|> group()
|> cumulativeSum(columns: ["volume"])
|> window(every: 15m, period: 15m)

Related

What does `|>` mean in TICKscript

Trying to write my fist TICKscript to work out when two sensor values cross: if the outside temperature has changed from lower to higher than the inside temperature then I need to close the windows (and conversely).
Using the query builder in InfluxDB I'm getting this for the meadian of the temperature values inside the house over the last 15 minutes:
from(bucket: "zigbee")
|> range(start: -15m, stop: now())
|> filter(fn: (r) => r["room"] == "Kitchen" or r["room"] == "DiningRoom" or r["room"] == "Bed3" or r["room"] == "Bed1")
|> filter(fn: (r) => r["_field"] == "temperature")
|> group(columns: ["_measurement"])
|> aggregateWindow(every: 15m, fn: mean, createEmpty: false)
|> yield(name:"inside")
The syntax |> appears to undocumented -- can you provide a reference?
Replacing |> with | breaks it.
It seems that group and aggregateWindow do not commute?
Presumably because aggregateWindow is forced to choose a single representative _time value for each window?
I think the plan is to
assign this to a stream,
copy and edit to creata a second stream shifted by 15 minutes,
create a second pair of streams for the outside temperature.
join all four streams and caluclate a value indicating whether the inside and outside temperatures have crossed over.
Unless you have a better idea?
(Right now it's looking easier to import the data into SQL.)
Check InfluxDB Flux language documentation for |>:
InfluxDB Pipe-forward operator
According to your flux syntax query:
from(bucket: "zigbee")
|> range(start: -15m, stop: now())
|> filter(fn: (r) => r["room"] == "Kitchen" or r["room"] == "DiningRoom" or r["room"] == "Bed3" or r["room"] == "Bed1")
|> filter(fn: (r) => r["_field"] == "temperature")
|> group(columns: ["_measurement"])
|> aggregateWindow(every: 15m, fn: mean, createEmpty: false)
|> yield(name:"inside")
You are taking data from bucket "zigbee"
Data from source are passed to range filter function with pipe-forward |> operator
Results from range filter data are passed to next filter function with another pipe-forward operator
Etc.
So all data flows as a result from one function to another.
You can group by but in your case columns are "room" key values if I understand your intentions correctly, so try:
|> group(columns: ["room"])
There is a difference between key values and measurement names - you should check InfluxDB documentation for understatnding data structure.
Flux data model documentation
I'ts not TICKscript, it's something do to with InfluxDB that might be called flux.
mean = from(bucket: "zigbee")
|> range(start: -5d, stop: now())
|> filter(fn: (r) => r["room"] == "Outside")
|> filter(fn: (r) => r["_measurement"] == "temperature")
|> aggregateWindow(every: 30m, fn: mean, createEmpty: false)
shift = mean
|> timeShift(duration: -3h)
j = join(tables: {mean: mean, shift: shift}, on: ["_time"])
|> map(fn: (r) => ({ r with diff: float(v: r._value_mean) - float( v: r._value_shift) }))
// yield contains 1 table with the required columns, but the UI doesn't understand it.
// The UI requires 1 table for each series.
j |> map(fn: (r) => ({_time: r._time, _value: r._value_mean})) |> yield(name: "mean")
j |> map(fn: (r) => ({_time: r._time, _value: r._value_shift})) |> yield(name: "shift")
j |> map(fn: (r) => ({_time: r._time, _value: r.diff})) |> yield(name: "diff")
The |> in TickScript "Declares a chaining method call which creates an instance of a new node and chains it to the node above it." as said in the official documentation

How to calculate sum of a column and group by quarter of months (time) in InfluxDB using Flux?

from(bucket: "bucket")
|> range(start: 2022-01-01T00:00:00Z, stop: 2022-12-31T00:00:00Z)
Few more steps needed here:
Narrow down to the measurement you are interested in
Narrow down to the field (i.e. "column") that is going to be summed
Create an aggregateWindow function to track the recurring results
You could try following via Flux:
from(bucket: "bucket")
|> range(start: 2022-01-01T00:00:00Z, stop: 2022-12-31T00:00:00Z)
|> filter(fn: (r) => r._measurement == "MeasurementName")
|> filter(fn: (r) => r._field == "FiledKey")
|> aggregateWindow(every : 3mo, fn : sum)
|> yield(name: "SomeResultSetName")
See more details here.

Work with non-table values, aka "A is not subtractable"

I see many similar questions but couldn't find a good match.
If we define a query and the result aught to be single value, is there a flux way to store as such? Example:
total = from(bucket: "xxx")
|> range(start: 0)
|> filter(fn: (r) => ...)
|> keep(columns: ["_value"])
|> sum()
consumed = from(bucket: "xxx")
|> range(start: 0)
|> filter(fn: (r) => ...)
|> keep(columns: ["_value"])
|> last()
total - consumed
Results in
invalid: error #18:1-18:40: [A] is not Subtractable
I can think of other ways to solve similar issues, but this example made me question whether flux actually supports easy working with single values or 1x1 relations.
Thanks
Not answering my original question but I want to provide the workaround I went with to solve this. I would still be interested in a more direct solution.
I've introduced a second column, then joined the two tables on that column:
total = from(bucket: "xxx")
|> range(start: 0)
|> filter(fn: (r) => ...)
|> keep(columns: ["_value"])
|> sum()
// Added:
|> map(fn: (r) => ({ age: "latest", _value:r._value }))
consumed = from(bucket: "xxx")
|> range(start: 0)
|> filter(fn: (r) => ...)
|> keep(columns: ["_value"])
|> last()
// Added:
|> map(fn: (r) => ({ age: "latest", _value:r._value }))
join(tables: {total: total, consumed: consumed}, on: ["age"])
|> map(fn: (r) => ({_value: r._value_total - r._value_consumed}))
In the query, total and consumed are tables. For how to extract and use scalar values, please see Extract scalar values in Flux

Append calculated field (percentage) and combine with results from different datasets, in Influx Flux

I'm struggling with an Influx 2 query in Flux on how to join and map data from two differents sets (tables) into a specific desired output.
My current Flux query is this:
data = from(bucket: "foo")
|> range(start:-1d)
|> filter(fn: (r) => r._measurement == "io")
|> filter(fn: (r) => r["device_id"] == "12345")
|> filter(fn: (r) => r._field == "status_id" )
# count the total points
totals = data
|> count(column: "_value")
|> toFloat()
|> set(key: "_field", value: "total_count")
# calculate the amount of onlines points (e.g. status = '1')
onlines = data
|> filter(fn: (r) => r._value == 1)
|> count(column: "_value")
|> toFloat()
|> set(key: "_field", value: "online_count")
union(tables: [totals, onlines])
This returns as output:
[{'online_count': 58.0}, {'total_count': 60.0}]
I would like to have appended to this output a percentage calculated from this. Something like:
[{'online_count': 58.0}, {'total_count': 60.0}, {'availability': 0.96666667}]
I've tried combining this using .map(), but to no avail:
# It feels like the map() is what I need, but can't find the right
# combination with .join/union(), .map(), .set()., .keep() etc.
union(tables: [totals, onlines])
|> map(fn: (r) => ({ r with percentage_online: r.onlines.online_count / r.totals.total_count * 100 }))
How can I append the (calculated) percentage as new field 'availability' in this Flux query?
Or, alternatively, is there a different Flux query approach to achieve this outcome?
N.B. I am aware of the Calculate percentages with Flux article from the docs, which I can't get working into this specific scenario. But it's close.

InfluxDB 2.0 multiple aggregations

I'm new to InfluxDB 2.0 and I used InfluxDB 1.6 for few months.
I'm trying to rewrite my queries to use flux language and I encountered problem with multiple aggregations.
Earlier I used this query:
select
max(val_max) as val_max,
min(val_min) as val_min,
last(val_close) as val_close,
first(val_open) as val_open
from candle
where time > '{t}' and interval = '{interval}'
group by currency_id
What would this query look like in flux?
I had similar query to rewrite before, that looked liked this:
select "
max(last_price) as val_max,
min(last_price) as val_min,
first(last_price) as val_open,
last(last_price) as val_close
from ticker
where time > '{t}'
group by currency_id
and after rewriting to flux it looks like this:
data = from(bucket: "{self.bucket}")
|> range(start: {min_time})
|> filter(fn: (r) =>
r._measurement == "ticker" and
r._field == "last_price"
)
|> group(columns: ["currency_id"])
val_max = data
|> max()
|> set(key: "_field", value: "val_max")
val_min = data
|> min()
|> set(key: "_field", value: "val_min")
val_open = data
|> first()
|> set(key: "_field", value: "val_open")
val_close = data
|> last()
|> set(key: "_field", value: "val_close")
union(tables: [val_max, val_min, val_open, val_close])
|> pivot(rowKey:["currency_id"], columnKey: ["_field"], valueColumn: "_value")
The difference is that I use one column for aggregation in one query and multiple columns in another

Resources