Flux Query equivalent of InfluxDB query - influxdb

I am trying to create two queries that produce the same result with one querying an InfluxDB and the other querying InfluxDB Cloud.
The below query retrieves the mean value of road temperature each day from the past 30 days and groups them by day which allows me to display the data in a graph showing the average road temperature for each day.
'SELECT mean("value") FROM "' + 'road_temp' + '"WHERE "road_sensor_id"=\'' + str(road_sensor_id)
+ '\' AND time > now() - 30d GROUP BY time(1d)'
I'm trying to get the equivalent query for the cloud DB to retrieve the same data. I have tried the below query and variations of it so far.
from(bucket: "Cloud_Data")
|> range(start: -30d, stop: now())
|> filter(fn: (r) => r._measurement == \"{measurement}\" and r.road_sensor_id == \"{road_sensor_id}\")
|> aggregateWindow(every: 1d, fn: mean, createEmpty: false)
|> group(columns:["r._time"])
The data from both queries is sent to the graph in the same way and the cloud query has an output as show below.
Is there something I am missing in the cloud query that is giving me the wrong result?
Thanks

You don't need that last group function call:
from(bucket: "Cloud_Data")
|> range(start: -30d, stop: now())
|> filter(fn: (r) => r._measurement == \"{measurement}\" and r.road_sensor_id == \"{road_sensor_id}\")
|> aggregateWindow(every: 1d, fn: mean, createEmpty: false) // this is good enough
// |> group(columns:["r._time"])
The aggregateWindow() function allows you to apply a selector (like min, max, median, etc.) or aggregate function (mean, count, sum, etc.) to your data after grouping by time by the duration specified with the every parameter.

Related

Pivot two yields into a single table in InfluxDB

I want to combine two search results into a single table with InfluxDB on home assistant. After many hours of research I can't seem to find what I would think is a simple solution so any suggestions appreciated. Here is my code
from(bucket: "HomeAssistant/autogen")
|> range(start: 2022-11-01T00:00:00Z, stop: v.timeRangeStop)
|> filter(fn: (r) => r["entity_id"] == "total_energy")
|> filter(fn: (r) => r["_field"] == "total_kwh")
|> aggregateWindow(every: 24h, fn: max)
|> yield(name: "today")
|> timeShift(duration: 24h)
|> yield(name: "tomorrow")
The above creates two query results named "today" and "tomorrow" time-shifted 24hrs apart. I want to pivot both results into a single table with columns names "today" and "tomorrow" but struggling to find a simple syntax without doing multiple queries with joins or creating dummy columns with labels. Is there a straight forward way to pivot these two yields into a single table ?
You can merge two tables into one by using the group function
When omitting all parameters this function should create one table out of multiple tables.
I used it for example, when a tag can have multiple values and I don't want to get a table for each possible value.
I don't know what your data looks like, so here is an example from me:
from(bucket: "monitoring")
|> range(start: -24h, stop: now())
|> filter(fn: (r) => r["_measurement"] == "container_logs")
|> filter(fn: (r) => r["_field"] == "message")
|> drop(columns: ["_start", "_stop", "_field", "_measurement", "host", "hostname"])
|> group()
|> sort(columns: ["_time"], desc: false)
This would produce one table with columns _time, tag1, tag2, _value. Without group() I would get a table for each tag value.
Hope this helps you. Otherwise try to provide some test data.

Uncomplete InfluxDB query result?

I have a bucket called "testdb" and I'm trying to execute some simple queries on it.
The structure of the bucket is very simple and (i think) is not important for the purpose of the question.
If I execute the following query I expect to get all the messages within the specified time range.
from(bucket: "testdb")
|> range(start: 2022-04-01T00:00:00Z, stop: 2022-04-28T00:00:00Z)
The result is composed of the following tables. The only difference between the tables is given by the topic tag. This made me think that in my bucket there was only "diagnostic" topic and its sub-topics.
Result from the query, only topics with "diagnostic" field are present
If I execute the following query I expect to get a subset of the first query where the topics contain the "event" keyword. Since the query above does not return any table with the "event" keyword i expect to get an empty result, but it is not what I get.
import "strings"
from(bucket: "testdb")
|> range(start: 2022-04-01T00:00:00Z, stop: 2022-04-28T00:00:00Z)
|> filter(fn: (r) => strings.containsStr(v: r.topic, substr: "event"))
Query result:
Result from the query only topics with "event" field are present
So, summing up the question: Why the more general query does not return the "diagnostic" + "event" records?
Thanks in advance for the patience.
Edit:
Even executing the query
import "strings"
from(bucket: "testdb")
|> range(start: 2022-04-01T00:00:00Z, stop: 2022-04-28T00:00:00Z)
|> filter(fn: (r) => strings.containsStr(v: r.topic, substr: "event") or strings.containsStr(v: r.topic, substr: "diagnostic"))
I get the same result as the first query. (only "diagnostic" topics)

InfluxDB - limit query result by number of series using Flux

I'm trying to query my InfluxDB (1.8) using Flux and retrieve only 100 series, at first I thought the "limit" function will do it, however, I found out it only limits the number of records in each table (series) which can result in max(100) * N(series).
then I tried a workaround:
from(bucket: "bucket")
|> range(start:1970-01-01T00:00:00Z)
|> filter(fn: (r) => (r["_measurement"] == "measurement" ))
|> group()
|> limit(n:100)
|> group(columns:["column1","column2"])
by doing so, I'm able to group all results into a single table and limit the results, however, it's not even close to what I need.
I'm retrieving only 100 points and also losing the ability to regroup by columns.
I know that by using InfluxQL "SLIMIT" function, it can be done.
Any thoughts about how I can achieve that using flux query language?
Thanks!
I had the some problem and indeed found no solution online.
Now after some tests I found a hacky solution that might help. As I understood from influxdb, there cant be several tag values in one table ... or so. So after grouping you have many tables with some or even just one value.
So, what I did, was to get rid of the tags without loosing them - and this seems a little hacky: move the tag to _field, drop it and done.
Here an example:
from(bucket: "current")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "jobs")
|> filter(fn: (r) => r["_field"] == "DurationSum")
|> group (columns: ["jobName"]) // all durations - each jobname has its table
|> last() // each table has only the last value
|> drop (columns: ["_start", "_stop", "_time"])
|> map(fn: (r) => ({ r with _field: r.jobName })) // hack: transfer the tag-name
|> drop (columns: ["jobName"]) // Now there is only ONE table
|> sort (desc: true)
|> limit (n: 10)

Do different Retention Policies (RP) behave like tables when Querying with Flux?

I would like to query the same measurement across different retention policies into a single graph. Ideally, I'd like to do this in the query itself, as I'm working with Grafana.
According to Flux documentation, "Flux structures all data in tables. When data is streamed from data sources, Flux formats it as annotated comma-separated values (CSV), representing tables. Functions then manipulate or process them and output new tables."
Would different retention policies behave like different tables in this context? Would I be able to use the union() function in order to get what I want? Any insight would be greatly appreciated.
So for something like this, you would just use two different from statements and use union or join to combine them. Check out the docs on union for a query example: https://v2.docs.influxdata.com/v2.0/reference/flux/stdlib/built-in/transformations/union/#examples
left = from(bucket: "database1/policy1")
|> range(start: 2018-05-22T19:53:00Z, stop: 2018-05-22T19:53:50Z)
|> filter(fn: (r) =>
r._field == "usage_guest" or
r._field == "usage_guest_nice"
)
|> drop(columns: ["_start", "_stop"])
right = from(bucket: "database1/policy2")
|> range(start: 2018-05-22T19:53:50Z, stop: 2018-05-22T19:54:20Z)
|> filter(fn: (r) =>
r._field == "usage_guest" or
r._field == "usage_idle"
)
|> drop(columns: ["_start", "_stop"])
union(tables: [left, right])
In this case, the bucket used in the from function would be in the form of database_name/rp. See the docs on bucket naming conventions in 1.x: https://docs.influxdata.com/flux/v0.50/introduction/getting-started/#buckets

how to join two measurement of influxdb

I want to join two different measurement of same database.how can join two measurement of same database with respect to timestamp in influxdb database.
I have checked with below things in influxdb:
select hosta.value + hostb.value
from cpu_load as hosta
inner join cpu_load as hostb
where hosta.host = 'hosta.influxdb.orb' and hostb.host = 'hostb.influxdb.org';
But may query is if the two measurement of same database with different name how can i join or how can i implement above query in my case?
consider below thing is my use case-
assume mydatabase is mydb
and there are two measurement one is cpu and another is network:
{
"measurement": "cpu",
"tags": {
"container": cont_name,
},
"fields": {
"Value": 25.78,
}
}
]
{
"measurement": "network",
"tags": {
"container": cont_name,
},
"fields": {
"Value": 96,
}
}
]
I want to merge above to measurement and want to get single measurement with both measurements information?
Is it possible to merge two measurements?
another issue:
select s1.value from "cpu" as s1
below error is coming:
ERR: error parsing query: found AS, expected ; at line 1, char 39?
I also spent some time looking into this, and it seems that at InfluxDB version 1.6 this is not possible:
Currently, there is no way to perform cross-measurement math or grouping. All data must be under a single measurement to query it together. InfluxDB is not a relational database and mapping data across measurements is not currently a recommended schema. See GitHub Issue #3552 for a discussion of implementing JOIN in InfluxDB.
– https://docs.influxdata.com/influxdb/v1.2/troubleshooting/frequently-asked-questions/#how-do-i-query-data-across-measurements
The issue is being tracked in GitHub here: https://github.com/influxdata/influxdb/issues/3552
Q: Is it possible to merge two measurements?
A: No. Cross measurement join is not possible through influx's query engine until v1.7.0 where you can use their new flux query language to do so.
Reference:
https://docs.influxdata.com/flux/v0.7/guides/join
Flux allows you to join on any columns common between two data streams
and opens the door for operations such as cross-measurement joins and
math across measurements.
Prior to v1.7.0 the only way you can achieve cross-measurement join is through writing a script/program. E.g. Using a influxdb driver to read out the data and then manually merge the dataset yourself.
So this can be done in FluxQL, though it is not extremely straightforward.
The vvariable is provided by Grafana according to current view.
Explanation: First get both tables and resample to the minimum required amount of data. Then set the _measurement field to match in the two tables. Finally, union(...) the two tables, and sort the union by _time. At this point maybe aggregate again and then pivot to create a table for each time step.
The createEmpty: true in aggregateWindow(...) makes sure you have one record for each time step, and then use fill(...) to fill in missing records with previous ones. This makes it so you dont need to deal with null records that
This worked for me:
import "math"
import "internal/debug"
slow = from(bucket: "test")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) =>
r._measurement == "slow"
)
|> map(fn: (r) => ({ r with _measurement: "fast"}))
|> aggregateWindow(every: v.windowPeriod, fn: first, createEmpty: true)
|> fill(usePrevious: true)
fast = from(bucket: "test")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) =>
r._measurement == "fast"
)
|> aggregateWindow(every: v.windowPeriod, fn: mean, createEmpty: true)
|> fill(usePrevious: true)
union(tables: [fast, slow])
|> sort(columns: ["_time"])
|> aggregateWindow(every: v.windowPeriod, fn: first, createEmpty: false)
|> pivot(rowKey: ["_time"], columnKey: ["_field"], valueColumn: "_value")

Resources