I have a table/serie like this
Message MessageValue
--------------- ---------------------
property1 10
property2 9
property3 7
property2 22
I want to downsample property2's mean value every 10 minutes. How would I do something like this?
CREATE CONTINUOUS QUERY "cq_10m" ON "DatabaseName" BEGIN SELECT mean(SELECT MessageValue WHERE Message =property2 ) AS "mean_Property2" INTO "RetentionPolicyName"."downsampled_orders" FROM "TableName" GROUP BY time(10m) END
It would look something like below. Your CQ will query the db every 10 minutes and will calculate the mean of "MessageValue" across that time frame. This will be downsampled into: mean_Property2.
CREATE CONTINUOUS QUERY "cq_10m" ON "dbName"
RESAMPLE EVERY 10m FOR 2h
BEGIN SELECT mean("MessageValue") AS mean_Property2 INTO
mean_Property2 FROM "retentionPolicyName"."measurementName" WHERE "Message"='property2'
GROUP BY time(10m) END
Related
I created a continuous query to downsample readings from temperature sensors in my influxdb to store hourly means for a longer time. There are readings of multiple sensors in one table. Upon executing the query, the sensors ip is missing.
Basic data looks like this:
> SELECT ip,tC FROM ht LIMIT 5
name: ht
time ip tC
---- -- --
1671057540000000000 192.168.0.83 21
1671057570000000000 192.168.0.83 21
1671057750000000000 192.168.0.17 21.38
The continuous query (simplified without CREATE ... END):
SELECT last(ip), mean("tC") AS "mean_temp" INTO "downsampled"."ht_downsampled" FROM "ht" GROUP BY time(1h),ip
The issue is, the value of 'ip' is only a tag, not the value in the table and subsequently is missing in the table the query inserts into:
name: ht
tags: ip=192.168.0.17
time ip mean_temp mean_hum
---- -- --------- --------
1671055200000000000 21.47 42.75
1671058800000000000 21.39428571428571 48.785714285714285
1671062400000000000 21.314999999999998 51.625
Why is last(ip) not producing any value?
Can I get the value from the 'tags' into the table?
Is there a different approach to group data with a constant value?
Could you just try query the ip instead of the last(ip) since you are grouping by the ip in the statement already?
Sample code:
SELECT ip, mean("tC") AS "mean_temp" INTO "downsampled"."ht_downsampled" FROM "ht" GROUP BY time(1h), ip
What I'm trying to achieve is to only keep the latest of any given point identified by an ID, delete everything else.
The ID is a tag.
So let's say we have:
time ID ...
2022-06-28 18:29:00 id1 ...
2022-06-28 18:28:00 id1 ...
2022-06-28 18:27:00 id1 ...
2022-06-28 18:29:00 id2 ...
2022-06-28 18:28:00 id2 ...
2022-06-28 18:29:00 id3 ...
Would result to:
time ID ...
2022-06-28 18:29:00 id1 ...
2022-06-28 18:29:00 id2 ...
2022-06-28 18:29:00 id3 ...
Is that possible without having to do something like:
DELETE FROM "measurement" WHERE "ID" = '...' AND time < ...
Which take way too much time to execute on all possible "duplicates". You can't also have any OR in a delete where statement. Multiple statements like specified here also take too long to execute.
Having ID as a tags has saved you a lot of time already.
Maybe we could think over a little bit about your scenario. Are you trying to keep the LATEST data for every INTERVAL? For example, your original sampling rate is 1 second (that is, your data is fed into the InfluxDB every single second) and you would like downsample it for every 5 seconds. That is, you are only taking the last record over that 5-second interval and dump all rest of data.
If the above assumption is right, you could try:
run continuous queries first to get your downsampled data into another database per this doc:
CREATE CONTINUOUS QUERY "cq_basic" ON "someDatabase"
BEGIN
SELECT time, id, LAST("fieldA"), LAST("fieldA") INTO "latest_5sec_measurement" FROM "measurement" GROUP BY time(5s)
END
This CQ is like a cron job and will keep transforming the original data into the less granular one.
delete the obsolete data with time range (InfluxDB loves you when you are doing this as time range deletion is the most efficient way) in the original database per this doc:
DELETE WHERE time > '2022-06-27 18:29:00' AND time < '2022-06-28 18:29:00'
And you can set up another cron job to run this query as long as you double check the above CQ has been executed successfully.
I'd like to calculate the delta values for a series of measurements stored in an InfluxDB. The values are readings from an electricity meter taken every 5 minutes. The values increase over time. Here is subset of the data to give you an idea (commands shown below are executed in the InfluxDB CLI):
> SELECT "Haushaltstromzaehler - cnt" FROM "myhome_measurements" WHERE time >= '2018-02-02T10:00:00Z' AND time < '2018-02-02T11:00:00Z'
name: myhome_measurements
time Haushaltstromzaehler - cnt
---- --------------------------
2018-02-02T10:00:12.610811904Z 11725.638
2018-02-02T10:05:11.242021888Z 11725.673
2018-02-02T10:10:10.689827072Z 11725.707
2018-02-02T10:15:12.143326976Z 11725.736
2018-02-02T10:20:10.753357056Z 11725.768
2018-02-02T10:25:11.18448512Z 11725.803
2018-02-02T10:30:12.922032896Z 11725.837
2018-02-02T10:35:10.618788096Z 11725.867
2018-02-02T10:40:11.820355072Z 11725.9
2018-02-02T10:45:11.634203904Z 11725.928
2018-02-02T10:50:11.10436096Z 11725.95
2018-02-02T10:55:10.753853952Z 11725.973
Calculating the differences in the InfluxDB CLI is pretty straightforward with the difference() function. This gives me the electricity consumed within the 5 minutes intervals:
> SELECT difference("Haushaltstromzaehler - cnt") FROM "myhome_measurements" WHERE time >= '2018-02-02T10:00:00Z' AND time < '2018-02-02T11:00:00Z'
name: myhome_measurements
time difference
---- ----------
2018-02-02T10:05:11.242021888Z 0.03499999999985448
2018-02-02T10:10:10.689827072Z 0.033999999999650754
2018-02-02T10:15:12.143326976Z 0.02900000000045111
2018-02-02T10:20:10.753357056Z 0.0319999999992433
2018-02-02T10:25:11.18448512Z 0.03499999999985448
2018-02-02T10:30:12.922032896Z 0.033999999999650754
2018-02-02T10:35:10.618788096Z 0.030000000000654836
2018-02-02T10:40:11.820355072Z 0.03299999999944703
2018-02-02T10:45:11.634203904Z 0.028000000000247383
2018-02-02T10:50:11.10436096Z 0.02200000000084401
2018-02-02T10:55:10.753853952Z 0.02299999999922875
Where I struggle is getting this to work in a continuous query. Here is the command I used to setup the continuous query:
CREATE CONTINUOUS QUERY cq_Haushaltstromzaehler_cnt ON myhomedb
BEGIN
SELECT difference(sum("Haushaltstromzaehler - cnt")) AS "delta" INTO "Haushaltstromzaehler_delta" FROM "myhome_measurements" GROUP BY time(1h)
END
Looking in the InfluxDB log file I see that no data is written in the new 'delta' measurement from the continuous query execution:
...finished continuous query cq_Haushaltstromzaehler_cnt, 0 points(s) written...
After much troubleshooting and experimenting I now understand why no data is generated. Setting up a continuous query requires to use the GROUP BY time() statement. This in turn requires to use an aggregate function within the differences() function. The problem now is that the aggregate function returns only one value for the time period specified by GROUP BY time(). Obviously, the differences() function cannot calculate a difference from just one value. Essentially, continuous query executes a command like this:
> SELECT difference(sum("Haushaltstromzaehler - cnt")) FROM "myhome_measurements" WHERE time >= '2018-02-02T10:00:00Z' AND time < '2018-02-02T11:00:00Z' GROUP BY time(1h)
>
I'm now somewhat clueless as to how to make this work and appreciate any advice you might have.
Does it help using the last aggregate function? Not tested this as a cq yet.
Select difference(last(T1_Consumed)) AS T1_Delta, difference(last(T2_Consumed)) AS T2_Delta
from P1Data
where time >= 1551648871000000000 group by time(1h)
DIFFERENCE() would calculate delta from the "aggregated" value taken from previous group, not within current group.
So fill free to use selector function there - since your counters seemed to be cumulative, LAST() should be working well.
I have a series, disk, that contains a path (/mnt/disk1, /mnt/disk2, etc) and total space of a disk. It also includes free and used values. These values are updated at a specified interval. What I would like to do, is query to get the sum of the total of the last() of each path. I would also like to do the same for free and for used, to get a aggregate of the total size, free space, and used space of all of my disks on my server.
I have a query here that will get me the last(total) of all the disks, grouped by its path (for distinction):
select last(total) as total from disk where path =~ /(mnt\/disk).*/ group by path
Currently, this returns 5 series, each containing 1 row (the latest) and the value of its total. I then want to take the sum of those series, but I cannot just wrap the last(total) into a sum() function call. Is there a way to do this that I am missing?
Carrying on from my comment above about nested functions.
Building a toy example:
CREATE DATABASE FOO
USE FOO
Assuming your data is updated at intervals greater than[1] every minute:
CREATE CONTINUOUS QUERY disk_sum_total ON FOO
BEGIN
SELECT sum("total") AS "total_1m" INTO disk_1m_total FROM "disk"
GROUP BY time(1m)
END
Then push some values in:
INSERT disk,path="/mnt/disk1" total=30
INSERT disk,path="/mnt/disk2" total=32
INSERT disk,path="/mnt/disk3" total=33
And wait more than a minute. Then:
INSERT disk,path="/mnt/disk1" total=41
INSERT disk,path="/mnt/disk2" total=42
INSERT disk,path="/mnt/disk3" total=43
And wait a minute+ again. Then:
SELECT * FROM disk_1m_total
name: disk_1m_total
-------------------
time total_1m
1476015300000000000 95
1476015420000000000 126
The two values are 30+32+33=95 and 41+42+43=126.
From there, it's trivial to query:
SELECT last(total_1m) FROM disk_1m_total
name: disk_1m_total
-------------------
time last
1476015420000000000 126
Hope that helps.
[1] Picking intervals smaller than the update frequency prevents minor timing jitters from making all the data being accidentally summed twice for a given group. There might be some "zero update" intervals, but no "double counting" intervals. I typically run the query twice as fast as the updates. If the CQ sees no data for a window, there will be no CQ performed for that window, so last() will still give the correct answer. For example, I left the CQ running overnight and pushed no new data in: last(total_1m) gives the same answer, not zero for "no new data".
I am trying to implement an EPL query that can pick up the avg for Time(t) & Time(t-1).
For example:
a) in the first 5 seconds (seconds 0-5) there are 2 events with an avg of 12
b) in the next 5 seconds (seconds 5-10) there are 3 events with an avg of 23 , and in the EPL query that catches this information, I am able to also see the avg of 12 from the previous time window of the first 5 seconds
The idea I have is to stagger the objects/queries in such a way that the final epl query has a snapshot of Time(t) & Time(t-1), as seen in the virtually created object ScoreInfoBeforeAfter . However it's not working.
Any ideas would be greatly appreciated. Thanks.
~~~~
// The object being published to the Esper stream:
class ScoreEvent { int score; ... }
Looks like the keyword prior is the solution.
http://esper.codehaus.org/esper-2.1.0/doc/reference/en/html/functionreference.html
See: Section 7.1.9
In terms of the example I described in the original post, here's the corresponding solution I found. It seems to be working correctly.
INSERT INTO ScoreInfo
SELECT
'ScoreInfo' as a_Label,
average AS curAvg,
prior(1, average) AS prevAvg
FROM
ScoreEvent.win:time_batch(5 sec).stat:uni(score);
SELECT
*
FROM
ScoreInfo.win:length(1);
..
And then it's nice, because you can do stuff like this:
SELECT
'GT curAvg > prevAvg' as a_Label,
curAvg,
prevAvg
FROM
ScoreInfo.win:length(1)
WHERE
curAvg > prevAvg;
SELECT
'LTE curAvg <= prevAvg' as a_Label,
curAvg,
prevAvg
FROM
ScoreInfo.win:length(1)
WHERE
curAvg <= prevAvg;