influxdb: calculating duration of boolean events? - influxdb

I have data in an influxdb database from a door sensor. This is a boolean sensor (either the door is open (value is false) or it is closed (value is true)), and the table looks like:
name: door
--------------
time value
1506026143659488953 true
1506026183699139512 false
1506026751433484237 true
1506026761473122666 false
1506043848850764808 true
1506043887602743375 false
I would like to calculate how long the door was open in a given period of time. The ELAPSED function gets me close, but I'm not sure how to either (a) restrict it to only those intervals for which the intitial value is false, or (b) identify "open" intervals from the output of something like select elapsed(value, 1s) from door.
I was hoping I could do something like:
select elapsed(value, 1s), first(value) from door
But that doesn't get me anything useful:
name: door
--------------
time elapsed first
0 true
1506026183699139512 40
1506026751433484237 567
1506026761473122666 10
1506043848850764808 17087
1506043887602743375 38
I was hoping for something more along the lines of:
name: door
--------------
time elapsed first
1506026183699139512 40 true
1506026751433484237 567 false
1506026761473122666 10 true
1506043848850764808 17087 false
1506043887602743375 38 true
Short of extracting the data myself and processing it in e.g. python, is there any way to do this via an influxdb query?

I came across this problem as well, I wanted to sum the durations of times for which a flag is on, which is pretty common in signal processing in time series libraries, but influxdb just doesn't seem to support that very well. I tried INTEGRATE with a flag of value 1 but it just didn't seem to give me correct values. In the end, I resorted to just calculating intervals in my data source, publishing those as a separate field in influxdb and summing them up. It works much better that way.

This is the closest I have found so far:
https://community.influxdata.com/t/storing-duration-in-influxdb/4669
The idea is to store the boolean event as 0or 1 and to store each state changes with two entries with one unit of time difference. It would look something like this:
name: door
--------------
time value
1506026143659488953 1
1506026183699139511 1
1506026183699139512 0
1506026751433484236 0
1506026751433484237 1
1506026761473122665 1
1506026761473122666 0
1506043848850764807 0
1506043848850764808 1
1506043887602743374 1
1506043887602743375 0
It should then be possible to use a query like this:
SELECT integral(value) FROM "door" WHERE time > x and time < y
I'm new to influx so let me know if this is a bad way of doing things today. I also haven't tested the example I've written here.

I had this same problem. After running into this wall with InfluxDB and finding no clean solutions here or elsewhere, I ended up switching to TimescaleDB (PostgreSQL-based) and solving it with a SQL window function, using lag() to calculate the delta to the previous time value.
For the OP's dataset, a possible solution looks like this:
SELECT
"time",
("time" - lag("time") OVER (ORDER BY "time"))/1000000000 AS elapsed,
value AS first
FROM door
ORDER BY 1
OFFSET 1; -- omit the initial zero value
Input:
CREATE TEMPORARY TABLE "door" (time bigint, value boolean);
INSERT INTO "door" VALUES
(1506026143659488953, true),
(1506026183699139512, false),
(1506026751433484237, true),
(1506026761473122666, false),
(1506043848850764808, true),
(1506043887602743375, false);
Output:
time | elapsed | first
---------------------+---------+-------
1506026183699139512 | 40 | f
1506026751433484237 | 567 | t
1506026761473122666 | 10 | f
1506043848850764808 | 17087 | t
1506043887602743375 | 38 | f
(5 rows)

Related

Detecting instance of data change

Is there a good way to write a query in influxdb that will show you a change of state from the previous value? I am looking to query my database for times of where a server has turned off.
For example if I had the following database:
Time | Server_1_ON | Server_2_ON
-------------------------------------------------
2019-08-18T14:43:00Z | True | True
2019-08-18T14:43:05Z | True | True
2019-08-18T14:43:10Z | True | False
2019-08-18T14:43:15Z | True | False
2019-08-18T14:43:20Z | True | False
2019-08-18T14:43:25Z | True | True
2019-08-18T14:43:30Z | True | True
2019-08-18T14:43:35Z | True | False
I would want to be able to detect that server 2 had turned off twice, and return the two rows
2019-08-18T14:43:10Z | True | False
2019-08-18T14:43:35Z | True | False
I could achieve the same results by writing a query to
SELECT * WHERE "Server_2_ON" = False
and then filtering out duplicate results. But this is a multi-step process.
If this is not easily possible in influxdb, is there another database that is more suited to this style of query?
If your measurements were integer (1 to represent ON / 0 for OFF) instead of boolean, you could use the difference function.
To select any change in either measurement:
WHERE (DIFFERENCE(Server_1_ON) != 0
OR DIFFERENCE(Server_2_ON) != 0)
to select change from on to off in either measurement:
WHERE (DIFFERENCE(Server_1_ON) = -1
OR DIFFERENCE(Server_2_ON) = -1)
Note that in InfluxDb v1.x it is not possible to cast from Boolean to Integer, so for this to work you will need to change the stored data type to int. See can I change a field’s data type?
"There is no way to cast a float or integer to a string or Boolean (or
vice versa). The simplest workaround is to begin writing the new data type to a different field in the same series.
In InfluxDb v2.0 (still alpha) it is possible to cast Boolean to Int (see INT function).
(I have just started to investigate InfluxDb. I don't like it so far. But it seems that's just me, for according to this article in DZone it's currently the no 1 Time Series Database.)

How do I properly transform missing datapoints as 0 in Prometheus?

We have an alert we want to fire based on the previous 5m of metrics (say, if it's above 0). However, if the metric is 0 it's not written to prometheus, and as such it's not returned for that time bucket.
The result is that we may have an example data-set of:
-60m | -57m | -21m | -9m | -3m <<< Relative Time
1 , 0 , 1 , 0 , 1 <<< Data Returned
which ultimately results in the alert firing every time the metric is above 0, not only when it's above 0 for 5m. I've tried writing our query with OR on() vector() appended to the end, but it does funny stuff with the returned dataset:
values:Array[12]
0:Array[1539021420,0.16666666666666666]
1:Array[1539021480,0]
2:Array[1539021540,0]
3:Array[1539021600,0]
4:Array[1539021660,0]
5:Array[1539021720,0]
6:Array[1539021780,0]
7:Array[1539021840,0]
8:Array[1539021900,0]
9:Array[1539021960,0]
10:Array[1539022020,0]
11:Array[1539022080,0]
For some reason it's putting the "real" data at the front of the array (even though my starting time is well before 1539021420) and continuing from that timestamp forward.
What is the proper way to have Prometheus return 0 for data-points which may not exist?
To be clear, this isn't an alertmanager question -- I'm using a different tool for alerting on this data.

How to predict at which day a value will be reached?

I have a table like the following:
Day | Number
1 | 2
2 | 2.5
3 | 3.5
4 | 5
5 | 7
6 | 8
7 | 10
8 | 11
9 | 13
10 | 15
11 | 12
Most of the time the numbers increase alongside the days, but sometimes they decrease instead. I would like to know at which day the value of number will be a certain number. For example, I would like to know at which day the value of Number will reach 100.
Is this possible? I've tried using the FORECAST function, but when I lower the value of number and increase the number of days, the predicted day lowers, where if I understand correctly it should increase instead.
Step1. Use forecast
You need to lock the first calls in a FORECAST formula:
=FORECAST(A17,B$2:B16,A$2:A16)
Use $ sign to lock the second row.
Step2. Find the position of a day in a forecast
Please try:
=INDEX(sort(A2:B,2,0), MATCH(C1,sort(B2:B,1,0),-1), 1)
where C1 = 100 (your number)
The formula will sort the range twice and will work even if the number does not grow as the days increase.
With your target Number in B13, please try in A13:
=FORECAST(B13,A$2:A12,B$2:B12)
You don't show the formula you tried so what was wrong with it is just a guess but maybe your xs and ys were 'the wrong way round'.
I would use function slope to calculate the day. Slope is the amount number will increase for each day. When C2 is edited, this script will calculate the day for the number you you enter. Enter the target number in C2. In F2 put =slope(B2:B12,A2:A12). The custom menu has a reset.
function onOpen() {
SpreadsheetApp.getActiveSpreadsheet().addMenu(
'Reset', [
{ name: 'Reset', functionName: 'reset' }
])}
function onEdit(e) {
var sheet=e.source.getSheetName()
var cell=e.range.getSheet().getActiveCell().getA1Notation()
var ss=SpreadsheetApp.getActiveSpreadsheet()
var s=ss.getSheetByName("Sheet1")
var find=s.getRange("C2").getValue()
var slope=s.getRange("F2").getValue()
Answer=0
if(sheet=="Sheet1" && cell=="C2"){
for (var i=0;i<100;i++){
var Answer=Answer+1
var test=Math.round(slope*Answer)
if(test>=find){
s.getRange("D2").setValue(Answer);
break;
}}}}
function reset(){
var ss=SpreadsheetApp.getActiveSpreadsheet()
var s=ss.getSheetByName("Sheet1")
s.getRange("C2").setValue(0)
s.getRange("D2").setValue(0);
}
Here is a test spreadsheet you can copy and try. https://docs.google.com/spreadsheets/d/1inmd2Lc1aOfVL7POKye6p2OQHsloJryISgdqNMowW90/edit?usp=sharing

combined aggregation functions for more efficient plotting

My computing cluster monitoring data is stored in an influx DB with the following shape (minus a few columns):
time number parti user
---- ------ ----- ----
2017-06-02T06:58:52.854866584Z 59 gr01 user01
2017-06-02T06:58:52.854866584Z 6 gr01 user02
2017-06-02T06:58:52.854866584Z 295 gr02 user03
2017-06-02T06:58:52.854866584Z 904 gr03 user04
data points are every 10 minutes. Right now I am plotting the sum for each "parti" with:
select sum(number) from status_logs where time > now() - 1h group by time(10m), parti
However, this becomes very slow when I show more than a few days due to the time(10m). I cannot use a varying time window because the sum() would not make sense anymore.
My question : would there be a way to take the average of the sum over a (variable) time window ?
Thanks !

Obtaining the quantity and proportion in SPSS 21

I have the data in a sav file
CODE | QUANTITY
------|----------
A | 1
B | 4
C | 1
F | 3
B | 3
D | 12
D | 5
I need to obtain the quantity of codes which have a quantity <= 3 and to obtain the proportion in a percentage with respect to the total number and present a result like this
<= 3 | PERCENTAGE
------|----------
4 | 57 %
All of this using SPSS syntax.
I would first convert the quantity value to a 0-1 variable, and then aggregate by code to the mean. This produces a nice second dataset to make a table. Example below.
data list free / Code (A1) Quantity (F2.0).
begin data
A 1
B 4
C 1
F 3
B 3
D 12
D 5
end data.
*convert to 0-1.
compute QuantityB3 = (Quantity LE 3).
*Aggregate.
DATASET DECLARE AggQuant.
AGGREGATE
/OUTFILE='AggQuant'
/BREAK=Code
/QuantityB3 = MEAN(QuantityB3).
I dont know how you migrate your question here, I dont have reputation here to add screen shoots that's help you allot. Anyhow the procedure of your desire output is given below.
Goto Transform->Count Values within cases a dialogue box open, write the name of new variable say "New" in Target Variable: go to define values a new dialogue box is open then check the radio button Range, LOWEST through value: put in below box 3 and then press add and press continue and press ok. A new variable is created with the name of "New". Now go to Analyze -> Descriptive Statistics-> Frequencies, new dialogue box will be open send "New" variable into Variable(s): press Statistics in new dialogue box check Percentile(s): write 100 in box and press Add and then continue and ok. You get the desire results.

Resources