How do I upsample time series in tsdb - time-series

I want to upsample a time series in OpenTSDB. For example, suppose I have temperatures that are recorded at 8 hour intervals, e.g., at 1am, 9am and 5pm every day. I want to retrieve by a TSDB query an upsampling of these data, so that I get temperatures at 1am, 2am, 3am, ...., 5pm, 6pm, ... midnight I want the "missing" data to be filled in by linear interpolation, e.g.,
otemp(2am) = itemp(1am) + 1/8 * ( itemp(9am) - itemp(1am) )
where otemp is the output up-sampled result and itemp is the input time series.
The problem is that OpenTSDB only seems to be willing to linearly interpolate data in the context of a multi-time-series operation like "sum". Now, I can kluge the solution that I want be creating another time series "ctemp" (the "c" is for "clock") that records a temperature of 0 every 1 hour, and then ask TSDB to give me the sum of this time-series with the itemp's.
Am I misunderstanding the OpenTSDB, and there is a way to do this without having to create the bogus "ctemp" series? Something reasonable like:
...?start=some_time&end=some_time&interval=1h&m=lerp:itemp
?
-- Mark

For comparison with Axibase TSD which runs on HBase, the interpolation can be performed using WITH INTERPOLATE clause.
SELECT date_format(time, 'MMM-dd HH:mm') AS sample_time,
value
FROM temperature
WHERE entity = 'sensor'
AND datetime BETWEEN '2017-05-14T00:00:00Z' AND '2017-05-17T00:00:00Z'
WITH INTERPOLATE(1 HOUR)
Sample commands:
series e:sensor d:2017-05-14T01:00:00Z m:temperature=25
series e:sensor d:2017-05-14T09:00:00Z m:temperature=30
series e:sensor d:2017-05-14T17:00:00Z m:temperature=29
series e:sensor d:2017-05-15T01:00:00Z m:temperature=28
series e:sensor d:2017-05-15T09:00:00Z m:temperature=35
series e:sensor d:2017-05-15T17:00:00Z m:temperature=31
series e:sensor d:2017-05-16T01:00:00Z m:temperature=22
series e:sensor d:2017-05-16T09:00:00Z m:temperature=40
series e:sensor d:2017-05-16T17:00:00Z m:temperature=33
The result:
sample_time value
May-14 01:00 25.0000
May-14 02:00 25.6250
May-14 03:00 26.2500
May-14 04:00 26.8750
May-14 05:00 27.5000
...
Disclaimer: I work for Axibase.

Related

Time function in sheets

I have the data of 4000 employees in google sheets along with their shift timings (9 hour long shift) spread across 24 hours. I wish to use a formula to understand the most common timing these employees are available in the office (09:00 to 18:00). My results would be 09:00 to 11:00, 11:00 to 13:00, 13:00 to 15:00, 15:00 to 18:00, 18:00 to 22:00, 22:00 to 09:00.
I could have used this formula to derive to the value:
=IF(AND(TIMEVALUE(A2)>=TIMEVALUE("09:00"), TIMEVALUE(A2)<=TIMEVALUE("11:00")), "09:00 to 11:00",
IF(AND(TIMEVALUE(A2)>=TIMEVALUE("11:00"), TIMEVALUE(A2)<=TIMEVALUE("13:00")), "11:00 to 13:00",
IF(AND(TIMEVALUE(A2)>=TIMEVALUE("13:00"), TIMEVALUE(A2)<=TIMEVALUE("15:00")), "13:00 to 15:00",
IF(AND(TIMEVALUE(A2)>=TIMEVALUE("15:00"), TIMEVALUE(A2)<=TIMEVALUE("18:00")), "15:00 to 18:00",
IF(AND(TIMEVALUE(A2)>=TIMEVALUE("18:00"), TIMEVALUE(A2)<=TIMEVALUE("22:00")), "18:00 to 22:00", "22:00 to 09:00")))))
but the problem is the timings are not in the time format but they are in text format
Here's my take:
Suppose Column A has clock ins, and Column B has clock outs. Let Column D have Times starting at 00:00 and going up to 33:00 (8am next day) in 5 minute (or 30, 60 etc) increments.
Let column E be the amount of clock in and outs that an employee was in the office at the time referred to in E.
We will define E to be =COUNTIFS($A$2:$A$9999,"<="&D2,$B$2:$B$9999,">="&D2).
Next, apply some conditional formatting to highlight the most busy times.
Note that you will need only the times of day, which it sounds like you have, but you will need to convert overnight shifts to not wrap around midnight.

Doing hourly rate calculations for hour more than 24

I am recording my time spent on a project in google excel sheet. There is a column which does addition of the recorded time and output total time to column say D40. The output time looks like <hoursspent>:<minutesspent>:<secondsspend>. For example 30:30:50 would mean that i have worked for 30 hours and 30 minutes and 50 seconds on a project.
Now, I was using this formula to calculate my total invoice
=(C41*HOUR(D40))+(C41*((Minute(D40)/60)))+(C41*((SECOND(D40)/3600)))
Where C41 cell contains my hourly rate (say $50).
This is working fine as long as the numbers of hours that i have worked are less than 24. The moment my numer of hours go above 24. The Hour function return the modulus value i.e., HOUR(30) would return 6.
How can I make this calculation generic in a way that it oculd calculate on more than 24 hours value too.
Try
=C41*D40*24
and change formet on the result as $
one hour is part of a day, as you know 1/24th of a day, that's why you could multiply by 24 to get hours, and then multiply it by the rate
Try below formula-
=SUMPRODUCT(SPLIT(D40,":"),{C41,C41/60,C41/3600})
When you store a value as HH:mm:ss into an Excel sheet, it automatically formats it as a Time, so it makes sense that HOUR modulos by 24.
Which is why you can simply ignore it. If you have a cell that is formatted as currency (FORMAT > Math > Currency) or any other normal Number-like format, then you can see, if you perform a numerical operation like multiplication, that it stores times like "30:30:50" as if it were a TIMEVALUE with a value over 1. Simply multiply that by 24, and then by your hourly rate, and you'll get your value, i.e,
=D40 * C41 * 24 :
Just replace HOUR(D40) with INT(D40)*24+HOUR(D40)

How to obtain time interval value reports from InfluxDB

Using InfluxDB: Is there any way to build a time-bucketed report of a field value representing a state that persists over time? Ideally in InfluxQL query language
More specifically as an example: Say a measurement contains points that report changes in the light bulb state (On / Off). They could be 0s and 1s as in the example below, or any other value. For example:
time light
---- -----
2022-03-18T00:00:00Z 1
2022-03-18T01:05:00Z 0
2022-03-18T01:55:00Z 0
2022-03-18T02:30:00Z 1
2022-03-18T04:06:00Z 0
The result should be a listing of intervals indicating if this light was on or off during each time interval (e.g. hours), or what percentage of that time it was on. For the given example, the result if grouping hourly should be:
Hour
Value
2022-03-18 00:00
1.00
2022-03-18 01:00
0.17
2022-03-18 02:00
0.50
2022-03-18 03:00
1.00
2022-03-18 04:00
0.10
Note that:
for 1am bucket, even if the light starts and ends in On state, it was On for only 10 over 60 minutes, so the value is low (10/60)
and more importantly the bucket from 3am to 4am has value "1" as the light was On since the last period, even if there was no change in this time period. This rules out usage of simple aggregation (e.g. MEAN) over a GROUP BY TIME(), as there would not be any way to know if an empty/missing bucket corresponds to an On or Off state as it only depends on the last reported value before that time bucket.
Is there a way to implement it in pure InfluxQL, without retrieving potentially big data sets (points) and iterating through them in a client?
I consider that raw data could be obtained by query:
SELECT "light" FROM "test3" WHERE $timeFilter
Where "test3" is your measurement name and $timeFilter is from... to... time period.
In this case we need to use a subquery which will fill our data, let's consider grouping (resolution) time as 1s:
SELECT last("light") as "filled_light" FROM "test3" WHERE $timeFilter GROUP BY time(1s) fill(previous)
This query gives us 1/0 value every 1s. We will use it as a subquery.
NOTE: You should be informed that this way does not consider if beginning of data period within $timeFilter has been started with light on or off. This way will not provide any data before hour with any value within $timeFilter.
In next step you should use integral() function on data you got from subquery, like this:
SELECT integral("filled_light",1h) from (SELECT last("light") as "filled_light" FROM "test3" WHERE $timeFilter GROUP BY time(1s) fill(previous)) group by time(1h)
This is how it looks on charts:
And how Result data looks in a table:
This is not a perfect way of getting it to work but I hope it resolves your problem.

ML.NET align multiple time series tables

I want to view the consumed energy every 5 minutes. I have two Energy counters which generate data with different time stamps.
Counter values from SolarPanel
DateTime
Counter
2021-01-01 8:00:01
123.0
2021-01-01 8:01:00
123.1
2021-01-01 8:05:55
123.4
2021-01-01 8:09:02
125.3
2021-01-01 8:11:55
126.9
Counter values from EnergyProvider
DateTime
Counter
2021-01-01 8:01:05
423.0
2021-01-01 8:01:22
427.5
2021-01-01 8:06:55
428.6
2021-01-01 8:09:33
431.8
2021-01-01 8:13:55
433.3
First, I want to resample the data, so the have same timestamps
Energyprovider (recalculated)
DateTime
Counter
Diff
2021-01-01 8:05:00
123.3441
2021-01-01 8:10:00
125.8364
2.4923
SolarPanel (recalculated)
DateTime
Counter
Diff
2021-01-01 8:05:00
427.7570
2021-01-01 8:10:00
431.9546
4.1976
The energy produced by the Solarpanel between 8:05 en 8:10 is 2.4923 Wh
The energy consumed from the net between 8.05 en 8.10 is 4.1976 Wh
The total energy consumed = 6.69 Wh
Is interpolation possible in ML.NET?
Is there a Diff function available?
Can two tables by joined by DateTime?
I've Googled the whole day find anything usefull, but no luck.
I'm not sure this is really a problem for Machine Learning. Perhaps a simple algorithm for measuring timespans and calculating the difference would be more appropriate.
You could put the data into a SQL DB and use a stored procedure, or you could do it directly in your language of choice:
Steps might be:
Create Lists (or arrays) of type DateTime and Double for both SolarPanel and EnergyProvider and add your data to them.
Create a start DateTime object with the time of when you want the sequence to start.
Create a List of DateTime objects using your start datetime and incrementing by by timespans using:
DateTime.AddMinutes(5)
Then iterate your SolarPanel and EnergyProvider Lists and find the index of values within consecutive items in the list created in step 3 This can be done something like:
for(int i=0; i<myTimeSeqeunce.Length; i++) { TimeSpan varTime = solarPanelDT[i] - myTimeSequence[i]; int intMinutes = (int)varTime.TotalMinutes; }
use i to find the relevant values in your solarPanel and EnergyProvider lists and then calculate the difference of the sums of them.
Now should have Lists of your desired time sequence and corresponding sums and differences.

Showing hourly average (histogramm) in grafana

Given a timeseries of (electricity) marketdata with datapoints every hour, I want to show a Bar Graph with all time / time frame averages for every hour of the data, so that an analyst can easily compare actual prices to all time averages (which hour of the day is most/least expensive).
We have cratedb as backend, which is used in grafana just like a postgres source.
SELECT
extract(HOUR from start_timestamp) as "time",
avg(marketprice) as value
FROM doc.el_marketprices
GROUP BY 1
ORDER BY 1
So my data basically looks like this
time value
23.00 23.19
22.00 25.38
21.00 29.93
20.00 31.45
19.00 34.19
18.00 41.59
17.00 39.38
16.00 35.07
15.00 30.61
14.00 26.14
13.00 25.20
12.00 24.91
11.00 26.98
10.00 28.02
9.00 28.73
8.00 29.57
7.00 31.46
6.00 30.50
5.00 27.75
4.00 20.88
3.00 19.07
2.00 18.07
1.00 19.43
0 21.91
After hours of fiddling around with Bar Graphs, Histogramm Mode, Heatmap Panel und much more, I am just not able to draw a simple Hours-of-the day histogramm with this in Grafana. I would very much appreciate any advice on how to use any panel to get this accomplished.
your query doesn't return correct time series data for the Grafana - time field is not valid timestamp, so don't extract only
hour, but provide full start_timestamp (I hope it is timestamp
data type and value is in UTC)
add WHERE time condition - use Grafana's macro __timeFilter
use Grafana's macro $__timeGroupAlias for hourly groupping
SELECT
$__timeGroupAlias(start_timestamp,1h,0),
avg(marketprice) as value
FROM doc.el_marketprices
WHERE $__timeFilter(start_timestamp)
GROUP BY 1
ORDER BY 1
This will give you data for historic graph with hourly avg values.
Required histogram may be a tricky, but you can try to create metric, which will have extracted hour, e.g.
SELECT
$__timeGroupAlias(start_timestamp,1h,0),
extract(HOUR from start_timestamp) as "metric",
avg(marketprice) as value
FROM doc.el_marketprices
WHERE $__timeFilter(start_timestamp)
GROUP BY 1
ORDER BY 1
And then visualize it as histogram. Remember that Grafana is designated for time series data, so you need proper timestamp (not only extracted hours, eventually you can fake it) otherwise you will have hard time to visualize non time series data in Grafana. This 2nd query may not work properly, but it gives you at least idea.

Resources