ML.NET align multiple time series tables

ML.NET align multiple time series tables - time-series

I want to view the consumed energy every 5 minutes. I have two Energy counters which generate data with different time stamps.
Counter values from SolarPanel
DateTime
Counter
2021-01-01 8:00:01
123.0
2021-01-01 8:01:00
123.1
2021-01-01 8:05:55
123.4
2021-01-01 8:09:02
125.3
2021-01-01 8:11:55
126.9
Counter values from EnergyProvider
DateTime
Counter
2021-01-01 8:01:05
423.0
2021-01-01 8:01:22
427.5
2021-01-01 8:06:55
428.6
2021-01-01 8:09:33
431.8
2021-01-01 8:13:55
433.3
First, I want to resample the data, so the have same timestamps
Energyprovider (recalculated)
DateTime
Counter
Diff
2021-01-01 8:05:00
123.3441
2021-01-01 8:10:00
125.8364
2.4923
SolarPanel (recalculated)
DateTime
Counter
Diff
2021-01-01 8:05:00
427.7570
2021-01-01 8:10:00
431.9546
4.1976
The energy produced by the Solarpanel between 8:05 en 8:10 is 2.4923 Wh
The energy consumed from the net between 8.05 en 8.10 is 4.1976 Wh
The total energy consumed = 6.69 Wh
Is interpolation possible in ML.NET?
Is there a Diff function available?
Can two tables by joined by DateTime?
I've Googled the whole day find anything usefull, but no luck.

I'm not sure this is really a problem for Machine Learning. Perhaps a simple algorithm for measuring timespans and calculating the difference would be more appropriate.
You could put the data into a SQL DB and use a stored procedure, or you could do it directly in your language of choice:
Steps might be:
Create Lists (or arrays) of type DateTime and Double for both SolarPanel and EnergyProvider and add your data to them.
Create a start DateTime object with the time of when you want the sequence to start.
Create a List of DateTime objects using your start datetime and incrementing by by timespans using:
DateTime.AddMinutes(5)
Then iterate your SolarPanel and EnergyProvider Lists and find the index of values within consecutive items in the list created in step 3 This can be done something like:
for(int i=0; i<myTimeSeqeunce.Length; i++) { TimeSpan varTime = solarPanelDT[i] - myTimeSequence[i]; int intMinutes = (int)varTime.TotalMinutes; }
use i to find the relevant values in your solarPanel and EnergyProvider lists and then calculate the difference of the sums of them.
Now should have Lists of your desired time sequence and corresponding sums and differences.

Related

Doing hourly rate calculations for hour more than 24

I am recording my time spent on a project in google excel sheet. There is a column which does addition of the recorded time and output total time to column say D40. The output time looks like <hoursspent>:<minutesspent>:<secondsspend>. For example 30:30:50 would mean that i have worked for 30 hours and 30 minutes and 50 seconds on a project.
Now, I was using this formula to calculate my total invoice
=(C41*HOUR(D40))+(C41*((Minute(D40)/60)))+(C41*((SECOND(D40)/3600)))
Where C41 cell contains my hourly rate (say $50).
This is working fine as long as the numbers of hours that i have worked are less than 24. The moment my numer of hours go above 24. The Hour function return the modulus value i.e., HOUR(30) would return 6.
How can I make this calculation generic in a way that it oculd calculate on more than 24 hours value too.

Try
=C41*D40*24
and change formet on the result as $
one hour is part of a day, as you know 1/24th of a day, that's why you could multiply by 24 to get hours, and then multiply it by the rate

Try below formula-
=SUMPRODUCT(SPLIT(D40,":"),{C41,C41/60,C41/3600})

When you store a value as HH:mm:ss into an Excel sheet, it automatically formats it as a Time, so it makes sense that HOUR modulos by 24.
Which is why you can simply ignore it. If you have a cell that is formatted as currency (FORMAT > Math > Currency) or any other normal Number-like format, then you can see, if you perform a numerical operation like multiplication, that it stores times like "30:30:50" as if it were a TIMEVALUE with a value over 1. Simply multiply that by 24, and then by your hourly rate, and you'll get your value, i.e,
=D40 * C41 * 24 :

Just replace HOUR(D40) with INT(D40)*24+HOUR(D40)

Showing hourly average (histogramm) in grafana

Given a timeseries of (electricity) marketdata with datapoints every hour, I want to show a Bar Graph with all time / time frame averages for every hour of the data, so that an analyst can easily compare actual prices to all time averages (which hour of the day is most/least expensive).
We have cratedb as backend, which is used in grafana just like a postgres source.
SELECT
extract(HOUR from start_timestamp) as "time",
avg(marketprice) as value
FROM doc.el_marketprices
GROUP BY 1
ORDER BY 1
So my data basically looks like this
time value
23.00 23.19
22.00 25.38
21.00 29.93
20.00 31.45
19.00 34.19
18.00 41.59
17.00 39.38
16.00 35.07
15.00 30.61
14.00 26.14
13.00 25.20
12.00 24.91
11.00 26.98
10.00 28.02
9.00 28.73
8.00 29.57
7.00 31.46
6.00 30.50
5.00 27.75
4.00 20.88
3.00 19.07
2.00 18.07
1.00 19.43
0 21.91
After hours of fiddling around with Bar Graphs, Histogramm Mode, Heatmap Panel und much more, I am just not able to draw a simple Hours-of-the day histogramm with this in Grafana. I would very much appreciate any advice on how to use any panel to get this accomplished.

your query doesn't return correct time series data for the Grafana - time field is not valid timestamp, so don't extract only
hour, but provide full start_timestamp (I hope it is timestamp
data type and value is in UTC)
add WHERE time condition - use Grafana's macro __timeFilter
use Grafana's macro $__timeGroupAlias for hourly groupping
SELECT
$__timeGroupAlias(start_timestamp,1h,0),
avg(marketprice) as value
FROM doc.el_marketprices
WHERE $__timeFilter(start_timestamp)
GROUP BY 1
ORDER BY 1
This will give you data for historic graph with hourly avg values.
Required histogram may be a tricky, but you can try to create metric, which will have extracted hour, e.g.
SELECT
$__timeGroupAlias(start_timestamp,1h,0),
extract(HOUR from start_timestamp) as "metric",
avg(marketprice) as value
FROM doc.el_marketprices
WHERE $__timeFilter(start_timestamp)
GROUP BY 1
ORDER BY 1
And then visualize it as histogram. Remember that Grafana is designated for time series data, so you need proper timestamp (not only extracted hours, eventually you can fake it) otherwise you will have hard time to visualize non time series data in Grafana. This 2nd query may not work properly, but it gives you at least idea.

Anomaly detection in data transfer

I am working on a Anomaly detection model and would need help with identifying the anomalies in data transfer. Example: If an employee is connected using VPN and we have the following data usage:
EMPID date Bytes_sent Bytes recieved
A123 Timestamp 222222 3333333
A123 Timestamp 444444 6666666
A123 Timestamp 99999999 88888888888
I want to flag row 3 as abnormal since the employee has been sending or receiving within a range and then there is a sudden jump. I want to keep track of the bytes sent and received in the recent days - meaning how his behavior is changing over the recent few days.

One way is keeping additional metrics for each observation:
For Bytes_recieved:
An indicator of whether the observation is an outlier. This will be
decided by whether the observed Bytes_recieved are outside of the
last observed average plus, minus the last observed SD as described
below.
A running average over the last N non outlying events.
Standard deviation over the last N non outlying events.
N will be based on the amount of observation you want to consider. You mentioned recent days, so you could set N = "recent" * average events per day
E.g:
EMPID date Bytes_sent Bytes_recieved br-avg-last-N br-sd-last-N br-Outlier
A123 Timestamp 222222 3333333 3333333 2357022.368 FALSE
A123 Timestamp 444444 6666666 4999999.5 2356922.368 FALSE
A123 Timestamp 99999999 88888888888 N/A N/A TRUE
Bytes_recieved Outlier for row three is calculated as whether the observed Bytes_recieved is outside the range defined by:
(last Bytes_recieved Average-Last-10) - 2*(last Bytes_recieved SD-Last-N) And (last Bytes_recieved Average-Last-10) + 2*(last Bytes_recieved SD-Last-N)
4999999.5 + 2 * 2356922.368 = 9713844.236; 9,713,844.236 < 88,888,888,888 -> TRUE
2 Standard deviations will give you 96% outliers, i.e. extreme observations you will only see ~4% of the time. You can modify it to your needs.
You can either do the same for Bytes_sent and have an 'Or' condition for the outlier decision, or calculate distance from a multi dimensional running average (here X is Bytes_sent and Y is Bytes_recieved) and mark outliers based on extreme distances. (You'll need to track a running SD or another spread metric per observation)
This way you could also easily add dimensions: time of day anomalies, extreme differences between Bytes_sent and Bytes_recieved etc.

How do I upsample time series in tsdb

I want to upsample a time series in OpenTSDB. For example, suppose I have temperatures that are recorded at 8 hour intervals, e.g., at 1am, 9am and 5pm every day. I want to retrieve by a TSDB query an upsampling of these data, so that I get temperatures at 1am, 2am, 3am, ...., 5pm, 6pm, ... midnight I want the "missing" data to be filled in by linear interpolation, e.g.,
otemp(2am) = itemp(1am) + 1/8 * ( itemp(9am) - itemp(1am) )
where otemp is the output up-sampled result and itemp is the input time series.
The problem is that OpenTSDB only seems to be willing to linearly interpolate data in the context of a multi-time-series operation like "sum". Now, I can kluge the solution that I want be creating another time series "ctemp" (the "c" is for "clock") that records a temperature of 0 every 1 hour, and then ask TSDB to give me the sum of this time-series with the itemp's.
Am I misunderstanding the OpenTSDB, and there is a way to do this without having to create the bogus "ctemp" series? Something reasonable like:
...?start=some_time&end=some_time&interval=1h&m=lerp:itemp
?
-- Mark

For comparison with Axibase TSD which runs on HBase, the interpolation can be performed using WITH INTERPOLATE clause.
SELECT date_format(time, 'MMM-dd HH:mm') AS sample_time,
value
FROM temperature
WHERE entity = 'sensor'
AND datetime BETWEEN '2017-05-14T00:00:00Z' AND '2017-05-17T00:00:00Z'
WITH INTERPOLATE(1 HOUR)
Sample commands:
series e:sensor d:2017-05-14T01:00:00Z m:temperature=25
series e:sensor d:2017-05-14T09:00:00Z m:temperature=30
series e:sensor d:2017-05-14T17:00:00Z m:temperature=29
series e:sensor d:2017-05-15T01:00:00Z m:temperature=28
series e:sensor d:2017-05-15T09:00:00Z m:temperature=35
series e:sensor d:2017-05-15T17:00:00Z m:temperature=31
series e:sensor d:2017-05-16T01:00:00Z m:temperature=22
series e:sensor d:2017-05-16T09:00:00Z m:temperature=40
series e:sensor d:2017-05-16T17:00:00Z m:temperature=33
The result:
sample_time value
May-14 01:00 25.0000
May-14 02:00 25.6250
May-14 03:00 26.2500
May-14 04:00 26.8750
May-14 05:00 27.5000
...
Disclaimer: I work for Axibase.

Moving averages varlist

I am trying to calculate moving averages spanning 30 days (prior moving averages) using SPSS 20 for about 1200 stock tickers. I would like to use a loop like:
Calculate 30 days moving average for a ticker say AAAA or 0001 and save it like MA30AAAA or MA300001.
Take another ticker say AAAB or 0002 and do as above.
Continued until all tickers are captured and MA calculated, saved to new columns.
Do you think I can develop a SPSS Syntax for that.
If I try the following, I get error warnings. Please can you help me get a reasonably well structured syntax to do my job.

There was a very similar question today on LinkedIn (see here or below for the answer).
-Assuming every date is present exactly once in your data, the syntax below will calculate moving annual totals and averages over each date + the preceding 29 dates.
-If fewer than 29 days preceded some date, these new variables will not be calculated for this date. (IMHO, this would be misleading information.)
-The 2 new variables will appear in one column each but with a few extra lines you can put each value into its own column if desired.
Kind regards,
Ruben
*Generate test data.
set seed 1.
input program.
loop #=1 to 60.
if #=1 date=date.dmy(21,11,2012).
if #>1 date=datesum(lag(date),1,"days").
end case.
end loop.
end file.
end inp pro.
if $casenum=1 price=100.
if $casenum ne 1 price=lag(price)+tru(rv.nor(0,5)).
for date(edate10).
exe.
*Compute moving total + average.
comp moving_total_30=price.
do rep dif=1 to 29.
comp moving_total_30=moving_total_30+lag(price,dif).
end rep.
comp moving_average_30=moving_total_30/30.
exe.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart