Hourly time series - time-series

What should I do in order to apply an ARIMA model to Hourly time series below? My goal is to predict traffic volume. Thanks in advance
ID,Datetime,Traffic
0,1/2/22 7:00,105
1,1/2/22 8:00,226
2,1/2/22 9:00,264
3,1/2/22 10:00,238
4,1/2/22 11:00,219
5,1/2/22 12:00,180
6,1/2/22 13:00,135
7,1/2/22 14:00,140
8,1/2/22 15:00,62
9,1/2/22 16:00,48
10,2/2/22 7:00,82
11,2/2/22 8:00,242
12,2/2/22 9:00,254
13,2/2/22 10:00,227
14,2/2/22 11:00,212
15,2/2/22 12:00,142
16,2/2/22 13:00,155
17,2/2/22 14:00,148
18,2/2/22 15:00,77
19,2/2/22 16:00,51
I have tried many times but always get errors due to hourly dataset issues.

Related

how to select SpatRaster layers from their names?

I've got a SpatRaster of (150 x 150 x 1377) that shows temporal evolution of precipitations. Each layer is a given hour in a 2-month interval, but some hours are missing, and the dataset isn't continuous. The layers names are strings as "YYYYMMDDhhmm".
I need to find the mean value every three hours even on whole intervals or on missing-data intervals. On entire ones I want to average three data and on missing-data ones I would like to average two of them or, if two are missing, to select the unique value as the averaged one.
How can I use data names to select how to act?
I've already tried this code but I'm averaging on three continuous layers by index and not by hours. How can I convert names in DateTime form from "tidyverse" in order to use rollapply() to see if two steps back I find the DateTime I am expecting? Is there any other method to check this out?
HSAF=rast(c((paste0(resfolder, "HSAF_final1_5.tif")),(paste0(resfolder, "HSAF_final6_10.tif")),(paste0(resfolder, "HSAF_final11_15.tif")),
(paste0(resfolder, "HSAF_final16_20.tif")),(paste0(resfolder, "HSAF_final21_25.tif")),(paste0(resfolder, "HSAF_final26_30.tif")),
(paste0(resfolder, "HSAF_final31_N04.tif")),(paste0(resfolder, "HSAF_finalN05_N08.tif")),(paste0(resfolder, "HSAF_finalN09_N13.tif")),
(paste0(resfolder, "HSAF_finalN14_N18.tif")),(paste0(resfolder, "HSAF_finalN19_N23.tif")),(paste0(resfolder, "HSAF_finalN24_N28.tif")),
(paste0(resfolder, "HSAF_finalN29_N30.tif"))))
index=names(HSAF)
j=2
for (i in seq(1,3, by=3))
{third_el<- HSAF[index[i+j]]
second_el <- HSAF[index[i+j-1]]
first_el<- HSAF[index[i+j-2]]
newraster<- c(first_el, second_el, third_el)
newraster<- mean(newraster, filename=paste0(tempfile(), ".tif"))
names(newraster)<- paste0(index[i+j-2],index[i+j-1],index[i+j])
}
for (i in seq(4,1374 , by=3))
{ third_el<- HSAF[index[i+j]]
second_el <- HSAF[index[i+j-1]]
first_el<- HSAF[index[i+j-2]]
subraster<- c(first_el, second_el, third_el)
subraster<- mean(subraster, filename=paste0(tempfile(), ".tif"))
names(subraster)<- paste0(index[i+j-2],index[i+j-1],index[i+j])
add(newraster)<- subraster
}

InfluxDB: How to create a continuous query to calculate delta values?

I'd like to calculate the delta values for a series of measurements stored in an InfluxDB. The values are readings from an electricity meter taken every 5 minutes. The values increase over time. Here is subset of the data to give you an idea (commands shown below are executed in the InfluxDB CLI):
> SELECT "Haushaltstromzaehler - cnt" FROM "myhome_measurements" WHERE time >= '2018-02-02T10:00:00Z' AND time < '2018-02-02T11:00:00Z'
name: myhome_measurements
time Haushaltstromzaehler - cnt
---- --------------------------
2018-02-02T10:00:12.610811904Z 11725.638
2018-02-02T10:05:11.242021888Z 11725.673
2018-02-02T10:10:10.689827072Z 11725.707
2018-02-02T10:15:12.143326976Z 11725.736
2018-02-02T10:20:10.753357056Z 11725.768
2018-02-02T10:25:11.18448512Z 11725.803
2018-02-02T10:30:12.922032896Z 11725.837
2018-02-02T10:35:10.618788096Z 11725.867
2018-02-02T10:40:11.820355072Z 11725.9
2018-02-02T10:45:11.634203904Z 11725.928
2018-02-02T10:50:11.10436096Z 11725.95
2018-02-02T10:55:10.753853952Z 11725.973
Calculating the differences in the InfluxDB CLI is pretty straightforward with the difference() function. This gives me the electricity consumed within the 5 minutes intervals:
> SELECT difference("Haushaltstromzaehler - cnt") FROM "myhome_measurements" WHERE time >= '2018-02-02T10:00:00Z' AND time < '2018-02-02T11:00:00Z'
name: myhome_measurements
time difference
---- ----------
2018-02-02T10:05:11.242021888Z 0.03499999999985448
2018-02-02T10:10:10.689827072Z 0.033999999999650754
2018-02-02T10:15:12.143326976Z 0.02900000000045111
2018-02-02T10:20:10.753357056Z 0.0319999999992433
2018-02-02T10:25:11.18448512Z 0.03499999999985448
2018-02-02T10:30:12.922032896Z 0.033999999999650754
2018-02-02T10:35:10.618788096Z 0.030000000000654836
2018-02-02T10:40:11.820355072Z 0.03299999999944703
2018-02-02T10:45:11.634203904Z 0.028000000000247383
2018-02-02T10:50:11.10436096Z 0.02200000000084401
2018-02-02T10:55:10.753853952Z 0.02299999999922875
Where I struggle is getting this to work in a continuous query. Here is the command I used to setup the continuous query:
CREATE CONTINUOUS QUERY cq_Haushaltstromzaehler_cnt ON myhomedb
BEGIN
SELECT difference(sum("Haushaltstromzaehler - cnt")) AS "delta" INTO "Haushaltstromzaehler_delta" FROM "myhome_measurements" GROUP BY time(1h)
END
Looking in the InfluxDB log file I see that no data is written in the new 'delta' measurement from the continuous query execution:
...finished continuous query cq_Haushaltstromzaehler_cnt, 0 points(s) written...
After much troubleshooting and experimenting I now understand why no data is generated. Setting up a continuous query requires to use the GROUP BY time() statement. This in turn requires to use an aggregate function within the differences() function. The problem now is that the aggregate function returns only one value for the time period specified by GROUP BY time(). Obviously, the differences() function cannot calculate a difference from just one value. Essentially, continuous query executes a command like this:
> SELECT difference(sum("Haushaltstromzaehler - cnt")) FROM "myhome_measurements" WHERE time >= '2018-02-02T10:00:00Z' AND time < '2018-02-02T11:00:00Z' GROUP BY time(1h)
>
I'm now somewhat clueless as to how to make this work and appreciate any advice you might have.
Does it help using the last aggregate function? Not tested this as a cq yet.
Select difference(last(T1_Consumed)) AS T1_Delta, difference(last(T2_Consumed)) AS T2_Delta
from P1Data
where time >= 1551648871000000000 group by time(1h)
DIFFERENCE() would calculate delta from the "aggregated" value taken from previous group, not within current group.
So fill free to use selector function there - since your counters seemed to be cumulative, LAST() should be working well.

Last period not output in Kapacitor recorded/replayed data

I’m trying to aggregate data over the previous week hourly, and compute summary statistics like median, mean, etc.
I’m using something like this:
var weekly_median = batch
|query('''SELECT median("duration") as week_median
FROM "db"."default"."profiling_metrics"''')
.period(1w)
.every(1h)
.groupBy(*)
.align()
|influxDBOut()
.database('default')
.measurement('summary_metrics')
The query works as expected, except that when recording and replaying data to test with
kapacitor record batch -task medians -past 30d
kapacitor replay -task medians -recording $rid -rec-time
the data is missing for the last period (1 week in this case). If I change the period to 1 day, all data is replayed except the final day’s worth.
Is this a problem with my tickscript, the way I’m recording data, or the way I’m replaying it?
I see, I need to do the aggregation in Kapacitor, not Influx. This seems to be a known issue, but finding documentation on it was tricky. https://github.com/influxdata/kapacitor/issues/1257 and https://github.com/influxdata/kapacitor/issues/1258 were helpful. The solution is to instead do something like:
var weekly_median = batch
|query('''SELECT "duration"
FROM "db"."default"."profiling_metrics"
WHERE "result" =~ /passed/''')
.period(1w)
.every(1h)
.groupBy(*)
.align()
|median('duration')
.as('week_median')
|influxDBOut()
.database('default')
.measurement('summary_metrics')

Not able to view measurement after creating continuous query

I have a measurement that has the following fields in InfluxDB. The measurement name is data.
Measurement name: data
Time Device Interface Metric Value
2016-10-11T19:00:00Z device1_name int_name In_bits 10
2016-10-11T19:00:00Z device2_name int_name Out_bits 5
.
.
.
I have created a continuous query as follows:
CREATE CONTINUOUS QUERY "test_query" ON "db_name" BEGIN SELECT sum("value") as Sumin INTO "data.copy" FROM "data" where metric = 'In_bits' GROUP BY time(15m), device END
After creating this, how do I see the results saved in new measurement data.copy? After creating this query, I am not able to find the new measurement that has been created. If I'm doing something wrong, I would like to get more input on this matter. Thanks!

Esper very simple context and aggregation

I have a quite simple problem to modelize and I don't have experience in Esper, so I may be headed the wrong way so I'd like some insight.
Here's the scenario: I have one stream of events "ParkingEvent", with two types of events "SpotTaken" and "SpotFree". So I have an Esper context both partitioned by id and bordered by a starting event of type "SpotTaken" and an end event of type "SpotFree". The idea is to monitor a parking spot with a sensor and then aggregate data to count the number of times the spot has been taken and also the time occupation.
That's it, no time window or whatsoever, so it seems quite simple but I struggle aggregating data. Here's the code I got so far:
create context ParkingSpotOccupation
context PartionBySource
partition by source from SmartParkingEvent,
context ContextBorders
initiated by SmartParkingEvent(
type = "SpotTaken") as startEvent
terminated by SmartParkingEvent(
type = "SpotFree") as endEvent;
#Name("measurement_occupation")
context ParkingSpotOccupation
insert into CreateMeasurement
select
e.source as source,
"ParkingSpotOccupation" as type,
{
"startDate", min(e.time),
"endDate", max(e.time),
"duration", dateDifferenceInSec(max(e.time), min(e.time))
} as fragments
from
SmartParkingEvent e
output
snapshot when terminated;
I got the same data for min and max so I'm guessing I'm doing somthing wrong.
When I'm using context.ContextBorders.startEvent.time and context.ContextBorders.endEvent.time instead of min and max, the measurement_occupation statement is not triggered.
Given that measurements have already been computed by the EPL that you provided, this counts the number of times the spot has been taken (and freed) and totals up the duration:
select source, count(*), sum(duration) from CreateMeasurement group by source

Resources