Gluon TS. Next day forecast error and question - time-series

Good day!
I am trying to forecast for 1 day into the future with Gluon TS.
My dataset looks like this:
df:
Date Volume
Jan1 100 ...
June1 99
June2 105
June3 90
June4 NaN
How do I forecast 1 day into the future (June4)?
I have tried the following as an example:
test_data = ListDataset([{"start": df.index[0],
"target": df.Volume[:"June4"]}],
freq="D")
estimator = NBEATSEstimator(freq="D", prediction_length=1, context_length = 5,trainer=Trainer(epochs=60,ctx="gpu"))
predictor = estimator.train(training_data=test_data)
_However, I get an error: 'Got NaN in first epoch. Try reducing initial learning rate.'__
What should I do to forecast June4 if I have all previous data available (June3 and earlier)? What am I doing wrong?
Also, If I use target June3 instead (same dataset as above with data including June3 and June4 NaN value).
test_data = ListDataset([{"start": df.index[0],
"target": df.Volume[:"June3"]}],
freq="D")
estimator = NBEATSEstimator(freq="D", prediction_length=1, context_length = 5,trainer=Trainer(epochs=60,ctx="gpu"))
predictor = estimator.train(training_data=test_data)
Forecasting results that I am getting are super close to June3 results.
Does it simply replicates June3 results, or does it use June2 and earlier and then tries to predict 1 day into the future (June3)?

Related

RNN - aggregating predictions

I'm using a RNN to predict hourly electricity demand for an entire calendar day ahead, but am having problems truly understanding how the prediction window works with RNN/LSTM.
This is the basic architecture of the model:
model = Sequential()
model.add(InputLayer((24, 6))) # 24 timesteps, 6 variables of interest
model.add(GRU(128))
model.add(Flatten())
model.add(Dense(8, 'relu')) # convert to relu
model.add(Dense(24, 'linear')) # output 24
I had processed my data like this:
def df_to_vars(df, window_size = 24): # sliding windows of 24
df_as_np = df.to_numpy()
X = []
y = []
for i in range(len(df_as_np) - window_size):
row = [r for r in df_as_np[i:i + window_size]]
X.append(row)
label = df_as_np[i+window_size][0] # just demand value
y.append(label)
return np.array(X).squeeze(), np.array(y).squeeze()
Accordingly, the predictions are in the shape of:
([[ pred_1, pred_2, pred_3, ..., pred_22, pred_23, pred_24],
[...],
[...],
...,
[...],
[...],
[...]], dtype=float32)
However, I'm unsure how I can aggregate these to find the mean of a given prediction (and then I intend to aggregate them further into 24-hour day cum-sum buckets, if a reader knows an easier way to do this, I would also love any advice or links to known resources), or if this is not the right approach, how I should edit the architecture of the model to output my desired results. I'm open to any and all suggestions, and am currently in a fix.
I tried the above code and was expecting an output of 24 hours into the future for each prediction, but am not entirely sure how to interpret the results and make use of them from here for metric calculation and implementing forecasts.
Thank you kindly for all help.

Straight line forecasat with Statsmodels ARIMA

I am new to statsmodels ARIMA
What is the best approach to do ARIMA on such a dataset?
The goal is to forecast the Value of the different types of gas.
I have run Augmented Dickey-Fuller test and have concluded that data is stationary.
How do I get a more accurate forecast?
Date
T
RH
Gas
Value
6/2/2017
6.62
51.73
CO
845.23
6/2/2017
6.62
51.73
HC
626.34
#Initialising ARIMA model
from statsmodels.tsa.arima_model import ARIMA
arima_model = ARIMA(scaled_df.Value, order=(2,0,1)).fit()
arima_model.summary()
start = len(df)
end = len(df) + len(test) -1
test['Date'] = pd.to_datetime(test['Date'],format='%d/%m/%Y')
test.set_index('Date', inplace=True)
pred = arima_model.predict(start=start, end=end,typ='levels')
i think it might be due to your training data being too large, try splitting it into smaller chunks

evaluate the output of autoML results

How do I interpret following results? What is the best possible algorithm to train based on autogluon summary?
*** Summary of fit() ***
Estimated performance of each model:
model score_val fit_time pred_time_val stack_level
19 weighted_ensemble_k0_l2 -0.035874 1.848907 0.002517 2
18 weighted_ensemble_k0_l1 -0.040987 1.837416 0.002259 1
16 CatboostClassifier_STACKER_l1 -0.042901 1559.653612 0.083949 1
11 ExtraTreesClassifierGini_STACKER_l1 -0.047882 7.307266 1.057873 1
...
...
0 RandomForestClassifierGini_STACKER_l0 -0.291987 9.871649 1.054538 0
The code to generate the above results:
import pandas as pd
from autogluon import TabularPrediction as task
from sklearn.datasets import load_digits
digits = load_digits()
savedir = "otto_models/" # where to save trained models
train_data = pd.DataFrame(digits.data)
train_target = pd.DataFrame(digits.target)
train_data = pd.merge(train_data, train_target, left_index=True, right_index=True)
label_column = "0_y"
predictor = task.fit(
train_data=train_data,
label=label_column,
output_directory=savedir,
eval_metric="log_loss",
auto_stack=True,
verbosity=2,
visualizer="tensorboard",
)
results = predictor.fit_summary() # display detailed summary of fit() process
Which algorithm seems to work in this case?
weighted_ensemble_k0_l2 is the best result in terms of validation score (score_val) because it has the highest value. You may wish to do predictor.leaderboard(test_data) to get the test scores for each of the models.
Note that the result shows a negative score because AutoGluon always considers higher to be better. If a particular metric such as logloss prefers lower values to be better, AutoGluon flips the sign of the metric. I would guess a val_score of 0 would be a perfect score in your case.

vowpalwabbit strange features count

I have found that during training my model vw shows very big (much more than my features count ) feature number count in it's log.
I have tried to reproduce it using some small example:
simple.test:
-1 | 1 2 3
1 | 3 4 5
then "vw simple.test" command says that it have used 8 features. +one feature is constant but what are the other ? And in my real exmaple difference between my features and features used in wv is abot x10 more.
....
Num weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
using no cache
Reading datafile = t
num sources = 1
average since example example current current current
loss last counter weight label predict features
finished run
number of examples = 2
weighted example sum = 2
weighted label sum = 3
average loss = 1.9179
best constant = 1.5
total feature number = 8 !!!!
total feature number displays a sum of feature counts from all observed examples. So it's 2*(3+1 constant)=8 in your case. The number of features in current example is displayed in current features column. Note that only 2^Nth example is printed on screen by default. In general observations can have unequal number of features.

How to use SGD for time series analysis

Is it possible to use stochastic gradient descent for time-series analysis?
My initial idea, given a series of (t, v) pairs where I want an SGD regressor to predict the v associated with t+1, would be to convert the date/time into an integer value, and train the regressor on this list using the hinge loss function. Is this feasible?
Edit: This is example code using the SGD implementation in scikit-learn. However, it fails to properly predict a simple linear time series model. All it seems to do is calculate the average of the training Y-values, and use that as its prediction of the test Y-values. Is SGD just unsuitable for time-series-analysis or am I formulating this incorrectly?
from datetime import date
from sklearn.linear_model import SGDRegressor
# Build data.
s = date(2010,1,1)
i = 0
training = []
for _ in xrange(12):
i += 1
training.append([[date(2012,1,i).toordinal()], i])
testing = []
for _ in xrange(12):
i += 1
testing.append([[date(2012,1,i).toordinal()], i])
clf = SGDRegressor(loss='huber')
print 'Training...'
for _ in xrange(20):
try:
print _
clf.partial_fit(X=[X for X,_ in training], y=[y for _,y in training])
except ValueError:
break
print 'Testing...'
for X,y in testing:
p = clf.predict(X)
print y,p,abs(p-y)
SGDRegressor in sklearn is numerically not stable for not scaled input parameters. For good result it's highly recommended that you scale the input variable.
from datetime import date
from sklearn.linear_model import SGDRegressor
# Build data.
s = date(2010,1,1).toordinal()
i = 0
training = []
for _ in range(1,13):
i += 1
training.append([[s+i], i])
testing = []
for _ in range(13,25):
i += 1
testing.append([[s+i], i])
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train = scaler.fit_transform([X for X,_ in training])
after training the SGD regressor, you will have to scale the test input variable accordingly.
clf = SGDRegressor()
clf.fit(X=X_train, y=[y for _,y in training])
print(clf.intercept_, clf.coef_)
print('Testing...')
for X,y in testing:
p = clf.predict(scaler.transform([X]))
print(X[0],y,p[0],abs(p[0]-y))
Here is the result:
[6.31706122] [3.35332573]
Testing...
733786 13 12.631164799851827 0.3688352001481725
733787 14 13.602565350686039 0.39743464931396133
733788 15 14.573965901520248 0.42603409847975193
733789 16 15.545366452354457 0.45463354764554254
733790 17 16.51676700318867 0.48323299681133136
733791 18 17.488167554022876 0.5118324459771237
733792 19 18.459568104857084 0.5404318951429161
733793 20 19.430968655691295 0.569031344308705
733794 21 20.402369206525506 0.5976307934744938
733795 22 21.373769757359714 0.6262302426402861
733796 23 22.34517030819392 0.6548296918060785
733797 24 23.316570859028133 0.6834291409718674
The method of choice for time series prediction depends on what you know about your time series. If you choose a specific method for your task you always make implicit assumptions about the nature of your signal and the kind of system that generated the signal. Any method is always a model of the system. The more you know a priori about your signal and the system the better you are able to model it.
If your signal for instance is of stochastic nature, usually ARMA processes or Kalman filters are a good choice. If those fail, other more deterministic models might help, given, of corse, you have some information about you system.

Resources