TCN model result seems like offset by one time step - machine-learning

I have been training a temperature dataset with TCN model. I Have tested it on small data available. I am training a bivariate TCN model with the data which I am giving input is Maximum temperature and minimum temperature to predict maximum temperature. I want to know if this is overfitting or if the graph is right Graph here
Below is my model
i = Input(shape=(lookback_window, 2))
m = TCN(nb_filters = 64)(i)
m = Dense(3, activation='linear')(m)
model = Model(inputs=[i], outputs=[m])
model.summary()
The summary of the model is given here

evaluate accuracy during epochs:
acc = sklearn.metrics.accuracy_score(y_true, y_pred)

Related

Discriminator's loss stuck at value = 1 while training conditional GAN

I am training a conditional GAN that generates image time series (similar to video prediction). I built a conditional GAN based on this paper. However, several probelms happened when I was training the cGAN.
Problems of training cGAN:
The discriminator's loss stucks at one.
It seems like the generator's loss is not effected by discriminator no matter how I adjust the hyper parameters related to the discriminator.
Training loss of discriminator
D_loss = (fake_D_loss + true_D_loss) / 2
fake_D_loss = Hinge_loss(D(G(x, z)))
true_D_loss = Hinge_loss(D(x, y))
The margin of hinge loss = 1
Training loss of generator
D_loss = -torch.mean(D(G(x,z))
G_loss = weighted MAE
Gradient flow of discriminator
Gradient flow of generator
Several settings of the cGAN:
The output layer of discriminator is linear sum.
The discriminator is trained twice per epoch while the generator is only trained once.
The number of neurons of the generator and discriminator are exactly the same as the paper.
I replaced the ReLU (original setting) to LeakyReLU to avoid nan.
I added gradient norm to avoid gradient vanishing problem.
Other hyper parameters are listed as follows:
Hyper parameters
Paper
Mine
number of input images
4
4
number of predicted images
18
10
batch size
16
16
opt_g, opt_d
Adam
Adam
lr_g
5e-5
5e-5
lr_d
2e-4
2e-4
The loss function I use for discriminator.
def HingeLoss(pred, validity, margin=1.):
if validity:
loss = F.relu(margin - pred)
else:
loss = F.relu(margin + pred)
return loss.mean()
The loss function for examining the validity of predicted image from generator.
def HingeLossG(pred):
return -torch.mean(pred)
I use the trainer of pytorch_lightning to train the model. The training codes I wrote are as follows.
def training_step(self, batch, batch_idx, optimizer_idx):
x, y = batch
x.requires_grad = True
if self.n_sample > 1:
pred = [self(x) for _ in range(self.n_sample)]
pred = torch.mean(torch.stack(pred, dim=0), dim=0)
else:
pred = self(x)
##### TRAIN DISCRIMINATOR #####
if optimizer_idx == 1:
true_D_loss = self.discriminator_loss(self.discriminator(x, y), True)
fake_D_loss = self.discriminator_loss(self.discriminator(x, pred.detach()), False)
D_loss = (fake_D_loss + true_D_loss) / 2
return D_loss
##### TRAIN GENERATOR #####
if optimizer_idx == 0:
G_loss = self.generator_loss(pred, y)
GD_loss = self.generator_d_loss(self.discriminator(x, pred.detach()))
train_G_loss = G_loss + GD_loss
return train_G_loss
I have several guesses of why these problems may occur:
Since the original model predicts 18 frames rather than 10 frames (my version), maybe the number of neurons in the original generator is too much for my case (predicting 10 frames), leading an exceedingly powerful generator that breaks the balance of training. However, I've tried to lower the learning rate of generator to 1e-5 (original 5e-5) or increase the training times of discriminator to 3 to 5 times. It seems that the loss curve of generator didn't much changed.
Various results of training cGAN
I have also adjust the weights of generator's loss, but the same problems still occurred.
The architecture codes of this model: https://github.com/hyungting/DGMR-pytorch

MLP Time Series Forecasting Model not learning the correct distribution

The goal is to build a model that can rank the stocks based on their Target Return.
The dataset I am using is structured the following way:
Date
Stock_Id
Volume
Open
Close
High
Low
Target
2022-01-04
8341
103422
2734
2742
2755
2730
0.0007304601899196
The data is chronological, and all the stocks are grouped by day. There are only 2000 stocks.
Using the Open, High, Low, Close, and Volume features, I was able to generate about 65+ new features using talib library
The Neural Net takes in all 74 total features and is meant to predict the Target number of each stock. The Target represents the rate of change of the stock between t+2 and t+1.
What I've done:
I have normalized the data accross time. This means that I take all of the time domain data for any given StockId and use it to compute mean and std in order to normalize the data.
The dataloader indexes into the dataset by day. This acts as a mini batch since when getitem is called a (2000, 74) tensor is returned.
The train dataloader is set to shuffle, so the indexing is random.
The loss function is meant to optimize for the highest returns all while keeping the standard deviation of those returns in check.
Because of the non chronological nature of the dataloader and the fact that I need all outputs to compute mean and std, the training loop using optim.SGD but acts as Gradient Descent.
During training, the loss converges very well, but the predictions are very off.
After training, I use the test set to view the accuracy of the prediction. This is the distribution of the predictions. And this is the distribution of the true targets.
The range of the Targets percentages are mainly between -15 and 15. Where as the predictions have a smaller range, -0.07 to -0.05.
I have tried running the model using different learning rates, changing the hidden layer sizes in the network, multiplying the targets by 100 to represent them as percentages, changing the timeperiod argument in many of the talib features. But I always get a model that poorly represents the data.
More Info:
The data is from 1300 trading days with each day containing data for 2000 stocks.
I have adjusted the Open, Close, High, Low prices before normalizing.
I have tried different neural network sizes. Having input layer length 74, and output being 1, with any arbitrary number of layers and sizes in between.
This is what the training loop looks like.
loss_lst = []
for epoch in range(50): # loop over the dataset multiple times
optimizer.zero_grad()
running_error = torch.tensor(0.0)
running_return = []
running_loss = torch.tensor(0.0)
for i, data in enumerate(trainloader, 0):
# get the inputs; data is a list of [inputs, labels]
inputs, labels, date = data
inputs = inputs.cuda().squeeze()
labels = labels.cuda().squeeze()
# forward pass
outputs = net(inputs).squeeze()
error = error_criterion(outputs, labels) # Error between prediction and label
running_error += error.cpu()
day_return = return_criterion(outputs,labels).cpu()
running_return.append(day_return) # Array of daily returns that will be used to compute the std for the loss function.
del outputs
del inputs
del labels
avg_error = running_error / len(trainloader)
std_return = torch.stack(running_return).cuda().std()
loss = (mean_hp * avg_error) + (std_hp * std_return)
loss.backward()
optimizer.step()
running_loss += loss.item()
loss_lst.append(loss.item())
print(f'[{epoch + 1}] loss: {running_loss:.10f}')
# loss_lst.append(sum(running_loss)/len(trainloader))
running_loss = 0.0
print('Finished Training')
Let me know if any additional clarifications are needed.

LSTM time series forecast for multiple time series' ( single model vs individual model performance)

I would like to use deep(stacked) lstm model to forecast the disk utilisation for all my clusters. But what I am experiencing is my individual model representing a single clusters- time series is giving more accurate performance, when compared to a single model representing all the time series'. Single model is a kind of averaging out the forecast though I am including the cluster-id as part of a feature passed to the model.
My training sequence logic goes as given below, where my time series
has [ 'utilisation','clusterID'] as two features.
for i in range(len(sequences)):
# find the end of this pattern
end_ix = i + n_steps_in
out_end_ix = end_ix + n_steps_out-1
# check if we are beyond the dataset
if out_end_ix > len(sequences):
break
# gather input and output parts of the pattern
seq_x, seq_y = sequences[i:end_ix, :], sequences[end_ix-1:out_end_ix, 0]
X.append(seq_x)
y.append(seq_y)
return array(X), array(y)```
- **Model**
trainX = trainX.reshape((trainX.shape[0], trainX.shape[1],features))
trainY = trainY.reshape((trainY.shape[0], trainY.shape[1]))
input = Input(batch_input_shape=(batch_size,look_back,features), name='input', dtype='float32')
optimizer = keras.optimizers.RMSprop(learning_rate=0.0001, rho=0.9, epsilon=None, decay=0.0)
model = Sequential()
model.add(Bidirectional(LSTM(800, return_sequences=True,return_state=False,input_shape=(look_back, features),stateful=stateful)))
model.add((LSTM(800, return_sequences=True,return_state=False,stateful=stateful)))
model.add(attention(return_sequences=False))
model.add(Dense(units=out_num))
model.compile(metrics=[rmse],run_eagerly=True,loss='mean_squared_error', optimizer='adam')
history = model.fit(trainX, trainY, batch_size=5000, epochs=100, verbose=2, validation_split=0.1)
- **and prediction logic as follows**
testX, testY = create_dataset(predictList.values,look_back,out_num)
reshaped_test=np.reshape(testX[0],(1,look_back,2))
futureStepPredict = model.predict(reshaped_test)
I was expecting an LSTM model can be trained on multiple timeseries at one time itself and can give individual forecast with almost the same accuracy as if it was considered as a single model based on the cluster-id provided as input, is this a wrong expectation?

PyTorch: Predicting future values with LSTM

I'm currently working on building an LSTM model to forecast time-series data using PyTorch. I used lag features to pass the previous n steps as inputs to train the network. I split the data into three sets, i.e., train-validation-test split, and used the first two to train the model. My validation function takes the data from the validation data set and calculates the predicted valued by passing it to the LSTM model using DataLoaders and TensorDataset classes. Initially, I've got pretty good results with R2 values in the region of 0.85-0.95.
However, I have an uneasy feeling about whether this validation function is also suitable for testing my model's performance. Because the function now takes the actual X values, i.e., time-lag features, from the DataLoader to predict y^ values, i.e., predicted target values, instead of using the predicted y^ values as features in the next prediction. This situation seems far from reality where the model has no clue of the real values of the previous time steps, especially if you forecast time-series data for longer time periods, say 3-6 months.
I'm currently a bit puzzled about tackling this issue and defining a function to predict future values relying on the model's values rather than the actual values in the test set. I have the following function predict, which makes a one-step prediction, but I haven't really figured out how to predict the whole test dataset using DataLoader.
def predict(self, x):
# convert row to data
x = x.to(device)
# make prediction
yhat = self.model(x)
# retrieve numpy array
yhat = yhat.to(device).detach().numpy()
return yhat
You can find how I split and load my datasets, my constructor for the LSTM model, and the validation function below. If you need more information, please do not hesitate to reach out to me.
Splitting and Loading Datasets
def create_tensor_datasets(X_train_arr, X_val_arr, X_test_arr, y_train_arr, y_val_arr, y_test_arr):
train_features = torch.Tensor(X_train_arr)
train_targets = torch.Tensor(y_train_arr)
val_features = torch.Tensor(X_val_arr)
val_targets = torch.Tensor(y_val_arr)
test_features = torch.Tensor(X_test_arr)
test_targets = torch.Tensor(y_test_arr)
train = TensorDataset(train_features, train_targets)
val = TensorDataset(val_features, val_targets)
test = TensorDataset(test_features, test_targets)
return train, val, test
def load_tensor_datasets(train, val, test, batch_size=64, shuffle=False, drop_last=True):
train_loader = DataLoader(train, batch_size=batch_size, shuffle=shuffle, drop_last=drop_last)
val_loader = DataLoader(val, batch_size=batch_size, shuffle=shuffle, drop_last=drop_last)
test_loader = DataLoader(test, batch_size=batch_size, shuffle=shuffle, drop_last=drop_last)
return train_loader, val_loader, test_loader
Class LSTM
class LSTMModel(nn.Module):
def __init__(self, input_dim, hidden_dim, layer_dim, output_dim, dropout_prob):
super(LSTMModel, self).__init__()
self.hidden_dim = hidden_dim
self.layer_dim = layer_dim
self.lstm = nn.LSTM(
input_dim, hidden_dim, layer_dim, batch_first=True, dropout=dropout_prob
)
self.fc = nn.Linear(hidden_dim, output_dim)
def forward(self, x, future=False):
h0 = torch.zeros(self.layer_dim, x.size(0), self.hidden_dim).requires_grad_()
c0 = torch.zeros(self.layer_dim, x.size(0), self.hidden_dim).requires_grad_()
out, (hn, cn) = self.lstm(x, (h0.detach(), c0.detach()))
out = out[:, -1, :]
out = self.fc(out)
return out
Validation (defined within a trainer class)
def validation(self, val_loader, batch_size, n_features):
with torch.no_grad():
predictions = []
values = []
for x_val, y_val in val_loader:
x_val = x_val.view([batch_size, -1, n_features]).to(device)
y_val = y_val.to(device)
self.model.eval()
yhat = self.model(x_val)
predictions.append(yhat.cpu().detach().numpy())
values.append(y_val.cpu().detach().numpy())
return predictions, values
I've finally found a way to forecast values based on predicted values from the earlier observations. As expected, the predictions were rather accurate in the short-term, slightly becoming worse in the long term. It is not so surprising that the future predictions digress over time, as they no longer depend on the actual values. Reflecting on my results and the discussions I had on the topic, here are my take-aways:
In real-life cases, the real values can be retrieved and fed into the model at each step of the prediction -be it weekly, daily, or hourly- so that the next step can be predicted with the actual values from the previous step. So, testing the performance based on the actual values from the test set may somewhat reflect the real performance of the model that is maintained regularly.
However, for predicting future values in the long term, forecasting, if you will, you need to make either multiple one-step predictions or multi-step predictions that span over the time period you wish to forecast.
Making multiple one-step predictions based on the values predicted the model yields plausible results in the short term. As the forecasting period increases, the predictions become less accurate and therefore less fit for the purpose of forecasting.
To make multiple one-step predictions and update the input after each prediction, we have to work our way through the dataset one by one, as if we are going through a for-loop over the test set. Not surprisingly, this makes us lose all the computational advantages that matrix operations and mini-batch training provide us.
An alternative could be predicting sequences of values, instead of predicting the next value only, say using RNNs with multi-dimensional output with many-to-many or seq-to-seq structure. They are likely to be more difficult to train and less flexible to make predictions for different time periods. An encoder-decoder structure may prove useful for solving this, though I have not implemented it by myself.
You can find the code for my function that forecasts the next n_steps based on the last row of the dataset X (time-lag features) and y (target value). To iterate over each row in my dataset, I would set batch_size to 1 and n_features to the number of lagged observations.
def forecast(self, X, y, batch_size=1, n_features=1, n_steps=100):
predictions = []
X = torch.roll(X, shifts=1, dims=2)
X[..., -1, 0] = y.item(0)
with torch.no_grad():
self.model.eval()
for _ in range(n_steps):
X = X.view([batch_size, -1, n_features]).to(device)
yhat = self.model(X)
yhat = yhat.to(device).detach().numpy()
X = torch.roll(X, shifts=1, dims=2)
X[..., -1, 0] = yhat.item(0)
predictions.append(yhat)
return predictions
The following line shifts values in the second dimension of the tensor by one so that a tensor [[[x1, x2, x3, ... , xn ]]] becomes [[[xn, x1, x2, ... , x(n-1)]]].
X = torch.roll(X, shifts=1, dims=2)
And, the line below selects the first element from the last dimension of the 3d tensor and sets that item to the predicted value stored in the NumPy ndarray (yhat), [[xn+1]]. Then, the new input tensor becomes [[[x(n+1), x1, x2, ... , x(n-1)]]]
X[..., -1, 0] = yhat.item(0)
Recently, I've decided to put together the things I had learned and the things I would have liked to know earlier. If you'd like to have a look, you can find the links down below. I hope you'll find it useful. Feel free to comment or reach out to me if you agree or disagree with any of the remarks I made above.
Building RNN, LSTM, and GRU for time series using PyTorch
Predicting future values with RNN, LSTM, and GRU using PyTorch

Problems with Naive Bayes implemented on Amazon fine food reviews dataset

cv accuracy cv accuracy graph test accuracy
I am trying to implement Naive bayes on fine food reviews dataset of amazon. Can you review the code and tell why there is such a big difference between cross validation accuracy and test accuracy?
Conceptually is there anything wrong with the below code?
#BOW()
from sklearn.feature_extraction.text import CountVectorizer
bow = CountVectorizer(ngram_range = (2,3))
bow_vect = bow.fit(X_train["F_review"].values)
bow_sparse = bow_vect.transform(X_train["F_review"].values)
X_bow = bow_sparse
y_bow = y_train
roc = []
accuracy = []
f1 = []
k_value = []
for i in range(1,50,2):
BNB =BernoulliNB(alpha =i)
print("************* for alpha = ",i,"*************")
x = (cross_validate(BNB, X_bow,y_bow, scoring = ['accuracy','f1','roc_auc'], return_train_score = False, cv = 10))
print(x["test_roc_auc"].mean())
print("-----c------break------c-------break-------c-----------")
roc.append(x['test_roc_auc'].mean())#This is the ROC metric
accuracy.append(x['test_accuracy'].mean())#This is the accuracy metric
f1.append(x['test_f1'].mean())#This is the F1 score
k_value.append(i)
#BOW Test prediction
BNB =BernoulliNB(alpha= 1)
BNB.fit(X_bow, y_bow)
y_pred = BNB.predict(bow_vect.transform(X_test["F_review"]))
print("Accuracy Score: ",accuracy_score(y_test,y_pred))
print("ROC: ", roc_auc_score(y_test,y_pred))
print("Confusion Matrix: ", confusion_matrix(y_test,y_pred))
Use one of the metric to find the optimal alpha value. Then train BernoulliNB on test data.
And don't consider Accuracy for performance measurement as it is prone to imbalanced dataset.
Before doing anything, please change values given in loop as mentioned by Kalsi in the comment.
Have alpha values as said above in a list
find maximum AUC value and its index.
Use the above index to find optimal alpha.

Resources