Horrible forecasting KPI scores (RMSE, MAE and MAPE) using RNN with three flavors ("LSTM", "GRU" and "RNN") - time-series

My project is trying to forecast COVID 19 total confirmed case for 2021. This is the overlook of the confirm case data, which I use to train my RNN model.
enter image description here
The confirmed number though doesn't show any repeating pattern. But there has been research studies using RNN and LSTM on the same data (in fact, I use the same data source as them). This is the research study I drew inspiration from:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8523546/
This is the result I got:
beginning the training of the LSTM RNN:
Epoch 99: 100%|██████████| 29/29 [00:00<00:00, 33.10it/s, loss=0.115, v_num=logs, train_loss=0.0933, val_loss=0.466]
training of the LSTM RNN completed: 105.23 sec
Predicting DataLoader 0: 100%|██████████| 1/1 [00:00<00:00, 6.48it/s]
LSTM :
MAPE : 199.3029
RMSPE : 0.6659
RMSE : 0.6763
-R squared : 15366746993394475597824.0000
se : 0.0871
beginning the training of the GRU RNN:
Epoch 99: 100%|██████████| 29/29 [00:00<00:00, 37.86it/s, loss=0.11, v_num=logs, train_loss=0.0911, val_loss=0.474]
training of the GRU RNN completed: 90.99 sec
Predicting DataLoader 0: 100%|██████████| 1/1 [00:00<00:00, 3.91it/s]
GRU :
MAPE : 209.6602
RMSPE : 0.6771
RMSE : 0.6877
-R squared : 467975998695823638528.0000
se : 0.0871
beginning the training of the Vanilla RNN:
Epoch 99: 100%|██████████| 29/29 [00:00<00:00, 43.54it/s, loss=0.114, v_num=logs, train_loss=0.109, val_loss=0.461]
training of the Vanilla RNN completed: 79.32 sec
Predicting DataLoader 0: 100%|██████████| 1/1 [00:00<00:00, 6.08it/s]
Vanilla :
MAPE : 200.5710
RMSPE : 0.6673
RMSE : 0.6778
-R squared : 115942252717688101559392358367232.0000
se : 0.0871
Also, here is my prediction plot. Three of the plot looks exactly the same, different MAPE score:
enter image description here
For the package i use, i use darts (u8darts[all] when importing the packages).
for methodology, I set the parameters for my model as this. I used this article by Heiko Oinen for methodology (detailed code is below the page):
https://medium.com/towards-data-science/temporal-loops-intro-to-recurrent-neural-networks-for-time-series-forecasting-in-python-b0398963dc1f:
#Set up the models, run the models, plot and evaluate
EPOCH = 100
def run_RNN(flavor, ts, train, val):
#set the model up
model_RNN = RNNModel(
model = flavor,
model_name = flavor + str(" RNN"),
input_chunk_length = 12,
training_length = 20,
hidden_dim = 20,
batch_size = 16,
n_epochs = EPOCH,
dropout = 0,
optimizer_kwargs = {'lr': 1e-3},
log_tensorboard = True,
random_state = 42,
force_reset = True
)
if flavor == "RNN": flavor = "Vanilla"
#fit the model
fit_it(model_RNN, train, val, flavor)
#compute N predictions
pred = model_RNN.predict(n = FC_N,future_covariates = covariates)
#plot predictions vs actual
plot_fitted(pred, ts, flavor)
#print accuracy metrics
res_acc = accuracy_metrics(pred, ts)
I have even tries epoch to 300, but the train loss and loss during training didn't decrease further.
I haven't gotten experience in asking questions here, I'll try to articulate and provide more detail if you have questions. Thank you so much for your help!

Related

LSTM regression model flat prediction

This is a time series regression problem for the battery capacity as output and a single input variable as voltage; the relation is non-linear.
LSTM Model prediction of the test data always returns a semi-flat line, probably the mean of the output variable in the training data.
This is an example of predicted vs test set output values, with the following model parameters:
(Window size: 10, batch site: 256, LSTM nodes: 16)
Prediction of the test data
Data had been normalized, down-sampled to 1 sec and later to 3 sec, original sampling was 10 Hz.
I was suspecting the voltage fluctuation is the problem, but sampling at 3 seconds hadn't resulted into noticeable improvement.
Here are the data after being down-sampled to 3 seconds:
Normalized Training Data ; Y:SOC, X: Voltage
Normalized Test Data ; Y:SOC, X: Voltage
I've tried many changes in the model and learning parameters as follows, but still the behavior is the same.
That's why i think it's not a parameter tuning issue, rather the model is not learning at all.
LSTM layer: always single, followed by Dense with no options.
LSTM nodes: [4,8,16,32]
Epoch: : [16,32,64,128]
window size (input vector depth): [8,32,64,128]
Batch size: [32,64,128,256]
learning rate: [.0005,.0001,.001]
optimizer : ADAM, options:[ none, clipnorm=1, clipvalue=0.5]
Model specification Code:
backend.clear_session()
model1 = Sequential()
model1.add(LSTM(16,input_shape=(win_sz, features_cnt) )) # stateless
model1.add(layers.Dense(1))
model1.summary()
Model training and validation Code:
n_epochs = 12
iterations = tr_samples_sh_cnt // batch_sz_tr
loss = tf.keras.losses.MeanAbsoluteError()
optimizer = tf.optimizers.Adam(learning_rate = 0.001)
loss_history = []
#tf.function
def train_model_on_batch():
start = epoch * batch_sz_tr
X_batch = df_feat_tr_3D[start:start+batch_sz_tr, :, :]
y_batch = df_SOC_tr_2D[start:start+batch_sz_tr, :]
with tf.GradientTape() as tape:
current_loss = loss(model1(X_batch), y_batch)
gradients = tape.gradient(current_loss, model1.trainable_variables)
optimizer.apply_gradients(zip(gradients, model1.trainable_variables))
return current_loss
for epoch in range(n_epochs+1):
for iteration in range(iterations):
current_loss = train_model_on_batch()
if epoch % 1 == 0:
loss_history.append(current_loss.numpy())
print("{}. \t\tLoss: {}".format(
epoch, loss_history[-1]))
print('\nTraining complete.')
P_test = model1.predict(df_feat_test_3D)
After adding sigmoid activation function in both LSTM and Dense layers, a very small change observed, but far from reasonable fit.
Prediction of the test data after adding activation function
The problem was the activation function as #Dr. Snoopy recommended

Conv nets accuracy not changing while loss decreases

I'm training several CNNs to do image classification in TensorFlow. The training losses decrease normally. However the test accuracy never changed throughout the whole training procedure, plus the accuracy is very low (0.014) where the accuracy for randomly guessing would be 0.003 (There are around 300 classes). One thing I've noticed is that only those models that I applied batch norm to showed such a weird behavior. What can possibly be wrong to cause this issue? The training set has 80000 samples, in case you might figure this was caused by overfitting. Below is part of the code for evaluation:
Accuracy function:
correct_prediction = tf.equal(tf.argmax(Model(test_image), 1), tf.argmax(test_image_label, 0))
accuracy = tf.cast(correct_prediction, tf.float32)
the test_image is a batch with only one sample in it while the test_image_label is a scalar.
Session:
with tf.Session() as sess:
sess.run(tf.local_variables_initializer())
sess.run(tf.global_variables_initializer())
saver = tf.train.Saver()
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord, start=True)
print('variables initialized')
step = 0
for epoch in range(epochs):
sess.run(enqueue_train)
print('epoch: %d' %epoch)
if epoch % 5 == 0:
save_path = saver.save(sess, savedir + "/Model")
for batch in range(num_batch):
if step % 400 == 0:
summary_str = cost_summary.eval(feed_dict={phase: True})
file_writer.add_summary(summary_str, step)
else:
sess.run(train_step, feed_dict={phase: True})
step += 1
sess.run(train_close)
sess.run(enqueue_test)
accuracy_vector = []
for num in range(len(testnames)):
accuracy_vector.append(sess.run(accuracy, feed_dict={phase: False}))
mean_accuracy = sess.run(tf.divide(tf.add_n(accuracy_vector), len(testnames)))
print("test accuracy %g"%mean_accuracy)
sess.run(test_close)
save_path = saver.save(sess, savedir + "/Model_final")
coord.request_stop()
coord.join(threads)
file_writer.close()
The phase above is to indicate if it is training or testing for the batch norm layer.
Note that I tried to calculate the accuracy with the training set, which led to the minimal loss. However it gives the same poor accuracy. Please help me, I really appreciate it!

Dropout Tensorflow: Weight scaling at test time

Do I need to scale weights at test time in tensorflow i.e weights*keep_prob at testing or tensorflow does it itself? if so then how?
At training my keep_prob is 0.5. and at test time its 1.
Although network is regularized but accuracy is not good as before regularization.
P.S i'm classifying CIFAR10
n_nodes_h1=1000
n_nodes_h2=1000
n_nodes_h3=400
n_nodes_h4=100
classes=10
x=tf.placeholder('float',[None,3073])
y=tf.placeholder('float')
keep_prob=tf.placeholder('tf.float32')
batch_size=100
def neural_net(data):
hidden_layer1= {'weight':tf.Variable(tf.random_normal([3073,n_nodes_h1])),
'biases':tf.Variable(tf.random_normal([n_nodes_h1]))}
hidden_layer2={'weight':tf.Variable(tf.random_normal([n_nodes_h1,n_nodes_h2])),
'biases':tf.Variable(tf.random_normal([n_nodes_h2]))}
out_layer={'weight':tf.Variable(tf.random_normal([n_nodes_h2,classes])),
'biases':tf.Variable(tf.random_normal([classes]))}
l1= tf.add(tf.matmul(data,hidden_layer1['weight']), hidden_layer1['biases'])
l1=tf.nn.relu(l1)
#************DROPOUT*******************
l1=tf.nn.dropout(l1,keep_prob)
l2= tf.add(tf.matmul(l1,hidden_layer2['weight']), hidden_layer2['biases'])
l2=tf.nn.relu(l2)
out= tf.matmul(l2,out_layer['weight'])+ out_layer['biases']
return out
This was network
iterations=20
Train_loss=[]
Test_loss=[]
def train_nn(x):
prediction=neural_net(x)
cost=tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=prediction,labels=y))
optimizer=tf.train.AdamOptimizer().minimize(cost)
epochs=iterations
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for epoch in range (epochs):
e_loss=0
i=0
for _ in range(int(X_train.shape[0]/batch_size)):
e_x=X_train[i:i+batch_size]
e_y=y_hot_train[i:i+batch_size]
i+=batch_size
_,c=sess.run([optimizer,cost],feed_dict={x:e_x,y:e_y, keep_prob:0.5})
e_loss+=c
print "Epoch: ",epoch," Train loss= ",e_loss
Train_loss.append(e_loss)
correct=tf.equal(tf.argmax(prediction,1),tf.argmax(y,1))
accuracy=tf.reduce_mean(tf.cast(correct,'float'))
print "Accuracy on test: " ,accuracy.eval({x:X_test,y:y_hot_test , keep_prob:1.})
print "Accuracy on train:" ,accuracy.eval({x:X_train[0:2600],y:y_hot_train[0:2600], keep_prob=1.})
train_nn(x)
Do I need something like
hidden_layer1['weight']*=keep_prob
#testing time
Tensorflow does it itself:
With probability keep_prob, outputs the input element scaled up by 1 /
keep_prob, otherwise outputs 0. The scaling is so that the expected
sum is unchanged.
(from this page)

How to use a fixed validation set (not K-fold cross validation) in Scikit-learn for a decision tree classifier/random forest classifier?

I am new to machine learning and data science. Sorry, if it is a very stupid question.
I see there is an inbuilt function for cross-validation but not for a fixed validation set. I have a dataset with 50,000 samples labeled with years from 1990 to 2010. I need to train different classifiers on 1990-2008 samples, then validate on 2009 samples, and test on 2010 samples.
EDIT:
After #Quan Tran's answer, I tried this. This is how it should be?
# Fit a decision tree
estimator1 = DecisionTreeClassifier( max_depth = 9, max_leaf_nodes=9)
estimator1.fit(X_train, y_train)
print estimator1
# validate using validation set
acc = np.zeros((20,20)) # store accuracy
for i in range(20):
for j in range(20):
estimator1 = DecisionTreeClassifier(max_depth = i+1, max_leaf_nodes=j+2)
estimator1.fit(X_valid, y_valid)
y_pred = estimator1.predict(X_valid)
acc[i,j] = accuracy_score(y_valid, y_pred)
best_mod = np.where(acc == acc.max())
print best_mod
print acc[best_mod]
# Predict target values
estimator1 = DecisionTreeClassifier(max_depth = int(best_mod[0]) + 1, max_leaf_nodes= int(best_mod[1]) + 2)
estimator1.fit(X_valid, y_valid)
y_pred = estimator1.predict(X_test)
confusion = metrics.confusion_matrix(y_test, y_pred)
TP = confusion[1, 1]
TN = confusion[0, 0]
FP = confusion[0, 1]
FN = confusion[1, 0]
# Classification Accuracy
print "======= ACCURACY ========"
print((TP + TN) / float(TP + TN + FP + FN))
print accuracy_score(y_valid, y_pred)
# store the predicted probabilities for class
y_pred_prob = estimator1.predict_proba(X_test)[:, 1]
# plot a ROC curve for y_test and y_pred_prob
fpr, tpr, thresholds = metrics.roc_curve(y_test, y_pred_prob)
plt.plot(fpr, tpr)
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.0])
plt.title('ROC curve for DecisionTreeClassifier')
plt.xlabel('False Positive Rate (1 - Specificity)')
plt.ylabel('True Positive Rate (Sensitivity)')
plt.grid(True)
print("======= AUC ========")
print(metrics.roc_auc_score(y_test, y_pred_prob))
I get this answer, which is not the best accuracy.
DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=9,
max_features=None, max_leaf_nodes=9, min_samples_leaf=1,
min_samples_split=2, min_weight_fraction_leaf=0.0,
presort=False, random_state=None, splitter='best')
(array([5]), array([19]))
[ 0.8489011]
======= ACCURACY ========
0.574175824176
0.538461538462
======= AUC ========
0.547632099893
In this case, there are three separate sets. The train set, the test set and the validation set.
The train set is used to fit the parameters of the classifier. For example:
clf = DecisionTreeClassifier(max_depth=2)
clf.fit(trainfeatures, labels)
The validation set is used to tune the hyper parameters of the classifier or find the cutoff point for the training procedure. For example, in the case of Decision tree, max_depth is a hyper parameter. You will need to find a good set of hyper parameters by experimenting with different values of hyper parameters (tuning) and compare the performance measures (accuracy/precision,..) on the validation set.
The test set is used to estimate the error rate on unseen data. After having the performance measures on the test set, the model must not be trained/tuned any further.

Keras RNN loss does not decrease over epoch

I built a RNN using Keras. The RNN is used to solve a regression problem:
def RNN_keras(feat_num, timestep_num=100):
model = Sequential()
model.add(BatchNormalization(input_shape=(timestep_num, feat_num)))
model.add(LSTM(input_shape=(timestep_num, feat_num), output_dim=512, activation='relu', return_sequences=True))
model.add(BatchNormalization())
model.add(LSTM(output_dim=128, activation='relu', return_sequences=True))
model.add(BatchNormalization())
model.add(TimeDistributed(Dense(output_dim=1, activation='relu'))) # sequence labeling
rmsprop = RMSprop(lr=0.00001, rho=0.9, epsilon=1e-08)
model.compile(loss='mean_squared_error',
optimizer=rmsprop,
metrics=['mean_squared_error'])
return model
The whole process looks fine. But the loss stays the exact same over epochs.
61267 in the training set
6808 in the test set
Building training input vectors ...
888 unique feature names
The length of each vector will be 888
Using TensorFlow backend.
Build model...
# Each batch has 1280 examples
# The training data are shuffled at the beginning of each epoch.
****** Iterating over each batch of the training data ******
Epoch 1/3 : Batch 1/48 | loss = 11011073.000000 | root_mean_squared_error = 3318.232910
Epoch 1/3 : Batch 2/48 | loss = 620.271667 | root_mean_squared_error = 24.904161
Epoch 1/3 : Batch 3/48 | loss = 620.068665 | root_mean_squared_error = 24.900017
......
Epoch 1/3 : Batch 47/48 | loss = 618.046448 | root_mean_squared_error = 24.859678
Epoch 1/3 : Batch 48/48 | loss = 652.977051 | root_mean_squared_error = 25.552946
****** Epoch 1: RMSD(training) = 24.897174
Epoch 2/3 : Batch 1/48 | loss = 607.372620 | root_mean_squared_error = 24.644049
Epoch 2/3 : Batch 2/48 | loss = 599.667786 | root_mean_squared_error = 24.487448
Epoch 2/3 : Batch 3/48 | loss = 621.368103 | root_mean_squared_error = 24.926300
......
Epoch 2/3 : Batch 47/48 | loss = 620.133667 | root_mean_squared_error = 24.901398
Epoch 2/3 : Batch 48/48 | loss = 639.971924 | root_mean_squared_error = 25.297264
****** Epoch 2: RMSD(training) = 24.897174
Epoch 3/3 : Batch 1/48 | loss = 651.519836 | root_mean_squared_error = 25.523636
Epoch 3/3 : Batch 2/48 | loss = 673.582581 | root_mean_squared_error = 25.952084
Epoch 3/3 : Batch 3/48 | loss = 613.930054 | root_mean_squared_error = 24.776562
......
Epoch 3/3 : Batch 47/48 | loss = 624.460327 | root_mean_squared_error = 24.988203
Epoch 3/3 : Batch 48/48 | loss = 629.544250 | root_mean_squared_error = 25.090448
****** Epoch 3: RMSD(training) = 24.897174
I do NOT think it is normal. Do I miss something?
UPDATE:
I find that all predictions are always zero after all epochs. This is the reason why all RMSDs are all the same because the predictions are all the same, i.e. 0. I checked the training y. It only contains just a few zeros. So it is not due to data imbalance.
So now I am thinking if it is because of the layers and activation that I am using.
Your RNN functions seems to be ok.
The speed of reduction in loss depends on optimizer and learning rate.
Any how you are using decay rate 0.9. try with bigger learning rate, any how it is going to decrease with 0.9 rate.
Try out other optimizers with different learning rates
Other optimizers available with keras: https://keras.io/optimizers/
Many times, some optimizers work well on some data sets while some may fails.
Have you tried changing activation function from relu to softmax?
Relu activation has the tendency to diverge. However, if initializing the weight with eigenmatrix may result in a better convergence.
Since you are using RNNs for regression problem (not for classification), you should use 'linear' activation at the last layer.
In your code,
model.add(TimeDistributed(Dense(output_dim=1, activation='relu'))) # sequence labeling
change to activation='linear' instead of 'relu'.
If it doesn't work, remove activation='relu' in second layer.
Also learning rate for rmsprop usually ranges from 0.1 to 0.0001.

Resources