How to specify the features to forecast in a multivariate time series using Encoder-Decoder LSTM? - time-series

I am working on forecasting multi-steps of a multivariate time series using Encoder-Decoder Time Series. The forecasting is done autonomously i.e. each of the input feature is forecasted.
I want to specify the feature that will be forecasted without forecasting other features.
This is because I want to modify some of the features (like inject a set of different data) and make the prediction to see if the forecasting model is able to capture the data modified in a series to make the forecast of the other features (specified) accurately.
I am currently using keras framework for the implementation of the Enc-Dec LSTM. Below is the code snippet:
n_features = 8
n_steps_in, n_steps_out = 50, 100
model = Sequential()
model.add(LSTM(50, activation='relu', input_shape=(n_steps_in, n_features)))
model.add(LSTM(50, activation-'relu', return_sequences=True))
model.compile(optimizer='adam', loss='mse')
train_history =, y,epochs=50, verbose=2, shuffle=False)
Can someone help, please?
I have tried introducing another layer before the last layer and change the value of n_features in the last layer, but this only reflects the number of feature to forecast. The problem is I need to specify which feature I am forecasting and which on I am dropping from the predicting process.
model.add(TimeDistributed(Dense(50, activation='relu')))


Is there a way to increase the variance of model's prediction?

I created a randomly generated(using numpy, between range 30 and 60) Data of about 12000 points (to
generate an artificial time-series data for more than a year in Time).
Now I am trying to fit that data points in an LSTM model and forecast
based upon that.
The LSTM model i applied,(here data is a single series so n_features = 1, and steps-in and out are for sequence-generation function for time-series, i took both equal to 5. Also the for the activation functions i tried all with both relu, both tanh and 1st tanh & 2nd relu (as shown here))
X, y = split_sequences(data, n_steps_in, n_steps_out)
n_features = X.shape[2]
model = Sequential()
model.add(LSTM(200, activation='tanh', input_shape=(n_steps_in,
model.add(LSTM(200, activation='relu', return_sequences=True))
opt = keras.optimizers.Adam(learning_rate=0.05)
model.compile(optimizer=opt, loss='mse'), y, epochs= n, batch_size=10, verbose=1,
workers=4, use_multiprocessing = True, initial_epoch = 0)
I also tried smoothening of the data-points as they are randomly
distributed (in the predefined boundaries).
and then applied the model on the smoothed data, but still i am getting similar results.
for e.g., In this image showing both the smoothed-training data and the forecasted-prediction from the model
plt.plot(Training_data, 'g')
Every time the models are giving straight lines in prediction.
and which is obvious since it is a set of random numbers so model tends to get to a mean value between the upper and lower limits of the data, but still is there any way to generate a somewhat real looking model.
P.S-1 - I have also tried applying different models like prophet, sarima, arima.
But i think i need to find a way to increase the Variance of the prediction, which i am unable to find.
PS-2 - Sorry for the long question i am new to deep-learning so i tried to explain more.

How to forecast one output with multiple features by LSTM model?

I am playing with some stocks timeseries data and trying to predict the trend with multivariate features. Below is the sample dataset I have which including different technical indicators including moving average, Parabolic SAR etc for each stocks. From different online sources, most of them are predicting one stock with one feature like "Close" price a time. How can I make use all the stocks' features to predict one output let's say S&P's close price. I know it may not help boosting the prediction accuracy but I am not sure what I am training right now and hope having more insight on LSTM model.
Basically, I put the whole dataset in and do the scaling and training stuffs. How could the prediction being specified on one column?
scaler = MinMaxScaler(feature_range = (0,1))
scaled_feature_data = scaler.fit_transform(feature_data)
X_train, y_train = training_set[:, :-1], training_set[:, -1]
X_test, y_test = testing_set[:, :-1], testing_set[:, -1]
X_train = X_train.reshape((X_train.shape[0],1,X_train.shape[1]))
X_test = X_test.reshape((X_test.shape[0],1,X_test.shape[1]))
model_lstm.add(LSTM(50, return_sequences = True, input_shape = (X_train.shape[1], X_train.shape[2])))
model_lstm.add(LSTM(50, return_sequences = True, input_shape = (X_train.shape[1], X_train.shape[2])))
model_lstm.add(LSTM(units=50, return_sequences=True))
model_lstm.add(LSTM(units=50, return_sequences=True))
model_lstm.add(Dense(units=1, activation='relu'))
This is a regression problem. You are predicting one variable. So last layers may look like
model_lstm.compile(loss='mse', optimizer='rmsprop')
Also, you probably don't need to return_sequences for each of input tokens. If return_sequences=true, then an output will be a matrix (actually (batch_size, num_tokens, num_features)), that cannot be automatically flattened to vector (batch_size, num_features) that is expected to be an input of Dense(1) layer. Just use an output of a last LSTM node. For this, set return_sequences=false. Its values depends on previous tokens, so you won't lose much information from them.
The whole model may look like this:
model_lstm = Sequential()
model_lstm.add(LSTM(50, return_sequences=False, input_shape=(X_train.shape[1], X_train.shape[2])))
model_lstm.compile(loss='mse', optimizer='rmsprop')
If you want more layers it will become:
model_lstm = Sequential()
model_lstm.add(LSTM(64, return_sequences=True, dropout=0.5, input_shape=(X_train.shape[1], X_train.shape[-1])))
model_lstm.add(LSTM(32, return_sequences=False, dropout=0.5))
model_lstm.compile(loss='mse', optimizer='rmsprop')
Actually I am wondering how can all features, lets say 10 technical indicator features, can help predict the one price column?
Don't know if i understand you correctly.But thats what a ML does, it tries to find corelations between the features and how they can be used to predict something. So maybe, the "DE30" has (or seems to have) a influence on the price and is therfore helpful. Was that your question?
rom different online sources, most of them are predicting one stock with one feature like "Close" price a time
I guess that for simplification. Therefore they used only one feature
Let my know if that was what you asked for..

How to improve accuracy with keras multi class classification?

I am trying to do multi class classification with tf keras. I have total 20 labels and total data I have is 63952and I have tried the following code
features = features.astype(float)
labels = df_test["label"].values
encoder = LabelEncoder()
encoded_Y = encoder.transform(labels)
dummy_y = np_utils.to_categorical(encoded_Y)
def baseline_model():
model = Sequential()
model.add(Dense(50, input_dim=3, activation='relu'))
model.add(Dense(40, activation='softmax'))
model.add(Dense(30, activation='softmax'))
model.add(Dense(20, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
return model
history =,dummy_y,
I have a very poor accuray with this. How can I improve that ?
softmax activations in the intermediate layers do not make any sense at all. Change all of them to relu and keep softmax only in the last layer.
Having done that, and should you still be getting unsatisfactory accuracy, experiment with different architectures (different numbers of layers and nodes) with a short number of epochs (say ~ 50), in order to get a feeling of how your model behaves, before going for a full fit with your 5,000 epochs.
You did not give us vital information, but here are some guidelines:
1. Reduce the number of Dense layer - you have a complicated layer with a small amount of data (63k is somewhat small). You might experience overfitting on your train data.
2. Did you check that the test has the same distribution as your train?
3. Avoid using softmax in middle Dense layers - softmax should be used in the final layer, use sigmoid or relu instead.
4. Plot a loss as a function of epoch curve and check if it is reduces - you can then understand if your learning rate is too high or too small.

Keras model accuracy not improving

I'm trying to train a neural network to predict the ratings for players in FIFA 18 by easports (ratings are between 64-99). I'm using their players database ( and I've processed the data into training_x, testing_x, training_y, testing_y. Each of the training samples is a numpy array containing 7 values...the first 6 are the different stats of the player (shooting, passing, dribbling, etc) and the last value is the position of the player (which I mapped between 1-8, depending on the position), and each of the testing values is a single integer between 64-99, representing the rating of that player.
I've tried many different hyperparameters, including changing the activation functions to tanh and relu, and I've tried adding a batch normalization layer after the first dense layer (I thought that it might be useful since one of my features is very small and the other features are between 50-99), I've played around with the SGD optimizer (changed the learning rate, momentum, even tried changing the optimizer to Adam), tried different loss functions, added/removed dropout layers, and tried different regularizers for the weights of the model.
model = Sequential()
model.add(Dense(64, input_shape=(7,),
//batch normalization?
model.add(Dense(64, kernel_regularizer=regularizers.l2(0.01),
model.add(Dense(32, kernel_regularizer=regularizers.l2(0.01),
model.add(Dense(1, activation='linear'))
sgd = optimizers.SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='mean_absolute_error', metrics=['accuracy'],
optimizer=sgd), training_y, epochs=50, batch_size=128, shuffle=True)
When I train the model, the loss is always nan and the accuracy is always 0, even though I've tried adjusting a lot of different parameters. However, if I remove the last feature from my data, the position of the players, and update the input shape of the first dense layer, the model actually "trains" and ends up with around 6% accuracy no matter what parameters I change. In that case, I've found that the model only predicts 79 to be the player's rating. What am I doing inherently wrong?
You can try the following steps :
Use mean squared error loss function.
Use Adam which will help you converge faster with low learning rate like 0.0001 or 0.001. Otherwise, try using the RMSprop optimizer.
Use the default regularizers. That is none actually.
Since this is a regression task, use activation function like ReLU in all the layers except the output layer ( including the input layer ). Use linear activation in output layer.
As mentioned in the comments by #pooyan , normalize the features. See here. Even try standardizing the features. Use whichever suites the best.

Non-linear multivariate time-series response prediction using RNN

I am trying to predict the hygrothermal response of a wall, given the interior and exterior climate. Based on literature research, I believe this should be possible with RNN but I have not been able to get good accuracy.
The dataset has 12 input features (time-series of exterior and interior climate data) and 10 output features (time-series of hygrothermal response), both containing hourly values for 10 years. This data was created with hygrothermal simulation software, there is no missing data.
Dataset features:
Dataset targets:
Unlike most time-series prediction problems, I want to predict the response for the full length of the input features time-series at each time-step, rather than the subsequent values of a time-series (eg financial time-series prediction). I have not been able to find similar prediction problems (in similar or other fields), so if you know of one, references are very welcome.
I think this should be possible with RNN, so I am currently using LSTM from Keras. Before training, I preprocess my data the following way:
Discard first year of data, as the first time steps of the hygrothermal response of the wall is influenced by the initial temperature and relative humidity.
Split into training and testing set. Training set contains the first 8 years of data, the test set contains the remaining 2 years.
Normalise training set (zero mean, unit variance) using StandardScaler from Sklearn. Normalise test set analogously using mean an variance from training set.
This results in: X_train.shape = (1, 61320, 12), y_train.shape = (1, 61320, 10), X_test.shape = (1, 17520, 12), y_test.shape = (1, 17520, 10)
As these are long time-series, I use stateful LSTM and cut the time-series as explained here, using the stateful_cut() function. I only have 1 sample, so batch_size is 1. For T_after_cut I have tried 24 and 120 (24*5); 24 appears to give better results. This results in X_train.shape = (2555, 24, 12), y_train.shape = (2555, 24, 10), X_test.shape = (730, 24, 12), y_test.shape = (730, 24, 10).
Next, I build and train the LSTM model as follows:
model = Sequential()
model.compile(loss='mean_squared_error', optimizer=Adam()), y_train, epochs=100, batch_size=batch=batch_size, verbose=2, shuffle=False)
Unfortunately, I don't get accurate prediction results; not even for the training set, thus the model has high bias.
The prediction results of the LSTM model for all targets
How can I improve my model? I have already tried the following:
Not discarding the first year of the dataset -> no significant difference
Differentiating the input features time-series (subtract previous value from current value) -> slightly worse results
Up to four stacked LSTM layers, all with the same hyperparameters -> no significant difference in results but longer training time
Dropout layer after LSTM layer (though this is usually used to reduce variance and my model has high bias) -> slightly better results, but difference might not be statistically significant
Am I doing something wrong with the stateful LSTM? Do I need to try different RNN models? Should I preprocess the data differently?
Furthermore, training is very slow: about 4 hours for the model above. Hence I am reluctant to do an extensive hyperparameter gridsearch...
In the end, I managed to solve this the following way:
Using more samples to train instead of only 1 (I used 18 samples to train and 6 to test)
Keep the first year of data, as the output time-series for all samples have the same 'starting point' and the model needs this information to learn
Standardise both input and output features (zero mean, unit variance). I found this improved prediction accuracy and training speed
Use stateful LSTM as described here, but add reset states after epoch (see below for code). I used batch_size = 6 and T_after_cut = 1460. If T_after_cut is longer, training is slower; if T_after_cut is shorter, accuracy decreases slightly. If more samples are available, I think using a larger batch_size will be faster.
use CuDNNLSTM instead of LSTM, this speed up the training time x4!
I found that more units resulted in higher accuracy and faster convergence (shorter training time). Also I found that the GRU is as accurate as the LSTM tough converged faster for the same number of units.
Monitor validation loss during training and use early stopping
The LSTM model is build and trained as follows:
def define_reset_states_batch(nb_cuts):
class ResetStatesCallback(Callback):
def __init__(self):
self.counter = 0
def on_batch_begin(self, batch, logs={}):
# reset states when nb_cuts batches are completed
if self.counter % nb_cuts == 0:
self.counter += 1
def on_epoch_end(self, epoch, logs={}):
# reset states after each epoch
model = Sequential()
model.add(layers.CuDNNLSTM(256, batch_input_shape=(batch_size,T_after_cut ,features),
model.add(layers.TimeDistributed(layers.Dense(targets, activation='linear')))
optimizer = RMSprop(lr=0.002)
model.compile(loss='mean_squared_error', optimizer=optimizer)
earlyStopping = EarlyStopping(monitor='val_loss', min_delta=0.005, patience=15, verbose=1, mode='auto')
ResetStatesCallback = define_reset_states_batch(nb_cuts), y_dev, epochs=n_epochs, batch_size=n_batch, verbose=1, shuffle=False, validation_data=(X_eval,y_eval), callbacks=[ResetStatesCallback(), earlyStopping])
This gave me very statisfying accuracy (R2 over 0.98):
This figure shows the temperature (left) and relative humidity (right) in the wall over 2 years (data not used in training), prediction in red and true output in black. The residuals show that the error is very small and that the LSTM learns to capture the long-term dependencies to predict the relative humidity.
