How to forecast one output with multiple features by LSTM model? - machine-learning

I am playing with some stocks timeseries data and trying to predict the trend with multivariate features. Below is the sample dataset I have which including different technical indicators including moving average, Parabolic SAR etc for each stocks. From different online sources, most of them are predicting one stock with one feature like "Close" price a time. How can I make use all the stocks' features to predict one output let's say S&P's close price. I know it may not help boosting the prediction accuracy but I am not sure what I am training right now and hope having more insight on LSTM model.
Basically, I put the whole dataset in and do the scaling and training stuffs. How could the prediction being specified on one column?
Code:
scaler = MinMaxScaler(feature_range = (0,1))
scaled_feature_data = scaler.fit_transform(feature_data)
X_train, y_train = training_set[:, :-1], training_set[:, -1]
X_test, y_test = testing_set[:, :-1], testing_set[:, -1]
X_train = X_train.reshape((X_train.shape[0],1,X_train.shape[1]))
X_test = X_test.reshape((X_test.shape[0],1,X_test.shape[1]))
model_lstm.add(LSTM(50, return_sequences = True, input_shape = (X_train.shape[1], X_train.shape[2])))
Model:
model_lstm.add(LSTM(50, return_sequences = True, input_shape = (X_train.shape[1], X_train.shape[2])))
model_lstm.add(Dropout(0.2))
model_lstm.add(LSTM(units=50, return_sequences=True))
model_lstm.add(Dropout(0.2))
model_lstm.add(LSTM(units=50, return_sequences=True))
model_lstm.add(Dropout(0.2))
model_lstm.add(LSTM(units=50))
model_lstm.add(Dropout(0.2))
model_lstm.add(Dense(units=1, activation='relu'))

This is a regression problem. You are predicting one variable. So last layers may look like
model_lstm.add(Dense(1))
model_lstm.compile(loss='mse', optimizer='rmsprop')
Also, you probably don't need to return_sequences for each of input tokens. If return_sequences=true, then an output will be a matrix (actually (batch_size, num_tokens, num_features)), that cannot be automatically flattened to vector (batch_size, num_features) that is expected to be an input of Dense(1) layer. Just use an output of a last LSTM node. For this, set return_sequences=false. Its values depends on previous tokens, so you won't lose much information from them.
The whole model may look like this:
model_lstm = Sequential()
model_lstm.add(LSTM(50, return_sequences=False, input_shape=(X_train.shape[1], X_train.shape[2])))
model_lstm.add(Dropout(0.5))
model_lstm.add(Dense(1))
model_lstm.compile(loss='mse', optimizer='rmsprop')
If you want more layers it will become:
model_lstm = Sequential()
model_lstm.add(LSTM(64, return_sequences=True, dropout=0.5, input_shape=(X_train.shape[1], X_train.shape[-1])))
model_lstm.add(Dropout(0.5))
model_lstm.add(LSTM(32, return_sequences=False, dropout=0.5))
model_lstm.add(Dense(1))
model_lstm.compile(loss='mse', optimizer='rmsprop')

Actually I am wondering how can all features, lets say 10 technical indicator features, can help predict the one price column?
Don't know if i understand you correctly.But thats what a ML does, it tries to find corelations between the features and how they can be used to predict something. So maybe, the "DE30" has (or seems to have) a influence on the price and is therfore helpful. Was that your question?
rom different online sources, most of them are predicting one stock with one feature like "Close" price a time
I guess that for simplification. Therefore they used only one feature
Let my know if that was what you asked for..

Related

How to specify the features to forecast in a multivariate time series using Encoder-Decoder LSTM?

I am working on forecasting multi-steps of a multivariate time series using Encoder-Decoder Time Series. The forecasting is done autonomously i.e. each of the input feature is forecasted.
I want to specify the feature that will be forecasted without forecasting other features.
This is because I want to modify some of the features (like inject a set of different data) and make the prediction to see if the forecasting model is able to capture the data modified in a series to make the forecast of the other features (specified) accurately.
I am currently using keras framework for the implementation of the Enc-Dec LSTM. Below is the code snippet:
n_features = 8
n_steps_in, n_steps_out = 50, 100
model = Sequential()
model.add(LSTM(50, activation='relu', input_shape=(n_steps_in, n_features)))
model.add(RepeatVector(n_steps_out))
model.add(LSTM(50, activation-'relu', return_sequences=True))
model.add(TimeDistributed(Dense(n_features)))
model.compile(optimizer='adam', loss='mse')
train_history = model.fit(X, y,epochs=50, verbose=2, shuffle=False)
Can someone help, please?
I have tried introducing another layer before the last layer and change the value of n_features in the last layer, but this only reflects the number of feature to forecast. The problem is I need to specify which feature I am forecasting and which on I am dropping from the predicting process.
model.add(TimeDistributed(Dense(50, activation='relu')))

Is there a way to increase the variance of model's prediction?

I created a randomly generated(using numpy, between range 30 and 60) Data of about 12000 points (to
generate an artificial time-series data for more than a year in Time).
Now I am trying to fit that data points in an LSTM model and forecast
based upon that.
The LSTM model i applied,(here data is a single series so n_features = 1, and steps-in and out are for sequence-generation function for time-series, i took both equal to 5. Also the for the activation functions i tried all with both relu, both tanh and 1st tanh & 2nd relu (as shown here))
X, y = split_sequences(data, n_steps_in, n_steps_out)
n_features = X.shape[2]
model = Sequential()
model.add(LSTM(200, activation='tanh', input_shape=(n_steps_in,
n_features)))
model.add(RepeatVector(n_steps_out))
model.add(LSTM(200, activation='relu', return_sequences=True))
model.add(TimeDistributed(Dense(n_features)))
opt = keras.optimizers.Adam(learning_rate=0.05)
model.compile(optimizer=opt, loss='mse')
model.fit(X, y, epochs= n, batch_size=10, verbose=1,
workers=4, use_multiprocessing = True, initial_epoch = 0)
I also tried smoothening of the data-points as they are randomly
distributed (in the predefined boundaries).
and then applied the model on the smoothed data, but still i am getting similar results.
for e.g., In this image showing both the smoothed-training data and the forecasted-prediction from the model
plt.plot(Training_data, 'g')
plt.plot(Pred_Forecasts,'r')
Every time the models are giving straight lines in prediction.
and which is obvious since it is a set of random numbers so model tends to get to a mean value between the upper and lower limits of the data, but still is there any way to generate a somewhat real looking model.
P.S-1 - I have also tried applying different models like prophet, sarima, arima.
But i think i need to find a way to increase the Variance of the prediction, which i am unable to find.
PS-2 - Sorry for the long question i am new to deep-learning so i tried to explain more.

How to improve accuracy with keras multi class classification?

I am trying to do multi class classification with tf keras. I have total 20 labels and total data I have is 63952and I have tried the following code
features = features.astype(float)
labels = df_test["label"].values
encoder = LabelEncoder()
encoder.fit(labels)
encoded_Y = encoder.transform(labels)
dummy_y = np_utils.to_categorical(encoded_Y)
Then
def baseline_model():
model = Sequential()
model.add(Dense(50, input_dim=3, activation='relu'))
model.add(Dense(40, activation='softmax'))
model.add(Dense(30, activation='softmax'))
model.add(Dense(20, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
return model
finally
history = model.fit(data,dummy_y,
epochs=5000,
batch_size=50,
validation_split=0.3,
shuffle=True,
callbacks=[ch]).history
I have a very poor accuray with this. How can I improve that ?
softmax activations in the intermediate layers do not make any sense at all. Change all of them to relu and keep softmax only in the last layer.
Having done that, and should you still be getting unsatisfactory accuracy, experiment with different architectures (different numbers of layers and nodes) with a short number of epochs (say ~ 50), in order to get a feeling of how your model behaves, before going for a full fit with your 5,000 epochs.
You did not give us vital information, but here are some guidelines:
1. Reduce the number of Dense layer - you have a complicated layer with a small amount of data (63k is somewhat small). You might experience overfitting on your train data.
2. Did you check that the test has the same distribution as your train?
3. Avoid using softmax in middle Dense layers - softmax should be used in the final layer, use sigmoid or relu instead.
4. Plot a loss as a function of epoch curve and check if it is reduces - you can then understand if your learning rate is too high or too small.

Why should we normalize data for deep learning in Keras?

I was testing some network architectures in Keras for classifying the MNIST dataset. I have implemented one that is similar to the LeNet.
I have seen that in the examples that I have found on the internet, there is a step of data normalization. For example:
X_train /= 255
I have performed a test without this normalization and I have seen that the performance (accuracy) of the network has decreased (keeping the same number of epochs). Why has this happened?
If I increase the number of epochs, the accuracy can reach the same level reached by the model trained with normalization?
So, the normalization affects the accuracy, or only the training speed?
The complete source code of my training script is below:
from keras.models import Sequential
from keras.layers.convolutional import Conv2D
from keras.layers.convolutional import MaxPooling2D
from keras.layers.core import Activation
from keras.layers.core import Flatten
from keras.layers.core import Dense
from keras.datasets import mnist
from keras.utils import np_utils
from keras.optimizers import SGD, RMSprop, Adam
import numpy as np
import matplotlib.pyplot as plt
from keras import backend as k
def build(input_shape, classes):
model = Sequential()
model.add(Conv2D(20, kernel_size=5, padding="same",activation='relu',input_shape=input_shape))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
model.add(Conv2D(50, kernel_size=5, padding="same", activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
model.add(Flatten())
model.add(Dense(500))
model.add(Activation("relu"))
model.add(Dense(classes))
model.add(Activation("softmax"))
return model
NB_EPOCH = 4 # number of epochs
BATCH_SIZE = 128 # size of the batch
VERBOSE = 1 # set the training phase as verbose
OPTIMIZER = Adam() # optimizer
VALIDATION_SPLIT=0.2 # percentage of the training data used for
evaluating the loss function
IMG_ROWS, IMG_COLS = 28, 28 # input image dimensions
NB_CLASSES = 10 # number of outputs = number of digits
INPUT_SHAPE = (1, IMG_ROWS, IMG_COLS) # shape of the input
(X_train, y_train), (X_test, y_test) = mnist.load_data()
k.set_image_dim_ordering("th")
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255
X_train = X_train[:, np.newaxis, :, :]
X_test = X_test[:, np.newaxis, :, :]
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')
y_train = np_utils.to_categorical(y_train, NB_CLASSES)
y_test = np_utils.to_categorical(y_test, NB_CLASSES)
model = build(input_shape=INPUT_SHAPE, classes=NB_CLASSES)
model.compile(loss="categorical_crossentropy",
optimizer=OPTIMIZER,metrics=["accuracy"])
history = model.fit(X_train, y_train, batch_size=BATCH_SIZE, epochs=NB_EPOCH, verbose=VERBOSE, validation_split=VALIDATION_SPLIT)
model.save("model2")
score = model.evaluate(X_test, y_test, verbose=VERBOSE)
print('Test accuracy:', score[1])
Normalization is a generic concept not limited only to deep learning or to Keras.
Why to normalize?
Let me take a simple logistic regression example which will be easy to understand and to explain normalization.
Assume we are trying to predict if a customer should be given loan or not. Among many available independent variables lets just consider Age and Income.
Let the equation be of the form:
Y = weight_1 * (Age) + weight_2 * (Income) + some_constant
Just for sake of explanation let Age be usually in range of [0,120] and let us assume Income in range of [10000, 100000]. The scale of Age and Income are very different. If you consider them as is then weights weight_1 and weight_2 may be assigned biased weights. weight_2 might bring more importance to Income as a feature than to what weight_1 brings importance to Age. To scale them to a common level, we can normalize them. For example, we can bring all the ages in range of [0,1] and all incomes in range of [0,1]. Now we can say that Age and Income are given equal importance as a feature.
Does Normalization always increase the accuracy?
Apparently, No. It is not necessary that normalization always increases accuracy. It may or might not, you never really know until you implement. Again it depends on at which stage in you training you apply normalization, on whether you apply normalization after every activation, etc.
As the range of the values of the features gets narrowed down to a particular range because of normalization, its easy to perform computations over a smaller range of values. So, usually the model gets trained a bit faster.
Regarding the number of epochs, accuracy usually increases with number of epochs provided that your model doesn't start over-fitting.
A very good explanation for Normalization/Standardization and related terms is here.
In a nutshell, normalization reduces the complexity of the problem your network is trying to solve. This can potentially increase the accuracy of your model and speed up the training. You bring the data on the same scale and reduce variance. None of the weights in the network are wasted on doing a normalization for you, meaning that they can be used more efficiently to solve the actual task at hand.
As #Shridhar R Kulkarni says, normalization is a general concept and doesn’t only apply to keras.
It’s often applied as part of data preparation for ML learning models to change numeric values in the dataset to fit a standard scale without distorting the differences in their ranges. As such, normalization enhances the cohesion of entity types within a model by reducing the probability of inconsistent data.
However, not every other dataset and use case requires normalization, it’s primarily necessary when features have different ranges. You may use when;
You want to improve your model’s convergence efficiency and make
optimization feasible
When you want to make training less sensitive to scale features, you can better
solve coefficients.
Want to improve analysis from multiple models.
Normalization is not recommended when;
-Using decision tree models or ensembles based on them
-Your data is not normally distributed- you may have to use other data pre-
processing techniques
-If your dataset comprises already scaled variables
In some cases, normalization can improve performance. However, it is not always necessary.
The critical thing is to understand your dataset and scenario first, then you’ll know whether you need it or not. Sometimes, you can experiment to see if it gives you good performance or not.
Check out deepchecks and see how to deal with important data-related checks you come across in ML.
For example, to check duplicated data in your set, you can use the following code detailed code
from deepchecks.checks.integrity.data_duplicates import DataDuplicates
from deepchecks.base import Dataset, Suite
from datetime import datetime
import pandas as pd
I think there are some issue with the convergence of the optimizer function too. Here i show a simple linear regression. Three examples:
First with an array with small values and it works as expected.
Second an array with bigger values and the loss function explodes toward infinity, suggesting the need to normalize. And at the end in model 3 the same array as case two but it has been normalized and we get convergence.
github colab enabled ipython notebook
I've use the MSE optimizer function i don't know if other optimizers suffer the same issues.

Sequence labeling in Keras

I'm working on sentence labeling problem. I've done embedding and padding by myself and my inputs look like:
X_i = [[0,1,1,0,2,3...], [0,1,1,0,2,3...], ..., [0,0,0,0,0...], [0,0,0,0,0...], ....]
For every word in sentence I want to predict one of four classes, so my desired output should look like:
Y_i = [[1,0,0,0], [0,0,1,0], [0,1,0,0], ...]
My simple network architecture is:
model = Sequential()
model.add(LSTM(input_shape = (emb,),input_dim=emb, output_dim=hidden, return_sequences=True))
model.add(TimeDistributedDense(output_dim=4))
model.add(Activation('softmax'))
model.compile(loss='binary_crossentropy', optimizer='adam')
model.fit(X_train, Y_train, batch_size=32, nb_epoch=3, validation_data=(X_test, Y_test), verbose=1, show_accuracy=True)
It shows approximately 95% while training, but when I'm trying to predict new sentences using trained model results are really bad. It looks like model just learnt some classes for first words and shows it every time. I think the problem can is:
Written by myself padding (zero vectors in the end of the sentence), can it make learning worse?
I should try to learn sentences of different length, without padding (if yes, can you help me how train such kind of a model in Keras?)
Wrong objective of learning, but I tried mean squared error, binary cross entropy and others, it doesn't change.
Something with TimeDistributedDense and softmax, I think, that I've got how it works, but still not 100% sure.
I'll be glad to see any hint or help regarding to this problem, thank you!
I personally think that you misunderstand what "sequence labeling" means.
Do you mean:
X is a list of sentences, each element X[i] is a word sequence of arbitrary length?
Y[i] is the category of X[i], and the one hot form of Y[i] is a [0, 1, 0, 0] like array?
If it is, then it's not a sequence labeling problem, it's a classification problem.
Don't use TimeDistributedDense, and if it is a multi-class classification problem, i.e., len(Y[i]) > 2, then use "categorical_crossentropy" instead of "binary_crossentropy"

Resources