Is it okay to use STATEFUL Recurrent NN (LSTM) for classification - machine-learning

I have a dataset C of 50,000 (binary) samples each of 128 features. The class label is also binary either 1 or -1. For instance, a sample would look like this [1,0,0,0,1,0, .... , 0,1] [-1]. My goal is to classify the samples based on the binary classes( i.e., 1 or -1). I thought to try using Recurrent LSTM to generate a good model for classification. To do so, I have written the following code using Keras library:
tr_C, ts_C, tr_r, ts_r = train_test_split(C, r, train_size=.8)
batch_size = 200
print('>>> Build STATEFUL model...')
model = Sequential()
model.add(LSTM(128, batch_input_shape=(batch_size, C.shape[1], C.shape[2]), return_sequences=False, stateful=True))
model.add(Dense(1, activation='softmax'))
print('>>> Training...')
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(tr_C, tr_r,
batch_size=batch_size, epochs=1, shuffle=True,
validation_data=(ts_C, ts_r))
However, I am getting bad accuracy, not more than 55%. I tried to change the activation function along with the loss function hoping to improve the accuracy but nothing works. Surprisingly, when I use Multilayer Perceptron, I get very good accuracy around 97%. Thus, I start questioning if LSTM can be used for classification or maybe my code here has something missing or it is wrong. Kindly, I want to know if the code has something missing or wrong to improve the accuracy. Any help or suggestion is appreciated.

You cannot use softmax as an output when you have only a single output unit as it will always output you a constant value of 1. You need to either change output activation to sigmoid or set output units number to 2 and loss to categorical_crossentropy. I would advise the first option.

Related

How to improve accuracy with keras multi class classification?

I am trying to do multi class classification with tf keras. I have total 20 labels and total data I have is 63952and I have tried the following code
features = features.astype(float)
labels = df_test["label"].values
encoder = LabelEncoder()
encoder.fit(labels)
encoded_Y = encoder.transform(labels)
dummy_y = np_utils.to_categorical(encoded_Y)
Then
def baseline_model():
model = Sequential()
model.add(Dense(50, input_dim=3, activation='relu'))
model.add(Dense(40, activation='softmax'))
model.add(Dense(30, activation='softmax'))
model.add(Dense(20, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
return model
finally
history = model.fit(data,dummy_y,
epochs=5000,
batch_size=50,
validation_split=0.3,
shuffle=True,
callbacks=[ch]).history
I have a very poor accuray with this. How can I improve that ?
softmax activations in the intermediate layers do not make any sense at all. Change all of them to relu and keep softmax only in the last layer.
Having done that, and should you still be getting unsatisfactory accuracy, experiment with different architectures (different numbers of layers and nodes) with a short number of epochs (say ~ 50), in order to get a feeling of how your model behaves, before going for a full fit with your 5,000 epochs.
You did not give us vital information, but here are some guidelines:
1. Reduce the number of Dense layer - you have a complicated layer with a small amount of data (63k is somewhat small). You might experience overfitting on your train data.
2. Did you check that the test has the same distribution as your train?
3. Avoid using softmax in middle Dense layers - softmax should be used in the final layer, use sigmoid or relu instead.
4. Plot a loss as a function of epoch curve and check if it is reduces - you can then understand if your learning rate is too high or too small.

Whats the output for Keras categorical_accuracy metrics?

I cant find proper description of metrics outputs.
For example if I use
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
then I get loss and accuracy tr_loss, tr_acc = model.train_on_batch(X, Y)
if I compile with metrics=['categorical_accuracy'] then I get 2 numbers as well,
but what are they ?
EDIT: I did this: print(model.metrics_names) and got: ['loss', 'categorical_accuracy']
The accuracy metric is actually a placeholder and keras chooses the appropriate accuracy metric for you, between binary_accuracy if you use binary_crossentropy loss, and categorical_accuracy if you use categorical_crossentropy loss.
So in this specific case, both metrics (accuracy and categorical_accuracy) are literally the same, and model.evaluate return loss and accuracy.
Could you please post the two numbers you mentioned?
I guess they are loss (categorical_crossentropy in your case) and metrics you added. (accuracy or categorical_accuracy as configured in your case).

Find fundamental frequency of signal

I have 1000 datasets, each of them consists 8000 amplitudes of signal and a label - the fundamental frequency of this signal. What is the best approach to build a neural network to predict fundamental frequency for newly provided signal?
For example:
Fundamental freq: 75.88206932 Hz
Snippet of data:
-9.609272558949627507e-02
-4.778297441391140543e-01
-2.434520972570237696e-01
-1.567176020112603263e+00
-1.020037056101358752e+00
-1.129608807811322446e+00
4.303651786855859918e-01
-3.936956061582048694e-01
-1.224883726737033163e+00
-1.776803300708089672e+00
The model I've created: (the training set shape: (600,8000,1))
model=Sequential()
model.add(Conv1D(filters=64, kernel_size=3, activation='tanh', \
input_shape=(data.shape[1],data.shape[2])))
model.add(MaxPooling1D(pool_size=2))
model.add(Conv1D(filters=64, kernel_size=3, activation='tanh'))
model.add(MaxPooling1D(pool_size=2))
model.add(Conv1D(filters=64, kernel_size=3, activation='tanh'))
model.add(MaxPooling1D(pool_size=2))
model.add(Flatten())
model.add(Dense(500, activation='tanh'))
model.add(Dropout(0.2))
model.add(Dense(50, activation='tanh'))
model.add(Dropout(0.2))
model.add(Dense(1, activation='linear'))
model.compile(loss='mean_squared_error', optimizer='adam', metrics=["accuracy"])
But the model doesn't want to train. Accuracy ~ 0.0.
I do appreciate any advice.
What is the best approach to build a neural network to predict
fundamental frequency for newly provided signal?
That is a way too-broad question for SO, and consequently you should not really expect any sufficiently detailed meaningful answer.
That said, there are certain issues with your code, and rectifying them will arguably move you a step closer to achieving your end goal.
So, you are making a very fundamental mistake:
Accuracy is suitable only for classification problems; for regression (i.e. numeric prediction) ones, such as yours, accuracy is meaningless.
What's more, the fact is that Keras unfortunately will not "protect" you or any other user from putting such meaningless requests in your code, i.e. you will not get any error, or even a warning, that you are attempting something that does not make sense, such as requesting the accuracy in a regression setting; see my answer in What function defines accuracy in Keras when the loss is mean squared error (MSE)? for more details and a practical demonstration.
So, here your performance metric is actually the same as your loss, i.e. the Mean Squared Error (MSE); you should go for making this quantity in your validation set as small as possible, and remove completely the metrics=['accuracy'] argument from the compilation of your model.
Additionally, nowadays we practically never use tanh activation for the hidden layers; you should try relu instead.
You might first FFT the data, either with or without a window, and then use the FFT magnitude vectors as ML training data vectors.

Keras model accuracy not improving

I'm trying to train a neural network to predict the ratings for players in FIFA 18 by easports (ratings are between 64-99). I'm using their players database (https://easports.com/fifa/ultimate-team/api/fut/item?page=1) and I've processed the data into training_x, testing_x, training_y, testing_y. Each of the training samples is a numpy array containing 7 values...the first 6 are the different stats of the player (shooting, passing, dribbling, etc) and the last value is the position of the player (which I mapped between 1-8, depending on the position), and each of the testing values is a single integer between 64-99, representing the rating of that player.
I've tried many different hyperparameters, including changing the activation functions to tanh and relu, and I've tried adding a batch normalization layer after the first dense layer (I thought that it might be useful since one of my features is very small and the other features are between 50-99), I've played around with the SGD optimizer (changed the learning rate, momentum, even tried changing the optimizer to Adam), tried different loss functions, added/removed dropout layers, and tried different regularizers for the weights of the model.
model = Sequential()
model.add(Dense(64, input_shape=(7,),
kernel_regularizer=regularizers.l2(0.01)))
//batch normalization?
model.add(Activation('sigmoid'))
model.add(Dense(64, kernel_regularizer=regularizers.l2(0.01),
activation='sigmoid'))
model.add(Dropout(0.3))
model.add(Dense(32, kernel_regularizer=regularizers.l2(0.01),
activation='sigmoid'))
model.add(Dense(1, activation='linear'))
sgd = optimizers.SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='mean_absolute_error', metrics=['accuracy'],
optimizer=sgd)
model.fit(training_x, training_y, epochs=50, batch_size=128, shuffle=True)
When I train the model, the loss is always nan and the accuracy is always 0, even though I've tried adjusting a lot of different parameters. However, if I remove the last feature from my data, the position of the players, and update the input shape of the first dense layer, the model actually "trains" and ends up with around 6% accuracy no matter what parameters I change. In that case, I've found that the model only predicts 79 to be the player's rating. What am I doing inherently wrong?
You can try the following steps :
Use mean squared error loss function.
Use Adam which will help you converge faster with low learning rate like 0.0001 or 0.001. Otherwise, try using the RMSprop optimizer.
Use the default regularizers. That is none actually.
Since this is a regression task, use activation function like ReLU in all the layers except the output layer ( including the input layer ). Use linear activation in output layer.
As mentioned in the comments by #pooyan , normalize the features. See here. Even try standardizing the features. Use whichever suites the best.

Why is binary_crossentropy more accurate than categorical_crossentropy for multiclass classification in Keras?

I'm learning how to create convolutional neural networks using Keras. I'm trying to get a high accuracy for the MNIST dataset.
Apparently categorical_crossentropy is for more than 2 classes and binary_crossentropy is for 2 classes. Since there are 10 digits, I should be using categorical_crossentropy. However, after training and testing dozens of models, binary_crossentropy consistently outperforms categorical_crossentropy significantly.
On Kaggle, I got 99+% accuracy using binary_crossentropy and 10 epochs. Meanwhile, I can't get above 97% using categorical_crossentropy, even using 30 epochs (which isn't much, but I don't have a GPU, so training takes forever).
Here's what my model looks like now:
model = Sequential()
model.add(Convolution2D(100, 5, 5, border_mode='valid', input_shape=(28, 28, 1), init='glorot_uniform', activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Convolution2D(100, 3, 3, init='glorot_uniform', activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.3))
model.add(Flatten())
model.add(Dense(100, init='glorot_uniform', activation='relu'))
model.add(Dropout(0.3))
model.add(Dense(100, init='glorot_uniform', activation='relu'))
model.add(Dropout(0.3))
model.add(Dense(10, init='glorot_uniform', activation='softmax'))
model.compile(loss='binary_crossentropy', optimizer='adamax', metrics=['accuracy'])
Short answer: it is not.
To see that, simply try to calculate the accuracy "by hand", and you will see that it is different from the one reported by Keras with the model.evaluate method:
# Keras reported accuracy:
score = model.evaluate(x_test, y_test, verbose=0)
score[1]
# 0.99794011611938471
# Actual accuracy calculated manually:
import numpy as np
y_pred = model.predict(x_test)
acc = sum([np.argmax(y_test[i])==np.argmax(y_pred[i]) for i in range(10000)])/10000
acc
# 0.98999999999999999
The reason it seems to be so is a rather subtle issue at how Keras actually guesses which accuracy to use, depending on the loss function you have selected, when you include simply metrics=['accuracy'] in your model compilation.
If you check the source code, Keras does not define a single accuracy metric, but several different ones, among them binary_accuracy and categorical_accuracy. What happens under the hood is that, since you have selected binary cross entropy as your loss function and have not specified a particular accuracy metric, Keras (wrongly...) infers that you are interested in the binary_accuracy, and this is what it returns.
To avoid that, i.e. to use indeed binary cross entropy as your loss function (nothing wrong with this, in principle) while still getting the categorical accuracy required by the problem at hand (i.e. MNIST classification), you should ask explicitly for categorical_accuracy in the model compilation as follows:
from keras.metrics import categorical_accuracy
model.compile(loss='binary_crossentropy', optimizer='adamax', metrics=[categorical_accuracy])
And after training, scoring, and predicting the test set as I show above, the two metrics now are the same, as they should be:
sum([np.argmax(y_test[i])==np.argmax(y_pred[i]) for i in range(10000)])/10000 == score[1]
# True
(HT to this great answer to a similar problem, which helped me understand the issue...)
UPDATE: After my post, I discovered that this issue had already been identified in this answer.
First of all, binary_crossentropy is not when there are two classes.
The "binary" name is because it is adapted for binary output, and each number of the softmax is aimed at being 0 or 1.
Here, it checks for each number of the output.
It doesn't explain your result, since categorical_entropy exploits the fact that it is a classification problem.
Are you sure that when you read your data there is one and only one class per sample? It's the only one explanation I can give.

Resources