I'm trying to design an LSTM network using Keras to combine word embeddings and other features in a binary classification setting. My test set contains 250 samples per class.
When I run my model using only the word embedding layers (the "model" layer in the code), I get an average F1 of around 0.67. When I create a new branch with the other features of fixed size that I compute separately ("branch2") and merge these with the word embeddings using "concat", the predictions all revert to a single class (giving perfect recall for that class), and average F1 drops to 0.33.
Am I adding in the features and training/testing incorrectly?
def create_model(embedding_index, sequence_features, optimizer='rmsprop'):
# Branch 1: word embeddings
model = Sequential()
embedding_layer = create_embedding_matrix(embedding_index, word_index)
model.add(embedding_layer)
model.add(Convolution1D(nb_filter=32, filter_length=3, border_mode='same', activation='tanh'))
model.add(MaxPooling1D(pool_length=2))
model.add(Bidirectional(LSTM(100)))
model.add(Dropout(0.2))
model.add(Dense(2, activation='sigmoid'))
# Branch 2: other features
branch2 = Sequential()
dim = sequence_features.shape[1]
branch2.add(Dense(15, input_dim=dim, init='normal', activation='tanh'))
branch2.add(BatchNormalization())
# Merging branches to create final model
final_model = Sequential()
final_model.add(Merge([model,branch2], mode='concat'))
final_model.add(Dense(2, init='normal', activation='sigmoid'))
final_model.compile(loss='categorical_crossentropy', optimizer=optimizer,
metrics=['accuracy','precision','recall','fbeta_score','fmeasure'])
return final_model
def run(input_train, input_dev, input_test, text_col, label_col, resfile, embedding_index):
# Processing text and features
data_train, labels_train, data_test, labels_test = vectorize_text(input_train, input_test, text_col,label_col)
x_train, y_train = data_train, labels_train
x_test, y_test = data_test, labels_test
seq_train = get_sequence_features(input_train).as_matrix()
seq_test = get_sequence_features(input_test).as_matrix()
# Generating model
filepath = lstm_config.WEIGHTS_PATH
checkpoint = ModelCheckpoint(filepath, monitor='val_fmeasure', verbose=1, save_best_only=True, mode='max')
callbacks_list = [checkpoint]
model = create_model(embedding_index, seq_train)
model.fit([x_train, seq_train], y_train, validation_split=0.33, nb_epoch=3, batch_size=100, callbacks=callbacks_list, verbose=1)
# Evaluating
scores = model.evaluate([x_test, seq_test], y_test, verbose=1)
time.sleep(0.2)
preds = model.predict_classes([x_test, seq_test])
preds = to_categorical(preds)
print(metrics.f1_score(y_true=y_test, y_pred=preds, average="micro"))
print(metrics.f1_score(y_true=y_test, y_pred=preds, average="macro"))
print(metrics.classification_report(y_test, preds))
Output:
Using Theano backend. Found 2999999 word vectors.
Processing text dataset Found 7165 unique tokens.
Shape of data tensor: (1996, 50)
Shape of label tensor: (1996, 2)
1996 train 500 test
Train on 1337 samples, validate on 659 samples
Epoch 1/3 1300/1337
[============================>.] - ETA: 0s - loss: 0.6767 - acc:
0.6669 - precision: 0.5557 - recall: 0.6815 - fbeta_score: 0.6120 - fmeasure: 0.6120Epoch 00000: val_fmeasure im1337/1337
[==============================] - 10s - loss: 0.6772 - acc: 0.6672 -
precision: 0.5551 - recall: 0.6806 - fbeta_score: 0.6113 - fmeasure:
0.6113 - val_loss: 0.7442 - val_acc: 0 .0000e+00 - val_precision: 0.0000e+00 - val_recall: 0.0000e+00 - val_fbeta_score: 0.0000e+00 - val_fmeasure: 0.0000e+00
Epoch 2/3 1300/1337
[============================>.] - ETA: 0s - loss: 0.6634 - acc:
0.7269 - precision: 0.5819 - recall: 0.7292 - fbeta_score: 0.6462 - fmeasure: 0.6462Epoch 00001: val_fmeasure di1337/1337
[==============================] - 9s - loss: 0.6634 - acc: 0.7263 -
precision: 0.5830 - recall: 0.7300 - fbeta_score: 0.6472 - fmeasure:
0.6472 - val_loss: 0.7616 - val_acc: 0. 0000e+00 - val_precision: 0.0000e+00 - val_recall: 0.0000e+00 - val_fbeta_score: 0.0000e+00 - val_fmeasure: 0.0000e+00
Epoch 3/3 1300/1337
[============================>.] - ETA: 0s - loss: 0.6542 - acc:
0.7354 - precision: 0.5879 - recall: 0.7308 - fbeta_score: 0.6508 - fmeasure: 0.6508Epoch 00002: val_fmeasure di1337/1337
[==============================] - 8s - loss: 0.6545 - acc: 0.7337 -
precision: 0.5866 - recall: 0.7307 - fbeta_score: 0.6500 - fmeasure:
0.6500 - val_loss: 0.7801 - val_acc: 0. 0000e+00 - val_precision: 0.0000e+00 - val_recall: 0.0000e+00 - val_fbeta_score: 0.0000e+00 - val_fmeasure: 0.0000e+00 500/500 [==============================] - 0s
500/500 [==============================] - 1s
0.5 /usr/local/lib/python3.4/dist-packages/sklearn/metrics/classification.py:1074:
UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in
labels with no predicted samples. 'precision', 'predicted', average,
warn_for)
0.333333333333 /usr/local/lib/python3.4/dist-packages/sklearn/metrics/classification.py:1074:
UndefinedMetricWarning: Precision and F-score are ill-defined and
being set to 0.0 in labels with no predicted samples.
precision recall f1-score support
0 0.00 0.00 0.00 250
1 0.50 1.00 0.67 250
avg / total 0.25 0.50 0.33 500
Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed last year.
Improve this question
I'm building a neural network to classify doublets of 100*80 images into two classes.
My accuracy is capped at around 88% no matter what I try to do (add convolutional layers, dropouts...).
I've investigated the issue and found from the confusion matrix that my model is only making true negative and false positive predictions. I have no idea how this is possible and was wondering if anyone could help me.
Here is some of the code (I've used a really simple model architecture here):
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = 0.2, shuffle = True)
model = keras.models.Sequential()
model.add(keras.layers.Flatten(input_shape = (100,80,2)))
model.add(keras.layers.Dense(5, activation = 'relu'))
model.add(keras.layers.Dense(1, activation = 'sigmoid'))
model.compile(optimizer='adam',
loss=tf.keras.losses.BinaryCrossentropy(from_logits=False),
metrics=['accuracy'])
model.fit(X_train, y_train, epochs =10, batch_size= 200, validation_data = (X_test, y_test))
Output for training:
Epoch 1/10
167/167 [==============================] - 6s 31ms/step - loss: 0.6633 - accuracy: 0.8707 - val_loss: 0.6345 - val_accuracy: 0.8813
Epoch 2/10
167/167 [==============================] - 2s 13ms/step - loss: 0.6087 - accuracy: 0.8827 - val_loss: 0.5848 - val_accuracy: 0.8813
Epoch 3/10
167/167 [==============================] - 2s 13ms/step - loss: 0.5630 - accuracy: 0.8828 - val_loss: 0.5435 - val_accuracy: 0.8813
Epoch 4/10
167/167 [==============================] - 2s 13ms/step - loss: 0.5249 - accuracy: 0.8828 - val_loss: 0.5090 - val_accuracy: 0.8813
Epoch 5/10
167/167 [==============================] - 2s 12ms/step - loss: 0.4931 - accuracy: 0.8828 - val_loss: 0.4805 - val_accuracy: 0.8813
Epoch 6/10
167/167 [==============================] - 2s 13ms/step - loss: 0.4663 - accuracy: 0.8828 - val_loss: 0.4567 - val_accuracy: 0.8813
Epoch 7/10
167/167 [==============================] - 2s 14ms/step - loss: 0.4424 - accuracy: 0.8832 - val_loss: 0.4363 - val_accuracy: 0.8813
Epoch 8/10
167/167 [==============================] - 3s 17ms/step - loss: 0.4198 - accuracy: 0.8848 - val_loss: 0.4190 - val_accuracy: 0.8816
Epoch 9/10
167/167 [==============================] - 2s 15ms/step - loss: 0.3982 - accuracy: 0.8887 - val_loss: 0.4040 - val_accuracy: 0.8816
Epoch 10/10
167/167 [==============================] - 3s 15ms/step - loss: 0.3784 - accuracy: 0.8942 - val_loss: 0.3911 - val_accuracy: 0.8821
Out[85]:
<keras.callbacks.History at 0x7fe3ce8dedd0>
loss, accuracies = model1.evaluate(X_test, y_test)
261/261 [==============================] - 1s 2ms/step - loss: 0.3263 - accuracy: 0.8813
y_pred = model1.predict(X_test)
y_pred = (y_pred > 0.5)
confusion_matrix((y_test > 0.5), y_pred )
array([[ 0, 990],
[ 0, 7353]])
First, check how imbalance is your data.
If for example your dataset contain 10 samples, which 9 is class A and 1 is of class B. So your model likely would want to maximize its acciracy by simply always tell you the class is A - it would still get 90% accuracy.
When you actually wish to punish him alot on the unreprented class - i.e. class B.
So if indeed your data is inbalanced you can change try to change the metric from [accuracy] to ['matthews_correlation']
e.g.
model.compile(optimizer='adam',
loss=tf.keras.losses.BinaryCrossentropy(from_logits=False),
metrics=['matthews_correlation'])
Which will do what I have explained in the beginning,over punish the mistakes in the unrepresented class .
I am training a multitarget classification model with keras. My architecture is:
from tensorflow.keras.layers import Input, Dense
from tensorflow.keras.models import Model
imput_ = Input(shape=(X_train.shape[1]))
x = Dense(50, activation="relu")(imput_)
x = Dense(n_targets, activation="sigmoid", name="output")(x)
model = Model(imput_, x)
model.compile(loss="binary_crossentropy", optimizer="adam", metrics=["accuracy"])
Then I fit my model like this:
model.fit(X_train, y_train.toarray(), validation_data=(X_test, y_test.toarray()), epochs=5)
The fitting loss shows this:
Epoch 1/5
36/36 [==============================] - 1s 10ms/step - loss: 0.5161 - accuracy: 0.0614 - val_loss: 0.3365 - val_accuracy: 0.1434
Epoch 2/5
36/36 [==============================] - 0s 6ms/step - loss: 0.2761 - accuracy: 0.2930 - val_loss: 0.2429 - val_accuracy: 0.4560
Epoch 3/5
36/36 [==============================] - 0s 5ms/step - loss: 0.2255 - accuracy: 0.4435 - val_loss: 0.2187 - val_accuracy: 0.5130
Epoch 4/5
36/36 [==============================] - 0s 5ms/step - loss: 0.2037 - accuracy: 0.4800 - val_loss: 0.2040 - val_accuracy: 0.5199
Epoch 5/5
36/36 [==============================] - 0s 5ms/step - loss: 0.1876 - accuracy: 0.4996 - val_loss: 0.1929 - val_accuracy: 0.5250
<keras.callbacks.History at 0x7fe0a549ee10>
But then if I run:
from sklearn.metrics import accuracy_score
accuracy_score(np.round(model.predict(X_test)), y_test.toarray())
I got the following score:
0.07772020725388601
Shouldn't the score be equal to the val accuracy score in the last epoch?
With that loss and activation function, your top probability might not be higher than 0.5, and it would become 0 when you use np.round.
Try:
y_pred = np.argmax(model.predict(X_test), axis=1)
accuracy_score(y_test, y_pred)
I am using LTSM Deep-learning technique to classify my text, First i am dividing them into text and lables using panda library and making their tokens and then dividing them into into training and text data sets,whenever i runs the code, i get different results which varies from (80 to 100)percent.
Here is my code,
tokenizer = Tokenizer(num_words=MAX_NB_WORDS, filters='!"#$%&()*+,-./:;<=>?#[\]^_`{|}~',
lower=True)
tokenizer.fit_on_texts(trainDF['texts'])
word_index = tokenizer.word_index
print('Found %s unique tokens.' % len(word_index))
X = tokenizer.texts_to_sequences(trainDF['texts'])
X = pad_sequences(X, maxlen=MAX_SEQUENCE_LENGTH)
print('Shape of data tensor:', X.shape)
Y = pd.get_dummies(trainDF['label'])
print('Shape of label tensor:', Y.shape)
X_train, X_test, Y_train, Y_test = train_test_split(X,Y, test_size = 0.10, random_state = 42)
print(X_train.shape,Y_train.shape)
print(X_test.shape,Y_test.shape)
model = Sequential()
model.add(Embedding(MAX_NB_WORDS, EMBEDDING_DIM, input_length=X.shape[1]))
model.add(SpatialDropout1D(0.2))
model.add(LSTM(100, dropout=0.2, recurrent_dropout=0.2))
variables_for_classification=6 #change it as per your number of categories
model.add(Dense(variables_for_classification, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
print(model.summary())
epochs = 5
batch_size = 64
history = model.fit(X_train, Y_train, epochs=epochs,
batch_size=batch_size,validation_split=0.1,callbacks=[EarlyStopping(monitor='val_loss', patience=3,
min_delta=0.0001)])
accr = model.evaluate(X_test,Y_test)
print('Test set\n Loss: {:0.3f}\n Accuracy: {:0.3f}'.format(accr[0],accr[1]))
Train on 794 samples, validate on 89 samples
Epoch 1/5
794/794 [==============================] - 19s 24ms/step - loss: 1.6401 - accuracy: 0.6297 - val_loss: 0.9098 - val_accuracy: 0.5843
Epoch 2/5
794/794 [==============================] - 16s 20ms/step - loss: 0.8365 - accuracy: 0.7166 - val_loss: 0.7487 - val_accuracy: 0.7753
Epoch 3/5
794/794 [==============================] - 16s 20ms/step - loss: 0.7093 - accuracy: 0.8401 - val_loss: 0.6519 - val_accuracy: 0.8652
Epoch 4/5
794/794 [==============================] - 16s 20ms/step - loss: 0.5857 - accuracy: 0.8829 - val_loss: 0.4935 - val_accuracy: 1.0000
Epoch 5/5
794/794 [==============================] - 16s 20ms/step - loss: 0.4248 - accuracy: 0.9345 - val_loss: 0.3512 - val_accuracy: 0.8652
99/99 [==============================] - 0s 2ms/step
Test set
Loss: 0.348
Accuracy: 0.869
in the last run accuracy was 100 percent.
From the code below, it looks like evaluating the roc with keras and with scikit actually makes a difference. Does anybody know an explanation?
import tensorflow as tf
from keras.layers import Dense, Input, Dropout
from keras import Sequential
import keras
from keras.constraints import maxnorm
from sklearn.metrics import roc_auc_score
# training data: X_train, y_train
# validation data: X_valid, y_valid
# Define the custom callback we will be using to evaluate roc with scikit
class MyCustomCallback(tf.keras.callbacks.Callback):
def on_epoch_end(self,epoch, logs=None):
y_pred = model.predict(X_valid)
print("roc evaluated with scikit = ",roc_auc_score(y_valid, y_pred))
return
# Define the model.
def model():
METRICS = [
tf.keras.metrics.BinaryAccuracy(name='accuracy'),
tf.keras.metrics.AUC(name='auc'),
]
optimizer="adam"
dropout=0.1
init='uniform'
nbr_features= vocab_size-1 #2500
dense_nparams=256
model = Sequential()
model.add(Dense(dense_nparams, activation='relu', input_shape=(nbr_features,), kernel_initializer=init, kernel_constraint=maxnorm(3)))
model.add(Dropout(dropout))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer=optimizer,metrics = METRICS)
return model
# instantiate the model
model = model()
# fit the model
history = model.fit(x=X_train, y=y_train, batch_size = 8, epochs = 8, verbose=1,validation_data = (X_valid,y_valid), callbacks=[MyCustomCallback()], shuffle=True, validation_freq=1, max_queue_size=10, workers=4, use_multiprocessing=True)
Output:
Train on 4000 samples, validate on 1000 samples
Epoch 1/8
4000/4000 [==============================] - 15s 4ms/step - loss: 0.7950 - accuracy: 0.7149 - auc: 0.7213 - val_loss: 0.7551 - val_accuracy: 0.7608 - val_auc: 0.7770
roc evaluated with scikit = 0.78766515781747
Epoch 2/8
4000/4000 [==============================] - 15s 4ms/step - loss: 0.0771 - accuracy: 0.8235 - auc: 0.8571 - val_loss: 1.0803 - val_accuracy: 0.8574 - val_auc: 0.8954
roc evaluated with scikit = 0.7795984218252997
Epoch 3/8
4000/4000 [==============================] - 14s 4ms/step - loss: 0.0085 - accuracy: 0.8762 - auc: 0.9162 - val_loss: 1.2084 - val_accuracy: 0.8894 - val_auc: 0.9284
roc evaluated with scikit = 0.7705172905961992
Epoch 4/8
4000/4000 [==============================] - 14s 4ms/step - loss: 0.0025 - accuracy: 0.8982 - auc: 0.9361 - val_loss: 1.1700 - val_accuracy: 0.9054 - val_auc: 0.9424
roc evaluated with scikit = 0.7808804338960933
Epoch 5/8
4000/4000 [==============================] - 14s 4ms/step - loss: 0.0020 - accuracy: 0.9107 - auc: 0.9469 - val_loss: 1.1887 - val_accuracy: 0.9150 - val_auc: 0.9501
roc evaluated with scikit = 0.7811174659489438
Epoch 6/8
4000/4000 [==============================] - 14s 4ms/step - loss: 0.0018 - accuracy: 0.9184 - auc: 0.9529 - val_loss: 1.2036 - val_accuracy: 0.9213 - val_auc: 0.9548
roc evaluated with scikit = 0.7822898825544409
Epoch 7/8
4000/4000 [==============================] - 14s 4ms/step - loss: 0.0017 - accuracy: 0.9238 - auc: 0.9566 - val_loss: 1.2231 - val_accuracy: 0.9258 - val_auc: 0.9579
roc evaluated with scikit = 0.7817036742516923
Epoch 8/8
4000/4000 [==============================] - 14s 4ms/step - loss: 0.0016 - accuracy: 0.9278 - auc: 0.9592 - val_loss: 1.2426 - val_accuracy: 0.9293 - val_auc: 0.9600
roc evaluated with scikit = 0.7817419052279585
As you may see, from epoch 2 onwards keras' and scikit's validation ROCs begin diverging. The same happens if I fit the model and then use keras' model.evaluate(X_valid, y_valid). Any help is greatly appreciated.
EDIT: testing the model on a separate test set, I get roc =0.76 so scikit seems to give the correct answer ( btw X_train has 4000 entries, X_valid has 1000 and test has 15000, quite an unconventional splitting but it is forced by external factors).
Also, suggestions on how to improve performance are equally appreciated.
EDIT2: To answer the reply by #arpitrathi, i modified the callbak but unfortunately without success:
class MyCustomCallback(tf.keras.callbacks.Callback):
def on_epoch_end(self,epoch, logs=None):
y_pred = model.predict_proba(X_valid)
print("roc evaluated with scikit = ",roc_auc_score(y_valid, y_pred))
return
model = model()
history = model.fit(x=X_trainl, y=y_train, batch_size = 8, epochs = 3, verbose=1,validation_data = (X_valid,y_valid), callbacks=[MyCustomCallback()], shuffle=True, validation_freq=1, max_queue_size=10, workers=4, use_multiprocessing=True)
Train on 4000 samples, validate on 1000 samples
Epoch 1/3
4000/4000 [==============================] - 20s 5ms/step - loss: 0.8266 - accuracy: 0.7261 - auc: 0.7409 - val_loss: 0.7547 - val_accuracy: 0.7627 - val_auc: 0.7881
roc evaluated with scikit = 0.7921764130168828
Epoch 2/3
4000/4000 [==============================] - 15s 4ms/step - loss: 0.0482 - accuracy: 0.8270 - auc: 0.8657 - val_loss: 1.0831 - val_accuracy: 0.8620 - val_auc: 0.9054
roc evaluated with scikit = 0.78525915504445
Epoch 3/3
4000/4000 [==============================] - 15s 4ms/step - loss: 0.0092 - accuracy: 0.8794 - auc: 0.9224 - val_loss: 1.2226 - val_accuracy: 0.8928 - val_auc: 0.9340
roc evaluated with scikit = 0.7705555215724655
Also, if I plot training and validation accuracy, i see that they both rapidly converge to 1. Is it strange?
The problem lies in the arguments that you passed to the sklearn function for roc_auc_score() calculation. You should use model.predict_proba() instead of model.predict().
def on_epoch_end(self,epoch, logs=None):
y_pred = model.predict_proba(X_valid)
print("roc evaluated with scikit = ",roc_auc_score(y_valid, y_pred))
return
Sklearn and keras use different default parameters when computing AUC. Increasing the number of thresholds keras uses to compute AUC (i.e., increasing num_thresholds) can help the keras AUC better match the sklearn AUC.
I am trying to train a model to solve multi-class classification problem.
I've got a problem that is training accuracy and validation accuracy doesn't change over all epochs. Like this:
Train on 4642 samples, validate on 516 samples
Epoch 1/100
- 1s - loss: 1.7986 - acc: 0.4649 - val_loss: 1.7664 - val_acc: 0.4942
Epoch 2/100
- 1s - loss: 1.6998 - acc: 0.5017 - val_loss: 1.7035 - val_acc: 0.4942
Epoch 3/100
- 1s - loss: 1.6956 - acc: 0.5022 - val_loss: 1.7000 - val_acc: 0.4942
Epoch 4/100
- 1s - loss: 1.6900 - acc: 0.5022 - val_loss: 1.6954 - val_acc: 0.4942
Epoch 5/100
- 1s - loss: 1.6931 - acc: 0.5017 - val_loss: 1.7058 - val_acc: 0.4942
...
Epoch 98/100
- 1s - loss: 1.6842 - acc: 0.5022 - val_loss: 1.6995 - val_acc: 0.4942
Epoch 99/100
- 1s - loss: 1.6844 - acc: 0.5022 - val_loss: 1.6977 - val_acc: 0.4942
Epoch 100/100
- 1s - loss: 1.6838 - acc: 0.5022 - val_loss: 1.6934 - val_acc: 0.4942
My code with keras:
y_train = to_categorical(y_train, num_classes=11)
X_train, X_test, Y_train, Y_test = train_test_split(x_train, y_train,
test_size=0.1, random_state=42)
model = Sequential()
model.add(Dense(64, init='normal', activation='relu', input_dim=160))
model.add(Dropout(0.3))
model.add(Dense(32, init='normal', activation='relu'))
model.add(BatchNormalization())
model.add(Dense(11, init='normal', activation='softmax'))
model.summary()
print("[INFO] compiling model...")
model.compile(optimizer=keras.optimizers.Adam(lr=0.01, beta_1=0.9,
beta_2=0.999, epsilon=None, decay=0.0, amsgrad=False),
loss='categorical_crossentropy',
metrics=['accuracy'])
print("[INFO] training network...")
model.fit(X_train, Y_train, epochs=100, batch_size=32, verbose=2, validation_data = (X_test, Y_test))
Please help me. Thank you!
I had a similar problem once. For me it turned out that making sure I didnt have too many missing values in x_train (having to fill with value representing unknown or filling with median value), dropping columns that really didnt help (all had same value), and normalizing the x_train data helped.
Example from my data/model,
# load data
x_main = pd.read_csv("glioma DB X.csv")
y_main = pd.read_csv("glioma DB Y.csv")
# fill with median (will have to improve later, not done yet)
fill_median =['Surgery_SBRT','df','Dose','Ki67','KPS','BMI','tumor_size']
x_main[fill_median] = x_main[fill_median].fillna(x_main[fill_median].median())
x_main['Neurofc'] = x_main['Neurofc'].fillna(2)
x_main['comorbid'] = x_main['comorbid'].fillna(int(x_main['comorbid'].median()))
# drop surgery
x_main = x_main.drop(['Surgery'], axis=1)
# normalize all x
x_main_normalized = x_main.apply(lambda x: (x-np.mean(x))/(np.std(x)+1e-10))