Class imbalance in CNN model for image classification - machine-learning

I have a dataset with 5 classes, and below there is the number of images in each class:
Class1: 6427 Images
Class2: 12678 Images
Class3: 9936 Images
Class 4: 26077 Images
Class 5: 1635 Images
I run my first CNN model on this dataset and my model was overfitted as can be seen below:
The overfitting is obvious in these models, also the high sensitivity on label4. I have tried different ways like Augmentation and etc to fix this and finally I fixed it by using VGG16 model. But I still have the class imbalance, but the strange thing is that the class imbalance now is shifted to label 1 as can be seen below.
In order to fix the class imbalance I choose equal number of each class for example I choose 1600 images from each class and I ran my model on that but again I have high sensitivity on label1. I do not know what is the problem. I would like to mention that when I add Dropout layer I receive the result below for the confusion matrix which is very bad:
I add my first model which I get overfitting and high sensitivity on label 4 below:
os.chdir('.')
train_path = './train'
test_path = './test'
valid_path = './val'
train_batches = ImageDataGenerator(preprocessing_function=tf.keras.applications.vgg16.preprocess_input) \
.flow_from_directory(directory=train_path, target_size=(224,224), classes=['label1', 'label2','label3','label4','label5'], batch_size=32)
test_batches = ImageDataGenerator(preprocessing_function=tf.keras.applications.vgg16.preprocess_input) \
.flow_from_directory(directory=test_path, target_size=(224,224), classes=['label1', 'label2','label3','label4','label5'], batch_size=32)
valid_batches = ImageDataGenerator(preprocessing_function=tf.keras.applications.vgg16.preprocess_input) \
.flow_from_directory(directory=valid_path, target_size=(224,224), classes=['label1', 'label2','label3','label4','label5'], batch_size=32)
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Conv2D(filters=32, kernel_size=(3,3), activation='relu', padding= 'same', input_shape=(224,224,3)))
model.add(tf.keras.layers.MaxPooling2D(pool_size=(2,2), strides=2))
model.add(tf.keras.layers.Conv2D(filters=64, kernel_size=(3,3),activation= 'relu', padding='same'))
model.add(tf.keras.layers.MaxPooling2D(pool_size=(2,2), strides=2))
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(5, activation='softmax'))
print(model.summary())
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.0001), loss='categorical_crossentropy', metrics=['accuracy'])
history = model.fit(x=train_batches, validation_data= valid_batches, epochs=10)
predictions = model.predict(x=test_batches, verbose=0)

Related

Combining Classification and Regression in a sequential way using MLP

I am looking for a way to do classification and regression sequentially?
For example, assuming samples have 3 input values and 1 output value. The model should first classify using the 3 input values and sequentially do the regression task using the classification output (i.e. classification has 3 input values from the original samples and regression has 4 input values (3 from the original samples + the classification output).
Below the architecture that I draw. However, not really sure about the part where the second input layer occurs. Could someone give advice or working examples for this application?
input1_classification = Input(shape=(3,))
hidden1 = Dense(20, activation='relu', kernel_initializer='he_normal'(input1_classification)
# classsfication
outputout_classification = Dense(2, activation='softmax')(hidden1)
# regression input
input1_regression =Input(shape=(5,))
hidden2 = Dense(10, activation='relu', kernel_initializer='he_normal'(out_classification)
out_reg_final = Dense(1)(hidden2)
# define model
model = Model(inputs=input1_classification, outputs=[out_classification, out_reg_final])
# compile the keras modelmodel.compile(loss['sparse_categorical_crossentropy','mse'], optimizer='adam')
# fit the keras model on the dataset
model.fit(X_train, [y_train_class,y_train_reg], epochs=150, batch_size=32, verbose=2)
All you need to do is to concatenate your original input with the output of classification and apply your regression model there, you do not specify "extra" inputs.
So it will become something among the lines of:
input1_classification = Input(shape=(3,))
# classsfication
hidden1 = Dense(20, activation='relu', kernel_initializer='he_normal'(input1_classification)
outputout_classification = Dense(2, activation='softmax')(hidden1)
# regression input
new_input = Concatenate(axis=1)([input1_classification, outputout_classification ])
hidden2 = Dense(10, activation='relu', kernel_initializer='he_normal'(new_input)
out_reg_final = Dense(1)(hidden2)
# define model
model = Model(inputs=input1_classification, outputs=[out_classification, out_reg_final])
# compile the keras modelmodel.compile(loss['sparse_categorical_crossentropy','mse'], optimizer='adam')
# fit the keras model on the dataset
model.fit(X_train, [y_train_class,y_train_reg], epochs=150, batch_size=32, verbose=2)

How to improve accuracy with keras multi class classification?

I am trying to do multi class classification with tf keras. I have total 20 labels and total data I have is 63952and I have tried the following code
features = features.astype(float)
labels = df_test["label"].values
encoder = LabelEncoder()
encoder.fit(labels)
encoded_Y = encoder.transform(labels)
dummy_y = np_utils.to_categorical(encoded_Y)
Then
def baseline_model():
model = Sequential()
model.add(Dense(50, input_dim=3, activation='relu'))
model.add(Dense(40, activation='softmax'))
model.add(Dense(30, activation='softmax'))
model.add(Dense(20, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
return model
finally
history = model.fit(data,dummy_y,
epochs=5000,
batch_size=50,
validation_split=0.3,
shuffle=True,
callbacks=[ch]).history
I have a very poor accuray with this. How can I improve that ?
softmax activations in the intermediate layers do not make any sense at all. Change all of them to relu and keep softmax only in the last layer.
Having done that, and should you still be getting unsatisfactory accuracy, experiment with different architectures (different numbers of layers and nodes) with a short number of epochs (say ~ 50), in order to get a feeling of how your model behaves, before going for a full fit with your 5,000 epochs.
You did not give us vital information, but here are some guidelines:
1. Reduce the number of Dense layer - you have a complicated layer with a small amount of data (63k is somewhat small). You might experience overfitting on your train data.
2. Did you check that the test has the same distribution as your train?
3. Avoid using softmax in middle Dense layers - softmax should be used in the final layer, use sigmoid or relu instead.
4. Plot a loss as a function of epoch curve and check if it is reduces - you can then understand if your learning rate is too high or too small.

How to avoid overfitting on a simple feed forward network

Using the pima indians diabetes dataset I'm trying to build an accurate model using Keras. I've written the following code:
# Visualize training history
from keras import callbacks
from keras.layers import Dropout
tb = callbacks.TensorBoard(log_dir='/.logs', histogram_freq=10, batch_size=32,
write_graph=True, write_grads=True, write_images=False,
embeddings_freq=0, embeddings_layer_names=None, embeddings_metadata=None)
# Visualize training history
from keras.models import Sequential
from keras.layers import Dense
import matplotlib.pyplot as plt
import numpy
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)
# load pima indians dataset
dataset = numpy.loadtxt("pima-indians-diabetes.csv", delimiter=",")
# split into input (X) and output (Y) variables
X = dataset[:, 0:8]
Y = dataset[:, 8]
# create model
model = Sequential()
model.add(Dense(12, input_dim=8, kernel_initializer='uniform', activation='relu', name='first_input'))
model.add(Dense(500, activation='tanh', name='first_hidden'))
model.add(Dropout(0.5, name='dropout_1'))
model.add(Dense(8, activation='relu', name='second_hidden'))
model.add(Dense(1, activation='sigmoid', name='output_layer'))
# Compile model
model.compile(loss='binary_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])
# Fit the model
history = model.fit(X, Y, validation_split=0.33, epochs=1000, batch_size=10, verbose=0, callbacks=[tb])
# list all data in history
print(history.history.keys())
# summarize history for accuracy
plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
# summarize history for loss
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
After several tries, I've added dropout layers in order to avoid overfitting, but with no luck. The following graph shows that the validation loss and training loss gets separate at one point.
What else could I do to optimize this network?
UPDATE:
based on the comments I got I've tweaked the code like so:
model = Sequential()
model.add(Dense(12, input_dim=8, kernel_initializer='uniform', kernel_regularizer=regularizers.l2(0.01),
activity_regularizer=regularizers.l1(0.01), activation='relu',
name='first_input')) # added regularizers
model.add(Dense(8, activation='relu', name='first_hidden')) # reduced to 8 neurons
model.add(Dropout(0.5, name='dropout_1'))
model.add(Dense(5, activation='relu', name='second_hidden'))
model.add(Dense(1, activation='sigmoid', name='output_layer'))
Here are the graphs for 500 epochs
The first example gave a validation accuracy > 75% and the second one gave an accuracy of < 65% and if you compare the losses for epochs below 100, its less than < 0.5 for the first one and the second one was > 0.6. But how is the second case better?.
The second one to me is a case of under-fitting: the model doesnt have enough capacity to learn. While the first case has a problem of over-fitting because its training was not stopped when overfitting started (early stopping). If the training was stopped at say 100 epoch, it would be a far better model compared between the two.
The goal should be to obtain small prediction error in unseen data and for that you increase the capacity of the network till a point beyond which overfitting starts to happen.
So how to avoid over-fitting in this particular case? Adopt early stopping.
CODE CHANGES: To include early stopping and input scaling.
# input scaling
scaler = StandardScaler()
X = scaler.fit_transform(X)
# Early stopping
early_stop = EarlyStopping(monitor='val_loss', min_delta=0, patience=3, verbose=1, mode='auto')
# create model - almost the same code
model = Sequential()
model.add(Dense(12, input_dim=8, activation='relu', name='first_input'))
model.add(Dense(500, activation='relu', name='first_hidden'))
model.add(Dropout(0.5, name='dropout_1'))
model.add(Dense(8, activation='relu', name='second_hidden'))
model.add(Dense(1, activation='sigmoid', name='output_layer')))
history = model.fit(X, Y, validation_split=0.33, epochs=1000, batch_size=10, verbose=0, callbacks=[tb, early_stop])
The Accuracy and loss graphs:
First, try adding some regularization (https://keras.io/regularizers/) like with this code:
model.add(Dense(12, input_dim=12,
kernel_regularizer=regularizers.l2(0.01),
activity_regularizer=regularizers.l1(0.01)))
Also, make sure to decrease your network size i.e. you don't need a hidden layer of 500 neurons - try just taking that out to decrease the representation power and maybe even another layer if it's still overfitting. Also, only use relu activation. Maybe also try increasing your dropout rate to something like 0.75 (although it's already high). You probably also don't need to run it for so many epochs - it will just begin to overfit after long enough.
For a dataset like the Diabetes one you can use a much simpler network. Try to reduce the neurons in your second layer. (Is there a specific reason why you chose tanh as the activation there?).
In addition you simply can add an EarlyStopping callback to your training: https://keras.io/callbacks/

How to calculate prediction uncertainty using Keras?

I would like to calculate NN model certainty/confidence (see What my deep model doesn't know) - when NN tells me an image represents "8", I would like to know how certain it is. Is my model 99% certain it is "8" or is it 51% it is "8", but it could also be "6"? Some digits are quite ambiguous and I would like to know for which images the model is just "flipping a coin".
I have found some theoretical writings about this but I have trouble putting this in code. If I understand correctly, I should evaluate a testing image multiple times while "killing off" different neurons (using dropout) and then...?
Working on MNIST dataset, I am running the following model:
from keras.models import Sequential
from keras.layers import Dense, Activation, Conv2D, Flatten, Dropout
model = Sequential()
model.add(Conv2D(128, kernel_size=(7, 7),
activation='relu',
input_shape=(28, 28, 1,)))
model.add(Dropout(0.20))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(Dropout(0.20))
model.add(Flatten())
model.add(Dense(units=64, activation='relu'))
model.add(Dropout(0.25))
model.add(Dense(units=10, activation='softmax'))
model.summary()
model.compile(loss='categorical_crossentropy',
optimizer='sgd',
metrics=['accuracy'])
model.fit(train_data, train_labels, batch_size=100, epochs=30, validation_data=(test_data, test_labels,))
How should I predict with this model so that I get its certainty about predictions too? I would appreciate some practical examples (preferably in Keras, but any will do).
To clarify, I am looking for an example of how to get certainty using the method outlined by Yurin Gal (or an explanation of why some other method yields better results).
If you want to implement dropout approach to measure uncertainty you should do the following:
Implement function which applies dropout also during the test time:
import keras.backend as K
f = K.function([model.layers[0].input, K.learning_phase()],
[model.layers[-1].output])
Use this function as uncertainty predictor e.g. in a following manner:
def predict_with_uncertainty(f, x, n_iter=10):
result = numpy.zeros((n_iter,) + x.shape)
for iter in range(n_iter):
result[iter] = f(x, 1)
prediction = result.mean(axis=0)
uncertainty = result.var(axis=0)
return prediction, uncertainty
Of course you may use any different function to compute uncertainty.
Made a few changes to the top voted answer. Now it works for me.
It's a way to estimate model uncertainty. For other source of uncertainty, I found https://eng.uber.com/neural-networks-uncertainty-estimation/ helpful.
f = K.function([model.layers[0].input, K.learning_phase()],
[model.layers[-1].output])
def predict_with_uncertainty(f, x, n_iter=10):
result = []
for i in range(n_iter):
result.append(f([x, 1]))
result = np.array(result)
prediction = result.mean(axis=0)
uncertainty = result.var(axis=0)
return prediction, uncertainty
Your model uses a softmax activation, so the simplest way to obtain some kind of uncertainty measure is to look at the output softmax probabilities:
probs = model.predict(some input data)[0]
The probs array will then be a 10-element vector of numbers in the [0, 1] range that sum to 1.0, so they can be interpreted as probabilities. For example the probability for digit 7 is just probs[7].
Then with this information you can do some post-processing, typically the predicted class is the one with highest probability, but you can also look at the class with second highest probability, etc.
A simpler way is to set training=True on any dropout layers you want to run during inference as well (essentially tells the layer to operate as if it's always in training mode - so it is always present for both training and inference).
import keras
inputs = keras.Input(shape=(10,))
x = keras.layers.Dense(3)(inputs)
outputs = keras.layers.Dropout(0.5)(x, training=True)
model = keras.Model(inputs, outputs)
Code above is from this issue.

Text classification using Keras: How to add custom features?

I'm writing a program to classify texts into a few classes. Right now, the program loads the train and test samples of word indices, applies an embedding layer and a convolutional layer, and classifies them into the classes. I'm trying to add handcrafted features for experimentation, as in the following code. The features is a list of two elements, where the first element consists of features for the training data, and the second consists of features for the test data. Each training/test sample will have a corresponding feature vector (i.e. the features are not word features).
model = Sequential()
model.add(Embedding(params.nb_words,
params.embedding_dims,
weights=[embedding_matrix],
input_length=params.maxlen,
trainable=params.trainable))
model.add(Convolution1D(nb_filter=params.nb_filter,
filter_length=params.filter_length,
border_mode='valid',
activation='relu'))
model.add(Dropout(params.dropout_rate))
model.add(GlobalMaxPooling1D())
# Adding hand-picked features
model_features = Sequential()
nb_features = len(features[0][0])
model_features.add(Dense(1,
input_shape=(nb_features,),
init='uniform',
activation='relu'))
model_final = Sequential()
model_final.add(Merge([model, model_features], mode='concat'))
model_final.add(Dense(len(citfunc.funcs), activation='softmax'))
model_final.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
print model_final.summary()
model_final.fit([x_train, features[0]], y_train,
nb_epoch=params.nb_epoch,
batch_size=params.batch_size,
class_weight=data.get_class_weights(x_train, y_train))
y_pred = model_final.predict([x_test, features[1]])
My question is, is this code correct? Is there any conventional way of adding features to each of the text sequences?
Try:
input = Input(shape=(params.maxlen,))
embedding = Embedding(params.nb_words,
params.embedding_dims,
weights=[embedding_matrix],
input_length=params.maxlen,
trainable=params.trainable)(input)
conv = Convolution1D(nb_filter=params.nb_filter,
filter_length=params.filter_length,
border_mode='valid',
activation='relu')(embedding)
drop = Dropout(params.dropout_rate)(conv)
seq_features = GlobalMaxPooling1D()(drop)
# Adding hand-picked features
nb_features = len(features[0][0])
other_features = Input(shape=(nb_features,))
model_final = merge([seq_features , other_features], mode='concat'))
model_final = Dense(len(citfunc.funcs), activation='softmax'))(model_final)
model_final = Model([input, other_features], model_final)
model_final.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
In this case - you are merging features from a sequence analysis with custom features directly - without squashing all custom features to 1 features using Dense.

Resources