How to avoid overfitting on a simple feed forward network - machine-learning

Using the pima indians diabetes dataset I'm trying to build an accurate model using Keras. I've written the following code:
# Visualize training history
from keras import callbacks
from keras.layers import Dropout
tb = callbacks.TensorBoard(log_dir='/.logs', histogram_freq=10, batch_size=32,
write_graph=True, write_grads=True, write_images=False,
embeddings_freq=0, embeddings_layer_names=None, embeddings_metadata=None)
# Visualize training history
from keras.models import Sequential
from keras.layers import Dense
import matplotlib.pyplot as plt
import numpy
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)
# load pima indians dataset
dataset = numpy.loadtxt("pima-indians-diabetes.csv", delimiter=",")
# split into input (X) and output (Y) variables
X = dataset[:, 0:8]
Y = dataset[:, 8]
# create model
model = Sequential()
model.add(Dense(12, input_dim=8, kernel_initializer='uniform', activation='relu', name='first_input'))
model.add(Dense(500, activation='tanh', name='first_hidden'))
model.add(Dropout(0.5, name='dropout_1'))
model.add(Dense(8, activation='relu', name='second_hidden'))
model.add(Dense(1, activation='sigmoid', name='output_layer'))
# Compile model
model.compile(loss='binary_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])
# Fit the model
history = model.fit(X, Y, validation_split=0.33, epochs=1000, batch_size=10, verbose=0, callbacks=[tb])
# list all data in history
print(history.history.keys())
# summarize history for accuracy
plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
# summarize history for loss
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
After several tries, I've added dropout layers in order to avoid overfitting, but with no luck. The following graph shows that the validation loss and training loss gets separate at one point.
What else could I do to optimize this network?
UPDATE:
based on the comments I got I've tweaked the code like so:
model = Sequential()
model.add(Dense(12, input_dim=8, kernel_initializer='uniform', kernel_regularizer=regularizers.l2(0.01),
activity_regularizer=regularizers.l1(0.01), activation='relu',
name='first_input')) # added regularizers
model.add(Dense(8, activation='relu', name='first_hidden')) # reduced to 8 neurons
model.add(Dropout(0.5, name='dropout_1'))
model.add(Dense(5, activation='relu', name='second_hidden'))
model.add(Dense(1, activation='sigmoid', name='output_layer'))
Here are the graphs for 500 epochs

The first example gave a validation accuracy > 75% and the second one gave an accuracy of < 65% and if you compare the losses for epochs below 100, its less than < 0.5 for the first one and the second one was > 0.6. But how is the second case better?.
The second one to me is a case of under-fitting: the model doesnt have enough capacity to learn. While the first case has a problem of over-fitting because its training was not stopped when overfitting started (early stopping). If the training was stopped at say 100 epoch, it would be a far better model compared between the two.
The goal should be to obtain small prediction error in unseen data and for that you increase the capacity of the network till a point beyond which overfitting starts to happen.
So how to avoid over-fitting in this particular case? Adopt early stopping.
CODE CHANGES: To include early stopping and input scaling.
# input scaling
scaler = StandardScaler()
X = scaler.fit_transform(X)
# Early stopping
early_stop = EarlyStopping(monitor='val_loss', min_delta=0, patience=3, verbose=1, mode='auto')
# create model - almost the same code
model = Sequential()
model.add(Dense(12, input_dim=8, activation='relu', name='first_input'))
model.add(Dense(500, activation='relu', name='first_hidden'))
model.add(Dropout(0.5, name='dropout_1'))
model.add(Dense(8, activation='relu', name='second_hidden'))
model.add(Dense(1, activation='sigmoid', name='output_layer')))
history = model.fit(X, Y, validation_split=0.33, epochs=1000, batch_size=10, verbose=0, callbacks=[tb, early_stop])
The Accuracy and loss graphs:

First, try adding some regularization (https://keras.io/regularizers/) like with this code:
model.add(Dense(12, input_dim=12,
kernel_regularizer=regularizers.l2(0.01),
activity_regularizer=regularizers.l1(0.01)))
Also, make sure to decrease your network size i.e. you don't need a hidden layer of 500 neurons - try just taking that out to decrease the representation power and maybe even another layer if it's still overfitting. Also, only use relu activation. Maybe also try increasing your dropout rate to something like 0.75 (although it's already high). You probably also don't need to run it for so many epochs - it will just begin to overfit after long enough.

For a dataset like the Diabetes one you can use a much simpler network. Try to reduce the neurons in your second layer. (Is there a specific reason why you chose tanh as the activation there?).
In addition you simply can add an EarlyStopping callback to your training: https://keras.io/callbacks/

Related

Different results from binary and categorical crossentropy

I made an experiment between the usage of binary_crossentropy and categorical_crossentropy. I try to understand the behavior of these two loss functions on same problem.
I worked on binary classification problem with this data.
In the first experiment, I used 1 neuron in the last layer with sigmoid activation function and binary_crossentropy. I trained this model 10 times and take the average accuracy. The average accuracy is 74.12760416666666.
The code that I used for first experiment is below.
total_acc = 0
for each_iter in range(0, 10):
print each_iter
X = dataset[:,0:8]
y = dataset[:,8]
# define the keras model
model = Sequential()
model.add(Dense(12, input_dim=8, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# compile the keras model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# fit the keras model on the dataset
model.fit(X, y, epochs=150, batch_size=32)
# evaluate the keras model
_, accuracy = model.evaluate(X, y)
print('Accuracy: %.2f' % (accuracy*100))
temp_acc = accuracy*100
total_acc += temp_acc
del model
In the second experiment, I used 2 neurons in the last layer with softmax activation function and categorical_crossentropy. I converted my target `y, into categorical and again I trained this model 10 times and take the average accuracy. The average accuracy is 66.92708333333334.
The code that I used for the second setting is in below:
total_acc_v2 = 0
for each_iter in range(0, 10):
print each_iter
X = dataset[:,0:8]
y = dataset[:,8]
y = np_utils.to_categorical(y)
# define the keras model
model = Sequential()
model.add(Dense(12, input_dim=8, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(2, activation='softmax'))
# compile the keras model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
# fit the keras model on the dataset
model.fit(X, y, epochs=150, batch_size=32)
# evaluate the keras model
_, accuracy = model.evaluate(X, y)
print('Accuracy: %.2f' % (accuracy*100))
temp_acc = accuracy*100
total_acc_v2 += temp_acc
del model
I think that these two experiments are identical and should give very similar results. What is the reason of this huge difference between accuracy?
Seems like the reason of such behaviour is randomness. I've ran your code and got around 74 average accuracy for the sigmoid model and around 74 for the softmax model.

Validation accuracy and validation loss almost remains constant in every epoch

I am making an autonomous farming robot for my final year project. I want to move it autonomously in lanes in side the farms. I am just using the raspberry pi image in front of my vehicle. I collect my data through pi and then send it to my computer for training.
Initially i have just trained it for moving in a straight line. As i have not used encoders in my motors so there is a possibility of its being diverging along one direction , so i have to constantly give it the feedback to stay on the right path.
Sample image is as follows, Note this is black and white image :enter image description here
I have 836 images for training and 356 for validation. When i am trying to train it, my model accuracy doesnot improves much. I have tried changing different structures, from fully connected layers to different convolutional layers, my training accuracy doesnot improves much and perhaps most of the times validation accuracy and validation loss remains same.
I am confused that why is this so, is this to do with my code or should i apply computer vision techniques on the image so that features are more prominently visible. What should be the best approach to tackle this problem.
My code is as follows:
import numpy
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import Flatten
from keras.layers.convolutional import Conv2D
from keras.layers.convolutional import MaxPooling2D
# fix dimension ordering issue
from keras import backend as K
import numpy as np
import glob
import pandas as pd
from sklearn.model_selection import train_test_split
K.set_image_dim_ordering('th')
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)
def load_data(path):
print("Loading training data...")
training_data = glob.glob(path)[0]
data=np.load(training_data)
a=data['train']
b=data['train_labels']
s=np.concatenate((a, b), axis=1)
data=pd.DataFrame(s)
data=data.sample(frac=1)
X = data.iloc[:,:-4]
y=data.iloc[:,-4:]
print("Image array shape: ", X.shape)
print("Label array shape: ", y.shape)
# normalize data
# train validation split, 7:3
return train_test_split(X, y, test_size=0.3)
data_path = "*.npz"
X_train,X_test,y_train,y_test=load_data(data_path)
# reshape to be [samples][channels][width][height]
X_train = X_train.values.reshape(X_train.shape[0], 1, 120, 320).astype('float32')
X_test = X_test.values.reshape(X_test.shape[0], 1, 120, 320).astype('float32')
# normalize inputs from 0-255 to 0-1
X_train = X_train / 255.0
X_test = X_test / 255.0
# one hot encode outputs
num_classes = y_test.shape[1]
# define a simple CNN model
def baseline_model():
model = Sequential()
model.add(Conv2D(30, (5, 5), input_shape=(1, 120, 320), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(15, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.2))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(50, activation='relu'))
model.add(Dense(num_classes, activation='softmax'))
# Compile model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
return model
# build the model
model = baseline_model()
# Fit the model
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=10, batch_size=10)
# Final evaluation of the model
scores = model.evaluate(X_test, y_test, verbose=0)
print("CNN Error: %.2f%%" % (100-scores[1]*100))
sample output: This is the best output and it is of the above code:
enter image description here
I solved this problem by changing the structure of my algorithm and using NVIDIA's deep learning car algorithm to solve this problem. The algorithm is very robust and applies basic computer vision also on it. You can easily find sample implementation for toy cars on medium/youtube also.
this article was really helpful for me:
https://towardsdatascience.com/deeppicar-part-1-102e03c83f2c
additionally this resource was also very helpful:
https://zhengludwig.wordpress.com/projects/self-driving-rc-car/

Handwritten digits recognition with keras

I am trying to learn Keras. I see machine learning code for recognizing handwritten digits here (also given here). It seems to have feedforward, SGD and backpropagation methods written from a scratch. I just want to know if it is possible to write this program using Keras? A starting step in that direction will be appreciated.
You can use this to understand how the MNIST dataset works for MLP first.Keras MNIST tutorial. As you proceed, you can look into how CNN works on the MNIST dataset.
I will describe a bit of the process of the keras code that you have attached to your comment
# Step 1: Organize Data
batch_size = 128 # This is split the 60k images into batches of 128, normally people use 100. It's up to you
num_classes = 10 # Your final layer. Basically number 0 - 9 (10 classes)
epochs = 20 # 20 'runs'. You can increase or decrease to see the change in accuracy. Normally MNIST accuracy peaks at around 10-20 epochs.
# the data, split between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data() #X_train - Your training images, y_train - training labels; x_test - test images, y_test - test labels. Normally people train on 50k train images, 10k test images.
x_train = x_train.reshape(60000, 784) # Each MNIST image is 28x28 pixels. So you are flattening into a 28x28 = 784 array. 60k train images
x_test = x_test.reshape(10000, 784) # Likewise, 10k test images
x_train = x_train.astype('float32') # For float numbers
x_test = x_test.astype('float32')
x_train /= 255 # For normalization. Each image has a 'degree' of darkness within the range of 0-255, so you want to reduce that range to 0 - 1 for your Neural Network
x_test /= 255
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')
# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes) # One-hot encoding. So when your NN is trained, your prediction for 5(example) will look like this [0000010000] (Final layer).
y_test = keras.utils.to_categorical(y_test, num_classes)
# Step 2: Create MLP model
model = Sequential()
model.add(Dense(512, activation='relu', input_shape=(784,))) #First hidden layer, 512 neurons, activation relu, input 784 array
model.add(Dropout(0.2)) # During the training, layer has 20% probability of 'switching off' certain neurons
model.add(Dense(512, activation='relu')) # Same as above
model.add(Dropout(0.2))
model.add(Dense(num_classes, activation='softmax')) # Final layer, 10 neurons, softmax is a probability function to give the best probability of the input image
model.summary()
# Step 3: Create model compilation
model.compile(loss='categorical_crossentropy',
optimizer=RMSprop(),
metrics=['accuracy'])
# 10 classes - categorical_crossentropy. If 2 classes, you can use binary_crossentropy; optimizer - RMSprop, you can change this to ADAM, SGD, etc...; metrics - accuracy
# Step 4: Train model
history = model.fit(x_train, y_train,
batch_size=batch_size,
epochs=epochs,
verbose=1,
validation_data=(x_test, y_test))
# Training happens here. Train on each batch size for 20 runs, the validate your result on the test set.
# Step 5: See results on your test data
score = model.evaluate(x_test, y_test, verbose=0)
# Prints out scores
print('Test loss:', score[0])
print('Test accuracy:', score[1])

Layer Counting with Keras Deep Learning

I am working on my First deep-learning project on counting layers in an image with convolutional neural network.
After fixing tons of errors, I could finally train my model. However, I am getting 0 accuracy; after 2nd epoch it just stops because it is not learning anything.
Input will be a 1200 x 100 size image of layers and output will be an integer.
If anyone can look over my model and can suggest a tip. That will be awesome.
Thanks.
from keras.layers import Reshape, Conv2D, MaxPooling2D, Flatten
model = Sequential()
model.add(Convolution2D(32, 5, 5, activation='relu', input_shape=(1,1200,100)))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Convolution2D(64, 5, 5, activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(1, activation='relu'))
batch_size = 1
epochs = 10
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(sgd, loss='poisson', metrics=['accuracy'])
earlyStopping=keras.callbacks.EarlyStopping(monitor='val_loss', patience=0, verbose=0, mode='auto')
history = model.fit(xtrain, ytrain, batch_size=batch_size, nb_epoch=epochs, validation_data=validation, callbacks=[earlyStopping], verbose=1)
There are sooo many thing to criticise?
1200*100 size of an image (I assume that they're pixels) is so big for CNN's. In ImageNet competitions, images are all 224*224, 299*299.
2.Why don't you use linear or sigmoid activation on last layer?
Did you normalize your outputs between 0 and 1? Normalize it, just divide your output with the maximum of your output and multiply with the same number when using your CNN after training/predicting.
Don't use it with small data, unnecessary :
earlyStopping=keras.callbacks.EarlyStopping(monitor='val_loss', patience=0, verbose=0, mode='auto')
Lower your optimizer to 0.001 with Adam.
Your data isn't actually big, it should work, probably your problem is at normalization of your output/inputs, check for them.

Why is binary_crossentropy more accurate than categorical_crossentropy for multiclass classification in Keras?

I'm learning how to create convolutional neural networks using Keras. I'm trying to get a high accuracy for the MNIST dataset.
Apparently categorical_crossentropy is for more than 2 classes and binary_crossentropy is for 2 classes. Since there are 10 digits, I should be using categorical_crossentropy. However, after training and testing dozens of models, binary_crossentropy consistently outperforms categorical_crossentropy significantly.
On Kaggle, I got 99+% accuracy using binary_crossentropy and 10 epochs. Meanwhile, I can't get above 97% using categorical_crossentropy, even using 30 epochs (which isn't much, but I don't have a GPU, so training takes forever).
Here's what my model looks like now:
model = Sequential()
model.add(Convolution2D(100, 5, 5, border_mode='valid', input_shape=(28, 28, 1), init='glorot_uniform', activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Convolution2D(100, 3, 3, init='glorot_uniform', activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.3))
model.add(Flatten())
model.add(Dense(100, init='glorot_uniform', activation='relu'))
model.add(Dropout(0.3))
model.add(Dense(100, init='glorot_uniform', activation='relu'))
model.add(Dropout(0.3))
model.add(Dense(10, init='glorot_uniform', activation='softmax'))
model.compile(loss='binary_crossentropy', optimizer='adamax', metrics=['accuracy'])
Short answer: it is not.
To see that, simply try to calculate the accuracy "by hand", and you will see that it is different from the one reported by Keras with the model.evaluate method:
# Keras reported accuracy:
score = model.evaluate(x_test, y_test, verbose=0)
score[1]
# 0.99794011611938471
# Actual accuracy calculated manually:
import numpy as np
y_pred = model.predict(x_test)
acc = sum([np.argmax(y_test[i])==np.argmax(y_pred[i]) for i in range(10000)])/10000
acc
# 0.98999999999999999
The reason it seems to be so is a rather subtle issue at how Keras actually guesses which accuracy to use, depending on the loss function you have selected, when you include simply metrics=['accuracy'] in your model compilation.
If you check the source code, Keras does not define a single accuracy metric, but several different ones, among them binary_accuracy and categorical_accuracy. What happens under the hood is that, since you have selected binary cross entropy as your loss function and have not specified a particular accuracy metric, Keras (wrongly...) infers that you are interested in the binary_accuracy, and this is what it returns.
To avoid that, i.e. to use indeed binary cross entropy as your loss function (nothing wrong with this, in principle) while still getting the categorical accuracy required by the problem at hand (i.e. MNIST classification), you should ask explicitly for categorical_accuracy in the model compilation as follows:
from keras.metrics import categorical_accuracy
model.compile(loss='binary_crossentropy', optimizer='adamax', metrics=[categorical_accuracy])
And after training, scoring, and predicting the test set as I show above, the two metrics now are the same, as they should be:
sum([np.argmax(y_test[i])==np.argmax(y_pred[i]) for i in range(10000)])/10000 == score[1]
# True
(HT to this great answer to a similar problem, which helped me understand the issue...)
UPDATE: After my post, I discovered that this issue had already been identified in this answer.
First of all, binary_crossentropy is not when there are two classes.
The "binary" name is because it is adapted for binary output, and each number of the softmax is aimed at being 0 or 1.
Here, it checks for each number of the output.
It doesn't explain your result, since categorical_entropy exploits the fact that it is a classification problem.
Are you sure that when you read your data there is one and only one class per sample? It's the only one explanation I can give.

Resources