Different results from binary and categorical crossentropy

Different results from binary and categorical crossentropy - machine-learning

I made an experiment between the usage of binary_crossentropy and categorical_crossentropy. I try to understand the behavior of these two loss functions on same problem.
I worked on binary classification problem with this data.
In the first experiment, I used 1 neuron in the last layer with sigmoid activation function and binary_crossentropy. I trained this model 10 times and take the average accuracy. The average accuracy is 74.12760416666666.
The code that I used for first experiment is below.
total_acc = 0
for each_iter in range(0, 10):
print each_iter
X = dataset[:,0:8]
y = dataset[:,8]
# define the keras model
model = Sequential()
model.add(Dense(12, input_dim=8, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# compile the keras model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# fit the keras model on the dataset
model.fit(X, y, epochs=150, batch_size=32)
# evaluate the keras model
_, accuracy = model.evaluate(X, y)
print('Accuracy: %.2f' % (accuracy*100))
temp_acc = accuracy*100
total_acc += temp_acc
del model
In the second experiment, I used 2 neurons in the last layer with softmax activation function and categorical_crossentropy. I converted my target `y, into categorical and again I trained this model 10 times and take the average accuracy. The average accuracy is 66.92708333333334.
The code that I used for the second setting is in below:
total_acc_v2 = 0
for each_iter in range(0, 10):
print each_iter
X = dataset[:,0:8]
y = dataset[:,8]
y = np_utils.to_categorical(y)
# define the keras model
model = Sequential()
model.add(Dense(12, input_dim=8, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(2, activation='softmax'))
# compile the keras model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
# fit the keras model on the dataset
model.fit(X, y, epochs=150, batch_size=32)
# evaluate the keras model
_, accuracy = model.evaluate(X, y)
print('Accuracy: %.2f' % (accuracy*100))
temp_acc = accuracy*100
total_acc_v2 += temp_acc
del model
I think that these two experiments are identical and should give very similar results. What is the reason of this huge difference between accuracy?

Seems like the reason of such behaviour is randomness. I've ran your code and got around 74 average accuracy for the sigmoid model and around 74 for the softmax model.

Related

Confusion Matrix for Binary Classification with NN

I'm building 2 neural network models (binary classification) for my finals. I've confusion matrix as an evaluation, but it always gives me one label output.
Anyone can tell where I made a mistake? Here's the code:
# Start neural network
network = Sequential()
# Add fully connected layer with a ReLU activation function
network.add(Dense(units=2, activation='relu', input_shape=(2,)))
# Add fully connected layer with a ReLU activation function
network.add(Dense(units=4, activation='relu'))
# Add fully connected layer with a sigmoid activation function
network.add(Dense(units=1, activation='sigmoid'))
# Compile neural network
network.compile(loss='binary_crossentropy', # Cross-entropy
optimizer='rmsprop', # Root Mean Square Propagation
metrics=['accuracy']) # Accuracy performance metric
# Train neural network
history = network.fit(X_train, # Features
y_train, # Target vector
epochs=3, # Number of epochs
verbose=1, # Print description after each epoch
batch_size=10, # Number of observations per batch
validation_data=(X_val, y_val)) # Data for evaluation
y_pred = network.predict(X_test)
# y_test = y_test.astype(int).tolist()
y_pred = np.argmax(y_pred, axis=1).tolist()
cm = confusion_matrix(y_test, y_pred)
print(cm)
I've got 93% accuracy from the validation, but the confusion matrix gave me this output:

Your network has a single output with a sigmoid, this means that in order to get your classification you want to round your result, not take an argmax.
y_pred = np.round(y_pred).tolist()
If you were to print your y_pred you would notice that it is N x 1 matrix, and thus an argmax returns 0 for every single row (as this is the only column there is).

Combining Classification and Regression in a sequential way using MLP

I am looking for a way to do classification and regression sequentially?
For example, assuming samples have 3 input values and 1 output value. The model should first classify using the 3 input values and sequentially do the regression task using the classification output (i.e. classification has 3 input values from the original samples and regression has 4 input values (3 from the original samples + the classification output).
Below the architecture that I draw. However, not really sure about the part where the second input layer occurs. Could someone give advice or working examples for this application?
input1_classification = Input(shape=(3,))
hidden1 = Dense(20, activation='relu', kernel_initializer='he_normal'(input1_classification)
# classsfication
outputout_classification = Dense(2, activation='softmax')(hidden1)
# regression input
input1_regression =Input(shape=(5,))
hidden2 = Dense(10, activation='relu', kernel_initializer='he_normal'(out_classification)
out_reg_final = Dense(1)(hidden2)
# define model
model = Model(inputs=input1_classification, outputs=[out_classification, out_reg_final])
# compile the keras modelmodel.compile(loss['sparse_categorical_crossentropy','mse'], optimizer='adam')
# fit the keras model on the dataset
model.fit(X_train, [y_train_class,y_train_reg], epochs=150, batch_size=32, verbose=2)

All you need to do is to concatenate your original input with the output of classification and apply your regression model there, you do not specify "extra" inputs.
So it will become something among the lines of:
input1_classification = Input(shape=(3,))
# classsfication
hidden1 = Dense(20, activation='relu', kernel_initializer='he_normal'(input1_classification)
outputout_classification = Dense(2, activation='softmax')(hidden1)
# regression input
new_input = Concatenate(axis=1)([input1_classification, outputout_classification ])
hidden2 = Dense(10, activation='relu', kernel_initializer='he_normal'(new_input)
out_reg_final = Dense(1)(hidden2)
# define model
model = Model(inputs=input1_classification, outputs=[out_classification, out_reg_final])
# compile the keras modelmodel.compile(loss['sparse_categorical_crossentropy','mse'], optimizer='adam')
# fit the keras model on the dataset
model.fit(X_train, [y_train_class,y_train_reg], epochs=150, batch_size=32, verbose=2)

Handwritten digits recognition with keras

I am trying to learn Keras. I see machine learning code for recognizing handwritten digits here (also given here). It seems to have feedforward, SGD and backpropagation methods written from a scratch. I just want to know if it is possible to write this program using Keras? A starting step in that direction will be appreciated.

You can use this to understand how the MNIST dataset works for MLP first.Keras MNIST tutorial. As you proceed, you can look into how CNN works on the MNIST dataset.
I will describe a bit of the process of the keras code that you have attached to your comment
# Step 1: Organize Data
batch_size = 128 # This is split the 60k images into batches of 128, normally people use 100. It's up to you
num_classes = 10 # Your final layer. Basically number 0 - 9 (10 classes)
epochs = 20 # 20 'runs'. You can increase or decrease to see the change in accuracy. Normally MNIST accuracy peaks at around 10-20 epochs.
# the data, split between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data() #X_train - Your training images, y_train - training labels; x_test - test images, y_test - test labels. Normally people train on 50k train images, 10k test images.
x_train = x_train.reshape(60000, 784) # Each MNIST image is 28x28 pixels. So you are flattening into a 28x28 = 784 array. 60k train images
x_test = x_test.reshape(10000, 784) # Likewise, 10k test images
x_train = x_train.astype('float32') # For float numbers
x_test = x_test.astype('float32')
x_train /= 255 # For normalization. Each image has a 'degree' of darkness within the range of 0-255, so you want to reduce that range to 0 - 1 for your Neural Network
x_test /= 255
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')
# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes) # One-hot encoding. So when your NN is trained, your prediction for 5(example) will look like this [0000010000] (Final layer).
y_test = keras.utils.to_categorical(y_test, num_classes)
# Step 2: Create MLP model
model = Sequential()
model.add(Dense(512, activation='relu', input_shape=(784,))) #First hidden layer, 512 neurons, activation relu, input 784 array
model.add(Dropout(0.2)) # During the training, layer has 20% probability of 'switching off' certain neurons
model.add(Dense(512, activation='relu')) # Same as above
model.add(Dropout(0.2))
model.add(Dense(num_classes, activation='softmax')) # Final layer, 10 neurons, softmax is a probability function to give the best probability of the input image
model.summary()
# Step 3: Create model compilation
model.compile(loss='categorical_crossentropy',
optimizer=RMSprop(),
metrics=['accuracy'])
# 10 classes - categorical_crossentropy. If 2 classes, you can use binary_crossentropy; optimizer - RMSprop, you can change this to ADAM, SGD, etc...; metrics - accuracy
# Step 4: Train model
history = model.fit(x_train, y_train,
batch_size=batch_size,
epochs=epochs,
verbose=1,
validation_data=(x_test, y_test))
# Training happens here. Train on each batch size for 20 runs, the validate your result on the test set.
# Step 5: See results on your test data
score = model.evaluate(x_test, y_test, verbose=0)
# Prints out scores
print('Test loss:', score[0])
print('Test accuracy:', score[1])

Layer Counting with Keras Deep Learning

I am working on my First deep-learning project on counting layers in an image with convolutional neural network.
After fixing tons of errors, I could finally train my model. However, I am getting 0 accuracy; after 2nd epoch it just stops because it is not learning anything.
Input will be a 1200 x 100 size image of layers and output will be an integer.
If anyone can look over my model and can suggest a tip. That will be awesome.
Thanks.
from keras.layers import Reshape, Conv2D, MaxPooling2D, Flatten
model = Sequential()
model.add(Convolution2D(32, 5, 5, activation='relu', input_shape=(1,1200,100)))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Convolution2D(64, 5, 5, activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(1, activation='relu'))
batch_size = 1
epochs = 10
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(sgd, loss='poisson', metrics=['accuracy'])
earlyStopping=keras.callbacks.EarlyStopping(monitor='val_loss', patience=0, verbose=0, mode='auto')
history = model.fit(xtrain, ytrain, batch_size=batch_size, nb_epoch=epochs, validation_data=validation, callbacks=[earlyStopping], verbose=1)

There are sooo many thing to criticise?
1200*100 size of an image (I assume that they're pixels) is so big for CNN's. In ImageNet competitions, images are all 224*224, 299*299.
2.Why don't you use linear or sigmoid activation on last layer?
Did you normalize your outputs between 0 and 1? Normalize it, just divide your output with the maximum of your output and multiply with the same number when using your CNN after training/predicting.
Don't use it with small data, unnecessary :
earlyStopping=keras.callbacks.EarlyStopping(monitor='val_loss', patience=0, verbose=0, mode='auto')
Lower your optimizer to 0.001 with Adam.
Your data isn't actually big, it should work, probably your problem is at normalization of your output/inputs, check for them.

How to avoid overfitting on a simple feed forward network

Using the pima indians diabetes dataset I'm trying to build an accurate model using Keras. I've written the following code:
# Visualize training history
from keras import callbacks
from keras.layers import Dropout
tb = callbacks.TensorBoard(log_dir='/.logs', histogram_freq=10, batch_size=32,
write_graph=True, write_grads=True, write_images=False,
embeddings_freq=0, embeddings_layer_names=None, embeddings_metadata=None)
# Visualize training history
from keras.models import Sequential
from keras.layers import Dense
import matplotlib.pyplot as plt
import numpy
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)
# load pima indians dataset
dataset = numpy.loadtxt("pima-indians-diabetes.csv", delimiter=",")
# split into input (X) and output (Y) variables
X = dataset[:, 0:8]
Y = dataset[:, 8]
# create model
model = Sequential()
model.add(Dense(12, input_dim=8, kernel_initializer='uniform', activation='relu', name='first_input'))
model.add(Dense(500, activation='tanh', name='first_hidden'))
model.add(Dropout(0.5, name='dropout_1'))
model.add(Dense(8, activation='relu', name='second_hidden'))
model.add(Dense(1, activation='sigmoid', name='output_layer'))
# Compile model
model.compile(loss='binary_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])
# Fit the model
history = model.fit(X, Y, validation_split=0.33, epochs=1000, batch_size=10, verbose=0, callbacks=[tb])
# list all data in history
print(history.history.keys())
# summarize history for accuracy
plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
# summarize history for loss
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
After several tries, I've added dropout layers in order to avoid overfitting, but with no luck. The following graph shows that the validation loss and training loss gets separate at one point.
What else could I do to optimize this network?
UPDATE:
based on the comments I got I've tweaked the code like so:
model = Sequential()
model.add(Dense(12, input_dim=8, kernel_initializer='uniform', kernel_regularizer=regularizers.l2(0.01),
activity_regularizer=regularizers.l1(0.01), activation='relu',
name='first_input')) # added regularizers
model.add(Dense(8, activation='relu', name='first_hidden')) # reduced to 8 neurons
model.add(Dropout(0.5, name='dropout_1'))
model.add(Dense(5, activation='relu', name='second_hidden'))
model.add(Dense(1, activation='sigmoid', name='output_layer'))
Here are the graphs for 500 epochs

The first example gave a validation accuracy > 75% and the second one gave an accuracy of < 65% and if you compare the losses for epochs below 100, its less than < 0.5 for the first one and the second one was > 0.6. But how is the second case better?.
The second one to me is a case of under-fitting: the model doesnt have enough capacity to learn. While the first case has a problem of over-fitting because its training was not stopped when overfitting started (early stopping). If the training was stopped at say 100 epoch, it would be a far better model compared between the two.
The goal should be to obtain small prediction error in unseen data and for that you increase the capacity of the network till a point beyond which overfitting starts to happen.
So how to avoid over-fitting in this particular case? Adopt early stopping.
CODE CHANGES: To include early stopping and input scaling.
# input scaling
scaler = StandardScaler()
X = scaler.fit_transform(X)
# Early stopping
early_stop = EarlyStopping(monitor='val_loss', min_delta=0, patience=3, verbose=1, mode='auto')
# create model - almost the same code
model = Sequential()
model.add(Dense(12, input_dim=8, activation='relu', name='first_input'))
model.add(Dense(500, activation='relu', name='first_hidden'))
model.add(Dropout(0.5, name='dropout_1'))
model.add(Dense(8, activation='relu', name='second_hidden'))
model.add(Dense(1, activation='sigmoid', name='output_layer')))
history = model.fit(X, Y, validation_split=0.33, epochs=1000, batch_size=10, verbose=0, callbacks=[tb, early_stop])
The Accuracy and loss graphs:

First, try adding some regularization (https://keras.io/regularizers/) like with this code:
model.add(Dense(12, input_dim=12,
kernel_regularizer=regularizers.l2(0.01),
activity_regularizer=regularizers.l1(0.01)))
Also, make sure to decrease your network size i.e. you don't need a hidden layer of 500 neurons - try just taking that out to decrease the representation power and maybe even another layer if it's still overfitting. Also, only use relu activation. Maybe also try increasing your dropout rate to something like 0.75 (although it's already high). You probably also don't need to run it for so many epochs - it will just begin to overfit after long enough.

For a dataset like the Diabetes one you can use a much simpler network. Try to reduce the neurons in your second layer. (Is there a specific reason why you chose tanh as the activation there?).
In addition you simply can add an EarlyStopping callback to your training: https://keras.io/callbacks/

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Different results from binary and categorical crossentropy - machine-learning

Seems like the reason of such behaviour is randomness. I've ran your code and got around 74 average accuracy for the sigmoid model and around 74 for the softmax model.

Related

Confusion Matrix for Binary Classification with NN

Combining Classification and Regression in a sequential way using MLP

Handwritten digits recognition with keras

Layer Counting with Keras Deep Learning

How to avoid overfitting on a simple feed forward network

Categories

Resources