Keras accuracy metrics differ from manual computation

Keras accuracy metrics differ from manual computation - machine-learning

I am working on a binary classification problem on Keras. The loss function I use is binary_crossentropy and metrics is metrics=['accuracy']. Since two classes are imbalanced, I use class_weight='auto' when I fit training data set to the model.
To see the performance, I print out the accuracy by
print GNN.model.test_on_batch([test_sample_1, test_sample_2], test_label)[1]
The output is 0.973. But this result is different when I use following lines to get the prediction accuracy
predict_label = GNN.model.predict([test_sample_1, test_sample_2])
rounded = predict_label.round(1)
print (rounded == test_label).sum()/float(rounded.shape[0])
which is 0.953.
So I am wondering how metrics=['accuracy'] evaluate the model performance and why the result is different.
For details, I attached the model summary below.
input_size = self.n_feature
encoder_size = 2000
dropout_rate = 0.5
X1 = Input(shape=(input_size, ), name='input_1')
X2 = Input(shape=(input_size, ), name='input_2')
encoder = Sequential()
encoder.add(Dropout(dropout_rate, input_shape=(input_size, )))
encoder.add(Dense(encoder_size, activation='tanh'))
encoded_1 = encoder(X1)
encoded_2 = encoder(X2)
merged = concatenate([encoded_1, encoded_2])
comparer = Sequential()
comparer.add(Dropout(dropout_rate, input_shape=(encoder_size * 2, )))
comparer.add(Dense(500, activation='relu'))
comparer.add(Dropout(dropout_rate))
comparer.add(Dense(200, activation='relu'))
comparer.add(Dropout(dropout_rate))
comparer.add(Dense(1, activation='sigmoid'))
Y = comparer(merged)
model = Model(inputs=[X1, X2], outputs=Y)
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
self.model = model
And I train model by
self.hist = self.model.fit(
x=[train_sample_1, train_sample_2],
y=train_label,
class_weight = 'auto',
validation_split=0.1,
batch_size=batch_size,
epochs=epochs,
callbacks=callbacks)

Related

Poor predictions on second dataset from trained LSTM model

I've trained an LSTM model with 8 features and 1 output. I have one dataset and split it into two separate files to train and predict with the first half of the set, and then attempt to predict the second half of the set using the trained model from the first part of my dataset. My model predicts the trained and testing sets from the dataset I used to train the model pretty well (RMSE of around 5-7), however when I attempt to predict using the second half of the set I get very poor predictions (RMSE of around 50-60). How can I get my trained model to predict outside datasets well?
dataset at this link
file = r'/content/drive/MyDrive/only_force_pt1.csv'
df = pd.read_csv(file)
df.head()
X = df.iloc[:, 1:9]
y = df.iloc[:,9]
print(X.shape)
print(y.shape)
plt.figure(figsize = (20, 6), dpi = 100)
plt.plot(y)
WINDOW_LEN = 50
def window_size(size, inputdata, targetdata):
X = []
y = []
i=0
while(i + size) <= len(inputdata)-1:
X.append(inputdata[i: i+size])
y.append(targetdata[i+size])
i+=1
assert len(X)==len(y)
return (X,y)
X_series, y_series = window_size(WINDOW_LEN, X, y)
print(len(X))
print(len(X_series))
print(len(y_series))
X_train, X_val, y_train, y_val = train_test_split(np.array(X_series),np.array(y_series),test_size=0.3, shuffle = True)
X_val, X_test,y_val, y_test = train_test_split(np.array(X_val),np.array(y_val),test_size=0.3, shuffle = False)
n_timesteps, n_features, n_outputs = X_train.shape[1], X_train.shape[2],1
[verbose, epochs, batch_size] = [1, 300, 32]
input_shape = (n_timesteps, n_features)
model = Sequential()
# LSTM
model.add(LSTM(64, input_shape=input_shape, return_sequences = False))
model.add(Dropout(0.2))
model.add(Dense(64, activation='relu', kernel_regularizer=keras.regularizers.l2(0.001)))
#model.add(Dropout(0.2))
model.add(Dense(32, activation='relu', kernel_regularizer=keras.regularizers.l2(0.001)))
model.add(Dense(1, activation='relu'))
earlystopper = EarlyStopping(monitor='val_loss', min_delta=0, patience = 30, verbose =1, mode = 'auto')
model.summary()
model.compile(loss = 'mse', optimizer = Adam(learning_rate = 0.001), metrics=[tf.keras.metrics.RootMeanSquaredError()])
history = model.fit(X_train, y_train, batch_size = batch_size, epochs = epochs, verbose = verbose, validation_data=(X_val,y_val), callbacks = [earlystopper])
Second dataset:
tests = r'/content/drive/MyDrive/only_force_pt2.csv'
df_testing = pd.read_csv(tests)
X_testing = df_testing.iloc[:4038,1:9]
torque = df_testing.iloc[:4038,9]
print(X_testing.shape)
print(torque.shape)
plt.figure(figsize = (20, 6), dpi = 100)
plt.plot(torque)
X_testing = X_testing.to_numpy()
X_testing_series, y_testing_series = window_size(WINDOW_LEN, X_testing, torque)
X_testing_series = np.array(X_testing_series)
y_testing_series = np.array(y_testing_series)
scores = model.evaluate(X_testing_series, y_testing_series, verbose =1)
X_prediction = model.predict(X_testing_series, batch_size = 32)

If your model is working fine on training data but performs bad on validation data, then your model did not learn the "true" connection between input and output variables but simply memorized the corresponding output to your input. To tackle this you can do multiple things:
Typically you would use 80% of your data to train and 20% to test, this will present more data to the model, which should make it learn more of the true underlying function
If your model is too complex, it will have neurons which will just be used to memorize input-output data pairs. Try to reduce the complexity of your model (layers, neurons) to make it more simple, so that the remaining layers can really learn instead of memorize
Look into more detail on training performance here

How to improve Training and Test accuracy

I am pretty new to Deep learning. I was experimenting with fine tuning of pretrained models on my own dataset but I am not able to improve the test and training accuracy. Both the Losses are hovering around 62 from beginning of training to last. I am using Xception as the pretrained model and combined with GlobalAveragePooling2D, a dense layer and dropout of 0.2.
The dataset consists of 3522 images belonging to 2 class of training and 881 images belonging to 2 classes of test set. Problem is I am not able add any more images to the datasets. This is the maximum number of images I could add to the datasets. Tried ImageDataGenerator but still it's of no use. Images of two classes looks bit similar in this constraint can I increase the accuracy.
Code:
base_model = Xception(include_top=False, weights='imagenet')
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(512, activation="relu")(x)
x = Dropout(0.2)(x)
predictions = Dense(2, activation='sigmoid')(x)
model = Model(inputs=base_model.input, outputs=predictions)
for layer in base_model.layers:
layer.trainable = False
model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['accuracy'])
num_training_img=3522
num_test_img=881
stepsPerEpoch = num_training_img/batch_size
validationSteps= num_test_img/batch_size
history= model.fit_generator(
train_data_gen,
steps_per_epoch=stepsPerEpoch,
epochs=20,
validation_data = test_data_gen,
validation_steps=validationSteps
)
layer_num = len(model.layers)
for layer in model.layers[:129]:
layer.trainable = False
for layer in model.layers[129:]:
layer.trainable = True
# update the weights
model.compile(optimizer=SGD(lr=0.0001, momentum=0.9), loss='binary_crossentropy', metrics=['accuracy'])
num_training_img=3522
num_test_img=881
stepsPerEpoch = num_training_img/batch_size
validationSteps= num_test_img/batch_size
history= model.fit_generator(
train_data_gen,
steps_per_epoch=stepsPerEpoch,
epochs=20,
validation_data = test_data_gen,
validation_steps=validationSteps
)

You should make the layers non-trainable before creating the model.
base_model = Xception(include_top=False, weights='imagenet')
for layer in base_model.layers:
layer.trainable = False
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(512, activation="relu")(x)
x = Dropout(0.2)(x)
predictions = Dense(2, activation='softmax')(x)
model = Model(inputs=base_model.input, outputs=predictions)
Your last layer has 2 units, which suggests, softmax is a better fit.
predictions = Dense(2, activation='softmax')(x)
Try with Adam and change loss.
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

Tackling the validation accuracy for a small validation set

I am training a Neural network for classifying between two classes. I got train set from internet and generated a small dataset from our internal experiment. i want to have a model learned through the internet data ( i use as train dataset 100k) and use our own generated data ( 600) as a validation set. So that model learned from the open domain can be adjusted to our domain.
I trained the model and did couple of experiments. But the problem is it i am getting around 60% validation set accuracy. I need suggestion how to improve the validation set accuracy.
Experiment No. 1
Code :
model_without_emb = tf.keras.Sequential([
tf.keras.layers.Embedding(vocab_size+1, embedding_dim, input_length=160, weights=[embeddings_matrix], trainable=True),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Conv1D(64, 5, activation='relu'),
tf.keras.layers.MaxPooling1D(pool_size=4),
tf.keras.layers.LSTM(64),
tf.keras.layers.Dense(1, activation='sigmoid') ])
model_without_emb.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy'])
model_without_emb.summary()
es = EarlyStopping(monitor='val_acc',
mode='max', verbose=1, patience=25) num_epochs = 50
history_without_emb = model_without_emb.fit(train_sequences,
train_labels, epochs=num_epochs, validation_data=(test_sequences,
test_labels), verbose=1, callbacks=[es])
Experiment 2 : (reducing the number of units)
Code :
model_without_emb = tf.keras.Sequential([
tf.keras.layers.Embedding(vocab_size+1, embedding_dim, input_length=160, weights
= [embeddings_matrix], trainable=False),
tf.keras.layers.Dropout(0.1),
tf.keras.layers.Conv1D(6, 3, activation='relu'),
tf.keras.layers.MaxPooling1D(pool_size=4),
tf.keras.layers.LSTM(6),
tf.keras.layers.Dense(1, activation='sigmoid') ])
opt = tf.compat.v1.keras.optimizers.Adam(learning_rate=0.000005,
beta_1=0.9,
beta_2=0.999, epsilon=1e-07, decay = 0.0, amsgrad=True)
model_without_emb.compile(loss='binary_crossentropy', optimizer=opt,
metrics=['accuracy'])
Experiment 3 ( removing Convolution layer)
Code :
model_without_emb = tf.keras.Sequential([
tf.keras.layers.Embedding(vocab_size+1, embedding_dim, input_length=160, weights=[embeddings_matrix], trainable=False),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.LSTM(8),
tf.keras.layers.Dense(1, activation='sigmoid') ])
opt = tf.compat.v1.keras.optimizers.Adam(learning_rate=0.000005,
beta_1=0.9, beta_2=0.999, epsilon=1e-07, decay = 0.0, amsgrad=True)
model_without_emb.compile(loss='binary_crossentropy',
optimizer=opt, metrics=['accuracy'])

CIFAR-10 test set classification accuracy different on PyTorch and Keras

I’ve made a custom CNN in PyTorch for classifying 10 classes in the CIFAR-10 dataset. My classification accuracy on the test dataset is 45.739%, this is very low and I thought it’s because my model is not very deep but I implemented the same model in Keras and the classification accuracy come outs to be 78.92% on test dataset. No problem in Keras however I think there's something I'm missing in my PyTorch program.
I have used the same model architecture, strides, padding, dropout rate, optimizer, loss function, learning rate, batch size, number of epochs on both PyTorch and Keras and despite that, the difference in the classification accuracy is still huge thus I’m not able to decide how I should debug my PyTorch program further.
For now I suspect 3 things: in Keras, I use the categorical cross entropy loss function (one hot vector labels) and in PyTorch I use the standard cross entropy loss function (scalar indices labels), can this be a problem?, if not then I suspect either my training loop or the code for calculating classification accuracy in PyTorch. I have attached both my programs below, will be grateful to any suggestions.
My program in Keras:
#================Function that defines the CNN model===========
def CNN_model():
model = Sequential()
model.add(Conv2D(32,(3,3),activation='relu',padding='same', input_shape=(size,size,channels))) #SAME PADDING
model.add(Conv2D(32,(3,3),activation='relu')) #VALID PADDING
model.add(MaxPooling2D(pool_size=(2,2))) #VALID PADDING
model.add(Dropout(0.25))
model.add(Conv2D(64,(3,3),activation='relu', padding='same')) #SAME PADDING
model.add(Conv2D(64,(3,3),activation='relu')) #VALID PADDING
model.add(MaxPooling2D(pool_size=(2,2))) #VALID PADDING
model.add(Dropout(0.25))
model.add(Conv2D(128,(3,3),activation='relu', padding='same')) #SAME PADDING
model.add(Conv2D(128,(3,3),activation='relu')) #VALID PADDING
model.add(MaxPooling2D(pool_size=(2,2),name='feature_extractor_layer')) #VALID PADDING
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(512, activation='relu', name='second_last_layer'))
model.add(Dropout(0.25))
model.add(Dense(10, activation='softmax', name='softmax_layer')) #10 nodes in the softmax layer
model.summary()
return model
#=====Main program starts here========
#get_train_data() and get_test_data() are my own custom functions to get CIFAR-10 dataset
images_train, labels_train, class_train = get_train_data(0,10)
images_test, labels_test, class_test = get_test_data(0,10)
model = CNN_model()
model.compile(loss='categorical_crossentropy', #loss function of the CNN
optimizer=Adam(lr=1.0e-4), #Optimizer
metrics=['accuracy'])#'accuracy' metric is to be evaluated
#images_train and images_test contain images and
#class_train and class_test contains one hot vectors labels
model.fit(images_train,class_train,
batch_size=128,
epochs=50,
validation_data=(images_test,class_test),
verbose=1)
scores=model.evaluate(images_test,class_test,verbose=0)
print("Accuracy: "+str(scores[1]*100)+"% \n")
My program in PyTorch:
#========DEFINE THE CNN MODEL=====
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(3, 32, 3,1,1)#SAME PADDING
self.conv2 = nn.Conv2d(32,32,3,1,0)#VALID PADDING
self.pool1 = nn.MaxPool2d(2,2) #VALID PADDING
self.drop1 = nn.Dropout2d(0.25) #DROPOUT OF 0.25
self.conv3 = nn.Conv2d(32,64,3,1,1)#SAME PADDING
self.conv4 = nn.Conv2d(64,64,3,1,0)#VALID PADDING
self.pool2 = nn.MaxPool2d(2,2)#VALID PADDING
self.drop2 = nn.Dropout2d(0.25) #DROPOUT OF 0.25
self.conv5 = nn.Conv2d(64,128,3,1,1)#SAME PADDING
self.conv6 = nn.Conv2d(128,128,3,1,0)#VALID PADDING
self.pool3 = nn.MaxPool2d(2,2)#VALID PADDING
self.drop3 = nn.Dropout2d(0.25) #DROPOUT OF 0.25
self.fc1 = nn.Linear(128*2*2, 512)#128*2*2 IS OUTPUT DIMENSION AFTER THE PREVIOUS LAYER
self.drop4 = nn.Dropout(0.25) #DROPOUT OF 0.25
self.fc2 = nn.Linear(512,10) #10 output nodes
def forward(self, x):
x = F.relu(self.conv1(x))
x = F.relu(self.conv2(x))
x = self.pool1(x)
x = self.drop1(x)
x = F.relu(self.conv3(x))
x = F.relu(self.conv4(x))
x = self.pool2(x)
x = self.drop2(x)
x = F.relu(self.conv5(x))
x = F.relu(self.conv6(x))
x = self.pool3(x)
x = self.drop3(x)
x = x.view(-1,2*2*128) #FLATTENING OPERATION 2*2*128 IS OUTPUT AFTER THE PREVIOUS LAYER
x = F.relu(self.fc1(x))
x = self.drop4(x)
x = self.fc2(x) #LAST LAYER DOES NOT NEED SOFTMAX BECAUSE THE LOSS FUNCTION WILL TAKE CARE OF IT
return x
#=======FUNCTION TO CONVERT INPUT AND TARGET TO TORCH TENSORS AND LOADING INTO GPU======
def PrepareInputDataAndTargetData(device,images,labels,batch_size):
#GET MINI BATCH OF TRAINING IMAGES AND RESHAPE THE TORCH TENSOR FOR CNN PROCESSING
mini_batch_images = torch.tensor(images)
mini_batch_images = mini_batch_images.view(batch_size,3,32,32)
#GET MINI BATCH OF TRAINING LABELS, TARGET SHOULD BE IN LONG FORMAT SO CONVERT THAT TOO
mini_batch_labels = torch.tensor(labels)
mini_batch_labels = mini_batch_labels.long()
#FEED THE INPUT DATA AND TARGET LABELS TO GPU
mini_batch_images = mini_batch_images.to(device)
mini_batch_labels = mini_batch_labels.to(device)
return mini_batch_images,mini_batch_labels
#==========MAIN PROGRAM==========
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
#get_train_data() and get_test_data() are my own custom functions to get CIFAR-10 dataset
Images_train, Labels_train, Class_train = get_train_data(0,10)
Images_test, Labels_test, Class_test = get_test_data(0,10)
net = Net()
net = net.double() #https://discuss.pytorch.org/t/runtimeerror-expected-object-of-scalar-type-double-but-got-scalar-type-float-for-argument-2-weight/38961
print(net)
#MAP THE MODEL ONTO THE GPU
net = net.to(device)
#CROSS ENTROPY LOSS FUNCTION AND ADAM OPTIMIZER
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(net.parameters(), lr=1e-4)
#PREPARE THE DATALOADER
#Images_train contains images and Labels_trains contains indices i.e. 0,1,...,9
dataset = TensorDataset( Tensor(Images_train), Tensor(Labels_train) )
trainloader = DataLoader(dataset, batch_size= 128, shuffle=True)
#START TRAINING THE CNN MODEL FOR 50 EPOCHS
for epoch in range(0,50):
for i, data in enumerate(trainloader, 0):
inputs, labels = data
inputs = torch.tensor(inputs).double()
inputs = inputs.view(len(inputs),3,32,32) #RESHAPE THE IMAGES
labels = labels.long() #MUST CONVERT LABEL TO LONG FORMAT
#MAP THE INPUT AND LABELS TO THE GPU
inputs=inputs.to(device)
labels=labels.to(device)
#FORWARD PROP, BACKWARD PROP, PARAMETER UPDATE
optimizer.zero_grad()
outputs = net.forward(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
#CALCULATE CLASSIFICATION ACCURACY ON ALL 10 CLASSES
with torch.no_grad():
Images_class,Labels_class = PrepareInputDataAndTargetData(device,Images_test,Labels_test,len(Images_test))
network_outputs = net.forward(Images_class)
correct = (torch.argmax(network_outputs.data,1) == Labels_class.data).float().sum()
acc = float(100.0*(correct/len(Images_class)))
print("Accuracy is: "+str(acc)+"\n")
del Images_class
del Labels_class
del network_outputs
del correct
del acc
torch.cuda.empty_cache()
print("Done\n")
I am not fully aware of how the actual core backend works in both libraries however I suppose that the classification accuracy of any model should be almost the same regardless of the library.

Why does output changes after I perform cross validation?

I have built a neural network for performing regression. However, if I'm performing cross-validation before making prediction, the output changes. Below are the graphs with and without cross validation.
With Cross Validation
Without Cross Validation
The code that I use for cross validation
from keras.wrappers.scikit_learn import KerasRegressor
from sklearn.model_selection import KFold
epoch = 5000
n_cols = X_train.shape[1]
def baseline_model():
model = Sequential()
model.add(Dense(3, activation='sigmoid', input_shape=(n_cols,)))
model.add(Dense(1, activation = 'linear'))
model.compile(optimizer='adam', loss='mse')
return model
estimator = KerasRegressor(build_fn=baseline_model, epochs=epoch, batch_size=16, verbose = 0)
kfold = KFold(n_splits=5)
results = cross_val_score(estimator, X_train, y_train, cv=kfold)
print("Results: %.10f (%.10f) MSE" % (results.mean(), results.std()))
print("RMSE:", np.sqrt(abs(results.mean())))
print(results)
for prediction
epoch = 5000
n_cols = X_train.shape[1]
def modelling():
model = Sequential()
model.add(Dense(4, activation='tanh', input_shape=(n_cols,)))
model.add(Dense(1, activation = 'linear'))
model.compile(optimizer='adam', loss='mse')
return model
model = modelling()
history = model.fit(X_train, y_train, epochs= epoch, validation_split = 0.3, batch_size= 16, verbose = 0)
Using keras with tensorflow backend

That's the essence of cross-validation. Instead of one evaluation, it yields the mean and std of many evaluations. For your example, you are using a 5 split Kfold, which means you will be learning on 4/5 of train data and testing on the remaining 1/5 for 5 times.
Cross validation is used to be sure that your model is not overfitting.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Keras accuracy metrics differ from manual computation - machine-learning

Related

Poor predictions on second dataset from trained LSTM model

How to improve Training and Test accuracy

Tackling the validation accuracy for a small validation set

CIFAR-10 test set classification accuracy different on PyTorch and Keras

Why does output changes after I perform cross validation?

Categories

Resources