Pytorch can not save torchvision models pretrained weights - save

I try to fine tune the torchvision models for some classification tasks. When I trained the inception code and saved the best accuracy model, some strange things happened. This is simplified train and test code:
net = torchvision.models.inception_v3(pretrained=True)
for epoch in range(60):
for data in trainLoader:
....
for data in testLoader:
....
if test_acc > best_acc:
best_acc = test_acc
best_model_state_dict = copy.deepcopy(net.state_dict())
best_epoch = epoch
state = {
"epoch": best_epoch,
"model_state": best_model_state_dict,
"best_acc": best_acc
}
save_path = "inception_model/{}_inceptionv3_{}_classroomClassification.pkl".format(str(best_acc), exp_i)
torch.save(state, save_path)
When I seperately test the same data using the code below, the accuracy is lower than the value recorded.
net = torchvision.models.inception_v3()
net.aux_logits = False
net.fc = nn.Linear(2048, 6)
model_path = r"D:\aha\0.8785714285714286_inceptionv3_2_classroomClassification.pkl"
state = torch.load(model_path)
net.load_state_dict(state["model_state"])
net.eval().cpu()
conf_mat = np.zeros([6, 6])
test_running_loss = 0.0
test_acc = 0.0
for data in tqdm(testLoader):
img_feature, label = data
img_feature = Variable(img_feature).cpu()
net.eval()
img_feature = Variable(img_feature).cpu()
label = Variable(label).long().cpu()
out = net(img_feature)
pred = torch.max(out.data, 1)[1]
test_acc += torch.sum(pred == label).item()
test_acc = test_acc / len(testData)
print(test_acc)
But when I change net = torchvision.models.inception_v3() into net = torchvision.models.inception_v3(pretrained=True), the accuracy is changed back. This seems like that pytorch can not save the pretrained weights.

Related

Accuracy value goes up and down on the training process

After training the network I noticed that accuracy goes up and down. Initially I thought it is caused by the learning rate, but it is set to quite small value. Please check the screenshot attached.
Plot Accuracy Screenshot
My network (in Pytorch) looks as follow:
class Network(nn.Module):
def __init__(self):
super(Network,self).__init__()
self.layer1 = nn.Sequential(
nn.Conv2d(3,16,kernel_size=3),
nn.ReLU(),
nn.MaxPool2d(2)
)
self.layer2 = nn.Sequential(
nn.Conv2d(16,32, kernel_size=3),
nn.ReLU(),
nn.MaxPool2d(2)
)
self.layer3 = nn.Sequential(
nn.Conv2d(32,64, kernel_size=3),
nn.ReLU(),
nn.MaxPool2d(2)
)
self.fc1 = nn.Linear(17*17*64,512)
self.fc2 = nn.Linear(512,1)
self.relu = nn.ReLU()
self.sigmoid = nn.Sigmoid()
def forward(self,x):
out = self.layer1(x)
out = self.layer2(out)
out = self.layer3(out)
out = out.view(out.size(0),-1)
out = self.relu(self.fc1(out))
out = self.fc2(out)
out = torch.sigmoid(out)
return out
I am using RMSprop as optimizer and BCELoss as criterion. The learning rate is set to 0.001
Here is the training process:
epochs = 15
itr = 1
p_itr = 100
model.train()
total_loss = 0
loss_list = []
acc_list = []
for epoch in range(epochs):
for samples, labels in train_loader:
samples, labels = samples.to(device), labels.to(device)
optimizer.zero_grad()
output = model(samples)
labels = labels.unsqueeze(-1)
labels = labels.float()
loss = criterion(output, labels)
loss.backward()
optimizer.step()
total_loss += loss.item()
scheduler.step()
if itr%p_itr == 0:
pred = torch.argmax(output, dim=1)
correct = pred.eq(labels)
acc = torch.mean(correct.float())
print('[Epoch {}/{}] Iteration {} -> Train Loss: {:.4f}, Accuracy: {:.3f}'.format(epoch+1, epochs, itr, total_loss/p_itr, acc))
loss_list.append(total_loss/p_itr)
acc_list.append(acc)
total_loss = 0
itr += 1
My dataset is quite small - 2000 train and 1000 validation (binary classification 0/1). I wanted to do the 80/20 split but I was asked to keep it like that. I was thinking that the architecture might be too complex for such a small dataset.
Any hits what may cause such jumps in the training process?
Your code here is wrong: pred = torch.argmax(output, dim=1)
This line using for multiclass classification with Cross-Entropy Loss.
Your task is binary classification so the pred values are wrong. Change to:
if itr%p_itr == 0:
pred = torch.round(output)
....
You can change your optimizer to Adam, SGD, or RMSprop to find the suitable optimizer that helps your model coverage faster.
Also change the forward() function:
def forward(self,x):
out = self.layer1(x)
out = self.layer2(out)
out = self.layer3(out)
out = out.view(out.size(0),-1)
out = self.relu(self.fc1(out))
out = self.fc2(out)
return self.sigmoid(out) #use your forward is ok, but this cleaner

Accuracy changes drastically on changing from CPU to GPU

So I was working on the SIIM Melanoma classification data on Kaggle. While training a number of networks I found that when I trained them on CPU, accuracy seemed to be appropriate, around 0.75. On switching to GPU, the accuracy would oscillate in and around 0.5. What should I do about this? Here's a code snippet of the training loop. Model trained finally was resnext50.
import cv2
device = "cpu"
import torch.nn.functional as F
epochs=3
#model = torch.load("model.pt")
model.cpu()
#model.cuda()
print("======== Training for ", epochs, "epochs=============")
for epoch in range(epochs):
total_loss = 0
model.train()
print("Training.......")
print("======== EPOCH #",epoch,"=================")
tmp_acc = 0
for i,batch in enumerate(train_loader):
img,label = batch["images"],batch["labels"]
#img = img.permute(0,3,1,2)
#img = torch.Tensor(img)
label = label.type(torch.FloatTensor)
img,label = img.to(device),label.to(device)
model.zero_grad()
op = model(img)
label_cpu = label.cpu().numpy()
op = F.sigmoid(op)
output = op.detach().cpu().numpy()
tmp_acc += accuracy_score(output,label_cpu)
loss = criterion(op,label)
total_loss = loss.item()
loss.backward()
adam.step()
if(i%10==0 and i>0):
print("STEP: ",i, "of steps ",len(train_loader))
print("Current loss: ",total_loss/i)
print("Training Accuracy ",tmp_acc/i)
avg_loss = total_loss/len(train_loader)
print("The loss after ",epoch," epochs is ",avg_loss)
print("OP",op)
print("Label",label_cpu)
torch.save(model.state_dict(),"/kaggle/working/model.pt")

Transfer Learning using Keras and vgg16 on small dataset

I have to build a neural network that can recognize the face of 15 people. I'm using keras. My dataset is composed of 300 total images and is divided into Training, Validation and Test. For each of the 15 people I have the following subdivision:
Training: 13
Validation: 3
Test: 4
Since I couldn't build an efficient neural network from scratch, I also believe because my dataset is very small, I'm trying to solve my problem by doing transfer learning. I used the vgg16 network. In the training and validation phase I get good results but when I run the tests the results are disastrous.
I don't know what my problem is. Here is the code I used:
img_width, img_height = 256, 256
train_data_dir = 'dataset_biometria/face/training_set'
validation_data_dir = 'dataset_biometria/face/validation_set'
nb_train_samples = 20
nb_validation_samples = 20
batch_size = 16
epochs = 5
model = applications.VGG19(weights = "imagenet", include_top=False, input_shape = (img_width, img_height, 3))
for layer in model.layers:
layer.trainable = False
#Adding custom Layers
x = model.output
x = Flatten()(x)
x = Dense(1024, activation="relu")(x)
x = Dropout(0.4)(x)
x = Dense(1024, activation="relu")(x)
predictions = Dense(15, activation="softmax")(x)
# creating the final model
model_final = Model(input = model.input, output = predictions)
# compile the model
model_final.compile(loss = "categorical_crossentropy", optimizer = optimizers.SGD(lr=0.0001, momentum=0.9), metrics=["accuracy"])
# Initiate the train and test generators with data Augumentation
train_datagen = ImageDataGenerator(
rescale = 1./255,
horizontal_flip = True,
fill_mode = "nearest",
zoom_range = 0.3,
width_shift_range = 0.3,
height_shift_range=0.3,
rotation_range=30)
test_datagen = ImageDataGenerator(
rescale = 1./255,
horizontal_flip = True,
fill_mode = "nearest",
zoom_range = 0.3,
width_shift_range = 0.3,
height_shift_range=0.3,
rotation_range=30)
train_generator = train_datagen.flow_from_directory(
train_data_dir,
target_size = (img_height, img_width),
batch_size = batch_size,
class_mode = "categorical")
validation_generator = test_datagen.flow_from_directory(
validation_data_dir,
target_size = (img_height, img_width),
class_mode = "categorical")
# Save the model according to the conditions
checkpoint = ModelCheckpoint("vgg16_1.h5", monitor='val_acc', verbose=1, save_best_only=True, save_weights_only=False, mode='auto', period=1)
early = EarlyStopping(monitor='val_acc', min_delta=0, patience=10, verbose=1, mode='auto')
# Train the model
model_final.fit_generator(
train_generator,
samples_per_epoch = nb_train_samples,
epochs = epochs,
validation_data = validation_generator,
nb_val_samples = nb_validation_samples,
callbacks = [checkpoint, early])
model('model_face_classification.h5')
I also tried to train some layers instead of not training any, as in the example below:
for layer in model.layers[:10]:
layer.trainable = False
I also tried changing the number of epochs, batch size, nb_validation_samples, nb_validation_sample.
Unfortunately the result has not changed, in the testing phase my network cannot correctly recognize faces.
Without seeing the actual results or errors I can not say what the problem is here.
Definitely, small dataset is a problem, but there are many ways to get around it.
You can use image augmentation to increase the samples. You can refer augement.py.
But instead of modifying your above network, there is a really cool model : siamese network/one-shot learning. It does not need too many pics and the accuracies are great.
Therefore you can see below links to get some help :
Facial-Recognition-Using-FaceNet-Siamese-One-Shot-Learning
Face-recognition-using-deep-learning

very large value of loss in AlexNet

Actually I am using AlexNet to classify my images in 2 groups , I am feeding images to the model in a batch of 60 images and the loss which I am getting after every batch is 6 to 7 digits large (for ex. 1428529.0) , here I am confused that why my loss is such a large value because on MNIST dataset the loss which I got was very small as compared to this. Can anyone explain me why I am getting such a large loss value.
Thanks in advance ;-)
Here is the code :-
import tensorflow as tf
import matplotlib.pyplot as plt
import numpy as np
import os
img_size = 227
num_channels = 1
img_flat_size = img_size * img_size
num_classes = 2
drop = 0.5
x = tf.placeholder(tf.float32,[None,img_flat_size])
y = tf.placeholder(tf.float32,[None,num_classes])
drop_p = tf.placeholder(tf.float32)
def new_weight(shape):
return tf.Variable(tf.random_normal(shape))
def new_bias(size):
return tf.Variable(tf.random_normal(size))
def new_conv(x,num_input_channels,filter_size,num_filters,stride,padd="SAME"):
shape = [filter_size,filter_size,num_input_channels,num_filters]
weight = new_weight(shape)
bias = new_bias([num_filters])
conv = tf.nn.conv2d(x,weight,strides=[1,stride,stride,1],padding=padd)
conv = tf.nn.bias_add(conv,bias)
return tf.nn.relu(conv)
def new_max_pool(x,k,stride):
max_pool = tf.nn.max_pool(x,ksize=[1,k,k,1],strides=[1,stride,stride,1],padding="VALID")
return max_pool
def flatten_layer(layer):
layer_shape = layer.get_shape()
num_features = layer_shape[1:4].num_elements()
flat_layer = tf.reshape(layer,[-1,num_features])
return flat_layer,num_features
def new_fc_layer(x,num_input,num_output):
weight = new_weight([num_input,num_output])
bias = new_bias([num_output])
fc_layer = tf.matmul(x,weight) + bias
return fc_layer
def lrn(x, radius, alpha, beta, bias=1.0):
"""Create a local response normalization layer."""
return tf.nn.local_response_normalization(x, depth_radius=radius,
alpha=alpha, beta=beta,
bias=bias)
def AlexNet(x,drop,img_size):
x = tf.reshape(x,shape=[-1,img_size,img_size,1])
conv1 = new_conv(x,num_channels,11,96,4,"VALID")
max_pool1 = new_max_pool(conv1,3,2)
norm1 = lrn(max_pool1, 2, 2e-05, 0.75)
conv2 = new_conv(norm1,96,5,256,1)
max_pool2 = new_max_pool(conv2,3,2)
norm2 = lrn(max_pool2, 2, 2e-05, 0.75)
conv3 = new_conv(norm2,256,3,384,1)
conv4 = new_conv(conv3,384,3,384,1)
conv5 = new_conv(conv4,384,3,256,1)
max_pool3 = new_max_pool(conv5,3,2)
layer , num_features = flatten_layer(max_pool3)
fc1 = new_fc_layer(layer,num_features,4096)
fc1 = tf.nn.relu(fc1)
fc1 = tf.nn.dropout(fc1,drop)
fc2 = new_fc_layer(fc1,4096,4096)
fc2 = tf.nn.relu(fc2)
fc2 = tf.nn.dropout(fc2,drop)
out = new_fc_layer(fc2,4096,2)
return out #, tf.nn.softmax(out)
def read_and_decode(tfrecords_file, batch_size):
'''read and decode tfrecord file, generate (image, label) batches
Args:
tfrecords_file: the directory of tfrecord file
batch_size: number of images in each batch
Returns:
image: 4D tensor - [batch_size, width, height, channel]
label: 1D tensor - [batch_size]
'''
# make an input queue from the tfrecord file
filename_queue = tf.train.string_input_producer([tfrecords_file])
reader = tf.TFRecordReader()
_, serialized_example = reader.read(filename_queue)
img_features = tf.parse_single_example(
serialized_example,
features={
'label': tf.FixedLenFeature([], tf.int64),
'image_raw': tf.FixedLenFeature([], tf.string),
})
image = tf.decode_raw(img_features['image_raw'], tf.uint8)
##########################################################
# you can put data augmentation here, I didn't use it
##########################################################
# all the images of notMNIST are 28*28, you need to change the image size if you use other dataset.
image = tf.reshape(image, [227, 227])
label = tf.cast(img_features['label'], tf.int32)
image_batch, label_batch = tf.train.batch([image, label],
batch_size= batch_size,
num_threads= 1,
capacity = 6000)
return tf.reshape(image_batch,[batch_size,227*227*1]), tf.reshape(label_batch, [batch_size])
pred = AlexNet(x,drop_p,img_size) #pred
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred,labels=y))
optimiser = tf.train.AdamOptimizer(learning_rate = 0.001).minimize(loss)
correct_pred = tf.equal(tf.argmax(pred,1),tf.argmax(y,1))
accuracy = tf.reduce_mean(tf.cast(correct_pred,tf.float32))
cost = tf.summary.scalar('loss',loss)
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
merge_summary = tf.summary.merge_all()
summary_writer = tf.summary.FileWriter('./AlexNet',graph = tf.get_default_graph())
tf_record_file = 'train.tfrecords'
x_val ,y_val = read_and_decode(tf_record_file,20)
y_val = tf.one_hot(y_val,depth=2,on_value=1,off_value=0)
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
x_val = x_val.eval()
y_val = y_val.eval()
epoch = 2
for i in range(epoch):
_, summary= sess.run([optimiser,merge_summary],feed_dict={x:x_val,y:y_val,drop_p:drop})
summary_writer.add_summary(summary,i)
loss_a,accu = sess.run([loss,accuracy],feed_dict={x:x_val,y:y_val,drop_p:1.0})
print "Epoch "+str(i+1) +', Minibatch Loss = '+ \
"{:.6f}".format(loss_a) + ', Training Accuracy = '+ \
'{:.5f}'.format(accu)
print "Optimization Finished!"
tf_record_file1 = 'test.tfrecords'
x_v ,y_v = read_and_decode(tf_record_file1,10)
y_v = tf.one_hot(y_v,depth=2,on_value=1,off_value=0)
coord1 = tf.train.Coordinator()
threads1 = tf.train.start_queue_runners(coord=coord1)
x_v = sess.run(x_v)
y_v = sess.run(y_v)
print "Testing Accuracy : "
print sess.run(accuracy,feed_dict={x:x_v,y:y_v,drop_p:1.0})
coord.request_stop()
coord.join(threads)
coord1.request_stop()
coord1.join(threads1)
Take a look a what a confusion matrix is. It is a performance evaluator. In addition, you should compare your precision versus your recall. Precision is the accuracy of your positive predictions and recall is the ratio of positive instances that are correctly detected by the classifier. By combining both precision and recall, you get the F_1 score which is keep in evaluating the problems of your model.
I would suggest you pick up the text Hands-On Machine Learning with Scikit-Learn and TensorFlow. It is a truly comprehensive book and covers what I describe above in more detail.

value prediction with tensorflow and python

I have a data set which contains a list of stock prices. I need to use the tensorflow and python to predict the close price.
Q1: I have the following code which takes the first 2000 records as training and 2001 to 20000 records as test but I don't know how to change the code to do the prediction of the close price of today and 1 day later??? Please advise!
#!/usr/bin/env python2
import numpy as np
import pandas as pd
import tensorflow as tf
import matplotlib.pyplot as plt
def feature_scaling(input_pd, scaling_meathod):
if scaling_meathod == 'z-score':
scaled_pd = (input_pd - input_pd.mean()) / input_pd.std()
elif scaling_meathod == 'min-max':
scaled_pd = (input_pd - input_pd.min()) / (input_pd.max() -
input_pd.min())
return scaled_pd
def input_reshape(input_pd, start, end, batch_size, batch_shift, n_features):
temp_pd = input_pd[start-1: end+batch_size-1]
output_pd = map(lambda y : temp_pd[y:y+batch_size], xrange(0, end-start+1, batch_shift))
output_temp = map(lambda x : np.array(output_pd[x]).reshape([-1]), xrange(len(output_pd)))
output = np.reshape(output_temp, [-1, batch_size, n_features])
return output
def target_reshape(input_pd, start, end, batch_size, batch_shift, n_step_ahead, m_steps_pred):
temp_pd = input_pd[start+batch_size+n_step_ahead-2: end+batch_size+n_step_ahead+m_steps_pred-2]
print temp_pd
output_pd = map(lambda y : temp_pd[y:y+m_steps_pred], xrange(0, end-start+1, batch_shift))
output_temp = map(lambda x : np.array(output_pd[x]).reshape([-1]), xrange(len(output_pd)))
output = np.reshape(output_temp, [-1,1])
return output
def lstm(input, n_inputs, n_steps, n_of_layers, scope_name):
num_layers = n_of_layers
input = tf.transpose(input,[1, 0, 2])
input = tf.reshape(input,[-1, n_inputs])
input = tf.split(0, n_steps, input)
with tf.variable_scope(scope_name):
cell = tf.nn.rnn_cell.BasicLSTMCell(num_units=n_inputs)
cell = tf.nn.rnn_cell.MultiRNNCell([cell]*num_layers)
output, state = tf.nn.rnn(cell, input, dtype=tf.float32) yi1
output = output[-1]
return output
feature_to_input = ['open price', 'highest price', 'lowest price', 'close price','turnover', 'volume','mean price']
feature_to_predict = ['close price']
feature_to_scale = ['volume']
sacling_meathod = 'min-max'
train_start = 1
train_end = 1000
test_start = 1001
test_end = 20000
batch_size = 100
batch_shift = 1
n_step_ahead = 1
m_steps_pred = 1
n_features = len(feature_to_input)
lstm_scope_name = 'lstm_prediction'
n_lstm_layers = 1
n_pred_class = 1
learning_rate = 0.1
EPOCHS = 1000
PRINT_STEP = 100
read_data_pd = pd.read_csv('./stock_price.csv')
temp_pd = feature_scaling(input_pd[feature_to_scale],sacling_meathod)
input_pd[feature_to_scale] = temp_pd
train_input_temp_pd = input_pd[feature_to_input]
train_input_nparr = input_reshape(train_input_temp_pd,
train_start, train_end, batch_size, batch_shift, n_features)
train_target_temp_pd = input_pd[feature_to_predict]
train_target_nparr = target_reshape(train_target_temp_pd, train_start, train_end, batch_size, batch_shift, n_step_ahead, m_steps_pred)
test_input_temp_pd = input_pd[feature_to_input]
test_input_nparr = input_reshape(test_input_temp_pd, test_start, test_end, batch_size, batch_shift, n_features)
test_target_temp_pd = input_pd[feature_to_predict]
test_target_nparr = target_reshape(test_target_temp_pd, test_start, test_end, batch_size, batch_shift, n_step_ahead, m_steps_pred)
tf.reset_default_graph()
x_ = tf.placeholder(tf.float32, [None, batch_size, n_features])
y_ = tf.placeholder(tf.float32, [None, 1])
lstm_output = lstm(x_, n_features, batch_size, n_lstm_layers, lstm_scope_name)
W = tf.Variable(tf.random_normal([n_features, n_pred_class]))
b = tf.Variable(tf.random_normal([n_pred_class]))
y = tf.matmul(lstm_output, W) + b
cost_func = tf.reduce_mean(tf.square(y - y_))
train_op = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost_func)
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step=global_step)
init = tf.initialize_all_variables()
with tf.Session() as sess:
sess.run(init)
for ii in range(EPOCHS):
sess.run(train_op, feed_dict={x_:train_input_nparr, y_:train_target_nparr})
if ii % PRINT_STEP == 0:
cost = sess.run(cost_func, feed_dict={x_:train_input_nparr, y_:train_target_nparr})
print 'iteration =', ii, 'training cost:', cost
Very simply, prediction (a.k.a. scoring or inference) comes from running the input through only the forward pass, and collecting the score for each input vector. It's the same process flow as testing. The difference is the four stages of model use:
Train: learn from the training data set; adjust weights as needed.
Test: evaluate the model's performance; if accuracy has converged, stop training.
Validate: evaluate the accuracy of the trained model. If it doesn't meet acceptance criteria, change something and start over with the training.
Predict: you've passed validation -- release the model for use by the intended application.
All four steps follow the same forward logic flow; training includes back-propagation; the others do not. Simply follow the forward-only process, and you'll get the result form you need.
I worry about your data partition: only 10% for training, 90% for testing, and none for validation. A more typical split is 50-30-20, or something in that general area.
Q-1 : You should change your LSTM parameter to return a sequence of size two which will be prediction for that day and the day after.
Q-2 it's clearly that your model is underfitting the data, which is so obvious with your 10% train 90% test data ! You should more equilibrated ratio as suggested in the previous answer.

Resources