How to use GPU while training a model? - machine-learning

I am running a code to train a resnet model on a kaggle notebook. I have chosen the accelerator as GPU so I haven't made any mistakes there. I am training the model using the following code:
model.cuda()
for epoch in range(10):
model.train(True)
trainloss=0
for x,y in trainloader:
x,y=x.cuda(),y.cuda()
yhat=model(x)
optimizer.zero_grad()
loss=criterion(yhat,y)
loss.backward()
optimizer.step()
trainloss+=loss.item()
print('Epoch {} Loss: {}'.format(epoch,(trainloss/len(trainloader.dataset))))
model.eval()
testcorrect=0
with torch.no_grad():
for test_x,test_y in testloader:
test_x,test_y=test_x.cuda(),test_y.cuda()
yhat=model(test_x)
_,z=yhat.max(1)
testcorrect+=(test_y==z).sum().item()
print('Model Accuracy: ',(testcorrect/len(testloader.dataset)))
Network Code:
model=torchvision.models.resnet18(pretrained=True)
num_ftrs=model.fc.in_features
model.fc=nn.Sequential(nn.Linear(num_ftrs,1000),
nn.ReLU(),
nn.Linear(1000,2)
)
If you see I have used the .cuda() function on both my model as well as the tensors(inside the training part as well as validation part). However the GPU usage shown for the kaggle notebook is 0% while my CPU usage is up to 99%. Am I missing any code which is required to train the model using the GPU?

It might be that your model doesn't give GPU enough work. Try to make your network more GPU-hungry, e.g. introduce some linear layer with a bunch of neurons, etc. to double check that in that case you see increased GPU usage. Also I noticed that the measurement is delayed by a bit, so maybe you give GPU some work which it can do in a fraction of a second and the GPU usage bar doesn't have a chance to go higher from 0%.
Maybe you could share the actual network you're using?
I can see the GPU usage going to 100% in Kaggle notebook with a toy example like this (notice 2500 x 2500 linear layer here):
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
trainloader = [(torch.Tensor(np.random.randn(1000, 5)), torch.Tensor([1.0] * 1000))] * 1000
model = nn.Sequential(nn.Linear(5, 2500), nn.Linear(2500, 1500), nn.Linear(1500, 1))
model.cuda()
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.)
criterion = lambda x,y : ((x-y)**2).mean()
for epoch in range(10):
for x,y in trainloader:
x,y=x.cuda(),y.cuda()
yhat=model(x)
optimizer.zero_grad()
loss=criterion(yhat,y)
loss.backward()
optimizer.step()
print(epoch)

Related

ResNet50 torchvision implementation gives low accuracy on CIFAR-10

I am new to Deep Learning and PyTorch. I am using the resnet-50 model in the torchvision module on cifar10. I have imported the CIFAR-10 dataset from torchvision. The accuracy is very low on testing and I have tried configuring the classification layers but there is no change in the accuracy. Is there something wrong with my code? Am I making a mistake in calculating the accuracy?
import torchvision
import torch
import torch.nn as nn
from torch import optim
import os
import torchvision.transforms as transforms
from torch.utils.data import DataLoader
import numpy as np
from collections import OrderedDict
import matplotlib.pyplot as plt
transformations=transforms.Compose([transforms.ToTensor(),transforms.Normalize([0.485, 0.456, 0.406],[0.229, 0.224, 0.225])])
trainset=torchvision.datasets.CIFAR10(root='./CIFAR10',download=True,transform=transformations,train=True)
testset=torchvision.datasets.CIFAR10(root='./CIFAR10',download=True,transform=transformations,train=False)
trainloader=DataLoader(dataset=trainset,batch_size=4)
testloader=DataLoader(dataset=testset,batch_size=4)
inputs,labels=next(iter(trainloader))
labels=labels.float()
inputs.size()
print(labels.type())
resnet=torchvision.models.resnet50(pretrained=True)
if torch.cuda.is_available():
resnet=resnet.cuda()
inputs,labels=inputs.cuda(),torch.Tensor(labels).cuda()
outputs=resnet(inputs)
outputs.size()
for param in resnet.parameters():
param.requires_grad=False
numft=resnet.fc.in_features
print(numft)
resnet.fc=torch.nn.Sequential(nn.Linear(numft,1000),nn.ReLU(),nn.Linear(1000,10))
resnet.cuda()
resnet.train(True)
optimizer=torch.optim.SGD(resnet.parameters(),lr=0.001,momentum=0.9)
criterion=nn.CrossEntropyLoss()
for epoch in range(5):
resnet.train(True)
trainloss=0
correct=0
for x,y in trainloader:
x,y=x.cuda(),y.cuda()
optimizer.zero_grad()
yhat=resnet(x)
loss=criterion(yhat,y)
loss.backward()
optimizer.step()
trainloss+=loss.item()
print('Epoch: {} Loss: {}'.format(epoch,(trainloss/len(trainloader))))
accuracy=[]
running_corrects=0.0
for x_test,y_test in testloader:
x_test,y_test=x_test.cuda(),y_test.cuda()
yhat=resnet(x_test)
_,z=yhat.max(1)
running_corrects += torch.sum(y_test == z)
accuracy.append(running_corrects/len(testloader))
print(running_corrects/len(testloader))
accuracy=max(accuracy)
print(accuracy)
OUTPUT AFTER TRAINING/TESTING
Epoch: 0 Loss: 1.9808503997325897
Epoch: 1 Loss: 1.7917569598436356
Epoch: 2 Loss: 1.624434965057373
Epoch: 3 Loss: 1.4082191940283775
Epoch: 4 Loss: 1.1343850775527955
tensor(1.1404, device='cuda:0')
tensor(1.1404, device='cuda:0')
Couple of my observations:
You may want to fine-tune learning-rate and number of epochs and batch size. For example, currently you are training your model for only five epochs which might not be sufficient to achieve high accuracy. you can try with lager value of epochs.
Have you tried adapting backbone (feature extractor) model for CIFAR10 dataset by setting `param.requires_grad=True? Because the original model is trained on imagenet that might need to adapt on CIFAR10.
Before evaluation/testing you may like to set resnet.train(False) or resnet.eval() to let the model know that you are in eval mode. Furthermore, you may want to evaluate your model under the scope of no_grad() by using with torch.no_grad(): that will speed up inference time and reduce memory usage.
[CIFAR-10 is a balanced dataset so it's an optional (EDA) task here.] Have you checked the class distribution of CIFAR10 in terms of whether it's an imbalanced dataset or not? If it's an imbalanced dataset you may want to employ weighted cross entropy for you loss calculation. There are other strategies to tackle class-imbalance like over-sampling or under-sampling.
Regarding test accuracy, You need to divide the total number of correct prediction by the total number of samples in the dataset, len(testloader.dataset) instead of len(testloader). If you want your accuracy in the range of [0,100], just multiply by 100. You can print test accuracy for each epoch to check how it's changing whereas you are currently showing the maximum accuracy.

I want to converge the keras model with just FC-layer

I have a keras model with just FC-layer(Dense). I got train image size 227*227 and 100 class, each class having 1 train image, I would like to overfit and get 100% training accuracy.
Isuue:
I tried to babysit model hyperparameters, but it's not converging to 100% train accuracy. Although, It's just FC-layer even.
Here's my code:
X_train, y_train = ...
# Create a Keras Model
model = Sequential()
model.add(Dense(100, input_dim=input_dim, activation='softmax',
kernel_regularizer=regularizers.l2(0.01),
activity_regularizer=regularizers.l1(0.01)))
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
model.summary()
# Callback and training
csv_logger = CSVLogger('training_log_v1.csv')
model.fit(x_train, y_train, epochs=10000, batch_size=100, callbacks=[csv_logger])
Here's plot for above code.
I have ran different hyper-params experiment with 10K to 20K epochs. Loss after some epochs not decreasing and no improvement in train-accuracy.
I tried to play with Different Optimizers(& hyper-params), regularization as well. There's not much hyperparams to play with except optimizer & regularizers here, Right?
If anyone can help me for converging the model that would be great.Thank You!
I am able to overfit. Hyper-params, I have used for over-fitting this experiment.
Class: 100
Samples_per_class: 1
kernel_regularizer=regularizers.l2(0.01)
activity_regularizer=regularizers.l1(0.01)
Op: Adam
lr: 0.00001
Epochs set to: 50000
batch_size: 256
I got 99% Train-accuracy #around 12K epochs and continued decreasing loss till around 25K epochs.

Can a Neural Network learn a simple interpolation?

I’ve tried to train a 2 layer neural network on a simple linear interpolation for a discrete function, I’ve tried lots of different learning rates as well as different activation functions, and it seems like nothing is being learned!
I’ve literally spent the last 6 hours trying to debug the following code, but it seems like there’s no bug! What's the explanation?
from torch.utils.data import Dataset
import os
import torch
import numpy as np
import torch.nn as nn
import torch.optim as optim
import random
LOW_X=255
MID_X=40000
HIGH_X=200000
LOW_Y=torch.Tensor([0,0,1])
MID_Y=torch.Tensor([0.2,0.5,0.3])
HIGH_Y=torch.Tensor([1,0,0])
BATCH_SIZE=4
def x_to_tensor(x):
if x<=MID_X:
return LOW_Y+(x-LOW_X)*(MID_Y-LOW_Y)/(MID_X-LOW_X)
if x<=HIGH_X:
return MID_Y+(x-MID_X)*(HIGH_Y-MID_Y)/(HIGH_X-MID_X)
return HIGH_Y
class XYDataset(Dataset):
LENGTH=10000
def __len__(self):
return self.LENGTH
def __getitem__(self, idx):
x=random.randint(LOW_X,HIGH_X)
y=x_to_tensor(x)
return x,y
class Interpolate(nn.Module):
def __init__(self, num_outputs,hidden_size=10):
super(Interpolate, self).__init__()
self.hidden_size=hidden_size
self.x_to_hidden = nn.Linear(1, hidden_size)
self.hidden_to_out = nn.Linear(hidden_size,num_outputs)
self.activation = nn.Tanh() #I have tried Sigmoid and Relu activations as well
self.softmax=torch.nn.Softmax(dim=1)
def forward(self, x):
out = self.x_to_hidden(x)
out = self.activation(out)
out = self.hidden_to_out(out)
out = self.softmax(out)
return out
dataset=XYDataset()
trainloader = torch.utils.data.DataLoader(dataset, batch_size=BATCH_SIZE,
shuffle=True, num_workers=4)
criterion= nn.MSELoss()
def train_net(net,epochs=10,lr=5.137871216190041e-05,l2_regularization=2.181622809797563e-12):
optimizer= optim.Adam(net.parameters(),lr=lr,weight_decay=l2_regularization)
net.train(True)
running_loss=0.0
for epoch in range(epochs):
for i,data in enumerate(trainloader):
inputs,targets=data
inputs,targets=torch.FloatTensor(inputs.float()).view(-1,1),torch.FloatTensor(targets.float())
optimizer.zero_grad()
outputs=net(inputs)
loss=criterion(outputs,targets)
loss.backward()
optimizer.step()
running_loss+=loss.item()
if (len(trainloader)*epoch+i)%200==199:
running_loss=running_loss/(200*BATCH_SIZE)
print('[%d,%5d] loss: %.6f ' % (epoch+1,i+1,running_loss))
running_loss=0.0
for i in range(-11,3):
net=Interpolate(num_outputs=3)
train_net(net,lr=10**i,epochs=1)
print('for learning rate {} net output on low x is {}'.format(i,net(torch.Tensor([255]).view(-1,1))))
Although your problem is quite simple, it is poorly scaled: x ranges from 255 to 200K. This poor scaling leads to numerical instability and overall makes the training process unnecessarily unstable.
To overcome this technical issue, you simply need to scale your inputs to [-1, 1] (or [0, 1]) range.
Note that this scaling is quite ubiquitous in deep-learning: images are scaled to [-1, 1] range (see, e.g., torchvision.transforms.Normalize).
To understand better the importance of scaled responses, you can look into the mathematical analysis done in this paper.
You Can Perform a simple interpolation with a NN however you have to consider the following:
I would recommend the following settings:
For an activation function: for a simple interpolation a identity activation function can turn the NN as a Linear Regressor which may generalize well. However you should consider Rectified Linear Unit (Relu) for big data and Logistic/Tanh for regular size data as other options.
In case of big amounts of data I would select an iterative optimizer for weights as simple gradient descent or Adam. On the other hand if you got few data I would use a Newton approximation LBFGS since you will get a good approximation at weights in a reasonably lower computational time.
Vary the number of neurons in each layer and number of layers performing batch learning to seek better approximations.

Why should we normalize data for deep learning in Keras?

I was testing some network architectures in Keras for classifying the MNIST dataset. I have implemented one that is similar to the LeNet.
I have seen that in the examples that I have found on the internet, there is a step of data normalization. For example:
X_train /= 255
I have performed a test without this normalization and I have seen that the performance (accuracy) of the network has decreased (keeping the same number of epochs). Why has this happened?
If I increase the number of epochs, the accuracy can reach the same level reached by the model trained with normalization?
So, the normalization affects the accuracy, or only the training speed?
The complete source code of my training script is below:
from keras.models import Sequential
from keras.layers.convolutional import Conv2D
from keras.layers.convolutional import MaxPooling2D
from keras.layers.core import Activation
from keras.layers.core import Flatten
from keras.layers.core import Dense
from keras.datasets import mnist
from keras.utils import np_utils
from keras.optimizers import SGD, RMSprop, Adam
import numpy as np
import matplotlib.pyplot as plt
from keras import backend as k
def build(input_shape, classes):
model = Sequential()
model.add(Conv2D(20, kernel_size=5, padding="same",activation='relu',input_shape=input_shape))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
model.add(Conv2D(50, kernel_size=5, padding="same", activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
model.add(Flatten())
model.add(Dense(500))
model.add(Activation("relu"))
model.add(Dense(classes))
model.add(Activation("softmax"))
return model
NB_EPOCH = 4 # number of epochs
BATCH_SIZE = 128 # size of the batch
VERBOSE = 1 # set the training phase as verbose
OPTIMIZER = Adam() # optimizer
VALIDATION_SPLIT=0.2 # percentage of the training data used for
evaluating the loss function
IMG_ROWS, IMG_COLS = 28, 28 # input image dimensions
NB_CLASSES = 10 # number of outputs = number of digits
INPUT_SHAPE = (1, IMG_ROWS, IMG_COLS) # shape of the input
(X_train, y_train), (X_test, y_test) = mnist.load_data()
k.set_image_dim_ordering("th")
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255
X_train = X_train[:, np.newaxis, :, :]
X_test = X_test[:, np.newaxis, :, :]
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')
y_train = np_utils.to_categorical(y_train, NB_CLASSES)
y_test = np_utils.to_categorical(y_test, NB_CLASSES)
model = build(input_shape=INPUT_SHAPE, classes=NB_CLASSES)
model.compile(loss="categorical_crossentropy",
optimizer=OPTIMIZER,metrics=["accuracy"])
history = model.fit(X_train, y_train, batch_size=BATCH_SIZE, epochs=NB_EPOCH, verbose=VERBOSE, validation_split=VALIDATION_SPLIT)
model.save("model2")
score = model.evaluate(X_test, y_test, verbose=VERBOSE)
print('Test accuracy:', score[1])
Normalization is a generic concept not limited only to deep learning or to Keras.
Why to normalize?
Let me take a simple logistic regression example which will be easy to understand and to explain normalization.
Assume we are trying to predict if a customer should be given loan or not. Among many available independent variables lets just consider Age and Income.
Let the equation be of the form:
Y = weight_1 * (Age) + weight_2 * (Income) + some_constant
Just for sake of explanation let Age be usually in range of [0,120] and let us assume Income in range of [10000, 100000]. The scale of Age and Income are very different. If you consider them as is then weights weight_1 and weight_2 may be assigned biased weights. weight_2 might bring more importance to Income as a feature than to what weight_1 brings importance to Age. To scale them to a common level, we can normalize them. For example, we can bring all the ages in range of [0,1] and all incomes in range of [0,1]. Now we can say that Age and Income are given equal importance as a feature.
Does Normalization always increase the accuracy?
Apparently, No. It is not necessary that normalization always increases accuracy. It may or might not, you never really know until you implement. Again it depends on at which stage in you training you apply normalization, on whether you apply normalization after every activation, etc.
As the range of the values of the features gets narrowed down to a particular range because of normalization, its easy to perform computations over a smaller range of values. So, usually the model gets trained a bit faster.
Regarding the number of epochs, accuracy usually increases with number of epochs provided that your model doesn't start over-fitting.
A very good explanation for Normalization/Standardization and related terms is here.
In a nutshell, normalization reduces the complexity of the problem your network is trying to solve. This can potentially increase the accuracy of your model and speed up the training. You bring the data on the same scale and reduce variance. None of the weights in the network are wasted on doing a normalization for you, meaning that they can be used more efficiently to solve the actual task at hand.
As #Shridhar R Kulkarni says, normalization is a general concept and doesn’t only apply to keras.
It’s often applied as part of data preparation for ML learning models to change numeric values in the dataset to fit a standard scale without distorting the differences in their ranges. As such, normalization enhances the cohesion of entity types within a model by reducing the probability of inconsistent data.
However, not every other dataset and use case requires normalization, it’s primarily necessary when features have different ranges. You may use when;
You want to improve your model’s convergence efficiency and make
optimization feasible
When you want to make training less sensitive to scale features, you can better
solve coefficients.
Want to improve analysis from multiple models.
Normalization is not recommended when;
-Using decision tree models or ensembles based on them
-Your data is not normally distributed- you may have to use other data pre-
processing techniques
-If your dataset comprises already scaled variables
In some cases, normalization can improve performance. However, it is not always necessary.
The critical thing is to understand your dataset and scenario first, then you’ll know whether you need it or not. Sometimes, you can experiment to see if it gives you good performance or not.
Check out deepchecks and see how to deal with important data-related checks you come across in ML.
For example, to check duplicated data in your set, you can use the following code detailed code
from deepchecks.checks.integrity.data_duplicates import DataDuplicates
from deepchecks.base import Dataset, Suite
from datetime import datetime
import pandas as pd
I think there are some issue with the convergence of the optimizer function too. Here i show a simple linear regression. Three examples:
First with an array with small values and it works as expected.
Second an array with bigger values and the loss function explodes toward infinity, suggesting the need to normalize. And at the end in model 3 the same array as case two but it has been normalized and we get convergence.
github colab enabled ipython notebook
I've use the MSE optimizer function i don't know if other optimizers suffer the same issues.

Accuracy remains zero while training LSTM in keras

I am trying to train LSTM, but while training accuracy remains zero in each epoch.
I have transformed data to multivariate Time-series data and also shape in three-dimensional shape.
I also have normalised data using minmaxsaller.
I have tried on a number of the epoch from 5 to 50 and batch size from 25 to 200.
I have tried data samples from 1000000 to 1000 but none is working.
Every time I am getting training accuracy zero only.
Can anyone help me in understanding it or suggest some more experiments.
Following is my network.
from keras.layers.core import Dense,Activation,Dropout
from keras.layers.recurrent import LSTM
from keras.models import Sequential
from keras.layers import Flatten
model = Sequential()
model.add(LSTM(50,return_sequences=True, input_shape=(X_train_values.shape[1], X_train_values.shape[2])))
model.add(Dropout(0.2))
model.add(Flatten())
model.add(Dense(1))
model.add(Activation('linear'))
model.compile(loss='mse',optimizer='rmsprop',metrics=['accuracy'])
history = model.fit(X_train_values, y_train.values,epochs=25, batch_size=30, verbose=2, shuffle=False)
me too, I'm a student from china, when I train LSTM model, the model's accuracy is very close zeros, but predicted answer and test collections is very close.
enter image description here
enter image description here

Resources