System freezes while running Tensorflow model training code - machine-learning

I am trying to train model for handwritten digits. My data consists of images, each of size 80X80. So I get 6400 features as inputs.
While trying to train model as per code used in https://www.kaggle.com/kakauandme/digit-recognizer/tensorflow-deep-nn, my systems hangs after 200 iterations with training accuracy 0.06.
Why is this happening? I don't get any errors. My system just freezes.
Also, how do I set pooling and convolution layers parameters? Is the problem related to these parameters?
PS: I'm not using GPU.

Related

Test accuracy vs Training time on Weka

From what I know, test accuracy should increase when training time increase(up to some point); but experimenting with weka yielded the opposite. I am wondering if misunderstood someting.
I used diabetes.arff for classification with 70% for training and 30% for testing. I used MultilayerPerceptron classifier and tried training times 100,500,1000,3000,5000.
Here are my results,
Training time Accuracy
100 75.2174 %
500 75.2174 %
1000 74.7826 %
3000 72.6087 %
5000 70.4348 %
10000 68.6957 %
What can be the reason for this? Thank you!
You got a very nice example of overfitting.
Here is the short explanation of what happened:
You model (doesn't matter whether this is multilayer perceptron, decision trees or literally anything else) can fit the training data in two ways.
First one is a generalization - model tries to find patterns and trends and use them to make predictions. The second one is remembering the exact data points from the training dataset.
Imagine the computer vision task: classify images into two categories – humans vs trucks. The good model will find common features that are present in human pictures but not in the trucks pictures (smooth curves, skin-color surfaces). This is a generalization. Such model will be able to handle new pictures pretty well. The bad model, overfitted one, will just remember exact images, exact pixels of the training dataset and will have no idea what to do with new images on the test set.
What can you do to prevent overfitting?
There are few common approaches to deal with overfitting:
Use simpler models. With fewer parameters, it will be difficult for a model to remember the dataset
Use regularization. Constrain the weights of the model and/or use dropout in your perceptron.
Stop the training process. Split your training data once more, so you will have three parts of the data: training, dev, and test. Then train your model using training data only and stop the training when the error on the dev set stopped decreasing.
The good starting point to read about overfitting is Wikipedia: https://en.wikipedia.org/wiki/Overfitting

Noisy validation loss in Keras when using fit_generator

Any idea about why our training loss is smooth and our validation loss is that noisy (see the link) across epochs? We are implementing a deep learning model for diabetic retinopathy detection (binary classification) using the data set of fundus photographs provided by this Kaggle competition. We are using Keras 2.0 with Tensorflow backend.
As the data set is too big to fit in memory, we are using fit_generator, with ImageDataGenerator randomly taking images from training and validation folders:
# TRAIN THE MODEL
model.fit_generator(
train_generator,
steps_per_epoch= train_generator.samples // training_batch_size,
epochs=int(config['training']['epochs']),
validation_data=validation_generator,
validation_steps= validation_generator.samples // validation_batch_size,
class_weight=None)
Our CNN architecture is VGG16 with dropout = 0.5 in the last two fully connected layers, batch normalization only before the first fully connected layer, and data augmentation (consisting on flipping the images horizontally and vertically). Our training and validation samples are normalized using the training set mean and standard deviation. Batch size is 32. Our activation is a sigmoid and the loss function is the binary_crossentropy. You can find our implementation in Github
It definitely has nothing to do with overfitting, as we tried with a highly regularized model and the behavior was quite the same. Is it related with the sampling from the validation set? Has any of you had a similar problem before?
Thanks!!
I would look, in that order:
bug in validation_generator implementation (incl. steps - does it go through all pics reserved for validation?)
in validation_generator, do not use augmentation (reason: an augmentation might be bad, not learnable, and at train, it does achieve a good score only by hard-coding relationships which are not generalizable)
change train/val split to 50/50
calculate, via a custom callback, the validation loss at the end of the epoch (use the same function, but calling it with a callback produces different (more accurate, at certain, non-standard models) results)
If nothing of the above gives a more smooth validation loss curve, then my next assumption would be that this is the way it is, and I might need to work on the model architecture

Shuffling batches of data in training neural networks

I have 6.5 GB worth of training data to be used for my GRU network. I intend to split the training time, i.e. pause and resume training since I am using a laptop computer. I'm assuming that it will take me days to train my neural net using the whole 6.5 GB, so, I'll be pausing the training and then resume again at some other time.
Here's my question. If I will shuffle the batches of training data, will the neural net remember which data has been used already for training or not?
Please note that I'm using the global_step parameter of tf.train.Saver().save.
Thank you very much in advance!
I would advise you to save your model at certain epoch,lets say you have 80 epochs,it would be wise to save your model at each 20epochs(20,40,60)but again this will depend on the capacity of your laptop,the reason is that at one epoch,your network will have seen all the datasets(training set).If your whole dataset can't be processed in a single epoch,i would advise you to randomly sample from your whole dataset what will be the training set.The whole point of shuffling is to let the network do some generalization over the whole dataset and it is usually done on either batch or selecting training dataset,or starting a new training epoch.As for your main question,its definetly ok to shuffle bacthes when training and resuming.Shuffling batches ensures that the gradients are calculated along the batch instead of over one image

how to load reference .caffemodel for training

I'm using alexnet to train my own dataset.
The example code in caffe comes with
bvlc_reference_caffenet.caffemodel
solver.prototxt
train_val.prototxt
deploy.prototxt
When I train with the following command:
./build/tools/caffe train --solver=models/bvlc_reference_caffenet/solver.prototxt
I'd like to start with weights given in bvlc_reference.caffenet.caffemodel.
My questions are
How do I do that?
Is it a good idea to start from the those weights? Would this converge faster? Would this be bad if my data are vastly different from the Imagenet dataset?
1.
In order to use existing .caffemodel weights for fine-tuning, you need to use --weights command line argument:
./build/tools/caffe train --solver=models/bvlc_reference_caffenet/solver.prototxt --weights=models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel
2.
In most cases fine-tuning a net is quite a recommended practice, even when the input images are quite different than "imagenet" photos.
However, you should note that when training for the original weights you are about to use, some (very reasonable) assumptions were made. You should decide whether these assumptions are still true for your task.
For instance, most nets were trained with simple data augmentation using an image and its horizontal flip. However, if your task is to distinguish between images that are flipped you will find it very difficult to fine tune.

ResNet - NaN during classification (not training) on same dataset used for validation

Some very quick background:
Training on Ubuntu 16.04 LTS with latest Caffe branch on GTX 1080
Classification on OSX
I'm wondering if anyone can shed light on this issue. I'm finetuning ResNet-50 using caffe from here. I'm experimenting with a small dataset. For training, I have approximately 1248 images and for validation, I have 1656. I am going through the entire validation set each time I test during training.
Now, during training, I am not seeing any NaNs or anything that would indicate gradient blow up or something like that. No NaNs, infs, or anything strange. When I load the net and try to classify the same images used for validation during training, NaNs show up. Any idea why this is? Could it be the difference between the GPU architecture vs the CPU architecture in carrying out the calculations in training vs classification?
[EDIT: I'm not doing any preprocessing]
[EDIT2: It seems stochastic. If I run the same image through again, the network will properly output a probability or it will output NaN again. My solution currently is to run the same image through the network until a non-NaN shows up, but sometimes, this takes awhile]
Maybe different type of preprocessing. You get NaN output for NaN input. Therefore if you somehow divide by 0 with your preprocessing (Normalization?) this can occur.

Resources