I used pre-trained GoogLeNet and then fine tuned it on my dataset for binary classification problem. Validation dataset seems to give the "loss3/top1" 98.5%. But when I evaluating the performance on my evaluation dataset it gives me 50% accuracy. Whatever changes I did it train_val.prototxt, I did the same changes in deploy.prototxt and I am not sure what changes should I do in these lines.
name: "GoogleNet"
layer {
name: "data"
type: "input"
top: "data"
input_param { shape: { dim:10 dim:3 dim:224 dim:224 } }
}
Any suggestions???
You do not need to change anything further in your deploy.prototxt*, but in the way you feed the data to the net. You must transform your evaluation images in the same way you transformed your training/validation images.
See, for example, how classifier.py puts the input images through a properly initialized caffe.io.Transformer class.
The "Input" layer you have in the prototxt is merely a declaration for caffe to allocate memory according to an input blob of shape 10-by-3-by-224-by-224.
* of course, you must verify that train_val.prototxt and deploy.prototxt are exactly the same (apart from the input layer(s) and loss layer(s)): that includes making sure layer names are identical as caffe uses layer names to assign weights from 'caffemodel' file to the actual parameters it loads. Mismatching names will cause caffe to use random weights for some of the layers.
Related
I am trying to train an image classifier on an unbalanced training set. In order to cope with the class imbalance, I want either to weight the classes or the individual samples. Weighting the classes does not seem to work. And somehow for my setup I was not able to find a way to specify the samples weights. Below you can read how I load and encode the training data and the two approaches that I tried.
Training data loading and encoding
My training data is stored in a directory structure where each image is place in the subfolder corresponding to its class (I have 32 classes in total). Since the training data is too big too all load at once into memory I make use of image_dataset_from_directory and by that describe the data in a TF Dataset:
train_ds = keras.preprocessing.image_dataset_from_directory (training_data_dir,
batch_size=batch_size,
image_size=img_size,
label_mode='categorical')
I use label_mode 'categorical', so that the labels are described as a one-hot encoded vector.
I then prefetch the data:
train_ds = train_ds.prefetch(buffer_size=buffer_size)
Approach 1: specifying class weights
In this approach I try to specify the class weights of the classes via the class_weight argument of fit:
model.fit(
train_ds, epochs=epochs, callbacks=callbacks, validation_data=val_ds,
class_weight=class_weights
)
For each class we compute weight which are inversely proportional to the number of training samples for that class. This is done as follows (this is done before the train_ds.prefetch() call described above):
class_num_training_samples = {}
for f in train_ds.file_paths:
class_name = f.split('/')[-2]
if class_name in class_num_training_samples:
class_num_training_samples[class_name] += 1
else:
class_num_training_samples[class_name] = 1
max_class_samples = max(class_num_training_samples.values())
class_weights = {}
for i in range(0, len(train_ds.class_names)):
class_weights[i] = max_class_samples/class_num_training_samples[train_ds.class_names[i]]
What I am not sure about is whether this solution works, because the keras documentation does not specify the keys for the class_weights dictionary in case the labels are one-hot encoded.
I tried training the network this way but found out that the weights did not have a real influence on the resulting network: when I looked at the distribution of predicted classes for each individual class then I could recognize the distribution of the overall training set, where for each class the prediction of the dominant classes is most likely.
Running the same training without any class weight specified led to similar results.
So I suspect that the weights don't seem to have an influence in my case.
Is this because specifying class weights does not work for one-hot encoded labels, or is this because I am probably doing something else wrong (in the code I did not show here)?
Approach 2: specifying sample weight
As an attempt to come up with a different (in my opinion less elegant) solution I wanted to specify the individual sample weights via the sample_weight argument of the fit method. However from the documentation I find:
[...] This argument is not supported when x is a dataset, generator, or keras.utils.Sequence instance, instead provide the sample_weights as the third element of x.
Which is indeed the case in my setup where train_ds is a dataset. Now I really having trouble finding documentation from which I can derive how I can modify train_ds, such that it has a third element with the weight. I thought using the map method of a dataset can be useful, but the solution I came up with is apparently not valid:
train_ds = train_ds.map(lambda img, label: (img, label, class_weights[np.argmax(label)]))
Does anyone have a solution that may work in combination with a dataset loaded by image_dataset_from_directory?
I am using caffe with the HDF5 layer. It will read my hdf5list.txt as
/home/data/file1.h5
/home/data/file2.h5
/home/data/file3.h5
In each file*.h5, I have 10.000 images. So, I have about 30.000 images in total. In each iteration, I will use batch size is 10 as the setting
layer {
name: "data"
type: "HDF5Data"
top: "data"
top: "label"
hdf5_data_param {
source: "./hdf5list.txt"
batch_size: 10
shuffle: true
}
include {
phase: TRAIN
}
}
Using caffe, Its output likes
Iterations 10, loss=100
Iterations 20, loss=90
...
My question is that how to compute the a number of epoch, respect to the loss? It means I want to plot a graph with x-axis is number of epoch and y-asix is loss.
Related link: Epoch vs iteration when training neural networks
If you want to do this for just the current problem, it is super easy. Note that
Epoch_index = floor((iteration_index * batch_size) / (# data_samples))
Now, in solver.cpp, find the line where Caffe prints Iterations ..., loss = .... Just compute epoch index using the above formula and print that too. You are done. Do not forget to recompile Caffe.
If you want to modify Caffe so that it always shows the epoch index, then you will first need to compute the data size from all your HDF5 files. By glancing the Caffe HDF5 layer code, I think you can get the number of data samples by hdf_blobs_[0]->shape(0). You should add this up for all HDF5 files and use that number in solver.cpp.
The variable hdf_blobs_ is defined in layers/hdf5_data_layer.cpp. I believe it is populated in the function util/hdf5.cpp. I think this is how the flow goes:
In layers/hdf5_data_layer.cpp, the hdf5 filenames are read from the text file.
Then a function LoadHDF5FileData attempts to load the hdf5 data into blobs.
Inside LoadHDF5FileData, the blob variable - hdf_blobs_ - is declared and it is populated inside the function util/hdf5.cpp.
Inside util/hdf5.cpp, the function hdf5_load_nd_dataset first calls hdf5_load_nd_dataset_helper that reshapes the blobs accordingly. I think this is where you will get the dimensions of your data for one hdf5 file. Iterating over multiple hdf5 files is done in the void HDF5DataLayer<Dtype>::Next() function in layers/hdf5_data_layer.cpp. So here you need to add up the data dimensions received earlier.
Finally, you need to figure out how to pass them back till solver.cpp.
I'm trying to reshape the size of a convolution layer of a caffemodel (This is a follow-up question to this question). Although there is a tutorial on how to do net surgery, it only shows how to copy weight parameters from one caffemodel to another of the same size.
Instead I need to add a new channel (all 0) to my convolution filter such that it changes its size from currently (64x3x3x3) to (64x4x3x3).
Say the convolution layer is called 'conv1'. This is what I tried so far:
# Load the original network and extract the fully connected layers' parameters.
net = caffe.Net('../models/train.prototxt',
'../models/train.caffemodel',
caffe.TRAIN)
Now I can perform this:
net.blobs['conv1'].reshape(64,4,3,3);
net.save('myNewTrainModel.caffemodel');
But the saved model seems not to have changed. I've read that the actual weights of the convolution are stored rather in net.params['conv1'][0].data than in net.blobs but I can't figure out how to reshape the net.params object. Does anyone have an idea?
As you well noted, net.blobs does not store the learned parameters/weights, but rather stores the result of applying the filters/activations on the net's input. The learned weights are stored in net.params. (see this for more details).
AFAIK, you cannot directly reshape net.params and add a channel.
What you can do, is have two nets deploy_trained_net_with_3ch.prototxt and deploy_empty_net_with_4ch.prototxt. The two files can be almost identical apart from the input shape definition and the first layer's name.
Then you can load both nets to python and copy the relevant part:
net3ch = caffe.Net('deploy_trained_net_with_3ch.prototxt', 'train.caffemodel', caffe.TEST)
net4ch = caffe.Net('deploy_empty_net_with_4ch.prototxt', 'train.caffemodel', caffe.TEST)
since all layer names are identical (apart from conv1) net4ch.params will have the weights of train.caffemodel. As for the first layer, you can now manually copy the relevant part:
net4ch.params['conv1_4ch'][0].data[:,:3,:,:] = net3ch.params['conv1'][0].data[...]
and finally:
net4ch.save('myNewTrainModel.caffemodel')
In Caffe when you are defining your inputs for the NN in the protobuf file, you can input "data" and "label". I'm guessing label contains the expected output for training data (what it is normally considered the y values in Machine Learning literature).
My problem is that in the caffe.proto file, label is defined as a scalar (int or long). At least with data, I can set it to an numpy array, because it takes String values. If I'm training for more than one prediction output, how could I pass it as an array?
Or am I mistaken? What is label? What is it for? And how can I pass the y values to caffe?
The basic use case of caffe used to be image classification: assigning a single integer label per input image. Thus, the "datum" data structure reserves space for a 4D float array (batches of 3 channels images) and an integer "label" per image in the batch.
This restriction can be easily overcome using HDF5 input data layer.
See e.g., this answer.
How can I use caffe convnet to detect facial expressions?
I have a image dataset, Cohn Kanade, and I want to train caffe convnet with this dataset. Caffe has a documentation site, but its not explain how to train my own data. Just with pre trained data.
Can someone teach me how to do it?
Caffe supports multiple formats for the input data (HDF5/lmdb/leveldb). It's just a matter of picking the one you feel most comfortable with. Here are a couple of options:
caffe/build/tools/convert_imageset:
convert_imageset is one of the command line tools you get from building caffe.
Usage is along the lines of:
specifying a list of images and label pairs in a text file. 1 row per pair.
specifying where the images are located.
Choosing a backend db (which format). Default is lmdb which should be fine.
You need to write up a text file where each line starts with the filename of the image followed by a scalar label (e.g. 0, 1, 2,...)
Construct your lmdb in python using Caffe's Datum class:
This requires building caffe's python interface. Here you write some python code that:
iterates through a list of images
loads the images into a numpy array.
Constructs a caffe Datum object
Assigns the image data to the Datum object.
The Datum class has a member called label you can set it to the AU class from your CK dataset, if that is what you want your network to classify.
Writes the Datum object to the db and moves on to the next image.
Here's a code snippet of converting images to an lmdb from a blog post by Gustav Larsson. In his example he constructs an lmdb of images and label pairs for image classification.
Loading the lmdb into your network:
This is done exactly like in the LeNet example. This Data layer at the beginning of the network prototxt that describes the LeNet model.
layer {
name: "mnist"
type: "Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
transform_param {
scale: 0.00390625
}
data_param {
source: "examples/mnist/mnist_train_lmdb"
batch_size: 64
backend: LMDB
}
}
The source field is where you point caffe to the location of the lmdb you just created.
Something more related to performance and not critical to getting this to work is specifying how to normalize the input features. This is done through the transform_param field. CK+ has fixed size images, so no need for resizing. One thing you do need though is normalize the grayscale values. You can do this through mean subtraction. A simple of doing this is to replace the value of transform_param:scale with the mean value of the gray scale intensities in your CK+ dataset.