I'm trying to use the MINST Caffe example via the C++ API, but I'm having a bit of trouble working out how to restructure the network prototxt file I'll deploy after training. I've trained and tested the model with the original file (lenet_train_test.prototxt), but when I want to deploy it and make predictions like in the C++ and OpenCV example, I realise I have to modify the input section to make it similar to the deploy.prototxt file they have.
Can I replace the information in the training and testing layers of the lenet_train_test.prototxt with this section of the deploy.prototxt file?
name: "CaffeNet"
input: "data"
input_shape {
dim: 10
dim: 3
dim: 227
dim: 227
}
The images I'll be passing for classification to the network will be grayscale and 24*24 pixels, and I'll also want to scale it like was done with the MINST dataset, so could I modify the section to this?
name: "CaffeNet"
input: "data"
input_shape {
dim: 10
dim: 1
dim: 24
dim: 24
}
transform_param {
scale: 0.00390625
}
I'm not entirely sure what the "dim: 10" is coming from though.
In order to "convert" you train_val prototxt to a deploy one you remove the input data layers (reading your train/val data) and replacing them with the declaration
name: "CaffeNet"
input: "data"
input_shape {
dim: 10
dim: 1
dim: 24
dim: 24
}
Note that the deploy prototxt does not have two phases for train and test only a single flavor.
Replacing the input data layer with this declaration basically tells caffe that you are responsible of supplying the data, and the net should allocate space for inputs of this size.
Regarding scale: once you deploy your net, the net has no control over the inputs - it does not read the data for you as the input data layers in the train_val net. Therefore, you'll have to scale the input data yourself before feeding it to the network. You can use the DataTransformer class to help you transform your input blobs in the same way they were transformed during training.
Regarding the first dim: 10: every Blob (i.e., data/parameters storage unit) in caffe has 4 dimensions: batch-size, channels, height and width. This parameter actually means the net should allocate space for batches of 10 inputs at a time.
The "magic" number 10 comes from the way googlenet and other competitors in ILSVRC challenge used to classify images: they classified 10 crops from each image and averaged the outputs to produce better classification results.
Related
I will try to explain what the problem is.
I have 5 materials, each composed of 3 different minerals of a set of 10 different minerals. For each material I have measured the inensity vs wavelength. And each Intensity vs wavelength vector can be mapped into a binary vector of ones and zeros corresponding to the minerals the material is composed of.
So material 1 has an intensity of [0.51 0.53 0.57 0.68...... ] measured at different wavelengths [470 480 490 500 510 ......] and a binary vector
[1 0 0 0 1 0 0 1 0 0]
and so on for each material.
For each material I have 5000 examples, so 25000 examples for all. Each example will have a 'similar' intensity vs wavelength behaviour but will give the 'same' binary vector.
I want to design a NN classifier so that if I give it as an input the intensity vs wavelength, it gives me the corresponding binary vector.
The intensity vs wavelength has a length of 450 so I will have 450 units in the input layer
the binary vector has a length of 10, so 10 output neurons
the hidden layer/s will have as a beginning 200 neurons.
Can I simly design a NN classifier this way, and would it solve the problem, or I need something else?
You can do that, however, be aware to use the right cost and output layer activation functions. In your case, you should use sigmoid units for your outer layer and binary-cross-entropy as a cost function.
Another way to go about this would be to use one-hot encoding so that you can use normal multi-class classification (will probably not make sense since your output is probably sparse).
I'm trying to build a CNN to classify dogs.In fact , my data set consists of 5 classes of dogs. I've 50 images of dogs splitted into 40 images for training and 10 for testing.
I've trained my network based on AlexNet pretrained model over 100,000 and 140,000 iterations but the accuracy is always between 20 % and 30 %.
In fact, I have adapted AlexNet to my problem as following : I changed the name of last fully connected network and num_output to 5. Also , I ve changed the name of the first fully connected layer (fc6).
So why this model failed even I' ve used data augmentation (cropping )?
Should I use a linear classification on top layer of my network since I have a little bit of data and similar to AlexNet dataset ( as mentioned here transfer learning) or my data set is very different of original data set of AlexNet and I should train linear classifier in earlier network ?
Here is my solver :
net: "models/mymodel/train_val.prototxt"
test_iter: 1000
test_interval: 1000
base_lr: 0.01
lr_policy: "step"
gamma: 0.1
stepsize: 100000
display: 20
max_iter: 200000
momentum: 0.9
weight_decay: 0.0005
snapshot: 1000
snapshot_prefix: "models/mymodel/my_model_alex_net_train"
solver_mode: GPU
Although you haven't given us much debugging information, I suspect that you've done some serious over-fitting. In general, a model's "sweet spot" for training is dependent on epochs, not iterations. Single-node AlexNet and GoogleNet, on an ILSVRC-style of data base, train in 50-90 epochs. Even if your batch size is as small as 1, you've trained for 2500 epochs with only 5 classes. With only 8 images per class, the AlexNet topology is serious overkill and is likely adapted to each individual photo.
Consider this: you have only 40 training photos, but 96 kernels in the first convolution layer and 256 in the second. This means that your model can spend over 2 kernels in conv1 and 6 in conv 2 for each photograph! You get no commonality of features, no averaging ... instead of edge detection generalizing to finding faces, you're going to have dedicated filters tuned to the individual photos.
In short, your model is trained to find "Aunt Polly's dog on a green throw rug in front of the kitchen cabinet with a patch of sun to the left." It doesn't have to learn to discriminate a basenji from a basset, just to recognize whatever is randomly convenient in each photo.
In my Caffe 'train.prototxt' I'm doing some input data transformation, like this:
transform_param {
mirror: true
crop_size: 321
mean_value: 104 # Red ?
mean_value: 116 # Blue ?
mean_value: 122 # Green ?
}
Now I want to store a modified version of my input images such that certain image regions are set to those mean values. The rational is that those regions are then set to 0 during mean subtraction. However I don't know what the order of channels is that caffe expects in such a prototxt file and I couldn't look it up in the caffe code either.
Does someone now whether the 3 values given above are in RGB or BGR order?
(I'm not sure since caffe is using opencv internally which stores images in the unusual BGR format)
https://groups.google.com/forum/#!topic/caffe-users/9opH6AW3Irw (answer by Evan Shelhamer):
[Mean] values are BGR for historical reasons -- the original CaffeNet training lmdb was made with image processing by OpenCV which defaults to BGR order.
I am trying to parameterise a 1D conv net via Torch.
Let's say I have a Tensor called data that is of dimensions 10 x 512, in that there are 10 rows and 512 columns. As such, I want to implement a single 3-layer stack of a TemporalConvolution layer, followed by ReLU, followed by TemporalMaxPooling. My classification problem is binary, and there is a corresponding labels tensor, which is 10 x 1. Let us assume that there is already written a feval to iterate through each row in both data and labels.
As such, the problem is to construct a net that can map from 512 columns down to 1 column
Adapted from the documentation:
...
model = nn.Sequential()
model:add(nn.TemporalConvolution(inputFrameSize, outputFrameSize, kW, [dW]))
model:add(nn.ReLU())
model:add(nn.TemporalMaxPooling(kW2, [dW2])
...
criterion = nn.BCECriterion()
...
I have parameterised it as follows, but the following doesn't work : /
TemporalConvolution(512,1,3,1)
ReLU())
TemporalMaxPooling(3, 1)
It throws the error: 2D or 3D(batch mode) tensor expected. As a result I tried to reshape data before passing it to the net:
data = data:resize(1, 100, 512)
But this throws the error: invalid input frame size.
I can see that the error concerns the shape of the data coming into the conv net and of course the parameterisation too. I am further confused by this post here which seems to suggest that inputFrameSize of TemporalConvolution should be set to 10 not 512.
Any guidance would be appreciated, as to how to build a 1D conv net.
P.S. I have tested the script with a logisticRegression model, and that runs, so the issue is purely with the conv net architecture / the shape of the data coming into it.
I guess you misunderstand the meaning of inputFrameSize, which is not the seqlen of your input but n_channels (e.g. for 512*512 RGB images in 2d-convlution, the inputFrameSize should be 3 not 512).
How can I use caffe convnet to detect facial expressions?
I have a image dataset, Cohn Kanade, and I want to train caffe convnet with this dataset. Caffe has a documentation site, but its not explain how to train my own data. Just with pre trained data.
Can someone teach me how to do it?
Caffe supports multiple formats for the input data (HDF5/lmdb/leveldb). It's just a matter of picking the one you feel most comfortable with. Here are a couple of options:
caffe/build/tools/convert_imageset:
convert_imageset is one of the command line tools you get from building caffe.
Usage is along the lines of:
specifying a list of images and label pairs in a text file. 1 row per pair.
specifying where the images are located.
Choosing a backend db (which format). Default is lmdb which should be fine.
You need to write up a text file where each line starts with the filename of the image followed by a scalar label (e.g. 0, 1, 2,...)
Construct your lmdb in python using Caffe's Datum class:
This requires building caffe's python interface. Here you write some python code that:
iterates through a list of images
loads the images into a numpy array.
Constructs a caffe Datum object
Assigns the image data to the Datum object.
The Datum class has a member called label you can set it to the AU class from your CK dataset, if that is what you want your network to classify.
Writes the Datum object to the db and moves on to the next image.
Here's a code snippet of converting images to an lmdb from a blog post by Gustav Larsson. In his example he constructs an lmdb of images and label pairs for image classification.
Loading the lmdb into your network:
This is done exactly like in the LeNet example. This Data layer at the beginning of the network prototxt that describes the LeNet model.
layer {
name: "mnist"
type: "Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
transform_param {
scale: 0.00390625
}
data_param {
source: "examples/mnist/mnist_train_lmdb"
batch_size: 64
backend: LMDB
}
}
The source field is where you point caffe to the location of the lmdb you just created.
Something more related to performance and not critical to getting this to work is specifying how to normalize the input features. This is done through the transform_param field. CK+ has fixed size images, so no need for resizing. One thing you do need though is normalize the grayscale values. You can do this through mean subtraction. A simple of doing this is to replace the value of transform_param:scale with the mean value of the gray scale intensities in your CK+ dataset.