For image batch classification I've used code from this question Modifying the Caffe C++ prediction code for multiple inputs . But I noticed function that splits channels of image for some reason.
I guess it works for usual CNN architecture, but does it work for Siamese architecture? I guess not, since in C++ classification it doesn't work correctly.
Can someone explain, how should I change code for siamese architecture(that splits image by channels and gives those channels to different layers, that's the point), or at least how memory storage for input works, to figure it out myself?
Blob<float>* input_layer = net_->input_blobs()[0];
Figured it out, problem was in opencv function in Classifier::Preprocess
cv::split(img[i], channels);
because it is working only if cv::Mat contained in img was formed as 2D image with channels. If, as in my case, it was formed otherwise, with dimensions like (channels, width, height) or (width, height, channels) cv::split will not split cv::Mat as you expect to, so I've replaced this step with other realisation.
Related
So far I have trained my neural network is trained on the MNIST data set (from this tutorial). Now, I want to test it by feeding my own images into it.
I've processed the image using OpenCV by making the dimensions 28x28 pixels, turning it into grayscale, and using adaptive thresholding. Where do I proceed from here?
An 'image' is a 28x28 array of values from 0-1... so not really an image. Just greyscaling your original image will not make it fit for input. You have to go through the following steps.
Load your image into your programming langauge, with 784 rgb values representing pixels
For each rgb value, take the average of r, g and b. Then divide this value by 255. You will now have the greyscale of an image, a value between 0 and 1.
Replace the rgb values with the greyscale values
You will now have an image which looks like this (see the right array):
So you must do everything through your programming language. If you just greyscale an image with a photoeditor, the pixels will still be r,g,b.
You can use libraries like PIL, skimage that let you load the data into numpy arrays in python and also support many image operations like grayscaling, scaling etc.
After you have processed the image and read the data into numpy array you can then feed this to your network.
Can the Keras deal with input images with different size? For example, in the fully convolutional neural network, the input images can have any size. However, we need to specify the input shape when we create a network by Keras. Therefore, how can we use Keras to deal with different input size without resizing the input images to the same size? Thanks for any help.
Yes.
Just change your input shape to shape=(n_channels, None, None).
Where n_channels is the number of channels in your input image.
I'm using Theano backend though, so if you are using tensorflow you might have to change it to (None,None,n_channels)
You should use:
input_shape=(1, None, None)
None in a shape denotes a variable dimension. Note that not all layers
will work with such variable dimensions, since some layers require
shape information (such as Flatten).
https://github.com/fchollet/keras/issues/1920
For example, using keras's functional API your input layer would be:
For a RGB dataset
inp = Input(shape=(3,None,None))
For a Gray dataset
inp = Input(shape=(1,None,None))
Implementing arbitrarily sized input arrays with the same computational kernels can pose many challenges - e.g. on a GPU, you need to know how big buffers to reserve, and more weakly how much to unroll your loops, etc. This is the main reason that Keras requires constant input shapes, variable-sized inputs are too painful to deal with.
This more commonly occurs when processing variable-length sequences like sentences in NLP. The common approach is to establish an upper bound on the size (and crop longer sequences), and then pad the sequences with zeros up to this size.
(You could also include masking on zero values to skip computations on the padded areas, except that the convolutional layers in Keras might still not support masked inputs...)
I'm not sure if for 3D data structures, the overhead of padding is not prohibitive - if you start getting memory errors, the easiest workaround is to reduce the batch size. Let us know about your experience with applying this trick on images!
Just use None while specifying input shape. But I still do not know how to pass different-shaped images into fit function.
I modified the MNIST example and when I train it with my 3 image classes it returns an accuracy of 91%. However, when I modify the C++ example with a deploy prototxt file and labels file, and try to test it on some images it returns a prediction of the second class (1 circle) with a probability of 1.0 no matter what image I give it - even if it's images that were used in the training set. I've tried a dozen images and it consistently just predicts the one class.
To clarify things, in the C++ example I modified I did scale the image to be predicted just like the images were scaled in the training stage:
img.convertTo(img, CV_32FC1);
img = img * 0.00390625;
If that was the right thing to do, then it makes me wonder if I've done something wrong with the output layers that calculate probability in my deploy_arch.prototxt file.
I think you have forgotten to scale the input image during classification time, as can be seen in line 11 of the train_test.prototxt file. You should probably multiply by that factor somewhere in your C++ code, or alternatively use a Caffe layer to scale the input (look into ELTWISE or POWER layers for this).
EDIT:
After a conversation in the comments, it turned out that the image mean was mistakenly being subtracted in the classification.cpp file whereas it was not being subtracted in the original training/testing pipeline.
Are your train classes balanced?
You may get to a stacked network on a prediction of one major class.
In order to find the issue I suggest to output the train prediction during training compared to predictions with the forward example on same train images from a different class.
I'm implementing, for the first time, a sw for objects detection for static images. My first goal is to detect simple circles, then I'll move to more complex object. Unfortunately it seems I have problem when validating my classifier.
My choice was to use a HOG descriptor (using OpenCv) and a svm as classifier (using svmlight). The code compiles and works but there is something that sounds odd to me, probably concerning the svm.
I have:
a training set composed by 5 images 48x48px of different circles and 5 images 48x48px of non-circles (I know there are too few of them in order to have a solid classifier but, up to know, it's to test that everything works)
a test set composed by 4 images 48x48px (with circles as big as the ones used for the training) and 1 image much bigger (765x600px) with multiple size circles and other geometric forms.
What happens is that:
the circles in the test set are not detected when the images are 48x48, even if in the test set there are some images used in the training phase.
in the image 765x800 (which contains circles of any size) the circles which are of the same size of the training set, or bigger, are correctly identified.
I'm using the following parameters:
hog: winSize=48x48px, winStride=4x4px, cellSize=4px, blockSize=8px, blockStride=4x4px
classifier: svm regression with a linear classifier with C=0.01. (RBF results are worse than linear)
This is the api which performs the detections with the parameters I'm using.
vector<Rect> found;
double hitThreshold = 0.; // tolerance
Size padding(Size(32, 32));
double scale = 1.05;
int groupThreshold = 2;
hog.detectMultiScale(testImg, found, hitThreshold, win_stride, padding, scale, groupThreshold);
Is there any reason why the circles in the images 48x48px are not detected and the circles in the bigger image are detected? I expect 48x48px images to be correctly classified in order to validate the classifier. I have added the bigger image when nothing where detected in 48x48px images.
Besides, what sounds stranger is the fact that in the 48x48ps test set there are some images used in the training set and I think they must be identified, instead they are not! (I know that the training set and the test set must be different but I did that when nothing were detected.)
This is my first experience with hog descriptors and svm so it might not work because of a configuration error or the choice of the images..
Any help is welcome!
Thanks in advance :)
I'm trying to add noise and blur functions to my project in Cuda and after quite some research i've hit a bit of a stumbling block, I've read up on the Gaussian blur matrix but i'm still having trouble getting a working piece of code which would be able to blur certain parts of an image, I've managed to get a form of noise to show.
If anyone could give a bit of help in either explaining how to implement a Gaussian or a simpler blur method or even providing a bit of code which implements blurring.
Gratefully appreciated!!
Gaussian blur is a separable filter, so you can apply the 1D kernel first to all the rows in your ROI and then to the columns of the blurred rows.
The tricky part with CUDA is that this is a neighbourhood operation, so typically you will need to have each block overlap by half the kernel size in order to get the required neighbourhood pixels into shared memory.
FYI, these are two separate questions and should be asked separately in this site.
Regarding the blur - for large blur kernels (strong blurs) the best approach is to use the FFT on the image and on a Gaussian noise kernel image then multiply the results using the complex multiplication and inverse FFT that result. You will have to implement a FFT-Shift function yourself and if you are using color images, you will have to split the image into a separate buffer per channel.
For small blur kernels (gentile blurs) the simplest approach is for each pixel in the result image, sum nearby pixels in the source image (with a Gaussian weight function).
Regarding the noise - test easiest approach is to load a pre-generated pseudo-random generator's result image into CUDA after transforming it from uniformly distributed random numbers to normal distributed random numbers. E.g. this question.
The a correctly size region in the random image should be multiplied by the noise sigma and added to the source image to receive the result.
Last time I checked there was no random buffer generation solution for CUDA, however, that was a few years ago.
Update: CUDA now has cuRand so you should be able to generator random numbers instead of using a pregenerated random buffer.