CNN for detecting objects within arbitrary boundary - machine-learning

I want to detect if there are objects within a boundary.
The boundary is white rectangle, and easily identifiable (for a human).
However, the position of the boundary is not fixed.
The objects are small, and usually only 1 or 2 present in the boundary - but are visible.
The sample images are only labelled with 1 if any object is in the boundary, 0 if not. In particular, I don't have the boundary as a label.
What is a good architecture for classifier of such images? Are layers of CNN + MaxPooling my best bet?

Related

Extract letters from image

I have a binary image (only 0 and 255 pixels) like the one below.
I want to extract bounding boxes around the letters such as A,B,C and D. The image is large (around 4000x4000) and the letters can be quite small (like B and D above). Moreover, the characters are broken. That is, there are gaps of black pixels within the outline of a character (such as A below).
The image has white noise, which are like streaks of white lines, scattered around the image.
What I have tried -
Extracting contours - The issue is that, for broken characters (like "A"), multiple disconnected contours are obtained for a character. I am not able to obtain a contour for the entire character.
Dilation to join edges - This solves the disconnected contours (for large characters) to a certain extent. However, with dilation, I lose a lot of information about smaller characters which now appear like blocks of white pixels.
I thought of clustering similar pixels but am not able to come up with a well defined solution.
I kindly request for some ideas! Thanks.
How about this procedure?
Object detection (e.g. HOG algorithm): Gives you multiple objects
Resize obtained objects to equal size (e.g. 28x28 like MNIST dataset)
Character classification (e.g. SVM, kNN, deep learning)
The detail is up to you for each process.
+) Search an example of MNIST recognition. The MNIST dataset is a handwritten digit dataset. There are lots of examples about it. (Even for noisy MNIST)

Are convolutional layers necessary for Deep Q Networks?

I'm currently trying to build a deep Q network to play the classic Snake game. I designed the game in such a way that the state space is confined to a 20 x 20 matrix, with 1's representing a square occupied by a body, 2 representing a square occupied by the head, and 5 representing a square occupied by food. Given the fact that the space is relatively small, would it be feasible to have the network input be a 400 dimensional vector instead of a raw image?
You could try a 400-dimensional vector and look at the performance of the agent(learning). If it doesn't improve, then you should try a CNN. But in my opinion, a 400-dimensional vector should work.
Also, make sure that you normalise the state inputs to [0,1] or [-1,1] before feeding it to the neural network.

Find rotational degree of an object by Two Dimensional Cross Correlation

[Background]
I am learning about two dimensional cross correlations (2DCC) to find how it can be applied to my current project. My current project is to make an efficient method to find displacement and rotation of an object on 2D surface.
To do this, I did the following experiment on my Software environment.
[Experiment]
I have one 1024x1024 image with a duck located at its center.
1024x1024 image with a duck located at center
I have another 1024x1024 image with a duck located not at its center and rotated.
1024x1024 image with a duck at 752, 336 rotated 123
Then, apply the two images to two dimensional cross correlation.
I got the following amplitude and its peak index of cross correlation result almost matches to the duck displacement of my 2nd image.
amplitude result of 2D Cross Correlation
On the other hand, phase value at the index of corresponding amplitude peak does not match the duck rotation of my 2nd image. The expected phase value is 123[degree] as seen in 2nd image, but the actual value is very small.
phase result of 2D Cross Correlation
[Question]
Is two dimensional cross correlation a right approach to recover the rotational degree of duck in this situation?
The cross-correlation only yields a shift. What it does is compare the two images at all possible translations.
You computed the cross-correlation through the Fourier domain, and obtained a result that has very small imaginary values. These are the result of numerical inaccuracies, and should be ignored. The cross-correlation of two real-valued signals (or images) is a real-valued signal (or image).
To find the rotation, you should probably look into the Fourier-Mellin transform.

Bounding box estimation in neural network output

I am working on a convolution neural network to identify objects like animals, vehicles, trees etc. One of the class for detection is auto. when I gave the image to the network , it predicted as auto . But I need to draw a bounding box around the object. When I tried the sliding window fashion, I got a lot of bounding boxes, but I need only one. How to find out the most appropriate bounding box of an object after neural network prediction?? Don't we need some method to localise the objects from a large image? That is what I want.
My final layer function is a logistic regression function, where it predicts only 1 or 0. I don't know how to make that prediction to a probability score. If I had a probability score of each box, then it was so easy to find out the most appropriate box. Please suggest me some methods for finding the same. Thanks in advance. All answers are welcome.
INPUT, OUTPUT AND EXPECTED OUTPUT
It's not clear if you have a single object in your input image or several. Your example shows one.
If you have ONE object, here are some options to consider for the bounding boxes:
Keep most distant ones: Keep the top, bottom, right, left boundaries that are most distant from the center of all the boundary boxes.
Keep the average ones: E.g. Take all the top boundaries and keep their average location. Repeat the same with all the bottom, right, and left boundaries.
Keep the median ones: Same as the average, but keep the median of each direction boundary instead.
Keep the boundary box with largest activation: You're using logistic regression as the final step, find the input that goes into that logistic layer, and keep the bounding box that has the largest input to the logistic layer.

Opencv Haar Cascade training / detection for simple objects

I am planning on making a cascade detector for a white cup, a red ball, and a blue puck. With how simple these objects are in their shape, I was wondering if there are any parameter differences I will have to have in the training vs finding complex objects such as cars / faces? Also, within the training pos images I have the objects in different lighting conditions and instances where the objects are under shadow.
For training negative images I noticed the image sizes may vary. However, for positive images they MUST be a fixed size.
I plan on using 100x100 pos images to help detect the objects from 20-30 feet, the 200x200 pos images to detect the objects when I am within 5ft / am directly overhead of the object (3 ft off the ground appx). Does this mean that I will have to train 6 different XMLs? 2 for each object as it is trained for 100x100 and 200x200?
Short answer: Yes
Long Answer: Probably:
You have to think about it like this, the classifier is going to build up a set of features for the positive images and then use these to determine whether your detection image is the same or not. If you are drastically moving the angle of your detection, then you are going to need a different classifier.
Let me example with pictures:
If at 20ft away your cup looks like this:
with associated background/lighting etc, then it is going to be a very different classifier if your cup looks like this(maybe 5ft away but different angle):
Now, with all that being said, if you only have larger and smaller versions of your cup, then you may only need one. However you will need a different classifier for each object (cup/ball/puck)
Images not mine - Taken from Google

Resources