Fully Connected Layers for segmentation with corner points - machine-learning

I am working on a segmentation problem, for simplicity I am using rectangular masks. At the beginning I am using the vectorized images as an input data and corner points of the rectangle [x_min, y_min, x_max, y_max] as a vector as an output.
I am trying to use fully connected NN with 10 hidden layers (I also changed results seems same) but it does not converge! I see that the loss function decreases from 3000 to 160 but stops there.
From those who are experienced, I will appreciate your kind comments!
Thanks

Related

How to normalize position of the elements on the picture [OpenCV]

I am currently working on program which could help at my work. I'm trying to use Machine Learning for the classification purpose. The problem is that I don't have enough samples for training the model and augmentation is something I'm trying to avoid because hardware problems (not enough RAM) either on my company laptop and on the Google Collab. So I decided to try to somehow normalize the position of the elements so the differences would be visible for the machine even with no big amount of different samples. Unfortunately now I'm struggling how to normalize those pictures.
Element 1a:
Element 1b:
Element 2a:
Element 2b:
Elements 1a and 1b are the same type and 2a - 2b are the same type. Is there a way to somehow normalize position for those pictures (something like position 0) which would help the algorithm to see differences between them? I've tried using cv2.minAreaSquare to get the square position, rotating them and cropping don't needed area but unfortunately those elements can have different width so after scaling them down the contours are deformed unevenly. Then I was trying to get symmetry axis and using this to do a proper cropping after rotation but still the results didn't meet my expectations. I was thinking to add more normalization points like this:
Normalization Points:
And using this points normalize position of the rest of my elements but Perspective Transform takes only 4 points and with 4 points its also not very good methodology. Maybe you guys know a way how to move those elements to have them in the same positions.
Seeing the images, I believe that the transformation between two pictures is either an isometry (translation + rotation) or a similarity (translation + rotation + scaling). These can be determined with just two points. (Perspective takes four points but I think that this is overkill.)
But for good accuracy, you must make sure that the points are found reliably and precisely. In the first place, you need to guess which features of the shapes are repeatable from one sample to the next.
For example, you might estimate that the straight edges are always in the same relative position. In such a case, I would recommend finding two points on some edges, drawing a line between them and find intersections between the lines.
In the illustration, you find edge points along the red profiles, and from them you draw the green lines. They intersect in the yellow points.
For increased accuracy, you can use a least-squares approach to find a best fit on more than two points.

Traffic Signal Detection with Cascade Classifier with OpenCV

I am working on " Controlling Raspberry Pi's GPIO pins according to change in traffic lights (Red, Green, Yellow)". Right now, I am focusing only on Traffic light detection part. For that, I am using Cascade classifier for Haar features.
I have 2000 negative sample images, which I have converted to grayscale and reshaped to 120 X 120. Also, I have ONE positive image of traffic signal (40 X 120), from which I am generating 2000 positive samples. And finally, I am training my classifier using 2000 positive samples and 1000 negative samples with 10 stages.
My output for some test images looks like following:
Output 1
output 2
Output 3
Image from which I created positive samples:
positive image
I have some questions/doubts and need some suggestions to improve or modify my classifier.
1) Do I need to use more than one image as positive image to create samples?
2) Why I am not able to detect all the traffic signals in above images?
3) Am I doing wrong in image shape or anything?
4) Please correct me in this point if I am wrong - To draw a rectangle over traffic signal, I am using cv2.rectangle function and provided constant height/width parameter, and thats the ONLY reason why it is drawing a big rectangle regardless of how near/far my traffic signal is in an image! Any suggestions to change this size dynamically?
Thank you.
To me, it looks like your network has not learned enough.
1) I strongly suggest taking 20-50 samples of traffic lights, instead of one sample. You still can generate thousands of samples using them, for training.
2) most likely because of inadequate training, but you should also check the parameters in the detection stage. What are the minimum and maximum sizes that you have set for detection?
3) You don't have to re-shape or re-size the image, so that should not be a problem.
The detector returns the position (x,Y) and the size (width, height) of all objects that were detected. So you should be able to change the size dynamically instead of using constant width and height. Please refer to the opencv example of Haar Face Detection, of the language of your choice.

Find angle form set of images

I have set of 2000 plane images similar to image below. Plane has different angle on every image. Image size is 512x512 and in every image is always this same plane.
My goal is to find angle on image which is not in test set.
So far I tried:
Harris corner detection, but in every image Harris gives me differnt
amount of points, event for images with very similar position.
Hough Lines Transform to find the longest line and get inclination to the axis X.
Corelation - this method gives the best results, but it take really long time and angels are only rough.
Neural network
Back porpagation to train image from Harris points and hough lines transform, but without any success.
I so 3D object in STP file, but I have no idea how to use it, to solve my problem.
It would be nice to get any sugestion of method, article or example.
In my experience, a convolutional neural network (CNN) will help you a great deal here. The performance will be great at detecting angles.
But here is the problem, depending of how you define the output to be and the number of layers (no more than three should be enough), the training can be very costly. For example, you could have one single output that could give you a real number which indicates the angle. Training this should be costly, but it is normal in CNNs. However, if you say you want to have 360 outputs (one for each angle in a 360 degree system), in that case the training will be a very painful and unpleasant long experience; the performance could be better, but not significantly.
(I wanted to write this as a comment to your question first, but I don't have enough reputation to do that yet, sorry.)

How should I setup my input neurons to recieve my input

I need to be able to determine if a shape was drawn correctly or incorrectly,
I have sample data for the shape, that holds the shape and the order of pixels (denoted by the color of the pixel)
for example, you can see of the downsampled image and color variation
I'm having trouble figuring out the network I need to define that will accept this kind of input for training.
should I convert the sampledown image to a matrix and input it? let's say my image is 64x64, I would need 64x64 input neurons (and that's if I ignore the color of the pixels, I think) is that feasible solution?
If you have any guidance, I could use it :)
I gave you an example as below.
It is a binarized 4x4 image of letter c. You can either concatenate the rows or columns. I am concatenating by columns as shown in the figure. Then each pixel is mapped to an input neuron (totally 16 input neurons). In the output layer, I have 26 outputs, the letters a to z.
Note, in the figure, I did not connect all nodes from layer i to layer i+1 for simplicity, which you probably should connect all.
At the output layer, I highlight the node of c to indicate that for this training instance, c is the target label. The expected input and output vector are listed in the bottom of the figure.
If you want to keep the intensity of color, e.g., R/G/B, then you have to triple the number of inputs. Each single pixel is replaced with three neurons.
Hope this helps more. For a further reading, I strongly suggest the deep learning tutorial by Andrew Ng at here - UFLDL. It's the state of art of such image recognition problem. In the exercise with the tutorial, you will be intensively trained to preprocess the images and work with a lot of engineering tricks for image processing, together with the interesting deep learning algorithm end-to-end.

which algorithm to choose for object detection?

I am interested in detecting single object more precisely a fire extinguisher which has no inter class variability (all fire extinguisher looks same). However, The application is supposedly realtime i.e a robot is exploring the environment and whenever it sees the object of interest it should be able to detect it and give pixel coordinates of it.
My question is which algorithm will be good choice for this task?
1. Is this a classification problem and should we use features(sift/surf etc) + bow +svm?
2. some other solution (no idea yet).
Any kind of input will be appreciated.
Thanks.
(P.S bear with me i am newbie to computer vision and stack over flow)
update1:
Height varies all are mounted on the wall but with different height. I tried with SIFT features and bow but it is expensive to extract bow descriptors in testing part. Moreover I have no idea how to locate the object(pixel coordinates) inside the image after its been classified positive.
update 2:
I finally used sift + bow + svm and am able to classify the object. But using this technique, i only get output interms of whether the object is present in the scene or not?
How can i detect the object i.e getting the bounding box or centre of the object. what is the compatible approach with the above method for achieving these results.
Thank you all.
I would suggest using color as the main feature to look for, and only try other features as needed. The fire extinguisher red is very distinctive, and should not occur too often elsewhere in an office environment. Other, more computationally expensive tests can then be performed only in regions of the right color.
Here is a good tutorial for color detection that also explains how to find good thresholds for your desired color.
I would suggest the following approach:
denoise your image with a median filter
convert the image to HSV format (Hue, Saturation, Value)
select pixels close to that particular shade of red with InRange()
Now you have a binary image image that contains only the pixels that are red.
count the number of red pixels with CountNonZero()
If that number is too small, abort
remove noise from the binary image by morphological opening / closing
find contours of all blobs in your picture with findContours or the CvBlob library
check if there are blobs of the correct width, correct height and correct width/height ratio
since your fire extinguishers are vertical cylinders, the width/height ratio will be constant from every angle. The width and height will of course vary somewhat with distance to the camera.
if the width and height do not match, abort
repeat these steps to find the black-colored part on the bottom of the extinguisher,
abort if there is no black region with correct width/height below the red region
(perhaps also repeat these steps for the metallic top and the yellow rectangle)
These tests should all be very fast. If they are too slow, you could reduce the resolution of your input images.
Depending on your environment, it is possible that this is already a robust enough test. If not, you can proceed with sift/surf feature matching, but only in a small region around the blobs with the correct color. You also do not necessarily have to do that for each frame, each n-th frame should be be enough for confirmation.
This is a old question .. but will still like to give my recommendation to use YOLO algorithm to solve this problem.
YOLO fits very well to this scenario.

Resources