How to get position (x,y) and number of particular objects or shape in a handdrawing image? - image-processing

first, I've learning just couple of week about image processing, NN, dll, by myself, so I'm really new n really far to pro. n sorry for my bad english.
there's image or photo of my drawing, I want to get the coordinates of object/shape (black dot) n the number around it, the number indicating the sequence number of dot.
How to get it? How to detect the dots? Shape recognition for the dots? Number handwriting recognition for the numbers? Then segmentation to get the position? Or use template matching? But every dot has a bit different shape because of hand drawing. Use neural network? in NN, the neuron is usually contain every pixel to recognize an character, right? can I use an picture of character or drawing dot contained by each neuron to recognize my whole picture?
I'm very new, so I'm really need your advice, correct me if I wrong! Please tell me what I must learn, what I must do, what I must use.
Thank you very much. :'D

This is a difficult problem which can't be solved by a quick solution.
Here is how I would approach it:
Get a better picture. Your image is very noisy and is taken in low light with high ISO. Use a better camera and better lighting conditions so you can get the background to be as white as possible and the dots as black as possible. Try to maximize the contrast.
Threshold the image so that all the background is white and the dots and numbers are black. Maybe you could apply some erosion and/or dilation to help connect the dark edges together.
Detect the rectangle somehow and set your work area to be inside the rectangle (crop the rest of the image so that you are left with the area inside the rectangle). You could do this by detecting the contours in the image and then the contour that has the largest area is the rectangle (because it's the largest object in the image). Of course, this is not the only way. See this: OpenCV find contours
Once you are left with only the dots, circles and numbers you need to find a way to detect them and discriminate between them. You could again find all contours (or maybe you've found them all from the previous step). You need to figure out a way to see if a certain contour is a circle, a filled circle (dot) or a number. This is a problem in it's own. Maybe you could count the white/black pixels in the contour's bounding box. Dots have more black pixels than circles and numbers. You also need to do something about numbers that connect with dots (like the number 5 in your image)
Once you know what is a dot, circle or number you could use an OCR library (Tesseract or any other OCR lib) to try and recognize the numbers. You could also use a neural network library (maybe trained with the MNIST dataset) to recognize the digits. A good one would be a convolutional neural network similar to LeNet-5.
As you can see, this is a problem that requires many different steps to solve, and many different components are involved. The steps I suggested might not be the best, but with some work I think it can be solved.

Related

Recognition and counting of books from side using OpenCV

Just wish to receive some ideas on I can solve this problem.
For a clearer picture, here are examples of some of the image that we are looking at:
I have tried looking into thresholding it, like otsu, blobbing it, etc. However, I am still unable to segment out the books and count them properly. Hardcover is easy of course, as the cover clearly separates the books, but when it comes to softcover, I have not been able to successfully count the number of books.
Does anybody have any suggestions on what I can do? Any help will be greatly appreciated. Thanks.
I ran a sobel edge detector and used Hough transform to detect lines on the last image and it seemed to be working okay for me. You can then link the edges on the output of the sobel edge detector and then count the number of horizontal lines. Or, you can do the same on the output of the lines detected using Hough.
You can further narrow down the area of interest by converting the image into a binary image. The outputs of all of these operators can be seen in following figure ( I couldn't upload an image so had to host it here) http://www.pictureshoster.com/files/v34h8hvvv1no4x13ng6c.jpg
Refer to http://www.mathworks.com/help/images/analyzing-images.html#f11-12512 for some more useful examples on how to do edge, line and corner detection.
Hope this helps.
I think that #audiohead's recommendation is good but you should be careful when applying the Hough transform for images that will have the library's stamp as it might confuse it with another book (You can see that the letters form some break-lines that will be detected by sobel).
Consider to apply first an edge preserving smoothing algorithm such as a Bilateral Filter. When tuned correctly (setting of the Kernels) it can avoid these such of problems.
A Different Solution That Might Work (But can be slow)
Here is a different approach that is based on pixel marking strategy.
a) Based on some very dark threshold, mark all black pixels as visited.
b) While there are unvisited pixels: Pick the next unvisited pixel and apply a region-growing algorithm http://en.wikipedia.org/wiki/Region_growing while marking its pixels with a unique number. At this stage you will need to analyse the geometric shape that this region is forming. A good criteria to detecting a book is that the region is creating some form of a rectangle where width >> height. This will detect a book and mark all its pixels to the unique number.
Once there are no more unvisited pixels, the number of unique numbers is the number of books you will have + For each pixel on your image you will now to which book does it belongs.
Do you have to keep the books this way? If you can change the books to face back side to the camera then I think you can get more information about the different colors used by different books.The lines by Hough transform or edge detection will be more prominent this way.
There exist more sophisticated methods which are much better in contour detection and segmentation, you can have a look at them here, however it is quite slow, http://www.eecs.berkeley.edu/Research/Projects/CS/vision/grouping/resources.html
Once you get the ultrametric contour map, you can perform some computation on them to count the number of books
I would try a completely different approach; with paperbacks, the covers are medium-dark lines whilst the rest of the (assuming white pages) are fairly white and "bloomed", so I'd try to thicken up the dark edges to make them easy to detect, then that would give the edges akin to working with hardbacks which you say you've done.
I'd try something like an erosion to thicken up the edges. This would be a nice, fast operation.

which algorithm to choose for object detection?

I am interested in detecting single object more precisely a fire extinguisher which has no inter class variability (all fire extinguisher looks same). However, The application is supposedly realtime i.e a robot is exploring the environment and whenever it sees the object of interest it should be able to detect it and give pixel coordinates of it.
My question is which algorithm will be good choice for this task?
1. Is this a classification problem and should we use features(sift/surf etc) + bow +svm?
2. some other solution (no idea yet).
Any kind of input will be appreciated.
Thanks.
(P.S bear with me i am newbie to computer vision and stack over flow)
update1:
Height varies all are mounted on the wall but with different height. I tried with SIFT features and bow but it is expensive to extract bow descriptors in testing part. Moreover I have no idea how to locate the object(pixel coordinates) inside the image after its been classified positive.
update 2:
I finally used sift + bow + svm and am able to classify the object. But using this technique, i only get output interms of whether the object is present in the scene or not?
How can i detect the object i.e getting the bounding box or centre of the object. what is the compatible approach with the above method for achieving these results.
Thank you all.
I would suggest using color as the main feature to look for, and only try other features as needed. The fire extinguisher red is very distinctive, and should not occur too often elsewhere in an office environment. Other, more computationally expensive tests can then be performed only in regions of the right color.
Here is a good tutorial for color detection that also explains how to find good thresholds for your desired color.
I would suggest the following approach:
denoise your image with a median filter
convert the image to HSV format (Hue, Saturation, Value)
select pixels close to that particular shade of red with InRange()
Now you have a binary image image that contains only the pixels that are red.
count the number of red pixels with CountNonZero()
If that number is too small, abort
remove noise from the binary image by morphological opening / closing
find contours of all blobs in your picture with findContours or the CvBlob library
check if there are blobs of the correct width, correct height and correct width/height ratio
since your fire extinguishers are vertical cylinders, the width/height ratio will be constant from every angle. The width and height will of course vary somewhat with distance to the camera.
if the width and height do not match, abort
repeat these steps to find the black-colored part on the bottom of the extinguisher,
abort if there is no black region with correct width/height below the red region
(perhaps also repeat these steps for the metallic top and the yellow rectangle)
These tests should all be very fast. If they are too slow, you could reduce the resolution of your input images.
Depending on your environment, it is possible that this is already a robust enough test. If not, you can proceed with sift/surf feature matching, but only in a small region around the blobs with the correct color. You also do not necessarily have to do that for each frame, each n-th frame should be be enough for confirmation.
This is a old question .. but will still like to give my recommendation to use YOLO algorithm to solve this problem.
YOLO fits very well to this scenario.

Detecting multiple shapes in a picture and calculate the middle

This question can be answered with any type of programming language, cause I would like some help with algorithms, but I prefer Delphi. I have a the task to detect and count multiple shapes (between 1 and N - mostly circular or a Elipse) of random pictures and calculate their middle and return them as coordinates of a picture. The middle of each shape can have a filling (but it doesn't matter). The shapes are at least 1+ pixel away from each other. None of the shapes will like blend in with another or the corner of a picture.
The background of the picture has always the same background color, which actually doesn't matter, cause the borders/frames of the shapes are always a different color compared to the background. This makes it easy to detect the shapes. I was thinking about going pixel by pixel and collect the coordinates and then draw like an invisible rectangle/square around every shape to calculate the middle. Then I also heard about scanline, but I don't think it would be faster in this case. So my question is, how can I calculate:
How many shapes are in the picture.
How can I calculate (more or less) the exact middle of them.
A few pictures to visualize the task:
This is a picture with random shapes (mostly close circles)
As you can see they are apart from each other just fine.
Then I could easily draw/calculate an imaginary rectangle/square around every shape and calculate the middle of it like that:
After I have the rectangles/squares. I can easily calculate the middle.
How do I start?
PS.: I've drawn some circles in mspaint. I have to add that all shapes are CLOSED, which makes it possible to flood fill EVERY shape in the picture with no problems!
Thank you for your help.
Calculate MSER (Maximally stable extremal regions) for the image. I can't explain that algorithm here. You can refer to the Maximally stable extremal regions article for more information about the algorithm.
That will give you centroid too.
This algorithm is implemented as inbuilt functions in OpenCv tool and Matlab 2012b.
Another method which i can think of and possibly simple than previous method is to apply connected components algorithm and count number of objects.More information of this can be found in book by Gonzalez and Woods on Digital Image Processing.

Finding the Bounding area of hand

I have the image of hand that was detected using this link. Its hand detection using HSV color space.
Now I face a problem: I need to get the enclosing area/draw bounding lines possible enough to determine the hand area, then fill the enclosing area and subtract it from the original to remove the hand.
I have thus so far tried to blurring the image to reduce noise, dilating the image, closing holes, etc. that seem to be an overdose. I have tried contours, and that seem to be the best approach so far. I was trying to get the convex hull (largest) and I ended up with the following after testing with different thresholds.
The inaccuracies can be seen with the thumb were the hull straightens. It must be curved. I am trying to figure out the location of the hand so to identify the region being covered by the hand. Going to subtract it to remove the hand from the original image. That is what I want to achieve.
Is there a better approach to this?
And ideas suggestions greatly appreciated.
Original and detected are as follows
Instead of the convex hull, consider using the alpha hull, which can better follow the contours of a shape by allowing concavities.
This site has a nice summary of alpha shapes: "Everything You Always Wanted to Know About Alpha Shapes But Were Afraid to Ask" by François Bélair.
http://cgm.cs.mcgill.ca/~godfried/teaching/projects97/belair/alpha.html
As David mentioned in his post, consider thresholding using HSV (or HSI) color space rather than on RGB or grayscale. If you can allow for longer processing time, you can use an algorithm such as Mean Shift to segment trickier images like yours. OpenCV has an implementation of Mean Shift, and the book Learning OpenCV provides a concise description of the algorithm.
Image Segmentation using Mean Shift explained
In any case, a standard binarization threshold doesn't appear to be helping much. Consider using a dynamic threshold; at least local/dynamic threshold is implemented for contours in OpenCV, from what I recall.
Assuming you want to identify hand area instead of the area convex hull gives and background of the application is at least in same color, I would apply hsv-threshold to identify background instead of hand if possible. Or maybe adaptive threshold if light distribution is not consistent. I believe this is what many applications do
If background can't be fixed, the segmentation is not an easy problem to resolve as you should take care of shadows and palm lines.

Shape/Pattern Matching Approach in Computer Vision

I am currently facing a, in my opinion, rather common problem which should be quite easy to solve but so far all my approached have failed so I am turning to you for help.
I think the problem is explained best with some illustrations. I have some Patterns like these two:
I also have an Image like (probably better, because the photo this one originated from was quite poorly lit) this:
(Note how the Template was scaled to kinda fit the size of the image)
The ultimate goal is a tool which determines whether the user shows a thumb up/thumbs down gesture and also some angles in between. So I want to match the patterns against the image and see which one resembles the picture the most (or to be more precise, the angle the hand is showing). I know the direction in which the thumb is showing in the pattern, so if i find the pattern which looks identical I also have the angle.
I am working with OpenCV (with Python Bindings) and already tried cvMatchTemplate and MatchShapes but so far its not really working reliably.
I can only guess why MatchTemplate failed but I think that a smaller pattern with a smaller white are fits fully into the white area of a picture thus creating the best matching factor although its obvious that they dont really look the same.
Are there some Methods hidden in OpenCV I havent found yet or is there a known algorithm for those kinds of problem I should reimplement?
Happy New Year.
A few simple techniques could work:
After binarization and segmentation, find Feret's diameter of the blob (a.k.a. the farthest distance between points, or the major axis).
Find the convex hull of the point set, flood fill it, and treat it as a connected region. Subtract the original image with the thumb. The difference will be the area between the thumb and fist, and the position of that area relative to the center of mass should give you an indication of rotation.
Use a watershed algorithm on the distances of each point to the blob edge. This can help identify the connected thin region (the thumb).
Fit the largest circle (or largest inscribed polygon) within the blob. Dilate this circle or polygon until some fraction of its edge overlaps the background. Subtract this dilated figure from the original image; only the thumb will remain.
If the size of the hand is consistent (or relatively consistent), then you could also perform N morphological erode operations until the thumb disappears, then N dilate operations to grow the fist back to its original approximate size. Subtract this fist-only blob from the original blob to get the thumb blob. Then uses the thumb blob direction (Feret's diameter) and/or center of mass relative to the fist blob center of mass to determine direction.
Techniques to find critical points (regions of strong direction change) are trickier. At the simplest, you might also use corner detectors and then check the distance from one corner to another to identify the place when the inner edge of the thumb meets the fist.
For more complex methods, look into papers about shape decomposition by authors such as Kimia, Siddiqi, and Xiaofing Mi.
MatchTemplate seems like a good fit for the problem you describe. In what way is it failing for you? If you are actually masking the thumbs-up/thumbs-down/thumbs-in-between signs as nicely as you show in your sample image then you have already done the most difficult part.
MatchTemplate does not include rotation and scaling in the search space, so you should generate more templates from your reference image at all rotations you'd like to detect, and you should scale your templates to match the general size of the found thumbs up/thumbs down signs.
[edit]
The result array for MatchTemplate contains an integer value that specifies how well the fit of template in image is at that location. If you use CV_TM_SQDIFF then the lowest value in the result array is the location of best fit, if you use CV_TM_CCORR or CV_TM_CCOEFF then it is the highest value. If your scaled and rotated template images all have the same number of white pixels then you can compare the value of best fit you find for all different template images, and the template image that has the best fit overall is the one you want to select.
There are tons of rotation/scaling independent detection functions that could conceivably help you, but normalizing your problem to work with MatchTemplate is by far the easiest.
For the more advanced stuff, check out SIFT, Haar feature based classifiers, or one of the others available in OpenCV
I think you can get excellent results if you just compute the two points that have the furthest shortest path going through white. The direction in which the thumb is pointing is just the direction of the line that joins the two points.
You can do this easily by sampling points on the white area and using Floyd-Warshall.

Resources