I apologize in advance for the lack of code, but I have been looking at this problem for some days now and couldn't proceed much.
I am trying to find a method that, given a template shape and a figure, finds the most fitting (edges-wise) placement of such shape in the (binarized) figure.
For example, in the picture above, I've highlighted 2 placement for the triangle on the right (I show 2, but there are probably 5). The blue one covers more "perimeter" and thus should be preferred.
The rotations are limited to steps of 45 degrees and no scaling is involved, so I don't think I need some particularly rotation/size invariant methods.
For now, I have obtained some results using opencv template matching, by first extracting the edges with canny and matching the resulting contour, but it is far from precise and I am afraid this would not work for more complex figures. Also by using canny I lose information about the binarized foreground and background, while the placement must be only inside the figure.
Any idea would be appreciated because I really could not find much in the literature (also I am quite inexpert in the field and probably lacking the proper terminology for effective searches), thank you in advance
Related
I've been trying to extract hand-drawn circles from a document for a while now but every attempt I make doesn't have the level of consistency I need.
Process Album
The problem I keep coming up against is when 2 "circles" are too close they become a single contour, ruining my attempt to detect if a contour is curved. I'm sure there must be a better way to extract these circles, but their imperfection and inconsistency are really stumping me.
I've tried many other ways to single out the curves, the most accurate of which being:
Rather than use dilation to bridge the gap between the segmented contours, find the endpoints and attempt to continue the curve until it hits another contour.
Problem: I can't effectively find the turning points of the contour, otherwise this would be my preferable method
I apologize if this question is deemed "too specific", but I feel like Computer Vision stuff like this can always be applied elsewhere.
Thanks ahead of time for any and all help, I'm about at the end of my rope here.
EDIT: I've just realized the album wasn't working correctly, I think it should be fixed now though.
It looks like a very challenging problem so it is very likely that the things I am going to write wouldn't work very well in practice.
In order to ease the problem, I would probably try to remove as much of other stuff from the image as possible.
If the template of the document is always the same, it might be worth trying to remove horizontal and vertical lines along with grayed areas. For example, given the empty template, substract it from the document that you are processing. Probably, it might be possible to get rid of the text also. This would result in an image with only parts of hand drawn circles.
On such image, detecting circles or ellipses with hough transform might give some results (although shapes might be far from circles or ellipses).
Just wish to receive some ideas on I can solve this problem.
For a clearer picture, here are examples of some of the image that we are looking at:
I have tried looking into thresholding it, like otsu, blobbing it, etc. However, I am still unable to segment out the books and count them properly. Hardcover is easy of course, as the cover clearly separates the books, but when it comes to softcover, I have not been able to successfully count the number of books.
Does anybody have any suggestions on what I can do? Any help will be greatly appreciated. Thanks.
I ran a sobel edge detector and used Hough transform to detect lines on the last image and it seemed to be working okay for me. You can then link the edges on the output of the sobel edge detector and then count the number of horizontal lines. Or, you can do the same on the output of the lines detected using Hough.
You can further narrow down the area of interest by converting the image into a binary image. The outputs of all of these operators can be seen in following figure ( I couldn't upload an image so had to host it here) http://www.pictureshoster.com/files/v34h8hvvv1no4x13ng6c.jpg
Refer to http://www.mathworks.com/help/images/analyzing-images.html#f11-12512 for some more useful examples on how to do edge, line and corner detection.
Hope this helps.
I think that #audiohead's recommendation is good but you should be careful when applying the Hough transform for images that will have the library's stamp as it might confuse it with another book (You can see that the letters form some break-lines that will be detected by sobel).
Consider to apply first an edge preserving smoothing algorithm such as a Bilateral Filter. When tuned correctly (setting of the Kernels) it can avoid these such of problems.
A Different Solution That Might Work (But can be slow)
Here is a different approach that is based on pixel marking strategy.
a) Based on some very dark threshold, mark all black pixels as visited.
b) While there are unvisited pixels: Pick the next unvisited pixel and apply a region-growing algorithm http://en.wikipedia.org/wiki/Region_growing while marking its pixels with a unique number. At this stage you will need to analyse the geometric shape that this region is forming. A good criteria to detecting a book is that the region is creating some form of a rectangle where width >> height. This will detect a book and mark all its pixels to the unique number.
Once there are no more unvisited pixels, the number of unique numbers is the number of books you will have + For each pixel on your image you will now to which book does it belongs.
Do you have to keep the books this way? If you can change the books to face back side to the camera then I think you can get more information about the different colors used by different books.The lines by Hough transform or edge detection will be more prominent this way.
There exist more sophisticated methods which are much better in contour detection and segmentation, you can have a look at them here, however it is quite slow, http://www.eecs.berkeley.edu/Research/Projects/CS/vision/grouping/resources.html
Once you get the ultrametric contour map, you can perform some computation on them to count the number of books
I would try a completely different approach; with paperbacks, the covers are medium-dark lines whilst the rest of the (assuming white pages) are fairly white and "bloomed", so I'd try to thicken up the dark edges to make them easy to detect, then that would give the edges akin to working with hardbacks which you say you've done.
I'd try something like an erosion to thicken up the edges. This would be a nice, fast operation.
I have the image of hand that was detected using this link. Its hand detection using HSV color space.
Now I face a problem: I need to get the enclosing area/draw bounding lines possible enough to determine the hand area, then fill the enclosing area and subtract it from the original to remove the hand.
I have thus so far tried to blurring the image to reduce noise, dilating the image, closing holes, etc. that seem to be an overdose. I have tried contours, and that seem to be the best approach so far. I was trying to get the convex hull (largest) and I ended up with the following after testing with different thresholds.
The inaccuracies can be seen with the thumb were the hull straightens. It must be curved. I am trying to figure out the location of the hand so to identify the region being covered by the hand. Going to subtract it to remove the hand from the original image. That is what I want to achieve.
Is there a better approach to this?
And ideas suggestions greatly appreciated.
Original and detected are as follows
Instead of the convex hull, consider using the alpha hull, which can better follow the contours of a shape by allowing concavities.
This site has a nice summary of alpha shapes: "Everything You Always Wanted to Know About Alpha Shapes But Were Afraid to Ask" by François Bélair.
http://cgm.cs.mcgill.ca/~godfried/teaching/projects97/belair/alpha.html
As David mentioned in his post, consider thresholding using HSV (or HSI) color space rather than on RGB or grayscale. If you can allow for longer processing time, you can use an algorithm such as Mean Shift to segment trickier images like yours. OpenCV has an implementation of Mean Shift, and the book Learning OpenCV provides a concise description of the algorithm.
Image Segmentation using Mean Shift explained
In any case, a standard binarization threshold doesn't appear to be helping much. Consider using a dynamic threshold; at least local/dynamic threshold is implemented for contours in OpenCV, from what I recall.
Assuming you want to identify hand area instead of the area convex hull gives and background of the application is at least in same color, I would apply hsv-threshold to identify background instead of hand if possible. Or maybe adaptive threshold if light distribution is not consistent. I believe this is what many applications do
If background can't be fixed, the segmentation is not an easy problem to resolve as you should take care of shadows and palm lines.
I am currently facing a, in my opinion, rather common problem which should be quite easy to solve but so far all my approached have failed so I am turning to you for help.
I think the problem is explained best with some illustrations. I have some Patterns like these two:
I also have an Image like (probably better, because the photo this one originated from was quite poorly lit) this:
(Note how the Template was scaled to kinda fit the size of the image)
The ultimate goal is a tool which determines whether the user shows a thumb up/thumbs down gesture and also some angles in between. So I want to match the patterns against the image and see which one resembles the picture the most (or to be more precise, the angle the hand is showing). I know the direction in which the thumb is showing in the pattern, so if i find the pattern which looks identical I also have the angle.
I am working with OpenCV (with Python Bindings) and already tried cvMatchTemplate and MatchShapes but so far its not really working reliably.
I can only guess why MatchTemplate failed but I think that a smaller pattern with a smaller white are fits fully into the white area of a picture thus creating the best matching factor although its obvious that they dont really look the same.
Are there some Methods hidden in OpenCV I havent found yet or is there a known algorithm for those kinds of problem I should reimplement?
Happy New Year.
A few simple techniques could work:
After binarization and segmentation, find Feret's diameter of the blob (a.k.a. the farthest distance between points, or the major axis).
Find the convex hull of the point set, flood fill it, and treat it as a connected region. Subtract the original image with the thumb. The difference will be the area between the thumb and fist, and the position of that area relative to the center of mass should give you an indication of rotation.
Use a watershed algorithm on the distances of each point to the blob edge. This can help identify the connected thin region (the thumb).
Fit the largest circle (or largest inscribed polygon) within the blob. Dilate this circle or polygon until some fraction of its edge overlaps the background. Subtract this dilated figure from the original image; only the thumb will remain.
If the size of the hand is consistent (or relatively consistent), then you could also perform N morphological erode operations until the thumb disappears, then N dilate operations to grow the fist back to its original approximate size. Subtract this fist-only blob from the original blob to get the thumb blob. Then uses the thumb blob direction (Feret's diameter) and/or center of mass relative to the fist blob center of mass to determine direction.
Techniques to find critical points (regions of strong direction change) are trickier. At the simplest, you might also use corner detectors and then check the distance from one corner to another to identify the place when the inner edge of the thumb meets the fist.
For more complex methods, look into papers about shape decomposition by authors such as Kimia, Siddiqi, and Xiaofing Mi.
MatchTemplate seems like a good fit for the problem you describe. In what way is it failing for you? If you are actually masking the thumbs-up/thumbs-down/thumbs-in-between signs as nicely as you show in your sample image then you have already done the most difficult part.
MatchTemplate does not include rotation and scaling in the search space, so you should generate more templates from your reference image at all rotations you'd like to detect, and you should scale your templates to match the general size of the found thumbs up/thumbs down signs.
[edit]
The result array for MatchTemplate contains an integer value that specifies how well the fit of template in image is at that location. If you use CV_TM_SQDIFF then the lowest value in the result array is the location of best fit, if you use CV_TM_CCORR or CV_TM_CCOEFF then it is the highest value. If your scaled and rotated template images all have the same number of white pixels then you can compare the value of best fit you find for all different template images, and the template image that has the best fit overall is the one you want to select.
There are tons of rotation/scaling independent detection functions that could conceivably help you, but normalizing your problem to work with MatchTemplate is by far the easiest.
For the more advanced stuff, check out SIFT, Haar feature based classifiers, or one of the others available in OpenCV
I think you can get excellent results if you just compute the two points that have the furthest shortest path going through white. The direction in which the thumb is pointing is just the direction of the line that joins the two points.
You can do this easily by sampling points on the white area and using Floyd-Warshall.
I need to automatically align an image B on top of another image A in such a way, that the contents of the image match as good as possible.
The images can be shifted in x/y directions and rotated up to 5 degrees on z, but they won't be distorted (i.e. scaled or keystoned).
Maybe someone can recommend some good links or books on this topic, or share some thoughts how such an alignment of images could be done.
If there wasn't the rotation problem, then I could simply try to compare rows of pixels with a brute-force method until I find a match, and then I know the offset and can align the image.
Do I need AI for this?
I'm having a hard time finding resources on image processing which go into detail how these alignment-algorithms work.
So what people often do in this case is first find points in the images that match then compute the best transformation matrix with least squares. The point matching is not particularly simple and often times you just use human input for this task, you have to do it all the time for calibrating cameras. Anyway, if you want to fully automate this process you can use feature extraction techniques to find matching points, there are volumes of research papers written on this topic and any standard computer vision text will have a chapter on this. Once you have N matching points, solving for the least squares transformation matrix is pretty straightforward and, again, can be found in any computer vision text, so I'll assume you got that covered.
If you don't want to find point correspondences you could directly optimize the rotation and translation using steepest descent, trouble is this is non-convex so there are no guarantees you will find the correct transformation. You could do random restarts or simulated annealing or any other global optimization tricks on top of this, that would most likely work. I can't find any references to this problem, but it's basically a digital image stabilization algorithm I had to implement it when I took computer vision but that was many years ago, here are the relevant slides though, look at "stabilization revisited". Yes, I know those slides are terrible, I didn't make them :) However, the method for determining the gradient is quite an elegant one, since finite difference is clearly intractable.
Edit: I finally found the paper that went over how to do this here, it's a really great paper and it explains the Lucas-Kanade algorithm very nicely. Also, this site has a whole lot of material and source code on image alignment that will probably be useful.
for aligning the 2 images together you have to carry out image registration technique.
In matlab, write functions for image registration and select your desirable features for reference called 'feature points' using 'control point selection tool' to register images.
Read more about image registration in the matlab help window to understand properly.