I am currently looking for a way to fit a simple shape (e.g. a T or an L shape) to a 2D point cloud. What I need as a result is the position and orientation of the shape.
I have been looking at a couple of approaches but most seem very complicated and involve building and learning a sample database first. As I am dealing with very simple shapes I was hoping that there might be a simpler approach.
By saying you don't want to do any training I am guessing that you mean you don't want to do any feature matching; feature matching is used to make good guesses about the pose (location and orientation) of the object in the image, and would be applicable along with RANSAC to your problem for guessing and verifying good hypotheses about object pose.
The simplest approach is template matching, but this may be too computationally complex (it depends on your use case). In template matching you simply loop over the possible locations of the object and its possible orientations and possible scales and check how well the template (a cloud that looks like an L or a T at that location and orientation and scale) matches (or you sample possible locations orientations and scales randomly). The checking of the template could be made fairly fast if your points are organised (or you organise them by e.g. converting them into pixels).
If this is too slow there are many methods for making template matching faster and I would recommend to you the Generalised Hough Transform.
Here, before starting the search for templates you loop over the boundary of the shape you are looking for (T or L) and for each point on its boundary you look at the gradient direction and then the angle at that point between the gradient direction and the origin of the object template, and the distance to the origin. You add that to a table (Let us call it Table A) for each boundary point and you end up with a table that maps from gradient direction to the set of possible locations of the origin of the object. Now you set up a 2D voting space, which is really just a 2D array (let us call it Table B) where each pixel contains a number representing the number of votes for the object in that location. Then for each point in the target image (point cloud) you check the gradient and find the set of possible object locations as found in Table A corresponding to that gradient, and then add one vote for all the corresponding object locations in Table B (the Hough space).
This is a very terse explanation but knowing to look for Template Matching and Generalised Hough transform you will be able to find better explanations on the web. E.g. Look at the Wikipedia pages for Template Matching and Hough Transform.
You may need to :
1- extract some features from the image inside which you are looking for the object.
2- extract another set of features in the image of the object
3- match the features (it is possible using methods like SIFT)
4- when you find a match apply RANSAC algorithm. it provides you with transformation matrix (including translation, rotation information).
for using SIFT start from here. it is actually one of the best source-codes written for SIFT. It includes RANSAC algorithm and you do not need to implement it by yourself.
you can read about RANSAC here.
Two common ways for detecting the shapes (L, T, ...) in your 2D pointcloud data would be using OpenCV or Point Cloud Library. I'll explain steps you may take for detecting those shapes in OpenCV. In order to do that, you can use the following 3 methods and the selection of the right method depends on the shape (Size, Area of the shape, ...):
Hough Line Transformation
Template Matching
Finding Contours
The first step would be converting your point to a grayscale Mat object, by doing that you basically make an image of your 2D pointcloud data and so you can use other OpenCV functions. Then you may smooth the image in order to reduce the noises and the result would be somehow a blurry image which contains real edges, if your application does not need real-time processing, you can use bilateralFilter. You can find more information about smoothing here.
The next step would be choosing the method. If the shape is just some sort of orthogonal lines (such as L or T) you can use Hough Line Transformation in order to detect the lines and after detection, you can loop over the lines and calculate the dot product of the lines (since they are orthogonal the result should be 0). You can find more information about Hough Line Transformation here.
Another way would be detecting your shape using Template Matching. Basically, you should make a template of your shape (L or T) and use it in matchTemplate function. You should consider that the size of the template you want to use should be in the order of your image, otherwise you may resize your image. More information about the algorithm can be found here.
If the shapes include areas you can find contours of the shape using findContours, it will give you the number of polygons which are around your shape you want to detect. For instance, if your shape is L, it would have polygon which has roughly 6 lines. Also, you can use some other filters along with findContours such as calculating the area of the shape.
Related
I have a non-planar object with 9 points with known dimensions in 3D i.e. length of all sides is known. Now given a 2D projection of this shape, I want to reconstruct the 3D model of it. I basically want to retrieve the shape of this object in the real world i.e. angles between different sides in 3D. For eg: given all the dimensions of every part of the table and a 2D image, I'm trying to reconstruct its 3D model.
I've read about homography, perspective transform, procrustes and fundamental/essential matrix so far but haven't found a solution that'll apply here. I'm new to this, so might have missed out something. Any direction on this will be really helpful.
In your question, you mention that you want to achieve this using only a single view of the object. In that case, homographies or Essential/Fundamental matrices wont help you, because these require at least two views of the scene to make sense. If you don't have any priors on the shape of the objects that you want to reconstruct, the key information that you'll be missing is (relative) depth, and in that case I think those are the two possible solutions:
Leverage a learning algorithm. There is a rich literature on 6dof object pose estimation with deep networks, see this paper for example. You wont have to deal with depth directly if you use those since those networks are trained end to end to estimate a pose in SO(3).
Add many more images and use a dense photometric SLAM/SFM pipeline, such as elastic fusion. However, in that case you will need to segment the resulting models since the estimation they produce is of the entire environment, which can be difficult depending on the scene.
However, as you mentioned in your comment, it is possible to reconstruct the model up to scale if you have very strong priors on its geometry. In the case of a planar object (a cuboid will just be an extension of that), you can use this simple algorithm (that is more or less what they do here, there are other methods but I find them a bit messy, equation-wise):
//let's note A,B,C,D the rectangle in 3d that we are after, such that
//AB is parellel with CD. Let's also note a,b,c,d their respective
//reprojections in the image, i.e. a=KA where K is the calibration matrix, and so on.
1) Compute the common vanishing point of AB and CD. This is just the intersection
of ab and cd in the image plane. Let's call it v_1.
2) Do the same for the two other edges, i.e bc and da. Let's call this
vanishing point v_2.
3) Now, you can compute the vanishing line, which will just be
crossproduct(v_1, v_2), i.e. the line going through both v_1 and v_2. This gives
you the orientation of your plane. Let's write its normal N.
5) All you need to find now is the boundaries of the rectangle. To do
that, just consider any plane with normal N that doesn't go through
the camera center. Now find the intersections of K^{-1}a, K^{-1}b,
K^{-1}c, K^{-1}d with that plane.
If you need a refresher on vanishing points and lines, I suggest you take a look at pages 213 and 216 of Hartley-Zisserman's book.
I have a processed binary image of dimension 300x300. This processed image contains few object(person or vehicle).
I also have another RGB image of the same scene of dimensiion 640x480. It is taken from a different position
note : both cameras are not the same
I can detect objects to some extent in the first image using background subtraction. I want to detect corresponding objects in the 2nd image. I went through opencv functions
getAffineTransform
getPerspectiveTransform
findHomography
estimateRigidTransform
All these functions require corresponding points(coordinates) in two images
In the 1st binary image, I have only the information that an object is present,it does not have features exactly similar to second image(RGB).
I thought conventional feature matching to determine corresponding control points which could be used to estimate the transformation parameters is not feasible because I think I cannot determine and match features from binary and RGB image(am I right??).
If I am wrong, what features could I take, how should I proceed with Feature matching, find corresponding points, estimate the transformation parameters.
The solution which I tried more of Manual marking to estimate transformation parameters(please correct me if I am wrong)
Note : There is no movement of both cameras.
Manually marked rectangles around objects in processed image(binary)
Noted down the coordinates of the rectangles
Manually marked rectangles around objects in 2nd RGB image
Noted down the coordinates of the rectangles
Repeated above steps for different samples of 1st binary and 2nd RGB images
Now that I have some 20 corresponding points, I used them in the function as :
findHomography(src_pts, dst_pts, 0) ;
So once I detect an object in 1st image,
I drew a bounding box around it,
Transform the coordinates of the vertices using the above found transformation,
finally draw a box in 2nd RGB image with transformed coordinates as vertices.
But this doesnt mark the box in 2nd RGB image exactly over the person/object. Instead it is drawn somewhere else. Though I take several sample images of binary and RGB and use several corresponding points to estimate the transformation parameters, it seems that they are not accurate enough..
What are the meaning of CV_RANSAC and CV_LMEDS option, ransacReprojecThreshold and how to use them?
Is my approach good...what should I modify/do to make the registration accurate?
Any alternative approach to be used?
I'm fairly new to OpenCV myself, but my suggestions would be:
Seeing as you have the objects identified in the first image, I shouldn't think it would be hard to get keypoints and extract features? (or maybe you have this already?)
Identify features in the 2nd image
Match the features using OpenCV FlannBasedMatcher or similar
Highlight matching features in 2nd image or whatever you want to do.
I'd hope that because all your features in the first image should be positives (you know they are the features you want), then it'll be relatively straight forward to get accurate matches.
Like I said, I'm new to this so the ideas may need some elaboration.
It might be a little late to answer this and the asker might not see this, but if the 1st image is originally a grayscale then this could be done:
1.) 2nd image ----> grayscale ------> gray2ndimg
2.) Point to Point correspondences b/w gray1stimg and gray2ndimg by matching features.
I'm creating a part scanner in C that pulls all possibilities for scanned parts as images in a directory. My code currently fetches all images from that directory and dumps them into a vector. I then produce groups of contours for all the images. The program then falls into a while loop where it constantly grabs images from a webcam, and generates contours for those as well. I have set up a jig for the part to rest on, so orientation and size are not a concern, however I don't want to have to calibrate the machine, so there may be movement between the template images and the part images taken.
What is the best way to compare the contours? I have tried several methods including matchTemplate without contours, but if you take a look at the two parts below, you can see that these two are very close to each other, so matchShapes and matchTemplate can't distinguish between them the way I was using them. I'm also not sure how to use cvMatchShapes. It works with just loading the images directly into match shapes, but the results are inconclusive. I think that contours is the way to go, I'm just not sure of how to go about implementing the comparison phase. Any help would be great.
You can view the templates here: http://www.cryogendesign.com/partDetection.html"
If you are ready for do-it-yourself, one approach could be to compute a "distance image" (assign every pixel the smallest Euclidean distance to the contour taken as the reference). See http://en.wikipedia.org/wiki/Distance_transform.
Using this distance image, you can quickly compute the average distance of a new contour to the reference one (for every contour pixel, get the distance from the distance image). The average distance gives you an indication of the goodness-of-fit and will let you find the best match to a set of reference templates.
If the parts have some moving freedom, the situation is a bit harder: before computing the average distance, you must fit the new contour to the reference one. You will need to apply a suitable transform (translation, rotation, possibly scaling), and find the parameters that will minimize... the average distance.
You can calculate the chamfer distance between the two contours:
T and E are the set of edges of the template and the image and x is the point of reference where you start to compare the two set of edges. So for each x you get a different value.
DT is the distance transform of an image. Matlab provides the algorithm here.
If you want a more detailed version of how to calculate the chamfer distance, take a look here.
I am currently facing a, in my opinion, rather common problem which should be quite easy to solve but so far all my approached have failed so I am turning to you for help.
I think the problem is explained best with some illustrations. I have some Patterns like these two:
I also have an Image like (probably better, because the photo this one originated from was quite poorly lit) this:
(Note how the Template was scaled to kinda fit the size of the image)
The ultimate goal is a tool which determines whether the user shows a thumb up/thumbs down gesture and also some angles in between. So I want to match the patterns against the image and see which one resembles the picture the most (or to be more precise, the angle the hand is showing). I know the direction in which the thumb is showing in the pattern, so if i find the pattern which looks identical I also have the angle.
I am working with OpenCV (with Python Bindings) and already tried cvMatchTemplate and MatchShapes but so far its not really working reliably.
I can only guess why MatchTemplate failed but I think that a smaller pattern with a smaller white are fits fully into the white area of a picture thus creating the best matching factor although its obvious that they dont really look the same.
Are there some Methods hidden in OpenCV I havent found yet or is there a known algorithm for those kinds of problem I should reimplement?
Happy New Year.
A few simple techniques could work:
After binarization and segmentation, find Feret's diameter of the blob (a.k.a. the farthest distance between points, or the major axis).
Find the convex hull of the point set, flood fill it, and treat it as a connected region. Subtract the original image with the thumb. The difference will be the area between the thumb and fist, and the position of that area relative to the center of mass should give you an indication of rotation.
Use a watershed algorithm on the distances of each point to the blob edge. This can help identify the connected thin region (the thumb).
Fit the largest circle (or largest inscribed polygon) within the blob. Dilate this circle or polygon until some fraction of its edge overlaps the background. Subtract this dilated figure from the original image; only the thumb will remain.
If the size of the hand is consistent (or relatively consistent), then you could also perform N morphological erode operations until the thumb disappears, then N dilate operations to grow the fist back to its original approximate size. Subtract this fist-only blob from the original blob to get the thumb blob. Then uses the thumb blob direction (Feret's diameter) and/or center of mass relative to the fist blob center of mass to determine direction.
Techniques to find critical points (regions of strong direction change) are trickier. At the simplest, you might also use corner detectors and then check the distance from one corner to another to identify the place when the inner edge of the thumb meets the fist.
For more complex methods, look into papers about shape decomposition by authors such as Kimia, Siddiqi, and Xiaofing Mi.
MatchTemplate seems like a good fit for the problem you describe. In what way is it failing for you? If you are actually masking the thumbs-up/thumbs-down/thumbs-in-between signs as nicely as you show in your sample image then you have already done the most difficult part.
MatchTemplate does not include rotation and scaling in the search space, so you should generate more templates from your reference image at all rotations you'd like to detect, and you should scale your templates to match the general size of the found thumbs up/thumbs down signs.
[edit]
The result array for MatchTemplate contains an integer value that specifies how well the fit of template in image is at that location. If you use CV_TM_SQDIFF then the lowest value in the result array is the location of best fit, if you use CV_TM_CCORR or CV_TM_CCOEFF then it is the highest value. If your scaled and rotated template images all have the same number of white pixels then you can compare the value of best fit you find for all different template images, and the template image that has the best fit overall is the one you want to select.
There are tons of rotation/scaling independent detection functions that could conceivably help you, but normalizing your problem to work with MatchTemplate is by far the easiest.
For the more advanced stuff, check out SIFT, Haar feature based classifiers, or one of the others available in OpenCV
I think you can get excellent results if you just compute the two points that have the furthest shortest path going through white. The direction in which the thumb is pointing is just the direction of the line that joins the two points.
You can do this easily by sampling points on the white area and using Floyd-Warshall.
What are the ways in which to quantify the texture of a portion of an image? I'm trying to detect areas that are similar in texture in an image, sort of a measure of "how closely similar are they?"
So the question is what information about the image (edge, pixel value, gradient etc.) can be taken as containing its texture information.
Please note that this is not based on template matching.
Wikipedia didn't give much details on actually implementing any of the texture analyses.
Do you want to find two distinct areas in the image that looks the same (same texture) or match a texture in one image to another?
The second is harder due to different radiometry.
Here is a basic scheme of how to measure similarity of areas.
You write a function which as input gets an area in the image and calculates scalar value. Like average brightness. This scalar is called a feature
You write more such functions to obtain about 8 - 30 features. which form together a vector which encodes information about the area in the image
Calculate such vector to both areas that you want to compare
Define similarity function which takes two vectors and output how much they are alike.
You need to focus on steps 2 and 4.
Step 2.: Use the following features: std() of brightness, some kind of corner detector, entropy filter, histogram of edges orientation, histogram of FFT frequencies (x and y directions). Use color information if available.
Step 4. You can use cosine simmilarity, min-max or weighted cosine.
After you implement about 4-6 such features and a similarity function start to run tests. Look at the results and try to understand why or where it doesnt work. Then add a specific feature to cover that topic.
For example if you see that texture with big blobs is regarded as simmilar to texture with tiny blobs then add morphological filter calculated densitiy of objects with size > 20sq pixels.
Iterate the process of identifying problem-design specific feature about 5 times and you will start to get very good results.
I'd suggest to use wavelet analysis. Wavelets are localized in both time and frequency and give a better signal representation using multiresolution analysis than FT does.
Thre is a paper explaining a wavelete approach for texture description. There is also a comparison method.
You might need to slightly modify an algorithm to process images of arbitrary shape.
An interesting approach for this, is to use the Local Binary Patterns.
Here is an basic example and some explanations : http://hanzratech.in/2015/05/30/local-binary-patterns.html
See that method as one of the many different ways to get features from your pictures. It corresponds to the 2nd step of DanielHsH's method.