I read about image features on wikipedia and I am still confused about what exactly they are.
Term is explained in a manner such that I cant clear my confusion.
1. They represent a Class (edge is a feature and boundry is another)
2. They represent a instance of a Class(all the edges detected will be a feature)
Suppose I detect all the corners of an object and put them in an array say A.
Did I get only one feature or I got features=len(A).
Each feature is an individual "interesting" point or area in the image, with "interesting" depending on what algorithm is used to find features. In your example, you'd have A features, each of the corners being one.
I'm trying to make a program that can take an image of a dartboard and read the score. So far I can get the position of each dart by comparing it to a model image as you can see here:
However this only works if the input image is practically the same. In this other case the board is slightly in a different perspective so I was thinking maybe I can transform the image to match the model image and then do the process that you can see above.
So my question is: How can I transform this last image to match the shape and pespective of the model dart board with OpenCV?
The dart board is basically planar. Thus, you can model the wanted transformation by a homography. Now you can perform a simple feature extraction and matching like here or if speed is not as important utilize an intensity based parametric alignment algorithm (more accurate).
However, as already mentioned in the comments, it will not be as simple afterwards. The dart flights will (depending on the distortion) most likely cover an area of your board which does not coincide with the actual score. Actually, even with a frontal view it is difficult to say.
I assume you will have to find the point on which the darts stick in your board. Furthermore, I think this will be easier with a view from a certain angle. Maybe, you can fit lines segments just in the area where you detected a difference beforehand.
I don't think comparing an image with the model that was captured using a different subject with a different angle is a good idea. There should be lots of small differences even after perfectly matching them geometrically - like shades, lighting, color differences, etc.
I would just capture an image every time the game begin (reference) and extract the features (straight lines seem good enough) and then after the game, capture an image, subtract the reference, and do blob analysis to find darts.
I want to design an algorithm that would find matches in images of the same apartment, when put up by different real estate agents.
Photos are relatively taken in similar time so the interior of the rooms should not change that much but of course every guys takes different pictures from different angles, etc.
(TLDR; a apartment goes for sale, and different real estate guys come in and make their own pictures, and I want to know if the given pictures from various guys are of the same place)
I know that image processing and recognition algorithm selections highly depend on the use case, so could you point me in correct direction given my use-case?
You can actually use Clarifai's Custom Training API endpoint, fairly simple and straightforward. All you would have to do is train the initial image and then compare the second to it. If the probability is high, it is likely the same apartment. For example:
In javascript, to declare a positive it is:
clarifai.positive('http://example.com/apartment1.jpg', 'firstapartment', callback);
And a negative is:
clarifai.negative('http://example.com/notapartment1.jpg', 'firstapartment', callback);
You don't necessarily have to do a negative, but it could only help. Then, when you are comparing images to the first aparment, you do:
clarifai.predict('http://example.com/someotherapartment.jpg', 'firstapartment', callback);
This will give you a probability regarding the likeness of the photo to what you've trained ('firstapartment'). This API is basically doing machine learning without the hassle of the actual machine. Clarifai's API also has a tagging input that is extremely accurate with some basic tags. The API is free for a certain number of calls/month. Definitely worth it to check out for this case.
As user Shaked mentioned in a comment, this is a difficult problem. Even if you knew the position and orientation of each camera in space, and also the characteristics of each camera, it wouldn't be a trivial problem to match the images.
A "bag of words" (BoW) approach may be of use here. Rather than try to identify specific objects and/or deduce the original 3D scene, you determine what "feature descriptors" can distinguish objects from one another in your image sets.
Imagine you could describe the two images by the relative locations of textures and colors:
horizontal-ish line segments at far left
red blob near center left
green clumpy thing at bottom left
bright round object near top left
then for a reasonably constrained set of images (e.g. photos just within a certain zip code), you may be able to yield a good match between the two images above.
The Wikipedia article on BoW may look a bit daunting, but I think if you hunt around you'll find an article that describes "bag of words" for image processing clearly. I've seen a very good demo of a BoW approach used to identify objects such as boats and delivery vans in arbitrary video streams, and it worked impressively well. I wish I had a copy of the presentation to pass along.
If you don't suspect the image to change much, you could try the standard first step of any standard structure-from-motion algorithm to establish a notion of similarity between a pair of images. Any pair of images are similar if they contain a number of matching image features larger than a threshold which satisfy the geometrical constraint of the scene as well. For a general scene, that geometrical constraint is given by a Fundamental Matrix F computed using a subset of matching features.
Here are the steps. I have inserted the opencv method for each step, but you could write your methods too:
Read the pair of images. Use img = cv2.imread(filename).
Use SIFT/SURF to detect image features/descriptors in both images.
sift = cv2.xfeatures2d.SIFT_create()
kp, des = sift.detectAndCompute(img,None)
Match features using the descriptors.
bf = cv2.BFMatcher(cv2.NORM_HAMMING, crossCheck=True)
matches = bf.match(des1,des2)
Use RANSAC to compute funamental matrix.
cv2.findFundamentalMatrix(pts1, pts2, cv2.FM_RANSAC, 3, 0.99, mask)
mask contains all the inliers. Simply count them to determine if the number of matches satisfying geometrical constraint is large enough.
CAUTION: In case of a planar scene, we use homography instead of a fundamental matrix and the steps described above work out pretty nicely because homography takes a point to a corresponding point in the other image. However, Fundamental matrix takes a point to the corresponding epipolar line in the other image, which makes the entire process a bit less stable. So I would recommend trying these steps a few more times with a little bit of jitter to the feature locations and collating the evidence over more than one trial to make the decision. You can also use more advanced steps to introduce robustness to this process but only if the steps described above don't yield the results you need.
Hi! I'm kinda new to OpenCV and Image processing. I've tried following approaches until now, but I believe there's gotta be a better approach.
1). Finding color range (HSV) manually using GColor2/Gimp tool/trackbar manually from a reference image which contains a single fruit (banana)
with a white background. Then I used inRange(), findContour(),
drawContour() on both reference banana image & target
image(fruit-platter) and matchShapes() to compare the contours in the
It works fine as long as the color range chosen is appropriate. (See 2nd image). But since these fruits doesn’t have uniform solid color, this approach didn't seem like an ideal approach to me. I don't want to hard-code the color-range (Scalar values) inside inRange().
2). Manual thresholding and contour matching.
Same issue as (1). Don't wanna hard-code the threshold value.
3). OTSU thresholding and canny edge detection.
Doesn't work well for banana, apple and lemon.
4). Dynamically finding colors. I used the cropped banana reference
image. Calculated the mean & standard deviation of the image.
Don't know how to ignore the white background pixels in my mean/std-dev calculation without looping through each x,y pixels. Any suggestions on this are welcome.
5). Haar Cascade training gives inaccurate results. (See the image below). I believe proper training might give better results. But not interested in this for now.
Other approaches I’m considering:
6). Using floodfill to find all the connected pixels and
the average and standard deviation of the same.
Haven't been successful in this. Not sure how to get all the connected pixels. I dumped the mask (imwrite) and got the banana (from the reference banana image) in black & white form. Any suggestions on this are welcome.
7). Hist backprojection:- not sure how it would help me.
8). K-Means , not tried yet. Let me know, if it’s better than step
9). meanshift/camshift → not sure whether it will help. Suggestions are welcome.
10). feature detection -- SIFT/SURF -- not tried yet.
Any help, tips, or suggestions will be highly appreciated.
Answers to such generic questions (object detection), especially to ones like this that are very active research topics, essentially boil down to a matter of preference. That said, of the 10 "approaches" you mentioned, feature detection/extraction is probably the one deserving the most attention, as it's the fundamental building block of a variety of computer vision problems, including but not limited to object recognition/detection.
A very simple but effective approach you can try is the Bag-of-Words model, very commonly used in early attempts at fast object detection, with all global spatial relationship information lost.
Late object detection research trend from what I observed from annual computer vision conference proceedings is that you encode each object by a graph that store feature descriptors in the nodes and store the spatial relationship information in the edges, so part of the global information is preserved, as we can now match not only the distance of feature descriptors in feature space but also the spatial distance between them in image space.
One common pitfall specific to this problem you described is that the homogeneous texture on banana and apple skins may not warrant a healthy distribution of features and most features you detect will be on the intersections of (most commonly) 3 or more objects, which in itself isn't a commonly regarded "good" feature. For this reason I suggest looking into superpixel object recognition (Just Google it. Seriously.) approaches, so the mathematical model of class "Apple" or "Banana" will be a block of interconnecting superpixels, stored in a graph, with each edge storing spatial relationship information and each node storing information concerning the color distribution etc. of the neighborhood specified by the superpixel. Then recognition will be come a (partial) graph matching problem or a problem related to probabilistic graphical model with many existing research done w.r.t it.
Have OpenCV implementation of shape context matching? I've found only matchShapes() function which do not work for me. I want to get from shape context matching set of corresponding features. Is it good idea to compare and find rotation and displacement of detected contour on two different images.
Also some example code will be very helpfull for me.
I want to detect for example pink square, and in the second case pen. Other examples could be squares with some holes, stars etc.
The basic steps of Image Processing is
Image Acquisition > Preprocessing > Segmentation > Representation > Recognition
And what you are asking for seems to lie within the representation part os this general algorithm. You want some features that descripes the objects you are interested in, right? Before sharing what I've done for simple hand-gesture recognition, I would like you to consider what you actually need. A lot of times simplicity will make it a lot easier. Consider a fixed color on your objects, consider background subtraction (these two main ties to preprocessing and segmentation). As for representation, what features are you interested in? and can you exclude the need of some of these features.
My project group and I have taken a simple approach to preprocessing and segmentation, choosing a green glove for our hand. Here's and example of the glove, camera and detection on the screen:
We have used a threshold on defects, and specified it to find defects from fingers, and we have calculated the ratio of a rotated rectangular boundingbox, to see how quadratic our blod is. With only four different hand gestures chosen, we are able to distinguish these with only these two features.
The functions we have used, and the measurements are all available in the documentation on structural analysis for OpenCV, and for acces of values in vectors (which we've used a lot), can be found in the documentation for vectors in c++
I hope you can use the train of thought put into this; if you want more specific info I'll be happy to comment, Enjoy.
I have a simple photograph that may or may not include a logo image. I'm trying to identify whether a picture includes the logo shape or not. The logo (rectangular shape with a few extra features) could be of various sizes and could have multiple occurrences. I'd like to use Computer Vision techniques to identify the location of these logo occurrences. Can someone point me in the right direction (algorithm, technique?) that can be used to achieve this goal?
I'm quite a novice to Computer Vision so any direction would be very appreciative.
Practical issues
Since you need a scale-invariant method (that's the proper jargon for "could be of various sizes") SIFT (as mentioned in Logo recognition in images, thanks overrider!) is a good first choice, it's very popular these days and is worth a try. You can find here some code to download. If you cannot use Matlab, you should probably go with OpenCV. Even if you end up discarding SIFT for some reason, trying to make it work will teach you a few important things about object recognition.
General description and lingo
This section is mostly here to introduce you to a few important buzzwords, by describing a broad class of object detection methods, so that you can go and look these things up. Important: there are many other methods that do not fall in this class. We'll call this class "feature-based detection".
So first you go and find features in your image. These are characteristic points of the image (corners and line crossings are good examples) that have a lot of invariances: whatever reasonable processing you do to to your image (scaling, rotation, brightness change, adding a bit of noise, etc) it will not change the fact that there is a corner in a certain point. "Pixel value" or "vertical lines" are bad features. Sometimes a feature will include some numbers (e.g. the prominence of a corner) in addition to a position.
Then you do some clean-up, like remove features that are not strong enough.
Then you go to your database. That's something you've built in advance, usually by taking several nice and clean images of whatever you are trying to find, running you feature detection on them, cleaning things up, and arrange them in some data structure for your next stage —
Look-up. You have to take a bunch of features form your image and try to match them against your database: do they correspond to an object you are looking for? This is pretty non-trivial, since on the face of it you have to consider all subsets of the bunch of features you've found, which is exponential. So there are all kinds of smart hashing techniques to do it, like Hough transform and Geometric hashing.
Now you should do some verification. You have found some places in the image which are suspect: it's probable that they contain your object. Usually, you know what is the presumed size, orientation, and position of your object, and you can use something simple (like a convolution) to check if it's really there.
You end up with a bunch of probabilities, basically: for a few locations, how probable it is that your object is there. Here you do some outlier detection. If you expect only 1-2 occurrences of your object, you'll look for the largest probabilities that stand out, and take only these points. If you expect many occurrences (like face detection on a photo of a bunch of people), you'll look for very low probabilities and discard them.
That's it, you are done!