opecv template matching -> get exact location? - opencv

I have opencv installed and working on my iphone (big thanks to this community). I'm doing template matching with it. It does find the object in the captured image. However, the exact location seems to be hard to tell.
Please take a look at the following video (18 seconds):
http://www.youtube.com/watch?v=PQnXNZMqpsU
As you can see in the video, it does find the template in the image. But when i move the camera a bit further away, then the found template is positioned somewhere inside that square. That way it's hard to tell the exact location of the found object.
The square that you see is basically the found x,y location of the template plus the width,height of the actual template image.
So basically my question is, is there a way to find the exact location of the found template image? Because currently it can be at any locastion inside that square. No real way to tell the exact location...?

It seems that you're not well-pleased with your template matching algorithm :)
Shortly, there are some ways to improve it, but I would recommend you to try something else. If your images are always as simple as in the video, you can use thresholding, contour finding, blob detection, etc. They are simple and fast.
For a more demanding environment, you may try feature matching. Look for SIFT, SURF, ORB, or other ways to describe your objects with features. Actually, ORB was specifically designed to be fast enough for the limited power of mobile phones.
Try this sample in the OCV samples/cpp/ folder
matching_to_many_images.cpp
And check this detailed answer on how to use feature detectors;
Detecting if an object from one image is in another image with OpenCV

Template matching (cvMatchTemplate()) is not invariant to scale and rotation. When you move the phone back, the image appears smaller, and the template "match" is just the place with the best match score, though it is not really a true match.
If you want scale and/or rotation invariance you will have to try non-template matching methods such as those using 2D-feature descriptors.
Check out the OpenCV samples for examples of how to do this.

Related

Adjust drone images to google earth

I want to use opencv to automatically adjust Google earth photo overlays (photos directly embedded in the landscape) to Google Earth's 3D environment. Photos already have location, fov, and orientation as metadata so only a small adjustment between the real image and the image rendered by Google earth is required.
I have tried using feature detectors like shif, Azaka or fast, but most of the images are on forested areas and there are not many clear features to link (they all find trees to be the most prominent features and that is not good). Beside, these algorithms will look everywhere in the image, and I only want to focus on small translations and scale.
I have tried brute force by translating one of the images a few pixels and computing the overall difference between images, in order to find a best fit traslation. Again, no great result.
Finally I have tried obtaining the contours of both (canny algorithm) to try to use as key points in the images. But so far I find it too hard to find the correct parameters that would work in all the images.
I am fairly new to opencv and I think I am just missing something obvious, can anybody give me some hint or ideas I can try to do??
Thanks
Direct methods can be applied if the images only differ by slight translation or rotation.
This may be a similar approach as the brute force method you described.
Here's a link to a short paper describing some of these methods:
http://pages.cs.wisc.edu/~dyer/ai-qual/irani-visalg00.pdf
Hope this helps.

Structure from Motion (SfM) in a tunnel-like structure?

I have a very specific application in which I would like to try structure from motion to get a 3D representation. For now, all the software/code samples I have found for structure from motion are like this: "A fixed object that is photographed from all angle to create the 3D". This is not my case.
In my case, the camera is moving in the middle of a corridor and looking forward. Sometimes, the camera can look on other direction (Left, right, top, down). The camera will never go back or look back, it always move forward. Since the corridor is small, almost everything is visible (no hidden spot). The corridor can be very long sometimes.
I have tried this software and it doesn't work in my particular case (but it's fantastic with normal use). Does anybody can suggest me a library/software/tools/paper that could target my specific needs? Or did you ever needed to implement something like that? Any help is welcome!
Thanks!
What kind of corridors are you talking about and what kind of precision are you aiming for?
A priori, I don't see why your corridor would not be a fixed object photographed from different angles. The quality of your reconstruction might suffer if you only look forward and you can't get many different views of the scene, but standard methods should still work. Are you sure that the programs you used aren't failing because of your picture quality, arrangement or other reasons?
If you have to do the reconstruction yourself, I would start by
1) Calibrating your camera
2) Undistorting your images
3) Matching feature points in subsequent image pairs
4) Extracting a 3D point cloud for each image pair
You can then orient the point clouds with respect to one another, for example via ICP between two subsequent clouds. More sophisticated methods might not yield much difference if you don't have any closed loops in your dataset (as your camera is only moving forward).
OpenCV and the Point Cloud Library should be everything you need for these steps. Visualization might be more of a hassle, but the pretty pictures are what you pay for in commercial software after all.
Edit (2017/8): I haven't worked on this in the meantime, but I feel like this answer is missing some pieces. If I had to answer it today, I would definitely suggest looking into the keyword monocular SLAM, which has recently seen a lot of activity, not least because of drones with cameras. Notably, LSD-SLAM is open source and may not be as vulnerable to feature-deprived views, as it operates directly on the intensity. There even seem to be approaches combining inertial/odometry sensors with the image matching algorithms.
Good luck!
FvD is right in the sense that your corridor is a static object. Your scenario is the same and moving around and object and taking images from multiple views. Your views are just not arranged to provide a 360 degree view of the object.
I see you mentioned in your previous comment that the data is coming from a video? In that case, the problem could very well be the camera calibration. A camera calibration tells the SfM algorithm about the internal parameters of the camera (focal length, principal point, lens distortion etc.) In the absence of knowledge about these, the bundler in VSfM uses information from the EXIF data of the image. However, I don't think video stores any EXIF information (not a 100% sure). As a result, I think the entire algorithm is running with bad focal length information and cannot solve for the orientation.
Can you extract a few frames from the video and see if there is any EXIF information?

Face Authentication

My project is Face Authentication.
System Description: My input is only one image (which was taken when the user logins for the first time) and using that image system should authenticate whenever the user logins to the application. The authentication images may differ from the first input image like -- different illumination conditions, different distance from camera and -10 to 10 degrees variation in pose. The camera used is same (ex: ipad) for all cases.
1) Authentication images are stored each time the user logins. How to
make use of these images to enhance the accuracy of the system??
2) When a new image comes, I need to select the closest image(s) (and
not all stored images) from the image repository and use for
authenticate to reduce the time. How to label an image based on
illumination/distance from camera automatically??
3) How should I make my system to perform decently for changes in
illumination and distance from camera??
Please, can anyone suggest me good alogirthm/papers/opensource-codes for my above questions??
Though it sounds like a research project, I would be extremely grateful if I get any response from someone.
For this task I think you should take a look at OpenCV's Face Recognition API. The API is basically able to identify the structure of a face (within certain limitations of course) and provide you with the coordinates of the image within which the face is available.
Having to deal with just the face in my opinion reduces the need to deal with different background colours which I think is something you do not really need.
Once you have the image of the face, you could scale it up/down to have a uniform size and also change the colour of the image to grey scale. Lastly, I would consider feeding all this information to an Artificial Neural Network since these are able to deal with inconsistencies with the input. This will allow you to increase your knowledge base each time a user logs in.
I'm pretty sure there are other ways to go around this. I would recommend taking a look at Google Scholar to try and find papers which deal with this matter for more information and quite possible other ways to achieve what you are after. Also, keep in mind that with some luck you might also find some open source project which already does most of what you are after.
If you really have a database of photographs of faces, you could probably use that to enhance the features of OpenCV face detection. The way faces are recognized is by comparing the principal components of the picture with those of the face examples in OpenCV database.
Check out:
How to create Haar Cascade (xml) for using with OpenCV?
Seeing that, you could also try to do your own Principal Component Analysis on every picture of a recognized face (use OpenCV face detection for that-> Black out everything exept the face, OpenCV gives you the position and size of the face). Compare the PCA to the ones in your database and match it to the closest. Course, this would work best with a fairly big database, so maybe at the beginning there could be wrong matches.
I think creating your own OpenCV haarcascade would be the best way to go.
Good Luck!

Approach for finding some patterns in the image captured from camera using opencv [duplicate]

This question already has an answer here:
Closed 10 years ago.
Possible Duplicate:
matchTemplate opencv not working as shown in opencv document
I have posted some questions earlier also but still can not find the solution.
According to my requirement I have to create a scanning paper app.
In this the camera takes a picture and I have to detect the patterns(that will be predefined) if it appears in the captured image or not.
I tried it with matchTemplate(opencv) but could not succeed it.
Since the image is captured from camera , so it might the case that the pattern in the captured image can be small or big from the size of the pattern image,
so in this case do the matchTemplate will work properly, or if this could not be the solution so what another approach should I try now.
match template won't work for different scales(sizes). To do that you can do a multiscale search. Basically you can run the template matching in different scales of input image. Another way you can do is to train a opencv Haar cascade to detect the template. It has built in multiscale detection.
the wrong size of the template is a standard problem of template matching. Since I don t see any example code it is not easy to understand where maybe the real problem of your question is. Did you try different thresholds in the algorithm ?
For the theoretical aspect there are two big main problems for feature extraction the size (distance) and the rotation (object orientation). The general hough transformation could be a solution.

How can I compare images of the same origin that were cropped?

Suppose I have an image file/URL, and I want my software to search it within a set of up to 100 images (or at least in that order of magnitude). The target image that the software should find should be the "same" image as the given image, but it should still be able to "forgive" slight processing on either of them (the two images may have been cropped differently, or they were compressed differently).
The question is - is this feasible a task, given that I won't have any of the images before the search is taking place (i.e., there won't be any indexing prior to the search.) Is it likely to work in subsecond time (remember that the compare set is quite small). And if feasible, which tools can I use for this task? This could be software components or even an online service (I can live with that for a proof of concept). Can OpenSURF help me here?
To focus my question further - I'm not asking which algorithms to use, at this point I would rather use an existing tool/API/service.
The target image that the software should find should be the "same" image as the given image, but it should still be able to "forgive" slight processing on either of them.
If "slight processing" doesn't involve rotation, but only "cropping", then simple cross-correlation should work, if there could be perspective correction, rotation, lens distortion correction, then things are more complicated.
I think this method is quite forgiving to slight color corrections. Anyway, you can always convert both images to grayscale and compare grayscale versions if you want.
To focus my question further - I'm not asking which algorithms to use, at this point I would rather use an existing tool/API/service.
You can start from cvMatchTemplate from OpenCV library (the link points to the C version of the API, but it's available also for C++ and Python). Use the cropped image as a template, and look for it in all your images.
If the images you compare have dark features on light backgrounds, you may benefit from using CV_TM_CCOEFF or CV_TM_CCOEFF_NORMED methods. They both subtract the average over the template area from both images. Normalized methods (CV_TM_*_NORMED) generally work better but are slower than their non-normalized counterparts.
You may consider to do some preprocessing with the images before the cross-correlation. If you normalize them first, the cross-correlation will be less sensitive to slight brightness/contrast modification. If you detect edges first, as suggested by #misha, you'll lose color/lightness information, but the results for contour overlapping will be much better.
jetxee set you off on the right track. However, if you simply use template matching, you can run into problems where the background interferes with your template matching result. For example, if your template is a building and your background is primarily light (e.g. desert sand), then the template matching will fail because the lighter background will always return a higher cross-correlation than the darker template. Here is an example of this problem.
The way you solve it is the same as what is in the link:
Perform edge-detection on both your template and the target image.
Throw original template and image away
Perform template detection using the edge-detected template and edge-detected target image
As far as forgiving slight processing, the edge detection step will take care of that. As long as the edges in the two images are not modified significantly (blurred, optically distorted), the approach will work.
I know you are not looking specifically for algorithms, but nonetheless, let me suggest the following which can accomplish exactly what you are trying to do, very efficiently...
For cropped versions of the same image, including rotation, the Fourier-Mellin transform or a log-polar transform (watch out for the artsy semi-nude drawing - good source however) will give you the translation, rotation and scale coefficients between the two images, allowing to to determine what operations were needed to go from one to the other.

Resources