Adjust drone images to google earth - opencv

I want to use opencv to automatically adjust Google earth photo overlays (photos directly embedded in the landscape) to Google Earth's 3D environment. Photos already have location, fov, and orientation as metadata so only a small adjustment between the real image and the image rendered by Google earth is required.
I have tried using feature detectors like shif, Azaka or fast, but most of the images are on forested areas and there are not many clear features to link (they all find trees to be the most prominent features and that is not good). Beside, these algorithms will look everywhere in the image, and I only want to focus on small translations and scale.
I have tried brute force by translating one of the images a few pixels and computing the overall difference between images, in order to find a best fit traslation. Again, no great result.
Finally I have tried obtaining the contours of both (canny algorithm) to try to use as key points in the images. But so far I find it too hard to find the correct parameters that would work in all the images.
I am fairly new to opencv and I think I am just missing something obvious, can anybody give me some hint or ideas I can try to do??
Thanks

Direct methods can be applied if the images only differ by slight translation or rotation.
This may be a similar approach as the brute force method you described.
Here's a link to a short paper describing some of these methods:
http://pages.cs.wisc.edu/~dyer/ai-qual/irani-visalg00.pdf
Hope this helps.

Related

How to match images with unknown rotation differences

I have a collection of about 3000 images that were taken from camera suspended from a weather balloon in flight. The camera is pointing a different direction in each image but is generally aimed down, so all the images share a significant area (40-50%) with the previous image but at a slightly different scale and rotated an arbitrary (and not consistent) amount. The image metadata includes a timestamp, so I do know with certainty the correct order of images and the elapsed time between each.
I want to process these images into a single video. If I simply string them together it will be great for making people seasick, but won't really capture the amazingness of the set :)
The specific part I need help with is finding the rotation of the image from the previous image. Is there a library somewhere that can identify regions of overlap between two images when the images themselves are rotated relative to each other? If I can find 2-3 common points (or more), I can do the remaining calculations to determine the amount of rotation and the offset so I can put them together correctly. Alternately, if there is a library that calculates both of those things for me, that would be even better.
I can do this in any language, with a slight preference for either Java or Python. The data is in Hadoop, so Java is the most natural language, but I can use scripting languages as well if necessary.
Since I'm new to image processing, I don't even know where to start. Any help is greatly appreciated!
For a problem like this you could look into SIFT. This algorithm detects local features in images. OpenCV has an implementation of it, you can read about it here.
You could also try SURF, which is a similar type of algorithm. OpenCV also has this implemented, you can read about that here.

Structure from Motion (SfM) in a tunnel-like structure?

I have a very specific application in which I would like to try structure from motion to get a 3D representation. For now, all the software/code samples I have found for structure from motion are like this: "A fixed object that is photographed from all angle to create the 3D". This is not my case.
In my case, the camera is moving in the middle of a corridor and looking forward. Sometimes, the camera can look on other direction (Left, right, top, down). The camera will never go back or look back, it always move forward. Since the corridor is small, almost everything is visible (no hidden spot). The corridor can be very long sometimes.
I have tried this software and it doesn't work in my particular case (but it's fantastic with normal use). Does anybody can suggest me a library/software/tools/paper that could target my specific needs? Or did you ever needed to implement something like that? Any help is welcome!
Thanks!
What kind of corridors are you talking about and what kind of precision are you aiming for?
A priori, I don't see why your corridor would not be a fixed object photographed from different angles. The quality of your reconstruction might suffer if you only look forward and you can't get many different views of the scene, but standard methods should still work. Are you sure that the programs you used aren't failing because of your picture quality, arrangement or other reasons?
If you have to do the reconstruction yourself, I would start by
1) Calibrating your camera
2) Undistorting your images
3) Matching feature points in subsequent image pairs
4) Extracting a 3D point cloud for each image pair
You can then orient the point clouds with respect to one another, for example via ICP between two subsequent clouds. More sophisticated methods might not yield much difference if you don't have any closed loops in your dataset (as your camera is only moving forward).
OpenCV and the Point Cloud Library should be everything you need for these steps. Visualization might be more of a hassle, but the pretty pictures are what you pay for in commercial software after all.
Edit (2017/8): I haven't worked on this in the meantime, but I feel like this answer is missing some pieces. If I had to answer it today, I would definitely suggest looking into the keyword monocular SLAM, which has recently seen a lot of activity, not least because of drones with cameras. Notably, LSD-SLAM is open source and may not be as vulnerable to feature-deprived views, as it operates directly on the intensity. There even seem to be approaches combining inertial/odometry sensors with the image matching algorithms.
Good luck!
FvD is right in the sense that your corridor is a static object. Your scenario is the same and moving around and object and taking images from multiple views. Your views are just not arranged to provide a 360 degree view of the object.
I see you mentioned in your previous comment that the data is coming from a video? In that case, the problem could very well be the camera calibration. A camera calibration tells the SfM algorithm about the internal parameters of the camera (focal length, principal point, lens distortion etc.) In the absence of knowledge about these, the bundler in VSfM uses information from the EXIF data of the image. However, I don't think video stores any EXIF information (not a 100% sure). As a result, I think the entire algorithm is running with bad focal length information and cannot solve for the orientation.
Can you extract a few frames from the video and see if there is any EXIF information?

Face Authentication

My project is Face Authentication.
System Description: My input is only one image (which was taken when the user logins for the first time) and using that image system should authenticate whenever the user logins to the application. The authentication images may differ from the first input image like -- different illumination conditions, different distance from camera and -10 to 10 degrees variation in pose. The camera used is same (ex: ipad) for all cases.
1) Authentication images are stored each time the user logins. How to
make use of these images to enhance the accuracy of the system??
2) When a new image comes, I need to select the closest image(s) (and
not all stored images) from the image repository and use for
authenticate to reduce the time. How to label an image based on
illumination/distance from camera automatically??
3) How should I make my system to perform decently for changes in
illumination and distance from camera??
Please, can anyone suggest me good alogirthm/papers/opensource-codes for my above questions??
Though it sounds like a research project, I would be extremely grateful if I get any response from someone.
For this task I think you should take a look at OpenCV's Face Recognition API. The API is basically able to identify the structure of a face (within certain limitations of course) and provide you with the coordinates of the image within which the face is available.
Having to deal with just the face in my opinion reduces the need to deal with different background colours which I think is something you do not really need.
Once you have the image of the face, you could scale it up/down to have a uniform size and also change the colour of the image to grey scale. Lastly, I would consider feeding all this information to an Artificial Neural Network since these are able to deal with inconsistencies with the input. This will allow you to increase your knowledge base each time a user logs in.
I'm pretty sure there are other ways to go around this. I would recommend taking a look at Google Scholar to try and find papers which deal with this matter for more information and quite possible other ways to achieve what you are after. Also, keep in mind that with some luck you might also find some open source project which already does most of what you are after.
If you really have a database of photographs of faces, you could probably use that to enhance the features of OpenCV face detection. The way faces are recognized is by comparing the principal components of the picture with those of the face examples in OpenCV database.
Check out:
How to create Haar Cascade (xml) for using with OpenCV?
Seeing that, you could also try to do your own Principal Component Analysis on every picture of a recognized face (use OpenCV face detection for that-> Black out everything exept the face, OpenCV gives you the position and size of the face). Compare the PCA to the ones in your database and match it to the closest. Course, this would work best with a fairly big database, so maybe at the beginning there could be wrong matches.
I think creating your own OpenCV haarcascade would be the best way to go.
Good Luck!

opecv template matching -> get exact location?

I have opencv installed and working on my iphone (big thanks to this community). I'm doing template matching with it. It does find the object in the captured image. However, the exact location seems to be hard to tell.
Please take a look at the following video (18 seconds):
http://www.youtube.com/watch?v=PQnXNZMqpsU
As you can see in the video, it does find the template in the image. But when i move the camera a bit further away, then the found template is positioned somewhere inside that square. That way it's hard to tell the exact location of the found object.
The square that you see is basically the found x,y location of the template plus the width,height of the actual template image.
So basically my question is, is there a way to find the exact location of the found template image? Because currently it can be at any locastion inside that square. No real way to tell the exact location...?
It seems that you're not well-pleased with your template matching algorithm :)
Shortly, there are some ways to improve it, but I would recommend you to try something else. If your images are always as simple as in the video, you can use thresholding, contour finding, blob detection, etc. They are simple and fast.
For a more demanding environment, you may try feature matching. Look for SIFT, SURF, ORB, or other ways to describe your objects with features. Actually, ORB was specifically designed to be fast enough for the limited power of mobile phones.
Try this sample in the OCV samples/cpp/ folder
matching_to_many_images.cpp
And check this detailed answer on how to use feature detectors;
Detecting if an object from one image is in another image with OpenCV
Template matching (cvMatchTemplate()) is not invariant to scale and rotation. When you move the phone back, the image appears smaller, and the template "match" is just the place with the best match score, though it is not really a true match.
If you want scale and/or rotation invariance you will have to try non-template matching methods such as those using 2D-feature descriptors.
Check out the OpenCV samples for examples of how to do this.

How to align two different pictures in such a way, that they match as close as possible?

I need to automatically align an image B on top of another image A in such a way, that the contents of the image match as good as possible.
The images can be shifted in x/y directions and rotated up to 5 degrees on z, but they won't be distorted (i.e. scaled or keystoned).
Maybe someone can recommend some good links or books on this topic, or share some thoughts how such an alignment of images could be done.
If there wasn't the rotation problem, then I could simply try to compare rows of pixels with a brute-force method until I find a match, and then I know the offset and can align the image.
Do I need AI for this?
I'm having a hard time finding resources on image processing which go into detail how these alignment-algorithms work.
So what people often do in this case is first find points in the images that match then compute the best transformation matrix with least squares. The point matching is not particularly simple and often times you just use human input for this task, you have to do it all the time for calibrating cameras. Anyway, if you want to fully automate this process you can use feature extraction techniques to find matching points, there are volumes of research papers written on this topic and any standard computer vision text will have a chapter on this. Once you have N matching points, solving for the least squares transformation matrix is pretty straightforward and, again, can be found in any computer vision text, so I'll assume you got that covered.
If you don't want to find point correspondences you could directly optimize the rotation and translation using steepest descent, trouble is this is non-convex so there are no guarantees you will find the correct transformation. You could do random restarts or simulated annealing or any other global optimization tricks on top of this, that would most likely work. I can't find any references to this problem, but it's basically a digital image stabilization algorithm I had to implement it when I took computer vision but that was many years ago, here are the relevant slides though, look at "stabilization revisited". Yes, I know those slides are terrible, I didn't make them :) However, the method for determining the gradient is quite an elegant one, since finite difference is clearly intractable.
Edit: I finally found the paper that went over how to do this here, it's a really great paper and it explains the Lucas-Kanade algorithm very nicely. Also, this site has a whole lot of material and source code on image alignment that will probably be useful.
for aligning the 2 images together you have to carry out image registration technique.
In matlab, write functions for image registration and select your desirable features for reference called 'feature points' using 'control point selection tool' to register images.
Read more about image registration in the matlab help window to understand properly.

Resources