I am wanting to automatically align stereo images for best fit. The images are taken only a few inches apart and in the same lighting conditions etc but there might be slight rotations and shifts etc. Can someone describe the best approach for this sort of task? I assume one could use template matching. Would a brute force approach work just fine? Would it be better to detect features in each and try to align the features? I am not looking to rectify the images, just find the closest alignment. Any direction about methodology would be very appreciated.
Edit: Basically this will be a similar method to creating a panorama except that I do not need to warp the image.
Related
I want to use opencv to automatically adjust Google earth photo overlays (photos directly embedded in the landscape) to Google Earth's 3D environment. Photos already have location, fov, and orientation as metadata so only a small adjustment between the real image and the image rendered by Google earth is required.
I have tried using feature detectors like shif, Azaka or fast, but most of the images are on forested areas and there are not many clear features to link (they all find trees to be the most prominent features and that is not good). Beside, these algorithms will look everywhere in the image, and I only want to focus on small translations and scale.
I have tried brute force by translating one of the images a few pixels and computing the overall difference between images, in order to find a best fit traslation. Again, no great result.
Finally I have tried obtaining the contours of both (canny algorithm) to try to use as key points in the images. But so far I find it too hard to find the correct parameters that would work in all the images.
I am fairly new to opencv and I think I am just missing something obvious, can anybody give me some hint or ideas I can try to do??
Thanks
Direct methods can be applied if the images only differ by slight translation or rotation.
This may be a similar approach as the brute force method you described.
Here's a link to a short paper describing some of these methods:
http://pages.cs.wisc.edu/~dyer/ai-qual/irani-visalg00.pdf
Hope this helps.
I want to get the panorama view of cylindrical objects without using special cameras.
The idea was to get a lot of images from different views, cut the center and join these centers together. But I got bad results.
May be somebody knows the best solution for this purpose? May be it's better recognize from video?
Hugin is a great configurable and agile free cross-platform software to stitch panoramic images. You can definitely use it for your task.
If you want to create your own tool for that purpose, you may find useful to read about Hugin's toolchain workflow to know what steps may be needed to achieve nice results.
A possible work flow may be
Take images.
Correct projection depending on lense parameters.
Find and verify control points on image pairs (possible algorithms: SIFT, SURF).
Geometric optimisation (shift, 3D rotation, etc).
Photometric optimisation (exposure values, vignetting, white balance).
Stitch and blend output (cut the centers and join them smoothly together).
You may skip some steps depending on your image capturing conditions. The more similar images are (same camera and cylinder positions, same lighting, etc.) the less image correction you will need.
I am trying to build a solution where I could differentiate between a 3D textured surface with the height of around 200 micron and a regular text print.
The following image is a textured surface. The black color here is the base surface.
Regular text print will be the 2D print of the same 3D textured surface.
[EDIT]
Initial thought about solving this problem, could look like this:
General idea here would be, images shot at different angles of a 3D object would be less related to each other than the images shot for a 2D object in the similar condition.
One of the possible way to verify could be: 1. Take 2 images, with enough light around (flash of the camera). These images should be shot at as far angle from the object plane as possible. Say, one taken at camera making 45 degree at left side and other with the same angle on the right side.
Extract the ROI, perspective correct them.
Find GLCM of the composite of these 2 images. If the contrast of the GLCM is low, then it would be a 3D image, else a 2D.
Please pardon the language, open for edit suggestion.
General idea here would be, images shot at different angles of a 3D object would be less related to each other than the images shot for a 2D object in the similar condition.
One of the possible way to verify could be:
1. Take 2 images, with enough light around (flash of the camera). These images should be shot at as far angle from the object plane as possible. Say, one taken at camera making 45 degree at left side and other with the same angle on the right side.
Extract the ROI, perspective correct them.
Find GLCM of composite of these 2 images. If contrast of the GLCM is low, then it would be a 3D image, else a 2D.
Please pardon the language, open for edit suggestion.
If you can get another image which
different angle or
sharper angle or
different lighting condition
you may get result. However, using two image with different angle with calibrate camera can get stereo vision image which solve your problem easily.
This is a pretty complex problem and there is no plug-in-and-go solution for this. Using light (structured or laser) or shadow to detect a height of 0.2 mm will almost surely not work with an acceptable degree of confidence, no matter of how much "photos" you take. (This is just my personal intuition, in computer vision we verify if something works by actually testing).
GLCM is a nice feature to describe texture, but it is, as far as I know, used to verify if there is a pattern in the texture, so, I believe it would output a positive value for 2D print text if there is some kind of repeating pattern.
I would let the computer learn what is text, what is texture. Just extract a large amount of 3D and 2D data, and use a machine learning engine to learn which is what. If the feature space is rich enough, it may be able to find a way to differentiate one from another, in a way our human mind wouldn't be able to. The feature space should consist of edge and colour features.
If the system environment is stable and controlled, this approach will work specially well, since the training data will be so similar to the testing data.
For this problem, I'd start by computing colour and edge features (local image pixel sums over different edge and colour channels) and try a boosted classifier. Boosted classifiers aren't the state of the art when it comes to machine learning, but they are good at not overfitting (meaning you can just insert as much data as you want), and will most likely work in a stable environment.
Hope this helps,
Good luck.
I'd like to check one feature (The lower part of the RAI) if it does appear at the top of the given images. Occasionally, there may be some noises at the left or right of the images affecting the detection such as some vertical lines or the to-be-detected image is a totally different image without the feature(RAI).
What would be the best way to achieve my expectation?
I prefer to use OpenCV. But should I use ORB,FREAK,BRISK,SURF algorithms for the detection? I'm not sure if this is the approximate case to use. I wonder if there are any other good choices and ask for help. Thanks in advance. This problem bothers me quite a long time.
We as human, could recognize these two images as same image :
In computer, it will be easy to recognize these two image if they are in the same size, so we have to make Preprocessing stage or step before recognize it, like scaling, but if we look deeply to scaling process, we will know that it's not an efficient way.
Now, could you help me to find some way to convert images into objects that doesn't deal with size or pixel location, to be input for recognition method ?
Thanks advance.
I have several ideas:
Let the image have several color thresholds. This way you get large
areas of the same color. The shapes of those areas can be traced with
curves which are math. If you do this for the larger and the smaller
one and see if the curves match.
Try to define key spots in the area. I don't know for sure how
this works but you can look up face detection algoritms. In such
an algoritm there is a math equation for how a face should look.
If you define enough object in such algorithms you can define
multiple objects in the images to see if the object match on the
same spots.
And you could see if the predator algorithm can accept images
of multiple size. If so your problem is solved.
It looks like you assume that human's brain recognize image in computationally effective way, which is rather not true. this algorithm is so complicated that we did not find it. It also takes a large part of your brain to deal with visual data.
When it comes to software there are some scale(or affine) invariant algorithms. One of such algorithms is LeNet 5 neural network.