Determine camera rotation between two 360x180 equirectangular panoramic images - opencv

I have n frames of 360x180 panoramic images. I'd like to determine the camera's rotation based on the comparison between two sequential images. For this project it's safe to assume that all features visible in the images are at infinity.
I am new (today) to OpenCV and definitely need to do more reading. I have an app that will find the KeyPoints using either SIFT or SURF, but am unsure of how to continue from here.
Thanks

To find the rotation between to images you need to know orientation of both, and therefore, the pose. To calculate the camera pose you need to find homography transformation from keypoints matches.
Imagine you know the orientation of the first frame and position, because you decide it arbitrarily. You have the keypoints extracted by SIFT. From here you have next steps:
1- Extract keypoints from next frame.
2- Find matches of the keypoints on both frames.
3- Use RANSAC to find the best set of inliers/outliers of that matches for next step
4- Use DLT (Direct Lienar Transform) with that set, this will use 4 matches to find homography between images.
5- Once you have homography, you can extract the pose, and the rotation.
You have openCV functions for all the steps except for pose from homography.

Related

Homography when camera translation (for stitching)

I have a camera which I take with it 2 captures. I want do make a reconstitution with the 2 images in one image.
I only do a translation with the camera an take images of a plane TV screen. I heard homography only works when the camera does a rotation.
What should i do when I only have a translation?
Because you are imaging a planar surface (in your case a TV screen), all images of it with a perspective camera will be related by homographies. This is the same if your camera is translating and/or rotating. Therefore to stitch different images of the surface, you don't need to do any 3D geometry processing (essential matrix computation/triangulation etc.).
To solve your problem you need to do the following:
You determine the homographies between your images. Because you only have two images you can select the first one as the 'source' and the second one as the 'target', and compute the homography from target to source. This is classically done with feature detection and robust homography fitting. Let's denote this homography by the 3x3 matrix H.
You warp your target image to your source using H. You can do this in openCV with the warpPerspective method.
Merge your source and warped target using a blending function.
An open source project for doing exactly these steps is here.
If your TV lacks distinct features or there is lots of background clutter, the method for estimating H might not be highly robust. If this is the case you could manually click four or more correspondences on the TV in the target and source images, and compute H using OpenCV's findHomography method. Note that your correspondences cannot be completely arbitrary. Specifically, there should not be three correspondences that are colinear (in which case H cannot be computed). They should also be clicked as accurately as possible because errors will affect the final stitch and cause ghosting artefacts.
An important caveat is if your camera has significant lens distortion. In this case your images will not be related by homographies. You can deal with this by performing a calibration of your camera using OpenCV, and then you need to pre-process your images to undo the lens distortion (using OpenCV's undistort method).

Expand homography matrix for distortion

I have two set of corresponding matches that I want to compute Homography Matrix between them. However, I found that the transformation between this points can not be modeled using just the Homography Matrix. I figured this by observing some lines in the original set of points have not represented as lines in the second set.
For example:
The previous state is very extreme in real the distortion is much less than that. It is usually a distortion because of the first set of points were extracted from image that was taken by scanner where the other set of points were extracted from a photo taken by mobile phone.
The Question:
How can I expand or Generalize the Homography matrix to make it includes this case? Or in other words, I want a non-line-preserve transformation model to use it instead of the Homography Matrix, Any Suggestion?
P.S OpenCV library is prefered if there is something ready to use.
EDIT:
Eliminating the distortion may not be an option for me because the photos are somewhat complex and I do not have the same Camera always plus I supposed to deal with images from unknown source (back-end separated from front-end). However, I have a reference which is planner and a query which has perspective + distoration effect which I want to correct it after I could found the corresponding pair matches.
It would be better if you had provided some examples of your images, so that we can understand your case better. From the description it seems that you are dealing with camera distortion.
Typical approach is to perform camera calibration once, then undistort each frame and finally work with images where straight lines look straight. All of these tasks are possible with OpenCV, consider the link above.
In case you cannot perform camera calibration to estimate distortion - there isn't much you can do. Try to calculate and apply homography on unrectified images - if the cameras don't have wide angle lens this should look ok (consider this case for example)

What are keypoints in image processing?

When using OpenCV for example, algorithms like SIFT or SURF are often used to detect keypoints. My question is what actually are these keypoints?
I understand that they are some kind of "points of interest" in an image. I also know that they are scale invariant and are circular.
Also, I found out that they have orientation but I couldn't understand what this actually is. Is it an angle but between the radius and something? Can you give some explanation? I think I need what I need first is something simpler and after that it will be easier to understand the papers.
Let's tackle each point one by one:
My question is what actually are these keypoints?
Keypoints are the same thing as interest points. They are spatial locations, or points in the image that define what is interesting or what stand out in the image. Interest point detection is actually a subset of blob detection, which aims to find interesting regions or spatial areas in an image. The reason why keypoints are special is because no matter how the image changes... whether the image rotates, shrinks/expands, is translated (all of these would be an affine transformation by the way...) or is subject to distortion (i.e. a projective transformation or homography), you should be able to find the same keypoints in this modified image when comparing with the original image. Here's an example from a post I wrote a while ago:
Source: module' object has no attribute 'drawMatches' opencv python
The image on the right is a rotated version of the left image. I've also only displayed the top 10 matches between the two images. If you take a look at the top 10 matches, these are points that we probably would want to focus on that would allow us to remember what the image was about. We would want to focus on the face of the cameraman as well as the camera, the tripod and some of the interesting textures on the buildings in the background. You see that these same points were found between the two images and these were successfully matched.
Therefore, what you should take away from this is that these are points in the image that are interesting and that they should be found no matter how the image is distorted.
I understand that they are some kind of "points of interest" of an image. I also know that they are scale invariant and I know they are circular.
You are correct. Scale invariant means that no matter how you scale the image, you should still be able to find those points.
Now we are going to venture into the descriptor part. What makes keypoints different between frameworks is the way you describe these keypoints. These are what are known as descriptors. Each keypoint that you detect has an associated descriptor that accompanies it. Some frameworks only do a keypoint detection, while other frameworks are simply a description framework and they don't detect the points. There are also some that do both - they detect and describe the keypoints. SIFT and SURF are examples of frameworks that both detect and describe the keypoints.
Descriptors are primarily concerned with both the scale and the orientation of the keypoint. The keypoints we've nailed that concept down, but we need the descriptor part if it is our purpose to try and match between keypoints in different images. Now, what you mean by "circular"... that correlates with the scale that the point was detected at. Take for example this image that is taken from the VLFeat Toolbox tutorial:
You see that any points that are yellow are interest points, but some of these points have a different circle radius. These deal with scale. How interest points work in a general sense is that we decompose the image into multiple scales. We check for interest points at each scale, and we combine all of these interest points together to create the final output. The larger the "circle", the larger the scale was that the point was detected at. Also, there is a line that radiates from the centre of the circle to the edge. This is the orientation of the keypoint, which we will cover next.
Also I found out that they have orientation but I couldn't understand what actually it is. It is an angle but between the radius and something?
Basically if you want to detect keypoints regardless of scale and orientation, when they talk about orientation of keypoints, what they really mean is that they search a pixel neighbourhood that surrounds the keypoint and figure out how this pixel neighbourhood is oriented or what direction this patch is oriented in. It depends on what descriptor framework you look at, but the general jist is to detect the most dominant orientation of the gradient angles in the patch. This is important for matching so that you can match keypoints together. Take a look at the first figure I have with the two cameramen - one rotated while the other isn't. If you take a look at some of those points, how do we figure out how one point matches with another? We can easily identify that the top of the cameraman as an interest point matches with the rotated version because we take a look at points that surround the keypoint and see what orientation all of these points are in... and from there, that's how the orientation is computed.
Usually when we want to detect keypoints, we just take a look at the locations. However, if you want to match keypoints between images, then you definitely need the scale and the orientation to facilitate this.
I'm not as familiar with SURF, but I can tell you about SIFT, which SURF is based on. I provided a few notes about SURF at the end, but I don't know all the details.
SIFT aims to find highly-distinctive locations (or keypoints) in an image. The locations are not merely 2D locations on the image, but locations in the image's scale space, meaning they have three coordinates: x, y, and scale. The process for finding SIFT keypoints is:
blur and resample the image with different blur widths and sampling rates to create a scale-space
use the difference of gaussians method to detect blobs at different scales; the blob centers become our keypoints at a given x, y, and scale
assign every keypoint an orientation by calculating a histogram of gradient orientations for every pixel in its neighborhood and picking the orientation bin with the highest number of counts
assign every keypoint a 128-dimensional feature vector based on the gradient orientations of pixels in 16 local neighborhoods
Step 2 gives us scale invariance, step 3 gives us rotation invariance, and step 4 gives us a "fingerprint" of sorts that can be used to identify the keypoint. Together they can be used to match occurrences of the same feature at any orientation and scale in multiple images.
SURF aims to accomplish the same goals as SIFT but uses some clever tricks in order to increase speed.
For blob detection, it uses the determinant of Hessian method. The dominant orientation is found by examining the horizontal and vertical responses to Haar wavelets. The feature descriptor is similar to SIFT, looking at orientations of pixels in 16 local neighborhoods, but results in a 64-dimensional vector.
SURF features can be calculated up to 3 times faster than SIFT features, yet are just as robust in most situations.
For reference:
A good SIFT tutorial
An introduction to SURF

Whether the SIFT is rotation invariant feature or not opencv

I want to write a code in opencv that proves whether the SIFT is rotation invariant feature or not.
Assuming that the image has one keypoint which is the center of the image. I want to caculate keypoint descriptor (magnitude and direction). I want to ask what is the keypoint ? is it a location in the image ?
I searched for simple tutorial or code to know what to do but I didn't find something simple.
A keypoint is an interesting point in your image. These points are usually found when you have a change in intensity, for example, at the edges between two objects in the image. A keypoint encodes, among other things, the location of the point in the image. SIFT will then extract a local feature descriptor for your keypoint which you can then use for image matching.
Scale Invariant Feature Transform (SIFT) is scale invariant, as the acronym says. It is not rotationally invariant. In such a case, you could use SURF. But, SURF is a little problematic for real-time applications.
SIFT: http://en.wikipedia.org/wiki/Scale-invariant_feature_transform
SURF: http://www.vision.ee.ethz.ch/~surf/papers.html
Example code: Trying to match two images using sift in OpenCv, but too many matches
To test your SIFT code out, you could create a black 512x512 image in Opencv with three equally spaced white colored points along its width. Then, rotate the image by small rotations angles, measure the angle, and check the feature matches. As you are doing this, you will realize that for large rotations, the features matches are thrown off.

position recognition of simple fiducial in image

I don't need a working solution but I'm looking for somebody who can push me into the right direction with some useful hints/links:
I have an image with a fiducial in it (can be e.g. a cross or dot or whatever simple geometry). The images source itself is lit in a way so that a human would not like the resulting image but the contrast for the fiducial is very good. Next I have a clear geometric description of that fiducial (vector data format) and a nominal position of it.
Now I want OpenCV to find the fiducial into the image and return me its real, current position (and rotation for fiducials where this is possible).
How can this be done with OpenCV? The tutorials I found always use complex patterns like faces and pictures that are not optimised for the fiducial detection itself, therefore they all use very complicated learning/description methods.
Depending on your fiducial you can use different methods. A very common method, already implemented in OpenCV is SIFT, which finds scale invariant robust points in an image. The way to proceed is:
Run SIFT on your fiducial offline. This generates keypoints to be tracked.
Run SIFT real-time (or FAST, which can also generate SIFT descriptors) to find keypoints in the scene.
Use a matcher (FLANN matcher, for example) to find which keypoints found in the image correspong to the fiducial.
Run findhomography() for matched points. From the found homography H matrix 3x3, you can obtain the camera pose.
There are more aproaches, this the one I like and it is quite up-to-day and fast.

Resources