Matlab Camera Calibration - Correct lens distortion - image-processing

In the Computer Vision System Toolbox for Matlab there are three types of interpolation methods used for Correct lens distortion.
Interpolation method for the function to use on the input image. The interp input interpolation method can be the string, 'nearest', 'linear', or 'cubic'.
My question is: what is the difference between 'nearest', 'linear', or 'cubic' ? and which one implemented in "Zhang" and "Heikkila, J, and O. Silven" methods.

I can't access the paged at the link you wrote in your question (it asks for a username and password) and so I assume your linked page has the same contents of the page http://www.mathworks.it/it/help/vision/ref/undistortimage.html which I quote here:
J = undistortImage(I,cameraParameters,interp) removes lens distortion from the input image, I and specifies the
interpolation method for the function to use on the input image.
Input Arguments
I — Input image
cameraParameters — Object for storing camera parameters
interp — Interpolation method
'linear' (default) | 'nearest' | 'cubic'
Interpolation method for the function to use on
the input image. The interp input interpolation method can be the
string, 'nearest', 'linear', or 'cubic'.
Furthermore, I assume you are referring to these papers:
ZHANG, Zhengyou. A flexible new technique for camera calibration. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 2000, 22.11: 1330-1334.
HEIKKILA, Janne; SILVEN, Olli. A four-step camera calibration procedure with implicit image correction. In: Computer Vision and Pattern Recognition, 1997. Proceedings., 1997 IEEE Computer Society Conference on. IEEE, 1997. p. 1106-1112.
I have searched for the word "interpolation" in the two pdf documents Zhang and Heikkila and Silven and I did not find any direct statement about the interpolation method they have used.
To my knowledge, in general, a camera calibration method is concerned on how to estimate the intrinsic, extrinsic and lens distortion parameters (all these parameters are inside the input argument cameraParameters of Matlab's undistortImage function); the interpolation method is part of a different problem, i.e. the problem of "Geometric Image Transformations".
I quote from the OpenCV's page Geometric Image Transformation (I have slightly modified the original omitting some details and adding some definitions, I assume you are working with grey level image):
The functions in this section perform various geometrical
transformations of 2D images. They do not change the image content but
deform the pixel grid and map this deformed grid to the destination
image. In fact, to avoid sampling artifacts, the mapping is done in
the reverse order, from destination to the source. That is, for each
pixel (x, y) of the destination image, the functions compute
coordinates of the corresponding “donor” pixel in the source image and
copy the pixel value:
dst(x,y) = src(f_x(x,y), f_y(x,y))
where
dst(x,y) is the grey value of the pixel located at row x and column y in the destination image
src(x,y) is the grey value of the pixel located at row x and column y in the source image
f_x is a function that maps the row x and the column y to a new row, it just uses coordinates and not the grey level.
f_y is a function that maps the row x and the column y to a new column, it just uses coordinates and not the grey level.
The actual implementations of the geometrical transformations, from
the most generic remap() and to the simplest and the fastest resize()
, need to solve two main problems with the above formula:
• Extrapolation of non-existing pixels. Similarly to the filtering
functions described in the previous section, for some (x,y) , either
one of f_x(x,y) , or f_y(x,y) , or both of them may fall outside of
the image. In this case, an extrapolation method needs to be used.
OpenCV provides the same selection of extrapolation methods as in the
filtering functions. In addition, it provides the method
BORDER_TRANSPARENT . This means that the corresponding pixels in the
destination image will not be modified at all.
• Interpolation of pixel
values. Usually f_x(x,y) and f_y(x,y) are floating-point numbers. This
means that <f_x, f_y> can be either an affine or
perspective transformation, or radial lens distortion correction, and
so on. So, a pixel value at fractional coordinates needs to be
retrieved. In the simplest case, the coordinates can be just rounded
to the nearest integer coordinates and the corresponding pixel can be
used. This is called a nearest-neighbor interpolation. However, a
better result can be achieved by using more sophisticated
interpolation methods, where a polynomial function is fit into some
neighborhood of the computed pixel (f_x(x,y), f_y(x,y)), and then the
value of the polynomial at (f_x(x,y), f_y(x,y)) is taken as the
interpolated pixel value. In OpenCV, you can choose between several
interpolation methods. See resize() for details.
For a "soft" introduction see also for example Cambridge in colour - DIGITAL IMAGE INTERPOLATION.
So let's say you need the grey level of pixel at x=20.2 y=14.7, since x and y are number with a fractional part different from zero you will need to "invent" (compute) the grey level in some way. In the simplest case ('nearest' interpolation) you just say that the grey level at (20.2,14.7) is the grey level you retrieve at (20,15), it is called "nearest" because 20 is the nearest integer value to 20.2 and 15 is the nearest integer value to 14.7.
In the (bi)'linear' interpolation you will compute the value at (20.2,14.7) with a combination of the grey levels of the four pixels at (20,14), (20,15), (21,14), (21,15); for the details on how to compute the combination see the Wikipedia page which has a numeric example.
The (bi)'cubic' interpolation considers the combination of sixteen pixels in order to compute the value at (20.2,14.7), see the Wikipedia page.
I suggest you to try all the three methods, with the same input image, and see the differences in the output image.

Interpolation method is actually independent of the camera calibration. Any time you apply a geometric transformation to an image, such as rotation, re-sizing, or distortion compensation, the pixels in the new image will correspond to points between the pixels of the old image. So you have to interpolate their values somehow.
'nearest' means you simply use the value of the nearest pixel.
'linear' means you use bi-linear interpolation. The new pixel's value is a weighted sum of the values of the neighboring pixels in the input image, where the weights are proportional to distances.
'cubic' means you use a bi-cubic interpolation, which is more complicated than bi-linear, but may give you a smoother image.
A good description of these interpolation methods is given in the documentation for the interp2 function.
And finally, just to clarify, the undistortImage function is in the Computer Vision System Toolbox.

Related

Map one camera's colour profile to another

I have two cameras (A & B), which I've taken photos of a calibration scene then corrected distortion and used feature mapping to get a pixel precise registration resulting in the following:
As you can see, the colour response is quite different. What I would like to do now is take a new photo with A and answer the question: what would it look like if instead I had used camera B?
Is there some existing technique or algorithm to convert between the colour spaces/profiles of two cameras like this?
From the image you provided it is not to hard to segment them to small squares. After that take the mean(or even better median) of each square in the both images. Now you have 2*m*n value which are as follow: MeansReference_(m*n) , MeansQuery_(m*n). Using the linear color correction matrix which is:
You can construct this linear system:
MeansReference[i][j]= C * MeansQuery[i][j]
Where:
MeansReference[i][j] is a vector (3*1) of the color (R,G,B) of the square [i,j] in the Reference image.
MeansQuery[i][j] is a vector (3*1) of the color (R,G,B) of the square [i,j] in the Query image.
C is the 3*3 Matrix (a11,a12,... ,a33)
Now, For each i,j you will got 3 linear equations (for R,G,B). Since there are 9 variables (a11...a33) you need at least 9 equations which mean at least 3 squares (each square provide you with 3 equation). However, the more equation you construct, the more accuracy you got.
How to solve linear system with the number of equations more than the number of variables? Use Batch-LSE. You can find great details about it in Neuro-Fuzzy-and-Soft-Computing-Jang-Sun-Mizutan book or any online source.
After you find the 9 variables you have a color correction matrix. Just apply it on any image from the new camera and you will got an image that look like it was taken by the old camera. If you want the opposite, apply C^-1 instead.
Good Luck!

direction on image pattern description and representation

I have a basic question regarding pattern learning, or pattern representation. Assume I have a complex pattern of this form, could you please provide me with some research directions or concepts that I can follow to learn how to represent (mathematically describe) these forms of patterns? in general the pattern does not have a closed contour nor it can be represented with analytical objects like boxes, circles etc.
By mathematically describe I'm assuming you mean derive from the image a vector of values that represents the content of the image. In computer vision/image processing we call this an "image descriptor".
There are several image descriptors that could be applied to pixel based data of the form you showed, which appear to be 1 value per pixel i.e. greyscale images.
One approach is to perform "spatial gridding" where you divide the image up into a regular grid of a constant size e.g. a 4x4 grid. You then average the pixel values within each cell of the grid. Then concatenate these values to form a 16 element vector - this coarsely describes the pixel distribution of the image.
Another approach would be to use "image moments" which are 2D statistical moments. Use this equation:
where f(x,y) is they pixel value at coordinates (x,y). W and H are the image width and height. The mu_x and mu_y indicate the average x and y. The values i and j select the order of moment you want to compute. Various orders of moment can be combined in different ways for example in the "Hu moments" we can compute 7 numbers using combinations of image moments:
The cool thing about the Hu moments is you can scale, rotate, flip etc the image and you still get the same 7 values which makes this a robust ("affine invariant") image descriptor.
Hope this helps as a general direction to read more in.

Documentation of CvStereoBMState for disparity calculation with cv::StereoBM

The application of Konolige's block matching algorithm is not sufficiantly explained in the OpenCV documentation. The parameters of CvStereoBMState influence the accuracy of the disparities calculated by cv::StereoBM. However, those parameters are not documented. I will list those parameters below and describe, what I understand. Maybe someone can add a description of the parameters, which are unclear.
preFilterType: Determines, which filter is applied on the image before the disparities are calculated. Can be CV_STEREO_BM_XSOBEL (Sobel filter) or CV_STEREO_BM_NORMALIZED_RESPONSE (maybe differences to mean intensity???)
preFilterSize: Window size of the prefilter (width = height of the window, negative value)
preFilterCap: Clips the output to [-preFilterCap, preFilterCap]. What happens to the values outside the interval?
SADWindowSize: Size of the compared windows in the left and in the right image, where the sums of absolute differences are calculated to find corresponding pixels.
minDisparity: The smallest disparity, which is taken into account. Default is zero, should be set to a negative value, if negative disparities are possible (depends on the angle between the cameras views and the distance of the measured object to the cameras).
numberOfDisparities: The disparity search range [minDisparity, minDisparity+numberOfDisparities].
textureThreshold: Calculate the disparity only at locations, where the texture is larger than (or at least equal to?) this threshold. How is texture defined??? Variance in the surrounding window???
uniquenessRatio: Cited from calib3d.hpp: "accept the computed disparity d* only ifSAD(d) >= SAD(d*)(1 + uniquenessRatio/100.) for any d != d+/-1 within the search range."
speckleRange: Unsure.
trySmallerWindows: ???
roi1, roi2: Calculate the disparities only in these regions??? Unsure.
speckleWindowSize: Unsure.
disp12MaxDiff: Unsure, but a comment in calib3d.hpp says, that a left-right check is performed. Guess: Pixels are matched from the left image to the right image and from the right image back to the left image. The disparities are only valid, if the distance between the original left pixel and the back-matched pixel is smaller than disp12MaxDiff.
speckleWindowSize and speckleRange are parameters for the function cv::filterSpeckles. Take a look at OpenCV's documentation.
cv::filterSpeckles is used to post-process the disparity map. It replaces blobs of similar disparities (the difference of two adjacent values does not exceed speckleRange) whose size is less or equal speckleWindowSize (the number of pixels forming the blob) by the invalid disparity value (either short -16 or float -1.f).
The parameters are better described in the Python tutorial on depth map from stereo images. The parameters seem to be the same.
texture_threshold: filters out areas that don't have enough texture
for reliable matching
Speckle range and size: Block-based matchers
often produce "speckles" near the boundaries of objects, where the
matching window catches the foreground on one side and the background
on the other. In this scene it appears that the matcher is also
finding small spurious matches in the projected texture on the table.
To get rid of these artifacts we post-process the disparity image with
a speckle filter controlled by the speckle_size and speckle_range
parameters. speckle_size is the number of pixels below which a
disparity blob is dismissed as "speckle." speckle_range controls how
close in value disparities must be to be considered part of the same
blob.
Number of disparities: How many pixels to slide the window over.
The larger it is, the larger the range of visible depths, but more
computation is required.
min_disparity: the offset from the x-position
of the left pixel at which to begin searching.
uniqueness_ratio:
Another post-filtering step. If the best matching disparity is not
sufficiently better than every other disparity in the search range,
the pixel is filtered out. You can try tweaking this if
texture_threshold and the speckle filtering are still letting through
spurious matches.
prefilter_size and prefilter_cap: The pre-filtering
phase, which normalizes image brightness and enhances texture in
preparation for block matching. Normally you should not need to adjust
these.
Also check out this ROS tutorial on choosing stereo parameters.

How to apply box filter on integral image? (SURF)

Assuming that I have a grayscale (8-bit) image and assume that I have an integral image created from that same image.
Image resolution is 720x576. According to SURF algorithm, each octave is composed of 4 box filters, which are defined by the number of pixels on their side. The
first octave uses filters with 9x9, 15x15, 21x21 and 27x27 pixels. The
second octave uses filters with 15x15, 27x27, 39x39 and 51x51 pixels.The third octave uses filters with 27x27, 51x51, 75x75 and 99x99 pixels. If the image is sufficiently large and I guess 720x576 is big enough (right??!!), a fourth octave is added, 51x51, 99x99, 147x147 and 195x195. These
octaves partially overlap one another to improve the quality of the interpolated results.
// so, we have:
//
// 9x9 15x15 21x21 27x27
// 15x15 27x27 39x39 51x51
// 27x27 51x51 75x75 99x99
// 51x51 99x99 147x147 195x195
The questions are:What are the values in each of these filters? Should I hardcode these values, or should I calculate them? How exactly (numerically) to apply filters to the integral image?
Also, for calculating the Hessian determinant I found two approximations:
det(HessianApprox) = DxxDyy − (0.9Dxy)^2 anddet(HessianApprox) = DxxDyy − (0.81Dxy)^2Which one is correct?
(Dxx, Dyy, and Dxy are Gaussian second order derivatives).
I had to go back to the original paper to find the precise answers to your questions.
Some background first
SURF leverages a common Image Analysis approach for regions-of-interest detection that is called blob detection.
The typical approach for blob detection is a difference of Gaussians.
There are several reasons for this, the first one being to mimic what happens in the visual cortex of the human brains.
The drawback to difference of Gaussians (DoG) is the computation time that is too expensive to be applied to large image areas.
In order to bypass this issue, SURF takes a simple approach. A DoG is simply the computation of two Gaussian averages (or equivalently, apply a Gaussian blur) followed by taking their difference.
A quick-and-dirty approximation (not so dirty for small regions) is to approximate the Gaussian blur by a box blur.
A box blur is the average value of all the images values in a given rectangle. It can be computed efficiently via integral images.
Using integral images
Inside an integral image, each pixel value is the sum of all the pixels that were above it and on its left in the original image.
The top-left pixel value in the integral image is thus 0, and the bottom-rightmost pixel of the integral image has thus the sum of all the original pixels for value.
Then, you just need to remark that the box blur is equal to the sum of all the pixels inside a given rectangle (not originating in the top-lefmost pixel of the image) and apply the following simple geometric reasoning.
If you have a rectangle with corners ABCD (top left, top right, bottom left, bottom right), then the value of the box filter is given by:
boxFilter(ABCD) = A + D - B - C,
where A, B, C, D is a shortcut for IntegralImagePixelAt(A) (B, C, D respectively).
Integral images in SURF
SURF is not using box blurs of sizes 9x9, etc. directly.
What it uses instead is several orders of Gaussian derivatives, or Haar-like features.
Let's take an example. Suppose you are to compute the 9x9 filters output. This corresponds to a given sigma, hence a fixed scale/octave.
The sigma being fixed, you center your 9x9 window on the pixel of interest. Then, you compute the output of the 2nd order Gaussian derivative in each direction (horizontal, vertical, diagonal). The Fig. 1 in the paper gives you an illustration of the vertical and diagonal filters.
The Hessian determinant
There is a factor to take into account the scale differences. Let's believe the paper that the determinant is equal to:
Det = DxxDyy - (0.9 * Dxy)^2.
Finally, the determinant is given by: Det = DxxDyy - 0.81*Dxy^2.
Look at page 17 of this document
http://www.sci.utah.edu/~fletcher/CS7960/slides/Scott.pdf
If you made a code for normal Gaussian 2D convolution, just use the box filter as a Gaussian kernel and the input image will be the same original image not integral image. The results from this method will be same with the one you asked.

SIFT matches and recognition?

I am developing an application where I am using SIFT + RANSAC and Homography to find an object (OpenCV C++,Java). The problem I am facing is that where there are many outliers RANSAC performs poorly.
For this reasons I would like to try what the author of SIFT said to be pretty good: voting.
I have read that we should vote in a 4 dimension feature space, where the 4 dimensions are:
Location [x, y] (someone says Traslation)
Scale
Orientation
While with opencv is easy to get the match scale and orientation with:
cv::Keypoints.octave
cv::Keypoints.angle
I am having hard time to understand how I can calculate the location.
I have found an interesting slide where with only one match we are able to draw a bounding box:
But I don't get how I could draw that bounding box with just one match. Any help?
You are looking for the largest set of matched features that fit a geometric transformation from image 1 to image 2. In this case, it is the similarity transformation, which has 4 parameters: translation (dx, dy), scale change ds, and rotation d_theta.
Let's say you have matched to features: f1 from image 1 and f2 from image 2. Let (x1,y1) be the location of f1 in image 1, let s1 be its scale, and let theta1 be it's orientation. Similarly you have (x2,y2), s2, and theta2 for f2.
The translation between two features is (dx,dy) = (x2-x1, y2-y1).
The scale change between two features is ds = s2 / s1.
The rotation between two features is d_theta = theta2 - theta1.
So, dx, dy, ds, and d_theta are the dimensions of your Hough space. Each bin corresponds to a similarity transformation.
Once you have performed Hough voting, and found the maximum bin, that bin gives you a transformation from image 1 to image 2. One thing you can do is take the bounding box of image 1 and transform it using that transformation: apply the corresponding translation, rotation and scaling to the corners of the image. Typically, you pack the parameters into a transformation matrix, and use homogeneous coordinates. This will give you the bounding box in image 2 corresponding to the object you've detected.
When using the Hough transform, you create a signature storing the displacement vectors of every feature from the template centroid (either (w/2,h/2) or with the help of central moments).
E.g. for 10 SIFT features found on the template, their relative positions according to template's centroid is a vector<{a,b}>. Now, let's search for this object in a query image: every SIFT feature found in the query image, matched with one of template's 10, casts a vote to its corresponding centroid.
votemap(feature.x - a*, feature.y - b*)+=1 where a,b corresponds to this particular feature vector.
If some of those features cast successfully at the same point (clustering is essential), you have found an object instance.
Signature and voting are reverse procedures. Let's assume V=(-20,-10). So during searching in the novel image, when the two matches are found, we detect their orientation and size and cast a respective vote. E.g. for the right box centroid will be V'=(+20*0.5*cos(-10),+10*0.5*sin(-10)) away from the SIFT feature because it is in half size and rotated by -10 degrees.
To complete Dima's , one needs to add that the 4D Hough space is quantized into a (possibly small) number of 4D boxes, where each box corresponds to the simiéarity given by its center.
Then, for each possible similarity obtained via a tentative matching of features, add 1 into the corresponding box (or cell) in the 4D space. The output similarity is given by the cell with the more votes.
In order to computethe transform from 1 match, just use Dima's formulas in his answer. For several pairs of matches, you may need to use some least squares fit.
Finally, the transform can be applied with the function cv::warpPerspective(), where the third line of the perspective matrix is set to [0,0,1].

Resources