Bilinear Interpolation from wikipedia - opencv

I've tried reading the bilinear interpolation on the Wikipedia page https://en.wikipedia.org/wiki/Bilinear_interpolation and I have implemented one of the algorithms and I wanted to know if I'm doing it right or not.
This is the algorithm I implemented from the page:
This what I tried to implement in code:
for(int i = 0; i < w; i++) {
for( int j = 0; j < h; j++) {
new_img(i,j,0) = old_img(i,j,0)*(1-i)(1-j) + old_img(i+1,j,0)*i*(1-j) + old_img(i,j+1)*(1-i)*j + old_img(i+1,j+1,0)*i*j;
}
}
Is that how to implement it?

The bilinear interpolation deals with upsampling an imgae. It try to estimate the value of an array (e.g. an image) between the known values. For example, if i have an image and I want to estimate its value in the location (10.5, 24.5), the value will be a weighted average of the values in the four neighbors: (10,24), (10,25), (11,24), (11,25) which are known. The formula is the equation you posted in your question.
This is the python code:
scale_factor = 0.7
shape = np.array(image.shape[:2])
new_shape = np.ceil(shape * scale_factor).astype(int)
grid = np.meshgrid(np.linspace(0, 1, new_shape[1]), np.linspace(0, 1, new_shape[0]))
grid_mapping = [i * s for i, s in zip(grid, shape[::-1])]
resized_image = np.zeros(tuple(new_shape))
for x_new_im in range(new_shape[1]):
for y_new_im in range(new_shape[0]):
mapping = [grid_mapping[i][y_new_im, x_new_im] for i in range(2)]
x_1, y_1 = np.floor(mapping).astype(int)
try:
resized_image[y_new_im, x_new_im] = do_interpolation(f=image[y_1: y_1 + 2, x_1: x_1 + 2], x=mapping[0] - x_1, y=mapping[1] - y_1)
except ValueError:
pass
def do_interpolation(f, x, y):
return np.array([1 - x, x]).dot(f).dot([[1 - y], [y]])
Explanation:
new_shape = np.ceil(shape * scale_factor).astype(int) - calculate the new image shape after rescale.
grid = np.meshgrid(np.linspace(0, 1, new_shape[1]), np.linspace(0, 1, new_shape[0]))
grid_mapping = [i * s for i, s in zip(grid, shape[::-1])] - generate a x,y grid, when x goes from 0 to the width of the resized image and y goes from 0 to the hight of the new image. Now, we have the inverse mapping from the new image to the original one.
In the for loop we look at every pixel in the resized image and where is its source in the original image. The source will not be an integer but float (a fraction).
Now we take the four pixels in the original image that are around the mapped x and y. For example, if the mapped x,y is in (17.3, 25.7) we will take the four values of the original image in the pixels: (17, 25), (17, 26), (18, 25), (18, 26). Then we will apply the equation you bring.
note:
I add a try-except because I did not want to deal with boundaries, but you can edit the code to do it.

Related

Obtain sigma of gaussian blur between two images

Suppose I have an image A, I applied Gaussian Blur on it with Sigam=3 So I got another Image B. Is there a way to know the applied sigma if A,B is given?
Further clarification:
Image A:
Image B:
I want to write a function that take A,B and return Sigma:
double get_sigma(cv::Mat const& A,cv::Mat const& B);
Any suggestions?
EDIT1: The suggested approach doesn't work in practice in its original form(i.e. using only 9 equations for a 3 x 3 kernel), and I realized this later. See EDIT1 below for an explanation and EDIT2 for a method that works.
EDIT2: As suggested by Humam, I used the Least Squares Estimate (LSE) to find the coefficients.
I think you can estimate the filter kernel by solving a linear system of equations in this case. A linear filter weighs the pixels in a window by its coefficients, then take their sum and assign this value to the center pixel of the window in the result image. So, for a 3 x 3 filter like
the resulting pixel value in the filtered image
result_pix_value = h11 * a(y, x) + h12 * a(y, x+1) + h13 * a(y, x+2) +
h21 * a(y+1, x) + h22 * a(y+1, x+1) + h23 * a(y+1, x+2) +
h31 * a(y+2, x) + h32 * a(y+2, x+1) + h33 * a(y+2, x+2)
where a's are the pixel values within the window in the original image. Here, for the 3 x 3 filter you have 9 unknowns, so you need 9 equations. You can obtain those 9 equations using 9 pixels in the resulting image. Then you can form an Ax = b system and solve for x to obtain the filter coefficients. With the coefficients available, I think you can find the sigma.
In the following example I'm using non-overlapping windows as shown to obtain the equations.
You don't have to know the size of the filter. If you use a larger size, the coefficients that are not relevant will be close to zero.
Your result image size is different than the input image, so i didn't use that image for following calculation. I use your input image and apply my own filter.
I tested this in Octave. You can quickly run it if you have Octave/Matlab. For Octave, you need to load the image package.
I'm using the following kernel to blur the image:
h =
0.10963 0.11184 0.10963
0.11184 0.11410 0.11184
0.10963 0.11184 0.10963
When I estimate it using a window size 5, I get the following. As I said, the coefficients that are not relevant are close to zero.
g =
9.5787e-015 -3.1508e-014 1.2974e-015 -3.4897e-015 1.2739e-014
-3.7248e-014 1.0963e-001 1.1184e-001 1.0963e-001 1.8418e-015
4.1825e-014 1.1184e-001 1.1410e-001 1.1184e-001 -7.3554e-014
-2.4861e-014 1.0963e-001 1.1184e-001 1.0963e-001 9.7664e-014
1.3692e-014 4.6182e-016 -2.9215e-014 3.1305e-014 -4.4875e-014
EDIT1:
First of all, my apologies.
This approach doesn't really work in the practice. I've used the filt = conv2(a, h, 'same'); in the code. The resulting image data type in this case is double, whereas in the actual image the data type is usually uint8, so there's loss of information, which we can think of as noise. I simulated this with the minor modification filt = floor(conv2(a, h, 'same'));, and then I don't get the expected results.
The sampling approach is not ideal, because it's possible that it results in a degenerated system. Better approach is to use random sampling, avoiding the borders and making sure the entries in the b vector are unique. In the ideal case, as in my code, we are making sure the system Ax = b has a unique solution this way.
One approach would be to reformulate this as Mv = 0 system and try to minimize the squared norm of Mv under the constraint squared-norm v = 1, which we can solve using SVD. I could be wrong here, and I haven't tried this.
Another approach is to use the symmetry of the Gaussian kernel. Then a 3x3 kernel will have only 3 unknowns instead of 9. I think, this way we impose additional constraints on v of the above paragraph.
I'll try these out and post the results, even if I don't get the expected results.
EDIT2:
Using the LSE, we can find the filter coefficients as pinv(A'A)A'b. For completion, I'm adding a simple (and slow) LSE code.
Initial Octave Code:
clear all
im = double(imread('I2vxD.png'));
k = 5;
r = floor(k/2);
a = im(:, :, 1); % take the red channel
h = fspecial('gaussian', [3 3], 5); % filter with a 3x3 gaussian
filt = conv2(a, h, 'same');
% use non-overlapping windows to for the Ax = b syatem
% NOTE: boundry error checking isn't performed in the code below
s = floor(size(a)/2);
y = s(1);
x = s(2);
w = k*k;
y1 = s(1)-floor(w/2) + r;
y2 = s(1)+floor(w/2);
x1 = s(2)-floor(w/2) + r;
x2 = s(2)+floor(w/2);
b = [];
A = [];
for y = y1:k:y2
for x = x1:k:x2
b = [b; filt(y, x)];
f = a(y-r:y+r, x-r:x+r);
A = [A; f(:)'];
end
end
% estimated filter kernel
g = reshape(A\b, k, k)
LSE method:
clear all
im = double(imread('I2vxD.png'));
k = 5;
r = floor(k/2);
a = im(:, :, 1); % take the red channel
h = fspecial('gaussian', [3 3], 5); % filter with a 3x3 gaussian
filt = floor(conv2(a, h, 'same'));
s = size(a);
y1 = r+2; y2 = s(1)-r-2;
x1 = r+2; x2 = s(2)-r-2;
b = [];
A = [];
for y = y1:2:y2
for x = x1:2:x2
b = [b; filt(y, x)];
f = a(y-r:y+r, x-r:x+r);
f = f(:)';
A = [A; f];
end
end
g = reshape(A\b, k, k) % A\b returns the least squares solution
%g = reshape(pinv(A'*A)*A'*b, k, k)

How to find number from image in OCR?

I'm trying to get the number contours from an image.
Original image is in number_img:
After I've used the following code:
gray = cv2.cvtColor(number_img, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (1, 1), 0)
ret, thresh = cv2.threshold(blur, 70, 255, cv2.THRESH_BINARY_INV)
img2, contours, _ = cv2.findContours(thresh, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
for c in contours:
area = cv2.contourArea(c)
[x, y, w, h] = cv2.boundingRect(c)
if (area > 50 and area < 1000):
[x, y, w, h] = cv2.boundingRect(c)
cv2.rectangle(number_img, (x, y), (x + w, y + h), (0, 0, 255), 2)
Since there are small boxes in between, I tried to limit with height:
if (area > 50 and area < 1000) and h > 50:
[x, y, w, h] = cv2.boundingRect(c)
cv2.rectangle(number_img, (x, y), (x + w, y + h), (0, 0, 255), 2)
What other ways should I do to get the best contours of number to do OCR?
Thanks.
Just tried in Matlab. Hopefully you can adapt the code to OpenCV and tweak some parameters. It is not clear the right most blob is a number or not.
img1 = imread('DSYEW.png');
% first we can convert the grayscale image you provided to a binary
% (logical) image. It is always the best option in image preprocessing.
% Here I used the threshold .28 based on your image. But you may change it
% for a general solution.
img = im2bw(img1,.28);
% Then we can use the Matlab 'regionprops' command to identify the
% individual blobs in binary image. 'regionprops' gives us an output, the
% Area of the each blob.
s = regionprops(imcomplement(img));
% Now as you did, we can filter out the bounding boxes with an area
% threshold. I used 350 originally. But it can be changed for a better
% output.
s([s.Area] < 350) = [];
% Now we draw each bounding box on the image.
figure; imshow(img);
for k = 1 : length(s)
bb = s(k).BoundingBox;
rectangle('Position', [bb(1),bb(2),bb(3),bb(4)],...
'EdgeColor','r','LineWidth',2 )
end
Output image:
Update 1:
Just changed the area parameter in the above code as follows. Unfortunately I don't have Python OpenCV in my Mac. But, it is all about tweaking the parameters in you code.
s([s.Area] < 373) = [];
Output image:
Update 2:
Numbers 3 and 4 in the above figure were detected as one digit. If you look at carefully you are see that 3 and 4 are connected with each other, and that is why above code detected it as a single digit. So I used the imdilate function to get rid of that. Next, in your code, even the white holes inside some digits were detected as digits. To eliminate that we can fill the holes using imfill in Matlab.
Updated code:
img1 = imread('TCXeuO9.png');
img = im2bw(img1,.28);
img = imcomplement(img);
img = imfill(img,'holes');
img = imcomplement(img);
se = strel('line',2,90);
img = imdilate(img, se);
s = regionprops(imcomplement(img));
s([s.Area] < 330) = [];
figure; imshow(img);
for k = 1 : length(s)
bb = s(k).BoundingBox;
rectangle('Position', [bb(1),bb(2),bb(3),bb(4)],...
'EdgeColor','r','LineWidth',2 )
end
Output image:

OpenCV : wrapPerspective on whole image

I'm detecting markers on images captured by my iPad. Because of that I want to calculate translations and rotations between them, I want to change change perspective on images these image, so it would look like I'm capturing them directly above markers.
Right now I'm using
points2D.push_back(cv::Point2f(0, 0));
points2D.push_back(cv::Point2f(50, 0));
points2D.push_back(cv::Point2f(50, 50));
points2D.push_back(cv::Point2f(0, 50));
Mat perspectiveMat = cv::getPerspectiveTransform(points2D, imagePoints);
cv::warpPerspective(*_image, *_undistortedImage, M, cv::Size(_image->cols, _image->rows));
Which gives my these results (look at the right-bottom corner for result of warpPerspective):
As you probably see result image contains recognized marker in left-top corner of the result image. My problem is that I want to capture whole image (without cropping) so I could detect other markers on that image later.
How can I do that? Maybe I should use rotation/translation vectors from solvePnP function?
EDIT:
Unfortunatelly changing size of warped image don't help much, because image is still translated so left-top corner of marker is in top-left corner of image.
For example when I've doubled size using:
cv::warpPerspective(*_image, *_undistortedImage, M, cv::Size(2*_image->cols, 2*_image->rows));
I've recieved these images:
Your code doesn't seem to be complete, so it is difficult to say what the problem is.
In any case the warped image might have completely different dimensions compared to the input image so you will have to adjust the size paramter you are using for warpPerspective.
For example try to double the size:
cv::warpPerspective(*_image, *_undistortedImage, M, 2*cv::Size(_image->cols, _image->rows));
Edit:
To make sure the whole image is inside this image, all corners of your original image must be warped to be inside the resulting image. So simply calculate the warped destination for each of the corner points and adjust the destination points accordingly.
To make it more clear some sample code:
// calculate transformation
cv::Matx33f M = cv::getPerspectiveTransform(points2D, imagePoints);
// calculate warped position of all corners
cv::Point3f a = M.inv() * cv::Point3f(0, 0, 1);
a = a * (1.0/a.z);
cv::Point3f b = M.inv() * cv::Point3f(0, _image->rows, 1);
b = b * (1.0/b.z);
cv::Point3f c = M.inv() * cv::Point3f(_image->cols, _image->rows, 1);
c = c * (1.0/c.z);
cv::Point3f d = M.inv() * cv::Point3f(_image->cols, 0, 1);
d = d * (1.0/d.z);
// to make sure all corners are in the image, every position must be > (0, 0)
float x = ceil(abs(min(min(a.x, b.x), min(c.x, d.x))));
float y = ceil(abs(min(min(a.y, b.y), min(c.y, d.y))));
// and also < (width, height)
float width = ceil(abs(max(max(a.x, b.x), max(c.x, d.x)))) + x;
float height = ceil(abs(max(max(a.y, b.y), max(c.y, d.y)))) + y;
// adjust target points accordingly
for (int i=0; i<4; i++) {
points2D[i] += cv::Point2f(x,y);
}
// recalculate transformation
M = cv::getPerspectiveTransform(points2D, imagePoints);
// get result
cv::Mat result;
cv::warpPerspective(*_image, result, M, cv::Size(width, height), cv::WARP_INVERSE_MAP);
I implemented littleimp's answer in python in case anyone needs it. It should be noted that this will not work properly if the vanishing points of the polygons are falling within the image.
import cv2
import numpy as np
from PIL import Image, ImageDraw
import math
def get_transformed_image(src, dst, img):
# calculate the tranformation
mat = cv2.getPerspectiveTransform(src.astype("float32"), dst.astype("float32"))
# new source: image corners
corners = np.array([
[0, img.size[0]],
[0, 0],
[img.size[1], 0],
[img.size[1], img.size[0]]
])
# Transform the corners of the image
corners_tranformed = cv2.perspectiveTransform(
np.array([corners.astype("float32")]), mat)
# These tranformed corners seems completely wrong/inverted x-axis
print(corners_tranformed)
x_mn = math.ceil(min(corners_tranformed[0].T[0]))
y_mn = math.ceil(min(corners_tranformed[0].T[1]))
x_mx = math.ceil(max(corners_tranformed[0].T[0]))
y_mx = math.ceil(max(corners_tranformed[0].T[1]))
width = x_mx - x_mn
height = y_mx - y_mn
analogy = height/1000
n_height = height/analogy
n_width = width/analogy
dst2 = corners_tranformed
dst2 -= np.array([x_mn, y_mn])
dst2 = dst2/analogy
mat2 = cv2.getPerspectiveTransform(corners.astype("float32"),
dst2.astype("float32"))
img_warp = Image.fromarray((
cv2.warpPerspective(np.array(image),
mat2,
(int(n_width),
int(n_height)))))
return img_warp
# image coordingates
src= np.array([[ 789.72, 1187.35],
[ 789.72, 752.75],
[1277.35, 730.66],
[1277.35,1200.65]])
# known coordinates
dst=np.array([[0, 1000],
[0, 0],
[1092, 0],
[1092, 1000]])
# Create the image
image = Image.new('RGB', (img_width, img_height))
image.paste( (200,200,200), [0,0,image.size[0],image.size[1]])
draw = ImageDraw.Draw(image)
draw.line(((src[0][0],src[0][1]),(src[1][0],src[1][1]), (src[2][0],src[2][1]),(src[3][0],src[3][1]), (src[0][0],src[0][1])), width=4, fill="blue")
#image.show()
warped = get_transformed_image(src, dst, image)
warped.show()
There are two things you need to do:
Increase the size of the output of cv2.warpPerspective
Translate the warped source image such that the center of the warped source image matches with the center of cv2.warpPerspective output image
Here is how code will look:
# center of source image
si_c = [x//2 for x in image.shape] + [1]
# find where center of source image will be after warping without comepensating for any offset
wsi_c = np.dot(H, si_c)
wsi_c = [x/wsi_c[2] for x in wsi_c]
# warping output image size
stitched_frame_size = tuple(2*x for x in image.shape)
# center of warping output image
wf_c = image.shape
# calculate offset for translation of warped image
x_offset = wf_c[0] - wsi_c[0]
y_offset = wf_c[1] - wsi_c[1]
# translation matrix
T = np.array([[1, 0, x_offset], [0, 1, y_offset], [0, 0, 1]])
# translate tomography matrix
translated_H = np.dot(T.H)
# warp
stitched = cv2.warpPerspective(image, translated_H, stitched_frame_size)

OpenCV Mat per-element operation: vector-matrix multiplication

I is an mxn matrix and each element of I is a 1x3 vector (I is a 3-channel Mat image actually).
M is a 3x3 matrix.
J is an matrix having the same dimension as I and is computed as follows: each element of J is the vector-matrix product of the corresponding (i.e. having the same coordinates) element of I and M.
I.e. if v1(r1,g1,b1) is an element of I and v2(r2,g2,b2) is its corresponding element of J, then v2 = v1 * M (this is a vector-matrix product, not a per-element product).
Question: How to compute J efficiently (in terms of speed)?
Thank you for your help.
As far as I know, the most efficient way to implement such an operation is as follows:
Reshape I from mxnx3 to (m·n)x3, let's call it I'
Calculate J' = I' * M
Reshape J' from (m·n)x3 to mxnx3, this is the J we wanted
The idea is to stack each pixel-wise operation pi'·M into one single operation P'·M, where P is the 3x(m·n) matrix containing each pixel in columns (hence P' holds one pixel per row. It's just a convention, really).
Here is a code sample written in c++:
// read some image
cv::Mat I = cv::imread("image.png"); // rows x cols x 3
// some matrix M, that modifies each pixel
cv::Mat M = (cv::Mat_<float>(3, 3) << 0, 0, 0,
0, .5, 0,
0, 0, .5); // 3 x 3
// remember old dimension
uint8_t prevChannels = I.channels;
uint32_t prevRows = I.rows;
// reshape I
uint32_t newRows = I.rows * I.cols;
I = I.reshape(1, newRows); // (rows * cols) x 3
// compute J
cv::Mat J = I * M; // (rows * cols) x 3
// reshape to original dimensions
J = J.reshape(prevChannels, prevRows); // rows x cols x 3
OpenCV provides an O(1) reshaping operation.
Thus performance depends solely on matrix multiplication, which I expect to be as efficient as possible in a computer vision library.
To further enhance performance, you might want to take a look at matrix multiplication using the ocl and gpu modules.

Sum of each column opencv

In Matlab, If A is a matrix, sum(A) treats the columns of A as vectors, returning a row vector of the sums of each column.
sum(Image); How could it be done with OpenCV?
Using cvReduce has worked for me. For example, if you need to store the column-wise sum of a matrix as a row matrix you could do this:
CvMat * MyMat = cvCreateMat(height, width, CV_64FC1);
// Fill in MyMat with some data...
CvMat * ColSum = cvCreateMat(1, MyMat->width, CV_64FC1);
cvReduce(MyMat, ColSum, 0, CV_REDUCE_SUM);
More information is available in the OpenCV documentation.
EDIT after 3 years:
The proper function for this is cv::reduce.
Reduces a matrix to a vector.
The function reduce reduces the matrix to a vector by treating the
matrix rows/columns as a set of 1D vectors and performing the
specified operation on the vectors until a single row/column is
obtained. For example, the function can be used to compute horizontal
and vertical projections of a raster image. In case of REDUCE_MAX and
REDUCE_MIN , the output image should have the same type as the source
one. In case of REDUCE_SUM and REDUCE_AVG , the output may have a
larger element bit-depth to preserve accuracy. And multi-channel
arrays are also supported in these two reduction modes.
OLD:
I've used ROI method: move ROI of height of the image and width 1 from left to right and calculate means.
Mat src = imread(filename, 0);
vector<int> graph( src.cols );
for (int c=0; c<src.cols-1; c++)
{
Mat roi = src( Rect( c,0,1,src.rows ) );
graph[c] = int(mean(roi)[0]);
}
Mat mgraph( 260, src.cols+10, CV_8UC3);
for (int c=0; c<src.cols-1; c++)
{
line( mgraph, Point(c+5,0), Point(c+5,graph[c]), Scalar(255,0,0), 1, CV_AA);
}
imshow("mgraph", mgraph);
imshow("source", src);
EDIT:
Just out of curiosity, I've tried resize to height 1 and the result was almost the same:
Mat test;
cv::resize(src,test,Size( src.cols,1 ));
Mat mgraph1( 260, src.cols+10, CV_8UC3);
for(int c=0; c<test.cols; c++)
{
graph[c] = test.at<uchar>(0,c);
}
for (int c=0; c<src.cols-1; c++)
{
line( mgraph1, Point(c+5,0), Point(c+5,graph[c]), Scalar(255,255,0), 1, CV_AA);
}
imshow("mgraph1", mgraph1);
cvSum respects ROI, so if you move a 1 px wide window over the whole image, you can calculate the sum of each column.
My c++ got a little rusty so I won't provide a code example, though the last time I did this I used OpenCVSharp and it worked fine. However, I'm not sure how efficient this method is.
My math skills are getting rusty too, but shouldn't it be possible to sum all elements in columns in a matrix by multiplying it by a vector of 1s?
For an 8 bit greyscale image, the following should work (I think).
It shouldn't be too hard to expand to different image types.
int imgStep = image->widthStep;
uchar* imageData = (uchar*)image->imageData;
uint result[image->width];
memset(result, 0, sizeof(uchar) * image->width);
for (int col = 0; col < image->width; col++) {
for (int row = 0; row < image->height; row++) {
result[col] += imageData[row * imgStep + col];
}
}
// your desired vector is in result

Resources