How INTER_LINEAR interpolation in OpenCV resize() work? - opencv

I am figuring out how OpenCV resize() function calculates linear interpolation when we set fx=2 and fy=1. I have written the following minimum working example,
import cv2
import numpy as np
pattern_img = np.zeros((6, 6), np.uint8)
pattern_img[:, 0::2] = 255
patteen_img_x2 = cv2.resize(pattern_img, None, fx=2, fy=1, interpolation=cv2.INTER_LINEAR)
if we look at the first row of pattern_img and pattern_img_x2, we will have,
pattern_img[0, :]
> array([255, 0, 255, 0, 255, 0], dtype=uint8)
pattern_img_x2[0, :]
> array([[255, 191, 64, 64, 191, 191, 64, 64, 191, 191, 64, 0]], dtype=uint8)
I cannot figure out how numbers 191 and 64 are calculated. I know that it implements bilinear algorithm, but in this case we have set fy=1, so it shall be a simple linear interpolation along x-axis. But I cannot figure out how resize() calculate those interpolated numbers. Could anybody help me to understand the algorithm behind?

This has to do with pixel "grids".
Is 0,0 the center of the first pixel, or the top left corner of it? Where are the corners of a pixel? A common question in computer graphics.
Interpolation adds another complication. Does a pixel define its whole square area? Then you get nearest neighbor interpolation. Or does it merely define the center point? Then, anything in between is undefined, technically, and interpolation gets to decide how to fill the space.
In OpenCV generally, pixel centers are at integer coordinates. That means the first pixel's top left corner sits at (-0.5, -0.5), so that's where the picture's top left corner starts.
Now, if you were to sample with fx=1, i.e. an identity transformation, you'd start at -0.5, which should be the left edge of a pixel, and the output pixel has a width of 1, so the first output pixel spans -0.5 to +0.5, and its center is at 0.0.
Since you want fx=2, your output pixels are 0.5 wide. You still start at -0.5, and your output pixels span... -0.5 to 0.0, 0.0 to +0.5, 0.5 to 1.0, 1.0 to 1.5...
And their centers sit at -0.25, +0.25, +0.75, +1.25, ...
And that is how you get those 1/4 and 3/4 values. 64 is one quarter of 255, 191 is three quarters of 255. And that's also why the first output pixel is 255. It sits to the left of the first input pixel, so that is its only support and determines 100% of its value.
You could "index-shift" this all so it is a little easier to visualize. Then the picture's top left pixel's top left corner is at (0,0), and the pixel extends to (1,1), with the center at (0.5,0.5). The output pixel grid lies accordingly, top left pixel going from 0 to 0.5 with center at 0.25, its neighbor to the right spanning 0.5 to 1.0, center at 0.75, and so on.
If you want to have full control over this madness, construct your own affine transformation (I'd recommend working with 3x3 matrices, easy to compose/matrix-multiply) and then use warpAffine. It'll take integer coordinates for the output, transform them using your matrix (it implicitly inverts it), and looks the resulting coordinates up in the source image, including interpolation in the source image space.
Made a little graphic here (click for full size). Black squares are input pixels, black dots their centers. Red squares and dots are the output pixels and their centers. You see, if you sample at the red dot positions, you'll sit at one or three quarters between input pixel centers.

Related

Where exactly does the bounding box start or end?

In OpenCV or object detection models, they represent bounding box as 4 numbers e.g. x,y,width,height or x1,y1,x2,y2.
These numbers seem to be ill-defined but it's fine when the resolution is big.
But it causes me to think when the image has very low resolution e.g. 8x8, the one-pixel error can cause things to go very wrong.
So I want to know, what exactly does it mean when you say that a bounding box has x1=0, x2=100?
Specifically, I want to clear these confusions when understood well:
Does the bounding box border occupy the 0th pixel or is it surrounding 0th pixel (its border is at x=-1)?
Where is the exact end of the bounding box? If the image have shape=(8,8), would the end be at 7 or 8?
If you want to represent a bounding box that occupy the entire image, what should be its values?
So I think the right question should be, how do I think about bounding box intuitively so that these are not confusing for me?
OK. After many days working with bounding boxes, I have my own intuition on how to think about bounding box coordinates now.
I divide coordinates in 2 categories: continuous and discrete. The mental problems usually arise when you try to convert between them.
Suppose the image have width=100, height=100 then you can have a continuous point with x,y that can have any real value in the range [0,100].
It means that points like (0,0), (0.5,7.1,39.83,99.9999) are valid points.
Now you can convert a continuous point to a discrete point on the image by taking the floor of the number. E.g. (5.5, 8.9) gets mapped to pixel number (5,8) on the image. It's very important to understand that you should not use the ceiling or rounding operation to convert it to the discrete version. Suppose you have a continuous point (0.9,0.9) this point lies in the (0,0) pixel so it's closest to (0,0) pixel, not (1,1) pixel.
From this foundation, let's try to answer my question:
So I want to know, what exactly does it mean when you say that a bounding box has x1=0, x2=100?
It means that the continuous point 1 has x value = 0, and continuous point 2, has x value = 100. Continuous point has zero size. It's not a pixel.
Does the bounding box border occupy the 0th pixel or is it surrounding 0th pixel (its border is at x=-1)?
In continuous-space, the bounding box border occupy zero space. The border is infinitesimally slim. But when we want to draw it onto an image, the border will have the size of at least 1 pixel thick. So if we have a continuous point (0,0), it will occupy 0th pixel of the image. But theoretically, it represents a slim border at the left side and top side of the 0th pixel.
Where is the exact end of the bounding box? If the image have shape=(8,8), would the end be at 7 or 8?
The biggest x,y value you can have is 7.999... but when converted to discrete version you will be left with 7 which represent the last pixel.
If you want to represent a bounding box that occupy the entire image, what should be its values?
You should represent bounding box coordinates in continuous space instead of discrete space because of the precision that you have. It means the largest bounding box starts at (0,0) and ends at (100,100). But if you want to draw this box, you need to convert it to discrete version and draws the bounding box at (0,0) and end at (99,99).
In OpenCv the bounding rectangle can be defined in many ways. One way is its top-left corner and bottom-right corner. In case of constructor Rect(int x1, int y1, int x2, int y2) it defines those two points. The rectangle starts exactly on that pixel and coordinate. For subpixel rectangles there are also variants holding the floating point coordinates.
So I want to know, what exactly does it mean when you say that a bounding box has x1=0, x2=100?
That means the top-left corner x-coordinate starts at 0 and bottom-right x-coordinate
starts at 100.
Does the bounding box border occupy the 0th pixel or is it surrounding 0th pixel (its border is at x=-1)?
The border starts exactly on the 0-th pixel. Meaning that rectangle with width and height of 1px when drawn is just a signle dot (1px)
Where is the exact end of the bounding box? If the image have shape=(8,8), would the end be at 7 or 8?
The end would be at 7, see below.
If you want to represent a bounding box that occupy the entire image, what should be its values?
Lets have an image size of 100,100. The around the image rectangle defined by two points would be Rect(Point(0,0), Point(99,99)) by starting point and size Rect(0, 0, 100, 100)
The basic is to know that image of size X,Y has a minimum top-left coordinate at (0,0) and maximum at bottom-right (X-1,Y-1)

Identifying imperfect shapes with noisy backgrounds with OpenCV

I am trying to identify a rectangle underwater in a noisy environment. I implemented Canny to find the edges, and drew the found edges using cv2.circle. From here, I am trying to identify the imperfect rectangle in the image (the black one below the long rectangle that covers the top of the frame)
I have attempted multiple solutions, including thresholds, blurs and resizing the image to detect the rectangle. Below is the barebones code with just drawing the identified edges.
import numpy as np
import cv2
import imutils
img_text = 'img5.png'
img = cv2.imread(img_text)
original = img.copy()
min_value = 50
max_value = 100
# draw image and return coordinates of drawn pixels
image = cv2.Canny(img, min_value, max_value)
indices = np.where(image != 0)
coordinates = zip(indices[1], indices[0])
for point in coordinates:
cv2.circle(original, point, 1, (0, 0, 255), -1)
cv2.imshow('original', original)
cv2.waitKey(0)
cv2.destroyAllWindows()
Where the output displays this:
output
From here I want to be able to separately detect just the rectangle and draw another rectangle on top of the output in green, but I haven't been able to find a way to detect the original rectangle on its own.
For your specific image, I obtained quite good results with a simple thresholding on the blue channel.
image = cv2.imread("test.png")
t, img = cv2.threshold(image[:,:,0], 80, 255, cv2.THRESH_BINARY)
In order to adapt the threshold, I propose a simple way of varying the threshold until you get one component. I have also implemented the rectangle drawing:
def find_square(image):
markers = 0
threshold = 10
while np.amax(markers) == 0:
threshold += 5
t, img = cv2.threshold(image[:,:,0], threshold, 255, cv2.THRESH_BINARY_INV)
_, markers = cv2.connectedComponents(img)
kernel = np.ones((5,5),np.uint8)
img = cv2.morphologyEx(img, cv2.MORPH_OPEN, kernel)
img = cv2.morphologyEx(img, cv2.MORPH_DILATE, kernel)
nonzero = cv2.findNonZero(img)
x, y, w, h = cv2.boundingRect(nonzero)
cv2.rectangle(image, (x, y), (x+w, y+h), (0, 255, 0), 2)
cv2.imshow("image", image)
And the results on the provided example images:
The idea behind this approach is based on the observation that the most information is in the blue channel. If you separate the images in the channels, you will see that in the blue channel, the dark square has the best contrast. It is also the darkest region on this channel, which is why thresholding works. The problem remains the threshold setting. Based on the above intuition, we are looking for the lowest threshold that will bring up something (and hope that it will be the square). What I did is to simply increase gradually the threshold until something appears.
Then, I applied some morphology operations to eliminate other small points that may appear after thresholding and to make the square look a bit bigger (the edges of the square are lighter, and therefore not the entire square is captured). Then is was a matter of drawing the rectangle.
The code can be made much nicer (and more efficient) by doing some statistical analysis on the histogram. Simply compute the threshold such that 5% (or some percent) of the pixels are darker. You may require do so a connected component analysis to keep the biggest blob.
Also, my usage of connectedComponents is very poor and inefficient. Again, code written in a hurry to prove the concept.

Can't determine document edges from camera with OpenCV

I need find edges of document that in user hands.
1) Original image from camera:
2) Then i convert image to BG:
3) Then i make blur:
3) Finds edges in an image using the Canny:
4) And use dilate :
As you can see on the last image the contour around the map is torn and the contour is not determined. What is my error and how to solve the problem in order to determine the outline of the document completely?
This is code how i to do it:
final Mat mat = new Mat();
sourceMat.copyTo(mat);
//convert the image to black and white
Imgproc.cvtColor(mat, mat, Imgproc.COLOR_BGR2GRAY);
//blur to enhance edge detection
Imgproc.GaussianBlur(mat, mat, new Size(5, 5), 0);
if (isClicked) saveImageFromMat(mat, "blur", "blur");
//convert the image to black and white does (8 bit)
int thresh = 128;
Imgproc.Canny(mat, mat, thresh, thresh * 2);
//dilate helps to connect nearby line segments
Imgproc.dilate(mat, mat,
Imgproc.getStructuringElement(Imgproc.MORPH_RECT, new Size(3, 3)),
new Point(-1, -1),
2,
1,
new Scalar(1));
This answer is based on my above comment. If someone is holding the document, you cannot see the edge that is behind the user's hand. So, any method for detecting the outline of the document must be robust to some missing parts of the edge.
I suggest using a variant of the Hough transform to detect the document. The Wikipedia article about the Hough transform makes it sound quite scary (as Wikipedia often does with mathematical subjects), but don't be discouraged, actually they are not too difficult to understand or implement.
The original Hough transform detected straight lines in images. As explained in this OpenCV tutorial, any straight line in an image can be defined by 2 parameters: an angle θ and a distance r of the line from the origin. So you quantize these 2 parameters, and create a 2D array with one cell for every possible line that could be present in your image. (The finer the quantization you use, the larger the array you will need, but the more accurate the position of the found lines will be.) Initialize the array to zeros. Then, for every pixel that is part of an edge detected by Canny, you determine every line (θ,r) that the pixel could be part of, and increment the corresponding bin. After processing all pixels, you will have, for each bin, a count of how many pixels were detected on the line corresponding to that bin. Counts which are high enough probably represent real lines in the image, even if parts of the line are missing. So you just scan through the bins to find bins which exceed the threshold.
OpenCV contains Hough detectors for straight lines and circles, but not for rectangles. You could either use the line detector and check for 4 lines that form the edges of your document; or you could write your own Hough detector for rectangles, perhaps using the paper Jung 2004 for inspiration. Rectangles have at least 5 degrees of freedom (2D position, scale, aspect ratio, and rotation angle), and memory requirement for a 5D array obviously goes up pretty fast. But since the range of each parameter is limited (ie, the document's aspect ratio is known, and you can assume the document will be well centered and not rotated much) it is probably feasible.

openCv Find coordinates of edges/contours

Lets say I have the following image where there is a folder image with a white label on it.
What I want is to detect the coordinates of end points of the folder and the white paper on it (both rectangles).
Using the coordinates, I want to know the exact place of the paper on the folder.
GIVEN :
The inner white paper rectangle is always going to be of the fixed size, so may be we can use this knowledge somewhere?
I am new to opencv and trying to find some guidance around how should I approach this problem?
Problem Statement : We cannot rely on color based solution since this is just an example and color of both the folder as well as the rectangular paper can change.
There can be other noisy papers too but one thing is given, The overall folder and the big rectangular paper would always be the biggest two rectangles at any given time.
I have tried opencv canny for edge detection and it looks like this image.
Now how can I find the coordinates of outer rectangle and inner rectangle.
For this image, there are three domain colors: (1) the background-yellow (2) the folder-blue (3) the paper-white. Use the color info may help, I analysis it in RGB and HSV like this:
As you can see(the second row, the third cell), the regions can be easily seperated in H(HSV) if you find the folder mask first.
We can choose
My steps:
(1) find the folder region mask in HSV using inRange(hsv, (80, 10, 20), (150, 255, 255))
(2) find contours on the mask and filter them by width and height
Here is the result:
Related:
Choosing the correct upper and lower HSV boundaries for color detection with`cv::inRange` (OpenCV)
How to define a threshold value to detect only green colour objects in an image :Opencv
You can opt for (Adaptive Threshold)[https://docs.opencv.org/3.4/d7/d4d/tutorial_py_thresholding.html]
Obtain the hue channel of the image.
Perform adaptive threshold with a certain block size. I used size of 15 for half the size of the image.
This is invariant to color as you expected. Now you can go ahead and extract what you need!!
This solution helps to identify the white paper region of the image.
This is the full code for the solution:
import cv2
import numpy as np
image = cv2.imread('stack2.jpg',-1)
paper = cv2.resize(image,(500,500))
ret, thresh_gray = cv2.threshold(cv2.cvtColor(paper, cv2.COLOR_BGR2GRAY),
200, 255, cv2.THRESH_BINARY)
image, contours, hier = cv2.findContours(thresh_gray, cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_NONE)
for c in contours:
area = cv2.contourArea(c)
rect = cv2.minAreaRect(c)
box = cv2.boxPoints(rect)
# convert all coordinates floating point values to int
box = np.int0(box)
# draw a green 'nghien' rectangle
if area>500:
cv2.drawContours(paper, [box], 0, (0, 255, 0),1)
print([box])
cv2.imshow('paper', paper)
cv2.imwrite('paper.jpg',paper)
cv2.waitKey(0)
First using a manual threshold(200) you can detect paper in the image.
ret, thresh_gray = cv2.threshold(cv2.cvtColor(paper, cv2.COLOR_BGR2GRAY), 200, 255, cv2.THRESH_BINARY)
After that you should find contours and get the minAreaRect(). Then you should get coordinates for that rectangle(box) and draw it.
rect = cv2.minAreaRect(c)
box = cv2.boxPoints(rect)
box = np.int0(box)
cv2.drawContours(paper, [box], 0, (0, 255, 0),1)
In order to avoid small white regions of the image you can use area = cv2.contourArea(c) and check if area>500 and drawContours().
final output:
Console output gives coordinates for the white paper.
console output:
[array([[438, 267],
[199, 256],
[209, 60],
[447, 71]], dtype=int64)]

Where to center the kernel when using FFTW for image convolution?

I am trying to use FFTW for image convolution.
At first just to test if the system is working properly, I performed the fft, then the inverse fft, and could get the exact same image returned.
Then a small step forward, I used the identity kernel(i.e., kernel[0][0] = 1 whereas all the other components equal 0). I took the component-wise product between the image and kernel(both in the frequency domain), then did the inverse fft. Theoretically I should be able to get the identical image back. But the result I got is very not even close to the original image. I am suspecting this has something to do with where I center my kernel before I fft it into frequency domain(since I put the "1" at kernel[0][0], it basically means that I centered the positive part at the top left). Could anyone enlighten me about what goes wrong here?
For each dimension, the indexes of samples should be from -n/2 ... 0 ... n/2 -1, so if the dimension is odd, center around the middle. If the dimension is even, center so that before the new 0 you have one sample more than after the new 0.
E.g. -4, -3, -2, -1, 0, 1, 2, 3 for a width/height of 8 or -3, -2, -1, 0, 1, 2, 3 for a width/height of 7.
The FFT is relative to the middle, in its scale there are negative points.
In the memory the points are 0...n-1, but the FFT treats them as -ceil(n/2)...floor(n/2), where 0 is -ceil(n/2) and n-1 is floor(n/2)
The identity matrix is a matrix of zeros with 1 in the 0,0 location (the center - according to above numbering). (In the spatial domain.)
In the frequency domain the identity matrix should be a constant (all real values 1 or 1/(N*M) and all imaginary values 0).
If you do not receive this result, then the identify matrix might need padding differently (to the left and down instead of around all sides) - this may depend on the FFT implementation.
Center each dimension separately (this is an index centering, no change in actual memory).
You will probably need to pad the image (after centering) to a whole power of 2 in each dimension (2^n * 2^m where n doesn't have to equal m).
Pad relative to FFT's 0,0 location (to center, not corner) by copying existing pixels into a new larger image, using center-based-indexes in both source and destination images (e.g. (0,0) to (0,0), (0,1) to (0,1), (1,-2) to (1,-2))
Assuming your FFT uses regular floating point cells and not complex cells, the complex image has to be of size 2*ceil(2/n) * 2*ceil(2/m) even if you don't need a whole power of 2 (since it has half the samples, but the samples are complex).
If your image has more than one color channel, you will first have to reshape it, so that the channel are the most significant in the sub-pixel ordering, instead of the least significant. You can reshape and pad in one go to save time and space.
Don't forget the FFTSHIFT after the IFFT. (To swap the quadrants.)
The result of the IFFT is 0...n-1. You have to take pixels floor(n/2)+1..n-1 and move them before 0...floor(n/2).
This is done by copying pixels to a new image, copying floor(n/2)+1 to memory-location 0, floor(n/2)+2 to memory-location 1, ..., n-1 to memory-location floor(n/2), then 0 to memory-location ceil(n/2), 1 to memory-location ceil(n/2)+1, ..., floor(n/2) to memory-location n-1.
When you multiply in the frequency domain, remember that the samples are complex (one cell real then one cell imaginary) so you have to use a complex multiplication.
The result might need dividing by N^2*M^2 where N is the size of n after padding (and likewise for M and m). - You can tell this by (a. looking at the frequency domain's values of the identity matrix, b. comparing result to input.)
I think that your understanding of the Identity kernel may be off. An Identity kernel should have the 1 at the center of the 2D kernal not at the 0, 0 position.
example for a 3 x 3, you have yours setup as follows:
1, 0, 0
0, 0, 0
0, 0, 0
It should be
0, 0, 0
0, 1, 0
0, 0, 0
Check this out also
What is the "do-nothing" convolution kernel
also look here, at the bottom of page 3.
http://www.fmwconcepts.com/imagemagick/digital_image_filtering.pdf
I took the component-wise product between the image and kernel in frequency domain, then did the inverse fft. Theoretically I should be able to get the identical image back.
I don't think that doing a forward transform with a non-fft kernel, and then an inverse fft transform should lead to any expectation of getting the original image back, but perhaps I'm just misunderstanding what you were trying to say there...

Resources