I need help in figuring out the algorithm/implementation OpenCV is using for image-downsampling with non-linear scaling factors.
I know the question was already ask a few times, but most answers seem to not match OpenCV's implementation (for instance, this answer is not correct when using OpenCV: https://math.stackexchange.com/questions/48903/2d-array-downsampling-and-upsampling-using-bilinear-interpolation).
Minimal problem formulation:
I want to downsample an image of resolution 4x4 to an image of resolution 3x3 using bilinear interpolation. I am interested in the interpolation coefficients.
Example in python:
img = np.asarray([[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12],
[13, 14, 15, 16]]).astype(np.float32)
img_resized = cv2.resize(img, (3, 3), 0, 0, cv2.INTER_LINEAR).astype(np.float32)
print(img)
# [[ 1. 2. 3. 4.]
# [ 5. 6. 7. 8.]
# [ 9. 10. 11. 12.]
# [13. 14. 15. 16.]]
print(img_resized)
# [[ 1.8333333 3.1666667 4.5 ]
# [ 7.166667 8.5 9.833333 ]
# [12.5 13.833333 15.166666 ]]
Interpolation coefficients:
After a lot of trial-and-error, I figured out the interpolation coefficients OpenCV is using for this specific case.
For the corner points of the 3x3 image:
1.8333333 = 25/36 * 1 + 5/36 * 2 + 5/36 * 5 + 1/36 * 6
4.5000000 = 25/36 * 4 + 5/36 * 3 + 5/36 * 8 + 1/36 * 7
12.5000000 = 25/36 * 13 + 5/36 * 9 + 5/36 * 14 + 1/36 * 10
15.1666666 = 25/36 * 16 + 5/36 * 15 + 5/36 * 12 + 1/36 * 11
For the middle points of the 3x3 image:
8.5 = 1/4 * 6 + 1/4 * 7 + 1/4 * 10 + 1/4 * 11
For the remaining 4 points of the 3x3 image:
3.1666667 = 5/12 * 2 + 5/12 * 3 + 1/12 * 6 + 1/12 * 7
7.1666667 = 5/12 * 5 + 5/12 * 9 + 1/12 * 6 + 1/12 * 10
9.8333333 = 5/12 * 8 + 5/12 * 12 + 1/12 * 7 + 1/12 * 11
13.833333 = 5/12 * 14 + 5/12 * 15 + 1/12 * 10 + 1/12 * 11
Question:
Can someone please help me make sense of these interpolation coefficients? How are they calculated? I tried to read the source of the cv::resize() function, but it did not help me a lot :S
After playing around with various test cases, I think I know the answer to how OpenCV chooses the sample point locations. As #ChrisLuengo has pointed out in a comment, OpenCV seems to not apply a low-pass filter before downsampling, but uses (bi-)linear interpolation only.
(Possible) Solution:
Let's assume we have a 5x5 image, which pixel positions are represented with the blue circles in the graphic below. We now want to downsample it to a 3x3, or a 4x4 image, and need to find the sample positions of the new downsampled image in the original image grid.
It appears to be that OpenCV uses pixel distance of 1 for the original image grid, and a pixel distance of (OLD_SIZE / NEW_SIZE), thus here 5/3 and 5/4, for the new image grid. Moreover, it aligns both grids at the center point. Thus, OpenCV's deterministic sampling algorithms can be visualized as follows:
Visualization 5x5 to 3x3:
Visualization 5x5 to 4x4:
Sample Code (Python 2.7):
import numpy as np
import cv2
# 1. H_W is the height & width of the original image, using uniform H/W for this example
# resized_H_W is the height & width of the resized image, using uniform H/W for this example
H_W = 5
resized_H_W = 4
# 2. Create original image & Get OpenCV resized image:
img = np.zeros((H_W, H_W)).astype(np.float32)
counter = 1
for i in range(0, H_W):
for j in range(0, H_W):
img[i, j] = counter
counter += 1
img_resized_opencv = cv2.resize(img, (resized_H_W, resized_H_W), 0, 0, cv2.INTER_LINEAR).astype(np.float32)
# 3. Get own resized image:
img_resized_own = np.zeros((resized_H_W, resized_H_W)).astype(np.float32)
for i in range(0, resized_H_W):
for j in range(0, resized_H_W):
sample_x = (1.0 * H_W) / 2.0 - 0.50 + (i - (1.0 * resized_H_W - 1.0) / 2.0) * (1.0 * H_W) / (1.0 * resized_H_W)
sample_y = (1.0 * H_W) / 2.0 - 0.50 + (j - (1.0 * resized_H_W - 1.0) / 2.0) * (1.0 * H_W) / (1.0 * resized_H_W)
pixel_top_left = img[int(np.floor(sample_x)), int(np.floor(sample_y))]
pixel_top_right = img[int(np.floor(sample_x)), int(np.ceil(sample_y))]
pixel_bot_left = img[int(np.ceil(sample_x)), int(np.floor(sample_y))]
pixel_bot_right = img[int(np.ceil(sample_x)), int(np.ceil(sample_y))]
img_resized_own[i, j] = (1.0 - (sample_x - np.floor(sample_x))) * (1.0 - (sample_y - np.floor(sample_y))) * pixel_top_left + \
(1.0 - (sample_x - np.floor(sample_x))) * (sample_y - np.floor(sample_y)) * pixel_top_right + \
(sample_x - np.floor(sample_x)) * (1.0 - (sample_y - np.floor(sample_y))) * pixel_bot_left + \
(sample_x - np.floor(sample_x)) * (sample_y - np.floor(sample_y)) * pixel_bot_right
# 4. Print results:
print "\n"
print "Org. image: \n", img
print "\n"
print "Resized image (OpenCV): \n", img_resized_opencv
print "\n"
print "Resized image (own): \n", img_resized_own
print "\n"
print "MSE between OpenCV <-> Own: ", np.mean(np.square(img_resized_opencv - img_resized_own))
print "\n"
Disclaimer:
This is just my theory that I tested via ~10 test cases. I do not claim that this is 100% true.
Related
I'm interested in calculating macro f1-score by macro precision and recall manually. But the results aren't equal. What is the difference in the final formula between f1 and f1_new in code?
from sklearn.metrics import precision_score, recall_score, f1_score
y_true = [0, 1, 0, 1, 0 , 1, 1, 0]
y_pred = [0, 1, 0, 0, 1 , 1, 0, 0]
p = precision_score(y_true, y_pred, average='macro')
r = recall_score(y_true, y_pred, average='macro')
f1_new = (2 * p * r) / (p + r) # 0.6291390728476821
f1 = f1_score(y_true, y_pred, average='macro') # 0.6190476190476191
print(f1_new == f1)
# False
The f1_score is calculated in scikit-learn as follows:
all_positives = 4
all_negatives = 4
true_positives = 2
true_negatives = 3
true_positive_rate = true_positives/all_positives = 2/4
true_negative_rate = true_negatives/all_negatives = 3/4
pred_positives = 3
pred_negatives = 5
positive_predicted_value = true_positives/pred_positives = 2/3
negative_predicted_value = true_negatives/pred_negatives = 3/5
f1_score_pos = 2 * true_positive_rate * positive_predicted_value / (true_positive_rate + positive_predicted_value)
= 2 * 2/4 * 2/3 / (2/4 + 2/3)
f1_score_neg = 2 * true_negative_rate * negative_predicted_value / (true_negative_rate + negative_predicted_value)
= 2 * 3/4 * 3/5 / (3/4 + 3/5)
f1 = average(f1_score_pos, f1_score_neg)
= 2/4 * 2/3 / (2/4 + 2/3) + 3/4 * 3/5 / (3/4 + 3/5)
= 0.6190476190476191
This matches the definition given in the documentation for the 'macro' parameter of skicit-learn's f1_score: Calculate metrics for each label, and find their unweighted mean. This definition also applies to precision_score and recall_score.
Your manual calculation of the F1-score is as follows:
precision = average(positive_predicted_value, negative_predicted_value)
= average(2/3, 3/5)
= 19/30
recall = average(true_positive_rate, true_negative_rate)
= average(2/4, 3/4)
= 5/8
f1_new = 2 * precision * recall / (precision + recall)
= 2 * 19/30 * 5/8 / (19/30 + 5/8)
= 0.6291390728476821
In fact, the general formula F1 = 2 * (precision * recall) / (precision + recall) as presented in the docs is only valid for average='binary' and average='micro', but not for average='macro' and average='weighted'. In that sense, as it is currently presented in scikit-learn, the formula is misleading as it suggests that it holds irrespective of the chosen parameters, which is not the case.
I am trying to implement Harris corner detection algorithm from the scratch. The output of this algorithm should supposed to get one single pixel representing the corner, but in my code I am getting multiple pixel representing the corner. This is may be because of not implemented the final part of the algorithm that is non-maximum suppression. This I could not able to implement because I did not understand it properly. How to implement this? Along with this I am also trying to find the coordinates of these corner, how to do this with out using cv2 library?
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.image as im
# 1. Before doing any operations convert the image into gray scale image
img = im.imread('OD6.jpg')
plt.imshow(img)
plt.show()
# split
R=img[:,:,0]
G=img[:,:,1]
B=img[:,:,2]
M,N=R.shape
gray_img=np.zeros((M,N), dtype=int);
for i in range(M):
for j in range(N):
gray_img[i, j]=(R[i, j]*0.2989)+(G[i, j]*0.5870)+(B[i, j]*0.114);
plt.imshow(gray_img, cmap='gray')
plt.show()
# 2. Applying sobel filter to get the gradients of the images in x and y directions
sobelx = np.array([[-1, 0, 1], [-2, 0, 2], [-1, 0, 1]], dtype = np.float)
sobely = np.array([[-1, -2, -1], [0, 0, 0], [1, 2, 1]], dtype = np.float)
sobelxImage = np.zeros((M,N))
sobelyImage = np.zeros((M,N))
sobelGrad = np.zeros((M,N))
image = np.pad(gray_img, (1,1), 'edge')
for i in range(1, M-1):
for j in range(1, N-1):
gx = (sobelx[0][0] * image[i-1][j-1]) + (sobelx[0][1] * image[i-1][j]) + \
(sobelx[0][2] * image[i-1][j+1]) + (sobelx[1][0] * image[i][j-1]) + \
(sobelx[1][1] * image[i][j]) + (sobelx[1][2] * image[i][j+1]) + \
(sobelx[2][0] * image[i+1][j-1]) + (sobelx[2][1] * image[i+1][j]) + \
(sobelx[2][2] * image[i+1][j+1])
gy = (sobely[0][0] * image[i-1][j-1]) + (sobely[0][1] * image[i-1][j]) + \
(sobely[0][2] * image[i-1][j+1]) + (sobely[1][0] * image[i][j-1]) + \
(sobely[1][1] * image[i][j]) + (sobely[1][2] * image[i][j+1]) + \
(sobely[2][0] * image[i+1][j-1]) + (sobely[2][1] * image[i+1][j]) + \
(sobely[2][2] * image[i+1][j+1])
sobelxImage[i-1][j-1] = gx
sobelyImage[i-1][j-1] = gy
g = np.sqrt(gx * gx + gy * gy)
sobelGrad[i-1][j-1] = g
sobelxyImage = np.multiply(sobelxImage, sobelyImage)
# 3 Apply gaussian filter along x y and xy
size=3;# define the filter size
sigma=1; # define the standard deviation
size = int(size) // 2
xx, yy = np.mgrid[-size:size+1, -size:size+1]
normal = 1 / (2.0 * np.pi * sigma**2)
gg = np.exp(-((xx**2 + yy**2) / (2.0*sigma**2))) * normal
gaussx =gg
gaussy =gg
gaussxImage = np.zeros((M,N))
gaussyImage = np.zeros((M,N))
gaussxyImage = np.zeros((M,N))
gaussresult = np.zeros((M,N))
gaussimagex = np.pad(sobelxImage, (1,1), 'edge')
gaussimagey = np.pad(sobelyImage, (1,1), 'edge')
gaussimagexy = np.pad(sobelxyImage, (1,1), 'edge')
for i in range(1, M-1):
for j in range(1, N-1):
ggx = (gaussx[0][0] * gaussimagex[i-1][j-1]) + (gaussx[0][1] *gaussimagex[i-1][j]) + \
(gaussx[0][2] * gaussimagex[i-1][j+1]) + (gaussx[1][0] * gaussimagex[i][j-1]) + \
(gaussx[1][1] * gaussimagex[i][j]) + (gaussx[1][2] * gaussimagex[i][j+1]) + \
(gaussx[2][0] * gaussimagex[i+1][j-1]) + (gaussx[2][1] * gaussimagex[i+1][j]) + \
(gaussx[2][2] * gaussimagex[i+1][j+1])
ggy = (gaussy[0][0] * gaussimagey[i-1][j-1]) + (gaussy[0][1] * gaussimagey[i-1][j]) + \
(gaussy[0][2] * gaussimagey[i-1][j+1]) + (gaussy[1][0] * gaussimagey[i][j-1]) + \
(gaussy[1][1] * gaussimagey[i][j]) + (gaussy[1][2] * gaussimagey[i][j+1]) + \
(gaussy[2][0] * gaussimagey[i+1][j-1]) + (gaussy[2][1] * gaussimagey[i+1][j]) + \
(gaussy[2][2] * gaussimagey[i+1][j+1])
crossgg = (gg[0][0] * gaussimagexy[i-1][j-1]) + (gg[0][1] * gaussimagexy[i-1][j]) + \
(gg[0][2] * gaussimagexy[i-1][j+1]) + (gg[1][0] * gaussimagexy[i][j-1]) + \
(gg[1][1] * gaussimagexy[i][j]) + (gg[1][2] * gaussimagexy[i][j+1]) + \
(gg[2][0] * gaussimagexy[i+1][j-1]) + (gg[2][1] * gaussimagexy[i+1][j]) + \
(gg[2][2] * gaussimagexy[i+1][j+1])
gaussxImage[i-1][j-1] = ggx
gaussyImage[i-1][j-1] = ggy
gaussxyImage[i-1][j-1] = crossgg
blur = np.sqrt(ggx * ggx + ggy * ggy)
gaussresult[i-1][j-1] = blur
det = gaussxImage *gaussyImage - (gaussxyImage ** 2)
alpha = 0.04
trace = alpha * (gaussxImage +gaussyImage) ** 2
#finding the harris response
R = det - trace
# applying threshold
for i in range(1, M-1):
for j in range(1, N-1):
if R[i][j] > 140:
R[i][j]==0
else:
R[i][j]==255
f, ax1 = plt.subplots(1, 1, figsize=(5,5))
ax1.set_title('corners')
ax1.imshow(R, cmap="gray")
First of all, there are a couple of bugs in your code:
The R[i][j]==0 part in the final thresholding loop should be R[i][j]=0. Note thought that you don't have to go through a loop, you can just do something like R[R>thresh]=255 etc.
If I'm not mistaken, the R values that corresponds to corners in Harris' algorithm are the large positive ones. When I run your code, I get R values that are negative for edges and corners, so I suspect that there is a bug somewhere there.
At this point, I don't think that the main issue in your code is non-maxima suppression, but in case it still is, here is a quick explanation of non maxima suppression and the paper that we discussed in the comments:
Basically, the idea of non-maximal suppression is very simple: if the (corner) response of a point x is not the highest in a neighborhood (that you are free to define depending on your needs), then you don't keep it. In your case, it will probably be simply sufficient to compare the response of each of your detected interest points with the response of its closest neighbors and keep them only if they are higher with respect to all of them. As for the paper that we discussed, its aim is to suppress keypoints (that are not local maxima) in a way that results in a more uniform spatial distribution. Let S be the keypoints lits, sorted in decreasing order of corner response. The idea is to assign each of them to a "suppression radius", that is, a radius in which those points wont be considered a local maximum. As S[0] has the highest corner response in the image, it will never be suppressed, so you can set its radius of suppression radius_list[0]=inf. Next, let's look at S[1]. As the list S is sorted, the only point with highest response than S[1] is S[0], and from that, it follows that the radius at which S[1] stops being a local maximum is Dist(S[1], S[2]). That is, once we include S[0] in the local neighborhood of S[1], since response[S[0]]>response[S[1]], S[0] will become the maximum in that neighborhood. Note that as you continue like this, the radii that you consider will become smaller and smaller. Once you have computed radius_list, assuming you need N feature points, you will just select the N points that have the highest radius_list values. In pseudo-code:
#let S be the keypoints, sorted in decreasing corner response order
#Assume you want only to keep N keypoints at the end
radius=zeros(len(S))
radius[0]=inf
for i in range(len(S[1:])):
candidate_radii=[]
for j in range(0,i):
if response[i]<response[j]*some_const:#you can set some_const to something in [0.9,1]
candidate_radii.append(image_space_dist(S[i],S[j]))
radius[i]=min(candidate_radii)
sorted_indexes = argsort(radius)
kept_points = S[sorted_indexes][:N]
Hope this helps.
Based on my understanding, CNN output size for 1D is
output_size = (input_size - kernel_size + 2*padding)//stride + 1
Refer to PyTorch DQN Tutorial. In the tutorial, it uses 0 padding, which is fine. However, it computes the output size as follows:
def conv2d_size_out(size, kernel_size = 5, stride = 2):
return (size - (kernel_size - 1) - 1) // stride + 1
It the above a mistake or is there something I missed?
No, it's not a mistake because
size - (kernel_size - 1) - 1 = size - kernel_size + 2 * 0
with 0 as padding
(it's not code, its an equation sorry for the formatting)
I think the tutorial is using the formula for the output size from the official document which is
output_size = ((input_size + 2 * padding - dialation * (kernel_size - 1) - 1) // stride + 1
official doc for conv1d
I am creating a rails application which is like a game. So it has points and levels. For example: to become level one the user has to get atleast 100 points and again for level two the user has to reach level 2 the user has to collect 200 points. The level difference changes after every 10 levels i.e., The difference between each level changes after 10 levels always. By that I mean the difference in points between level one and two is 100 and the difference in points in level 11 and 12 is 150 and so on. There is no upper bound for levels.
Now my question is let's say a user's total points is 3150 and just got updated to 3155. What's the optimal solution to find the current level and update it if needed?
I can get a solution using while loops and again looping inside it which will give a result in O(n^2). I need something better.
I think this code works but I'm not sure if this is the best way to go about it
def get_level(points)
diff = 100
sum = 0
level = -1
current_level = 0
while level.negative?
10.times do |i|
current_level += 1
sum += diff
if points > sum
next
elsif points <= sum
level = current_level
break
end
end
diff += 50
end
puts level
end
I wrote a get_points function (it should not be difficult). Then based on it get_level function in which it was necessary to solve the quadratic equation to find high value, and then calc low.
If you have any questions, let me know.
Check output here.
#!/usr/bin/env python3
import math
def get_points(level):
high = (level + 1) // 10
low = (level + 1) % 10
high_point = 250 * high * high + 750 * high # (3 + high) * high // 2 * 500
low_point = (100 + 50 * high) * low
return low_point + high_point
def get_level(points):
# quadratic equation
a = 250
b = 750
c = -points
d = b * b - 4 * a * c
x = (-b + math.sqrt(d)) / (2 * a)
high = int(x)
remainder = points - (250 * high * high + 750 * high)
low = remainder // (100 + 50 * high)
level = high * 10 + low
return level
def main():
for l in range(0, 40):
print(f'{l:3d} {get_points(l - 1):5d}..{get_points(l) - 1}')
for level, (l, r) in (
(1, (100, 199)),
(2, (200, 299)),
(9, (900, 999)),
(10, (1000, 1149)),
(11, (1150, 1299)),
(19, (2350, 2499)),
(20, (2500, 2699)),
):
for p in range(l, r + 1): # for in [l, r]
assert get_level(p) == level, f'{p} {l}'
if __name__ == '__main__':
main()
Why did you set the value of a=250 and b = 750? Can you explain that to me please?
Let's write out every 10 level and the difference between points:
lvl - pnt (+delta)
10 - 1000 (+1000 = +100 * 10)
20 - 2500 (+1500 = +150 * 10)
30 - 4500 (+2000 = +200 * 10)
40 - 7000 (+2500 = +250 * 10)
Divide by 500 (10 levels * 50 difference changes) and received an arithmetic progression starting at 2:
10 - 2 (+2)
20 - 5 (+3)
30 - 9 (+4)
40 - 14 (+5)
Use arithmetic progression get points formula for level = k * 10 equal to:
sum(x for x in 2..k+1) * 500 =
(2 + k + 1) * k / 2 * 500 =
(3 + k) * k * 250 =
250 * k * k + 750 * k
Now we have points and want to find the maximum high such that point >= 250 * high^2 + 750 * high, i. e. 250 * high^2 + 750 * high - points <= 0. Value a = 250 is positive and branches of the parabola are directed up. Now we find the solution of quadratic equation 250 * high^2 + 750 * high - points = 0 and discard the real part (is high = int(x) in python script).
Lets say I want to manually calculate the gradient update with respect to the Kullback-Liebler divergence loss, say on a VAE (see an actual example from pytorch sample documentation here):
KLD = -0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp())
where the logvar is (for simplicitys sake, ignoring activation functions and multiple layers etc.) basically a single layer transformation from a 400 dim feature vector into a 20 dim one:
self.fc21 = nn.Linear(400, 20)
logvar = fc21(x)
I'm just not mathematically understanding how you take the gradient of this, with respect to the weight vector for fc21. Mathematically I thought this would look like:
KL = -.5sum(1 + Wx + b - m^2 - e^{Wx + b})
dKL/dW = -.5 (x - e^{Wx + b}x)
where W is the weight matrix of the fc21 layer. But here this result isn't in the same shape as W (20x400). Like, x is just a 400 feature vector. So how would I perform SGD on this? Does x just broadcast to the second term, and if so why? I feel like I'm just missing some mathematical understanding here...
Let's simplify the example a bit and assume a fully connected layer of input shape 3 and output shape 2, then:
W = [[w1, w2, w3], [w4, w5, w6]]
x = [x1, x2, x3]
y = [w1*x1 + w2*x2 + w3*x3, w4*x1 + w5*x2 + w6*x3]
D_KL = -0.5 * [ 1 + w1*x1 + w2*x2 + w3*x3 + w4*x1 + w5*x2 + w6*x3 + b - m^2 + e^(..)]
grad(D_KL, w1) = -0.5 * [x1 + x1* e^(..)]
grad(D_KL, w2) = -0.5 * [x2 + x2* e^(..)]
...
grad(D_KL, W) = [[grad(D_KL, w1), grad(D_KL, w2), grad(D_KL,w3)],
[grad(D_KL, w4), grad(D_KL, w5), grad(D_KL,w6)]
]
This generalizes for higher order tensors of any dimensionality. Your differentiation is wrong in treating x and W as scalars rather than taking element-wise partial derivatives.