Background:
Thanks for your attention! I am learning the basic knowledge of 2D convolution, linear algebra and PyTorch. I encounter the implementation problem about the psedo-inverse of the convolution operator. Specifically, I have no idea about how to implement it in an efficient way. Please see the following problem statements for details. Any help/tip/suggestion is welcomed.
(Thanks a lot for your attention!)
The Original Problem:
I have an image feature x with shape [b,c,h,w] and a 3x3 convolutional kernel K with shape [c,c,3,3]. There is y = K * x. How to implement the corresponding pseudo-inverse on y in an efficient way?
There is [y = K * x = Ax], how to implement [x_hat = (A^+)y]?
I guess that there should be some operations using torch.fft. However, I still have no idea about how to implement it. I do not know if there exists an implementation previously.
import torch
import torch.nn.functional as F
c = 32
K = torch.randn(c, c, 3, 3)
x = torch.randn(1, c, 128, 128)
y = F.conv2d(x, K, padding=1)
print(y.shape)
# How to implement pseudo-inverse for y = K * x in an efficient way?
Some of My Efforts:
I may know that the 2D convolution is a linear operator. It is equivalent to a "matrix product" operator. We can actually write out the matrix form of the convolution and calculate its psedo-inverse. However, I think this type of operation will be inefficient. And I have no idea about how to implement it in an efficient way.
According to Wikipedia, the psedo-inverse may satisfy the property of A(A_pinv(x))=x, where A is the convolutional operator, A_pinv is its psedo-inverse, and x may be any image feature.
(Thanks again for reading such a long post!)
This takes the problem to another level.
The convolution itself is a linear operation, you can determine the matrix of the operation and solve a least square problem directly [1], or compute the pseudo-inverse as you mentioned, and then apply to different outputs and predicting a projection of the input.
I am changing your code to using padding=0
import torch
import torch.nn.functional as F
# your code
c = 32
K = torch.randn(c, c, 1, 1)
x = torch.randn(4, c, 128, 128)
y = F.conv2d(x, K, bias=torch.zeros((c,)))
Also, as you probably already suggested the convolution can be computed as ifft(fft(h)*fft(x)). However, the conv2d function is a cross-correlation, so you have to conjugate the filter leading to ifft(fft(h)*fft(x)), also you have to apply this to two axes, and you have to make sure the FFT is calcuated using the same representation (size), since the data is real, we can apply multi-dimensional real FFT. To be complete, conv2d works on multiple channels, so we have to calculate summations of convolutions. Since the FFT is linear, we can simply compute the summations on the frequency domain
using einsum.
s = y.shape[-2:]
K_f = torch.fft.rfftn(K, s)
x_f = torch.fft.rfftn(x, s)
y_f = torch.einsum('jkxy,ikxy->ijxy', K_f.conj(), x_f)
y_hat = torch.fft.irfftn(y_f, s)
Except for the borders it should be accurate (remember FFT computes a cyclic convolution).
torch.max(abs(y_hat[:,:,:-2,:-2] - y[:,:,:,:]))
Now, notice the pattern jk,ik->ij on the einsum, that means y_f[i,j] = sum(K_f[j,k] * x_f[i,k]) = x_f # K_f.T, if # is the matrix product on the first two dimensions. So to invert this operation we have to can interpret the first two dimensions as matrices. The function pinv will compute pseudo-inverses on the last two axes, so in order to use that we have to permute the axes. If we right multiply the output by the pseudo-inverse of transposed K_f we should invert this operation.
s = 128,128
K_f = torch.fft.rfftn(K, s)
K_f_inv = torch.linalg.pinv(K_f.T).T
y_f = torch.fft.rfftn(y_hat, s)
x_f = torch.einsum('jkxy,ikxy->ijxy', K_f_inv.conj(), y_f)
x_hat = torch.fft.irfftn(x_f, s)
print(torch.mean((x - x_hat)**2) / torch.mean((x)**2))
Nottice that I am using the full convolution, but the conv2d actually cropped the images. Let's apply that
y_hat[:,:,128-(k-1):,:] = 0
y_hat[:,:,:,128-(k-1):] = 0
Repeating the calculation you will see that the input is not accurate anymore, so you have to be careful about what you do with your convolution, but in some situations where you can get this to work it will be in fact efficient.
s = 128,128
K_f = torch.fft.rfftn(K, s)
K_f_inv = torch.linalg.pinv(K_f.T).T
y_f = torch.fft.rfftn(y_hat, s)
x_f = torch.einsum('jkxy,ikxy->ijxy', K_f_inv.conj(), y_f)
x_hat = torch.fft.irfftn(x_f, s)
print(torch.mean((x - x_hat)**2) / torch.mean((x)**2))
I have gone through the code below and would like to know how can I count the outlier points and inlier points after using RANSAC? could you point to a good code how it can be done?
Second question, which feature matching algorithm is better: BFMatcher.knnMatch() with Test ratio or bf = cv.BFMatcher(cv.NORM_HAMMING, crossCheck=True) with shortest distance? any reference for this comparison?
**# BFMatcher with default params
bf = cv.BFMatcher()
matches = bf.knnMatch(des1, des2, k=2)
# Apply ratio test
good_matches = []
for m,n in matches:
if m.distance < 0.75*n.distance:
good_matches.append([m])
# Draw matches
img3=cv.drawMatchesKnn(img1,kp1,img2,kp2,good_matches,None,flags=cv.DrawMatchesFlags_NOT_DRAW_SINGLE_POINTS)
cv.imwrite('matches.jpg', img3)
# Select good matched keypoints
ref_matched_kpts = np.float32([kp1[m[0].queryIdx].pt for m in good_matches])
sensed_matched_kpts = np.float32([kp2[m[0].trainIdx].pt for m in good_matches])
# Compute homography
H, status = cv.findHomography(sensed_matched_kpts, ref_matched_kpts, cv.RANSAC,5.0)**
Count number of outliers and inliers
# number of detected outliers: len(status) - np.sum(status)
# number of detected inliers: np.sum(status)
# Inlier Ratio, number of inlier/number of matches: float(np.sum(status)) / float(len(status))
Feature Matching Algorithm
I would say that if you are using the sparse feature-based algorithm (SIFT or SURF), BFMatcher.knnMatch() with Test ratio is preferred. While the bf = cv.BFMatcher(cv.NORM_HAMMING, crossCheck=True) is used for binary-based algorithm (ORB, FAST, etc). My suggestion would be try both algorithms on your project to investigate which one is better.
I'm trying to write an app for wild leopard classification and conservation in South Asia. For this, I have the main challenge to identify the leopards by their spot pattern in the forehead.
The current approach I am using is,
Store the known leopard forehead images as a base list
Get the user-provided leopard image and crop the forehead of the leopard
Pre-process the images with the bilateral filter to reduce the noise
Identify the keypoints using the SIFT algorithm
Use FLANN matcher to get KNN matches
Select good matches based on the ratio threshold
Sample code:
# Pre-Process & reduce noise.
img1 = cv.bilateralFilter(baseImg, 9, 75, 75)
img2 = cv.bilateralFilter(userImage, 9, 75, 75)
detector = cv.xfeatures2d_SIFT.create()
keypoints1, descriptors1 = detector.detectAndCompute(img1, None)
keypoints2, descriptors2 = detector.detectAndCompute(img2, None)
# FLANN parameters
FLANN_INDEX_KDTREE = 1
index_params = dict(algorithm=FLANN_INDEX_KDTREE, trees=5)
search_params = dict(checks=50) # or pass empty dictionary
matcher = cv.FlannBasedMatcher(index_params, search_params)
knn_matches = matcher.knnMatch(descriptors1, descriptors2, 2)
allmatchpointcount = len(knn_matches)
ratio_thresh = 0.7
good_matches = []
for m, n in knn_matches:
if m.distance < ratio_thresh * n.distance:
good_matches.append(m)
goodmatchpointcount = len(good_matches)
print("Good match count : ", goodmatchpointcount)
matchsuccesspercentage = goodmatchpointcount/allmatchpointcount*100
print("Match percentage : ", matchsuccesspercentage)
Problems I have with this approach:
The method has a medium-low success rate and tends to break when there is a new user image.
The user images are sometimes taken from different angles where some key patterns are not visible or warped.
The user image quality affects the match result significantly.
I appreciate any suggestions to get this improved in any manner.
Sample Images
Base Image
Above is matching to below: (Incorrect pattern matched)
More sample images as requested.
I have the current programming problem in Torch.
I have a table made of two Tensors:
require 'nn'
N = 4
aaaTensor = torch.randn(N)
bbbTensor = torch.randn(N)
thisTable = {aaaTensor, bbbTensor}
I would like to compute the cosine distance for each pair of single values of aaaTensor and bbbTensor:
the cosine distance between aaaTensor[1] and bbbTensor[1]
the cosine distance between aaaTensor[2] and bbbTensor[2]
...
the cosine distance between aaaTensor[N] and bbbTensor[N]
And I don't know how to do this.
If I use the nn.CosineDistance() module (link), it will compute the general cosine distance between aaaTensor and bbbTensor:
cosine = nn.CosineDistance()
cosine:forward{aaaTensor, bbbTensor}
0.7185
[torch.DoubleTensor of size 1]
I want to have N=4 outputs.
How could I implement this one-by-one cosine distance computation?
Thanks
The documentation says nn.CosineDistance() accepts batches. So (although cosine distance of single values does not make sense) you can do it like;
require 'nn'
N = 4
aaaTensor = torch.randn(N,1)
bbbTensor = torch.randn(N,1)
thisTable = {aaaTensor, bbbTensor}
cosine = nn.CosineDistance()
cosine:forward{aaaTensor, bbbTensor}
Not drawing matches. Opencv 3.0, fully updated Ubuntu. The code runs but it doesn't show any matches. The test region is directly cut and copied from the image to match.
import numpy as np
import cv2
cv2.ocl.setUseOpenCL(False)
img1 = cv2.imread('images/ingrassroi.png',0)
img2 = cv2.imread('images/ingrass.png',0)
img3 = img1.copy()
# Initiate ORB detector
orb = cv2.ORB_create()
# compute the descriptors with ORB
kp1, des1 = orb.detectAndCompute(img1,None)
kp2, des2 = orb.detectAndCompute(img2,None)
# create BFMatcher object
bf = cv2.BFMatcher(cv2.NORM_HAMMING, crossCheck=True)
# Match descriptors.
matches = bf.match(des1,des2)
# Sort them in the order of their distance.
matches = sorted(matches, key = lambda x:x.distance)
# Draw first 10 matches.
img3 = cv2.drawMatches(img1,kp1,img2,kp2,matches[:10],None, flags=2)
cv2.imshow("Matches",img3)
cv2.waitKey(-1)
Turns out that the image to match with was too small. It was not finding any key points in the train image. I enlarged the area that I cropped from the test image and it found matches and correctly identified the cigarette butt in the image. Just an FYI if you decide to try the FLANN based matcher on the same page in the OpenCV Python tutorials, be sure to define FLANN_INDEX_LSH = 6.