I'm reading source code of opencv: cvProjectPoints2, cvRodrigues2.
In cvProjectPoints2, the Jacobian matrix is first got using cvRodrigues2( &_r, &matR, &_dRdr );, and then used to calculate the partial derivative of pixels w.r.t the rvec (axis-angle representation)。
if( dpdr_p )
{
double dx0dr[] =
{
X*dRdr[0] + Y*dRdr[1] + Z*dRdr[2],
X*dRdr[9] + Y*dRdr[10] + Z*dRdr[11],
X*dRdr[18] + Y*dRdr[19] + Z*dRdr[20]
};
double dy0dr[] =
{
X*dRdr[3] + Y*dRdr[4] + Z*dRdr[5],
X*dRdr[12] + Y*dRdr[13] + Z*dRdr[14],
X*dRdr[21] + Y*dRdr[22] + Z*dRdr[23]
};
double dz0dr[] =
{
X*dRdr[6] + Y*dRdr[7] + Z*dRdr[8],
X*dRdr[15] + Y*dRdr[16] + Z*dRdr[17],
X*dRdr[24] + Y*dRdr[25] + Z*dRdr[26]
};
for( j = 0; j < 3; j++ )
{
double dxdr = z*(dx0dr[j] - x*dz0dr[j]);
double dydr = z*(dy0dr[j] - y*dz0dr[j]);
double dr2dr = 2*x*dxdr + 2*y*dydr;
double dcdist_dr = k[0]*dr2dr + 2*k[1]*r2*dr2dr + 3*k[4]*r4*dr2dr;
double dicdist2_dr = -icdist2*icdist2*(k[5]*dr2dr + 2*k[6]*r2*dr2dr + 3*k[7]*r4*dr2dr);
double da1dr = 2*(x*dydr + y*dxdr);
double dmxdr = fx*(dxdr*cdist*icdist2 + x*dcdist_dr*icdist2 + x*cdist*dicdist2_dr +
k[2]*da1dr + k[3]*(dr2dr + 2*x*dxdr));
double dmydr = fy*(dydr*cdist*icdist2 + y*dcdist_dr*icdist2 + y*cdist*dicdist2_dr +
k[2]*(dr2dr + 2*y*dydr) + k[3]*da1dr);
dpdr_p[j] = dmxdr;
dpdr_p[dpdr_step+j] = dmydr;
}
dpdr_p += dpdr_step*2;
}
The shape of dRdr is 3*9, and from how the indices of dRdr is used:
X*dRdr[0] + Y*dRdr[1] + Z*dRdr[2], //-> dx0dr1
X*dRdr[9] + Y*dRdr[10] + Z*dRdr[11], //-> dx0dr2
X*dRdr[18] + Y*dRdr[19] + Z*dRdr[20] //-> dx0dr3
the Jacobian matrix seems to be:
dR1/dr1, dR2/dr1, ..., dR9/dr1,
dR1/dr2, dR2/dr2, ..., dR9/dr2,
dR1/dr3, dR2/dr3, ..., dR9/dr3,
But to my knowledge the Jacobian matrix should be of shape 9*3, since it's derivatives of R(1~9) w.r.t r(1~3):
dR1/dr1, dR1/dr2, dR1/dr3,
dR2/dr1, dR2/dr2, dR2/dr3,
...
...
dR9/dr1, dR9/dr2, dR9/dr3,
As the docs of cvRodrigues2 says:
jacobian – Optional output Jacobian matrix, 3x9 or 9x3, which is a
matrix of partial derivatives of the output array components with
respect to the input array components.
So am I misunderstanding the code & docs? Or is the code using other convention? Or is it a bug (not likely...)?
If you look up the docs:
src – Input rotation vector (3x1 or 1x3) or rotation matrix (3x3).
dst – Output rotation matrix (3x3) or rotation vector (3x1 or 1x3), respectively.
jacobian – Optional output Jacobian matrix, 3x9 or 9x3, which is a matrix of partial derivatives of the output array components with respect to the input array components.
As you see you can switch source and destination places(mathematically it will be exactly transposition), but the code does not account for it.
Therefore, indeed you've got a transposed Jacobian, because you switched first arguments places(from default places for their types). Switch them again, and you'll get normal Jacobian!
Related
I am trying to implement Harris corner detection algorithm from the scratch. The output of this algorithm should supposed to get one single pixel representing the corner, but in my code I am getting multiple pixel representing the corner. This is may be because of not implemented the final part of the algorithm that is non-maximum suppression. This I could not able to implement because I did not understand it properly. How to implement this? Along with this I am also trying to find the coordinates of these corner, how to do this with out using cv2 library?
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.image as im
# 1. Before doing any operations convert the image into gray scale image
img = im.imread('OD6.jpg')
plt.imshow(img)
plt.show()
# split
R=img[:,:,0]
G=img[:,:,1]
B=img[:,:,2]
M,N=R.shape
gray_img=np.zeros((M,N), dtype=int);
for i in range(M):
for j in range(N):
gray_img[i, j]=(R[i, j]*0.2989)+(G[i, j]*0.5870)+(B[i, j]*0.114);
plt.imshow(gray_img, cmap='gray')
plt.show()
# 2. Applying sobel filter to get the gradients of the images in x and y directions
sobelx = np.array([[-1, 0, 1], [-2, 0, 2], [-1, 0, 1]], dtype = np.float)
sobely = np.array([[-1, -2, -1], [0, 0, 0], [1, 2, 1]], dtype = np.float)
sobelxImage = np.zeros((M,N))
sobelyImage = np.zeros((M,N))
sobelGrad = np.zeros((M,N))
image = np.pad(gray_img, (1,1), 'edge')
for i in range(1, M-1):
for j in range(1, N-1):
gx = (sobelx[0][0] * image[i-1][j-1]) + (sobelx[0][1] * image[i-1][j]) + \
(sobelx[0][2] * image[i-1][j+1]) + (sobelx[1][0] * image[i][j-1]) + \
(sobelx[1][1] * image[i][j]) + (sobelx[1][2] * image[i][j+1]) + \
(sobelx[2][0] * image[i+1][j-1]) + (sobelx[2][1] * image[i+1][j]) + \
(sobelx[2][2] * image[i+1][j+1])
gy = (sobely[0][0] * image[i-1][j-1]) + (sobely[0][1] * image[i-1][j]) + \
(sobely[0][2] * image[i-1][j+1]) + (sobely[1][0] * image[i][j-1]) + \
(sobely[1][1] * image[i][j]) + (sobely[1][2] * image[i][j+1]) + \
(sobely[2][0] * image[i+1][j-1]) + (sobely[2][1] * image[i+1][j]) + \
(sobely[2][2] * image[i+1][j+1])
sobelxImage[i-1][j-1] = gx
sobelyImage[i-1][j-1] = gy
g = np.sqrt(gx * gx + gy * gy)
sobelGrad[i-1][j-1] = g
sobelxyImage = np.multiply(sobelxImage, sobelyImage)
# 3 Apply gaussian filter along x y and xy
size=3;# define the filter size
sigma=1; # define the standard deviation
size = int(size) // 2
xx, yy = np.mgrid[-size:size+1, -size:size+1]
normal = 1 / (2.0 * np.pi * sigma**2)
gg = np.exp(-((xx**2 + yy**2) / (2.0*sigma**2))) * normal
gaussx =gg
gaussy =gg
gaussxImage = np.zeros((M,N))
gaussyImage = np.zeros((M,N))
gaussxyImage = np.zeros((M,N))
gaussresult = np.zeros((M,N))
gaussimagex = np.pad(sobelxImage, (1,1), 'edge')
gaussimagey = np.pad(sobelyImage, (1,1), 'edge')
gaussimagexy = np.pad(sobelxyImage, (1,1), 'edge')
for i in range(1, M-1):
for j in range(1, N-1):
ggx = (gaussx[0][0] * gaussimagex[i-1][j-1]) + (gaussx[0][1] *gaussimagex[i-1][j]) + \
(gaussx[0][2] * gaussimagex[i-1][j+1]) + (gaussx[1][0] * gaussimagex[i][j-1]) + \
(gaussx[1][1] * gaussimagex[i][j]) + (gaussx[1][2] * gaussimagex[i][j+1]) + \
(gaussx[2][0] * gaussimagex[i+1][j-1]) + (gaussx[2][1] * gaussimagex[i+1][j]) + \
(gaussx[2][2] * gaussimagex[i+1][j+1])
ggy = (gaussy[0][0] * gaussimagey[i-1][j-1]) + (gaussy[0][1] * gaussimagey[i-1][j]) + \
(gaussy[0][2] * gaussimagey[i-1][j+1]) + (gaussy[1][0] * gaussimagey[i][j-1]) + \
(gaussy[1][1] * gaussimagey[i][j]) + (gaussy[1][2] * gaussimagey[i][j+1]) + \
(gaussy[2][0] * gaussimagey[i+1][j-1]) + (gaussy[2][1] * gaussimagey[i+1][j]) + \
(gaussy[2][2] * gaussimagey[i+1][j+1])
crossgg = (gg[0][0] * gaussimagexy[i-1][j-1]) + (gg[0][1] * gaussimagexy[i-1][j]) + \
(gg[0][2] * gaussimagexy[i-1][j+1]) + (gg[1][0] * gaussimagexy[i][j-1]) + \
(gg[1][1] * gaussimagexy[i][j]) + (gg[1][2] * gaussimagexy[i][j+1]) + \
(gg[2][0] * gaussimagexy[i+1][j-1]) + (gg[2][1] * gaussimagexy[i+1][j]) + \
(gg[2][2] * gaussimagexy[i+1][j+1])
gaussxImage[i-1][j-1] = ggx
gaussyImage[i-1][j-1] = ggy
gaussxyImage[i-1][j-1] = crossgg
blur = np.sqrt(ggx * ggx + ggy * ggy)
gaussresult[i-1][j-1] = blur
det = gaussxImage *gaussyImage - (gaussxyImage ** 2)
alpha = 0.04
trace = alpha * (gaussxImage +gaussyImage) ** 2
#finding the harris response
R = det - trace
# applying threshold
for i in range(1, M-1):
for j in range(1, N-1):
if R[i][j] > 140:
R[i][j]==0
else:
R[i][j]==255
f, ax1 = plt.subplots(1, 1, figsize=(5,5))
ax1.set_title('corners')
ax1.imshow(R, cmap="gray")
First of all, there are a couple of bugs in your code:
The R[i][j]==0 part in the final thresholding loop should be R[i][j]=0. Note thought that you don't have to go through a loop, you can just do something like R[R>thresh]=255 etc.
If I'm not mistaken, the R values that corresponds to corners in Harris' algorithm are the large positive ones. When I run your code, I get R values that are negative for edges and corners, so I suspect that there is a bug somewhere there.
At this point, I don't think that the main issue in your code is non-maxima suppression, but in case it still is, here is a quick explanation of non maxima suppression and the paper that we discussed in the comments:
Basically, the idea of non-maximal suppression is very simple: if the (corner) response of a point x is not the highest in a neighborhood (that you are free to define depending on your needs), then you don't keep it. In your case, it will probably be simply sufficient to compare the response of each of your detected interest points with the response of its closest neighbors and keep them only if they are higher with respect to all of them. As for the paper that we discussed, its aim is to suppress keypoints (that are not local maxima) in a way that results in a more uniform spatial distribution. Let S be the keypoints lits, sorted in decreasing order of corner response. The idea is to assign each of them to a "suppression radius", that is, a radius in which those points wont be considered a local maximum. As S[0] has the highest corner response in the image, it will never be suppressed, so you can set its radius of suppression radius_list[0]=inf. Next, let's look at S[1]. As the list S is sorted, the only point with highest response than S[1] is S[0], and from that, it follows that the radius at which S[1] stops being a local maximum is Dist(S[1], S[2]). That is, once we include S[0] in the local neighborhood of S[1], since response[S[0]]>response[S[1]], S[0] will become the maximum in that neighborhood. Note that as you continue like this, the radii that you consider will become smaller and smaller. Once you have computed radius_list, assuming you need N feature points, you will just select the N points that have the highest radius_list values. In pseudo-code:
#let S be the keypoints, sorted in decreasing corner response order
#Assume you want only to keep N keypoints at the end
radius=zeros(len(S))
radius[0]=inf
for i in range(len(S[1:])):
candidate_radii=[]
for j in range(0,i):
if response[i]<response[j]*some_const:#you can set some_const to something in [0.9,1]
candidate_radii.append(image_space_dist(S[i],S[j]))
radius[i]=min(candidate_radii)
sorted_indexes = argsort(radius)
kept_points = S[sorted_indexes][:N]
Hope this helps.
I'm doing an LSTM cell implementation from scratch and I was thinking of implementing the computation of the gates with multiple neural layers instead of just using the single layer version : sigmoid(dot(W,concat(a_prev,xt)) + b). I can't seem to find any literature on it. Does it work ? Can it converge ?
This the standard LSTM cell forward propagation code that I learned on Andrew Ng's Deep Learning course :
concat = np.zeros((n_a + n_x, m))
concat[: n_a, :] = a_prev
concat[n_a :, :] = xt
ft = sigmoid(np.dot(Wf, concat) + bf)
it = sigmoid(np.dot(Wi, concat) + bi)
cct = np.tanh(np.dot(Wc, concat) + bc)
c_next = ft * c_prev + it * cct
ot = sigmoid(np.dot(Wo, concat) + bo)
a_next = ot * np.tanh(c_next)
# Compute prediction of the LSTM cell
yt_pred = softmax(np.dot(Wy, a_next) + by)
This the LSTM cell I want to use :
concat = np.zeros((n_a + n_x, m))
concat[: n_a, :] = a_prev
concat[n_a:, :] = xt
ft1 = sigmoid(np.dot(Wf1, concat) + bf1)
ft2 = sigmoid(np.dot(Wf2, ft1) + bf2)
it1 = sigmoid(np.dot(Wi11, concat) + bi1)
it2 = sigmoid(np.dot(Wi12, it1) + bi2)
cct1 = np.tanh(np.dot(Wc1, concat) + bc1)
cct2 = np.tanh(np.dot(Wc2, cct1) + bc2)
c_next = ft2 * c_prev + it2 * cct2
ot1 = sigmoid(np.dot(Wo1, concat) + bo1)
ot2 = sigmoid(np.dot(Wo2, ot1) + bo2)
a_next = ot2 * np.tanh(c_next)
# Compute prediction of the LSTM cell
yt_pred1 = softmax(np.dot(Wy1, a_next) + by1)
I am given 4 camera extrinsic parameter matrices, and i wrote some code to display those cameras and their vector systems in 3D.
Here is the code:
def plot_cameras(views):
fig = plt.figure()
ax = fig.gca(projection='3d')
ax.set_aspect('equal')
for name, view in views.items():
#for name, view in {'test_cam': 0}.items():
m = view.cam.m
#m = Camera.make_lookat_m(
# colvec([10, 10, 10]),
# colvec( [0,0,0] ),
# colvec([0, 0, -1])
# )
r = m[:3, :3].copy()
r_t = r.T
t = m[:3, 3].copy()
pos = -r_t.dot(t)
x_cam, y_cam, z_cam = pos # Camera pose
u = 100*r_t[:, 0]
v = 100*r_t[:, 1]
w = 100*r_t[:, 2] # Camera u,v,w vectors
ax.text(x_cam, y_cam, z_cam, name)
ax.plot3D(
[x_cam, x_cam + u[0]],
[y_cam, y_cam + u[1]],
[z_cam, z_cam + u[2]],
color='red')
ax.plot3D(
[x_cam, x_cam + v[0]],
[y_cam, y_cam + v[1]],
[z_cam, z_cam + v[2]],
color='green')
ax.plot3D(
[x_cam, x_cam + w[0]],
[y_cam, y_cam + w[1]],
[z_cam, z_cam + w[2]],
color='blue')
ax.plot3D(
[0],
[0],
[0],
color='red',
marker='*'
)
ax.set_xlabel('X')
ax.set_ylabel('Y')
ax.set_zlabel('Z')
set_axes_equal(ax)
plt.show()
I have 4 cameras at 0, +-25 and +90 degrees of the target.
I am told that these cameras are in OpenCV convention but my function clearly shows they are in OpenGL convention (looking down negative z axis).
Am I properly decomposing the camera matrix and extracting the vectors?
If so, is there a way of transforming my OpenGL-style camera matrices into OpenCV-style?
Lets say I want to manually calculate the gradient update with respect to the Kullback-Liebler divergence loss, say on a VAE (see an actual example from pytorch sample documentation here):
KLD = -0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp())
where the logvar is (for simplicitys sake, ignoring activation functions and multiple layers etc.) basically a single layer transformation from a 400 dim feature vector into a 20 dim one:
self.fc21 = nn.Linear(400, 20)
logvar = fc21(x)
I'm just not mathematically understanding how you take the gradient of this, with respect to the weight vector for fc21. Mathematically I thought this would look like:
KL = -.5sum(1 + Wx + b - m^2 - e^{Wx + b})
dKL/dW = -.5 (x - e^{Wx + b}x)
where W is the weight matrix of the fc21 layer. But here this result isn't in the same shape as W (20x400). Like, x is just a 400 feature vector. So how would I perform SGD on this? Does x just broadcast to the second term, and if so why? I feel like I'm just missing some mathematical understanding here...
Let's simplify the example a bit and assume a fully connected layer of input shape 3 and output shape 2, then:
W = [[w1, w2, w3], [w4, w5, w6]]
x = [x1, x2, x3]
y = [w1*x1 + w2*x2 + w3*x3, w4*x1 + w5*x2 + w6*x3]
D_KL = -0.5 * [ 1 + w1*x1 + w2*x2 + w3*x3 + w4*x1 + w5*x2 + w6*x3 + b - m^2 + e^(..)]
grad(D_KL, w1) = -0.5 * [x1 + x1* e^(..)]
grad(D_KL, w2) = -0.5 * [x2 + x2* e^(..)]
...
grad(D_KL, W) = [[grad(D_KL, w1), grad(D_KL, w2), grad(D_KL,w3)],
[grad(D_KL, w4), grad(D_KL, w5), grad(D_KL,w6)]
]
This generalizes for higher order tensors of any dimensionality. Your differentiation is wrong in treating x and W as scalars rather than taking element-wise partial derivatives.
hello i am recently working in image de-blurring, i just want to know that can i break the standard image degradation model {for an image of traffic signal where different vehicles are moving in different direction}
g(x,y) = H[f(x,y)] + n(x,y)
like that
g1(x,y) = H1[f1(x,y)] + n(x,y) ;
g2(x,y) = H2[f2(x,y)] + n(x,y) ;
g3(x,y) = H3[f3(x,y)] + n(x,y) ;
.....................
.....................
.....................
.....................
gm(x,y) = Hm[fm(x,y)] + n(x,y)
here i am assuming that the whole image is degraded by different degradation functions, and same noise is added to different part of noise.
here f1(x,y) + f2(x,y) ......... + fm(x,y) = f(x,y).
Please suggest the correct concept. and tell me if i am going on wrong way.
I found answer of this question with slight changes...
we can decompose the image degradation model
g(x,y) = H[f(x,y)] + n(x,y)
as
g(x,y) = H[f1(x,y)] + H[f2(x,y)] + H[f3(x,y)] ..... + H[fn(x,y)]
using the concept of Linearity if
g(x,y) = H[k1xf1(x,y) + k2xf2(x,y)]
= k1xH[f1(x,y)]+ k2xH[f2(x,y)]
and if k1 = k2 = 1 then
g(x,y) = H[f1(x,y)]+ H[f2(x,y)]
similarly we can get the following form
g(x,y) = H[f1(x,y)] + H[f2(x,y)] + H[f3(x,y)] ..... + H[fn(x,y)]