can i decompose the image deblurring model? - image-processing

hello i am recently working in image de-blurring, i just want to know that can i break the standard image degradation model {for an image of traffic signal where different vehicles are moving in different direction}
g(x,y) = H[f(x,y)] + n(x,y)
like that
g1(x,y) = H1[f1(x,y)] + n(x,y) ;
g2(x,y) = H2[f2(x,y)] + n(x,y) ;
g3(x,y) = H3[f3(x,y)] + n(x,y) ;
gm(x,y) = Hm[fm(x,y)] + n(x,y)
here i am assuming that the whole image is degraded by different degradation functions, and same noise is added to different part of noise.
here f1(x,y) + f2(x,y) ......... + fm(x,y) = f(x,y).
Please suggest the correct concept. and tell me if i am going on wrong way.

I found answer of this question with slight changes...
we can decompose the image degradation model
g(x,y) = H[f(x,y)] + n(x,y)
g(x,y) = H[f1(x,y)] + H[f2(x,y)] + H[f3(x,y)] ..... + H[fn(x,y)]
using the concept of Linearity if
g(x,y) = H[k1xf1(x,y) + k2xf2(x,y)]
= k1xH[f1(x,y)]+ k2xH[f2(x,y)]
and if k1 = k2 = 1 then
g(x,y) = H[f1(x,y)]+ H[f2(x,y)]
similarly we can get the following form
g(x,y) = H[f1(x,y)] + H[f2(x,y)] + H[f3(x,y)] ..... + H[fn(x,y)]


Should I exit my gradient descent loop as soon as the cost increases?

I'm trying to learn machine learning so I'm taking a course and currently studying gradient descent for linear regression. I just learned that if the learning rate is small enough, the value returned by the cost function should continuously decrease until convergence. When I imagine this being done in a loop of code, it seems like I could just keep track of what the cost was in the previous iteration and exit the loop if the new cost is greater than the previous, since this tells us the learning rate is too large. I'd like to hear opinions since I'm new to this, but in an effort to not make this question primarily opinion-based my main question is this: Would there be anything wrong with this method of detecting a learning rate that needs to be decreased? I'd appreciate an example of when this method would fail, if possible.
In this example below, we will vary the learning rate eta = 10^k with k={-6,-5,-4,...0}
def f(x):
return 100 * (x[ 0] *x[0] - x[ 1]) **2 + (x[ 0] -1) **2
def df(x):
a = x[ 0] *x[0] - x[ 1]
ret = np.zeros(2)
ret[ 0] = 400 * a * x[0] + 2 * (x[0] - 1)
ret[ 1] = -200 * a
return ret
for k in range(-6, 0):
eta = math.pow(10.0, k)
print("eta: " + str(eta))
x = -np.ones(2)
for iter in range(1000000):
fx = f(x)
if fx < 1e-10:
print(" solved after " + str(iter) + " iterations; f(x) = " + str(f(x)))
if fx > 1e10:
print(" divergence detected after " + str(iter) + " iterations; f(x) = " +
g = df(x)
x -= eta * g
if iter == 999999:
print(" not solved; f(x) = " + str(f(x)))
For too small learning rates, the optimization is very slow and the problem is not solved within the iteration budget.
For too large learning rates, the optimization process becomes unstable and diverges very quickly. The learning rate must be "just right" for the optimization process to work well.

Are multiple layers in LSTM gates usefull?

I'm doing an LSTM cell implementation from scratch and I was thinking of implementing the computation of the gates with multiple neural layers instead of just using the single layer version : sigmoid(dot(W,concat(a_prev,xt)) + b). I can't seem to find any literature on it. Does it work ? Can it converge ?
This the standard LSTM cell forward propagation code that I learned on Andrew Ng's Deep Learning course :
concat = np.zeros((n_a + n_x, m))
concat[: n_a, :] = a_prev
concat[n_a :, :] = xt
ft = sigmoid(, concat) + bf)
it = sigmoid(, concat) + bi)
cct = np.tanh(, concat) + bc)
c_next = ft * c_prev + it * cct
ot = sigmoid(, concat) + bo)
a_next = ot * np.tanh(c_next)
# Compute prediction of the LSTM cell
yt_pred = softmax(, a_next) + by)
This the LSTM cell I want to use :
concat = np.zeros((n_a + n_x, m))
concat[: n_a, :] = a_prev
concat[n_a:, :] = xt
ft1 = sigmoid(, concat) + bf1)
ft2 = sigmoid(, ft1) + bf2)
it1 = sigmoid(, concat) + bi1)
it2 = sigmoid(, it1) + bi2)
cct1 = np.tanh(, concat) + bc1)
cct2 = np.tanh(, cct1) + bc2)
c_next = ft2 * c_prev + it2 * cct2
ot1 = sigmoid(, concat) + bo1)
ot2 = sigmoid(, ot1) + bo2)
a_next = ot2 * np.tanh(c_next)
# Compute prediction of the LSTM cell
yt_pred1 = softmax(, a_next) + by1)

Calculating Gradient Update

Lets say I want to manually calculate the gradient update with respect to the Kullback-Liebler divergence loss, say on a VAE (see an actual example from pytorch sample documentation here):
KLD = -0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp())
where the logvar is (for simplicitys sake, ignoring activation functions and multiple layers etc.) basically a single layer transformation from a 400 dim feature vector into a 20 dim one:
self.fc21 = nn.Linear(400, 20)
logvar = fc21(x)
I'm just not mathematically understanding how you take the gradient of this, with respect to the weight vector for fc21. Mathematically I thought this would look like:
KL = -.5sum(1 + Wx + b - m^2 - e^{Wx + b})
dKL/dW = -.5 (x - e^{Wx + b}x)
where W is the weight matrix of the fc21 layer. But here this result isn't in the same shape as W (20x400). Like, x is just a 400 feature vector. So how would I perform SGD on this? Does x just broadcast to the second term, and if so why? I feel like I'm just missing some mathematical understanding here...
Let's simplify the example a bit and assume a fully connected layer of input shape 3 and output shape 2, then:
W = [[w1, w2, w3], [w4, w5, w6]]
x = [x1, x2, x3]
y = [w1*x1 + w2*x2 + w3*x3, w4*x1 + w5*x2 + w6*x3]
D_KL = -0.5 * [ 1 + w1*x1 + w2*x2 + w3*x3 + w4*x1 + w5*x2 + w6*x3 + b - m^2 + e^(..)]
grad(D_KL, w1) = -0.5 * [x1 + x1* e^(..)]
grad(D_KL, w2) = -0.5 * [x2 + x2* e^(..)]
grad(D_KL, W) = [[grad(D_KL, w1), grad(D_KL, w2), grad(D_KL,w3)],
[grad(D_KL, w4), grad(D_KL, w5), grad(D_KL,w6)]
This generalizes for higher order tensors of any dimensionality. Your differentiation is wrong in treating x and W as scalars rather than taking element-wise partial derivatives.

Computation of Jacobian matrix in cvRodrigues2

I'm reading source code of opencv: cvProjectPoints2, cvRodrigues2.
In cvProjectPoints2, the Jacobian matrix is first got using cvRodrigues2( &_r, &matR, &_dRdr );, and then used to calculate the partial derivative of pixels w.r.t the rvec (axis-angle representation)。
if( dpdr_p )
double dx0dr[] =
X*dRdr[0] + Y*dRdr[1] + Z*dRdr[2],
X*dRdr[9] + Y*dRdr[10] + Z*dRdr[11],
X*dRdr[18] + Y*dRdr[19] + Z*dRdr[20]
double dy0dr[] =
X*dRdr[3] + Y*dRdr[4] + Z*dRdr[5],
X*dRdr[12] + Y*dRdr[13] + Z*dRdr[14],
X*dRdr[21] + Y*dRdr[22] + Z*dRdr[23]
double dz0dr[] =
X*dRdr[6] + Y*dRdr[7] + Z*dRdr[8],
X*dRdr[15] + Y*dRdr[16] + Z*dRdr[17],
X*dRdr[24] + Y*dRdr[25] + Z*dRdr[26]
for( j = 0; j < 3; j++ )
double dxdr = z*(dx0dr[j] - x*dz0dr[j]);
double dydr = z*(dy0dr[j] - y*dz0dr[j]);
double dr2dr = 2*x*dxdr + 2*y*dydr;
double dcdist_dr = k[0]*dr2dr + 2*k[1]*r2*dr2dr + 3*k[4]*r4*dr2dr;
double dicdist2_dr = -icdist2*icdist2*(k[5]*dr2dr + 2*k[6]*r2*dr2dr + 3*k[7]*r4*dr2dr);
double da1dr = 2*(x*dydr + y*dxdr);
double dmxdr = fx*(dxdr*cdist*icdist2 + x*dcdist_dr*icdist2 + x*cdist*dicdist2_dr +
k[2]*da1dr + k[3]*(dr2dr + 2*x*dxdr));
double dmydr = fy*(dydr*cdist*icdist2 + y*dcdist_dr*icdist2 + y*cdist*dicdist2_dr +
k[2]*(dr2dr + 2*y*dydr) + k[3]*da1dr);
dpdr_p[j] = dmxdr;
dpdr_p[dpdr_step+j] = dmydr;
dpdr_p += dpdr_step*2;
The shape of dRdr is 3*9, and from how the indices of dRdr is used:
X*dRdr[0] + Y*dRdr[1] + Z*dRdr[2], //-> dx0dr1
X*dRdr[9] + Y*dRdr[10] + Z*dRdr[11], //-> dx0dr2
X*dRdr[18] + Y*dRdr[19] + Z*dRdr[20] //-> dx0dr3
the Jacobian matrix seems to be:
dR1/dr1, dR2/dr1, ..., dR9/dr1,
dR1/dr2, dR2/dr2, ..., dR9/dr2,
dR1/dr3, dR2/dr3, ..., dR9/dr3,
But to my knowledge the Jacobian matrix should be of shape 9*3, since it's derivatives of R(1~9) w.r.t r(1~3):
dR1/dr1, dR1/dr2, dR1/dr3,
dR2/dr1, dR2/dr2, dR2/dr3,
dR9/dr1, dR9/dr2, dR9/dr3,
As the docs of cvRodrigues2 says:
jacobian – Optional output Jacobian matrix, 3x9 or 9x3, which is a
matrix of partial derivatives of the output array components with
respect to the input array components.
So am I misunderstanding the code & docs? Or is the code using other convention? Or is it a bug (not likely...)?
If you look up the docs:
src – Input rotation vector (3x1 or 1x3) or rotation matrix (3x3).
dst – Output rotation matrix (3x3) or rotation vector (3x1 or 1x3), respectively.
jacobian – Optional output Jacobian matrix, 3x9 or 9x3, which is a matrix of partial derivatives of the output array components with respect to the input array components.
As you see you can switch source and destination places(mathematically it will be exactly transposition), but the code does not account for it.
Therefore, indeed you've got a transposed Jacobian, because you switched first arguments places(from default places for their types). Switch them again, and you'll get normal Jacobian!

finding the rth term of a sequence

the question is to give a possible formula for the rth term.
i'm able to solve two questions but rest i can't seems to be of a different way or like i'm studying alevels i think there's a common rule or maybe an easy way to solve sequence related problems.i never understood sequence well enough-it's just that hard for me.
6 18 54 162
i'm able to solve it by 2*3^r
4 7 12 19
by r^2+3
4 12 24 40 60
i'm trying so many ways but i can't find the answer.i think there's a common rule for solving all these not much marks are there so it should be solved in an easy way but i'm not getting how to.please help
Here's a formula in R for the sequence:
g <- function(n) 6*n + 2*n^2 + 4
[1] 4 12 24 40 60
Here is one way to solve this relation. First, recognize that it is quadratic as the difference is an arithmetic sequence (linear).
Then note that g(x + 1) = g(x) + 8 + 4x. Represent g(x) = a*x^2 + b*x + c.
g(x+1) = a(x+1)^2 + b(x+1) + c = g(x) + 8 + 4x = a*x^2 + b*x + c + 8 * 4x
ax^2 + 2ax + a + b*x + b + c = a*x^2 + b*x + c + 8 + 4x
2ax + a +b = 8 + 4x
As this holds for all x, it must be that 2ax = 4x or a = 2. Thus
4x + 2 + b = 8 + 4x
So b = 6. With these known, c is determined by g(0) = c = 4.
