this is the function I have for calculating the image derivatives. Please help me understand this code as I am new to this field. If anyone could give me some links to understand this concept, I'll be greatful. some doubts that i have -
Why are we using ndgrid here?
What are the directions 'x', 'y', 'xx', ('xy', 'yx'), 'yy' here?
And how and why does the formula for this gaussian change according to the directions?
Why are we using imfilter at the end?
function D =calc_image_derivatives(I,sigma,direction)
[x,y]=ndgrid(floor(-3*sigma):ceil(3*sigma),floor(-3*sigma):ceil(3*sigma));
switch(direction)
case 'x'
DGauss=-(x./(2*pi*sigma^4)).*exp(-(x.^2+y.^2)/(2*sigma^2));
case 'y'
DGauss=-(y./(2*pi*sigma^4)).*exp(-(x.^2+y.^2)/(2*sigma^2));
case 'xx'
DGauss = 1/(2*pi*sigma^4) * (x.^2/sigma^2 - 1) .* exp(-(x.^2 + y.^2)/(2*sigma^2));
case {'xy','yx'}
DGauss = 1/(2*pi*sigma^6) * (x .* y) .* exp(-(x.^2 + y.^2)/(2*sigma^2));
case 'yy'
DGauss = 1/(2*pi*sigma^4) * (y.^2/sigma^2 - 1) .* exp(-(x.^2 + y.^2)/(2*sigma^2));
end
D = imfilter(I,DGauss,'conv','replicate');
This code calculates various directional derivatives of the image - x/y are first directional derivative, xx/yy/xy are second derivatives. The digital filter used for derivation is a 2D Gaussian of standard variation sigma, derived by the appropriate partial derivative (for example, in the case 'xx', the Gaussian is derived twice by x). From your question, I'm not sure you're familiar with the notion of a partial derivative, you can Google it.
ndgrid is used to create grid matrices - this is a very commonly used approach in Matlab. Perhaps you know the function meshgrid, it is the same, only ndgrid can also create grid matrices of higher dimensions.
imfilter is used to perform a convolution (correlation to be more precise) between the digital filters to the image. The result of the this is an estimation of the required derivative.
Related
I am trying to implement convolution by hand in Julia. I'm not too familiar with image processing or Julia, so maybe I'm biting more than I can chew.
Anyway, when I apply this method with a 3*3 edge filter edge = [0 -1 0; -1 4 -1; 0 -1 0] as convolve(img, edge), I am getting an error saying that my values are exceeding the allowed values for the RGBA type.
Code
function convolve(img::Matrix{<:Any}, kernel)
(half_kernel_w, half_kernel_h) = size(kernel) .÷ 2
(width, height) = size(img)
cpy_im = copy(img)
for row ∈ 1+half_kernel_h:height-half_kernel_h
for col ∈ 1+half_kernel_w:width-half_kernel_w
from_row, to_row = row .+ (-half_kernel_h, half_kernel_h)
from_col, to_col = col .+ (-half_kernel_h, half_kernel_h)
cpy_im[row, col] = sum((kernel .* RGB.(img[from_row:to_row, from_col:to_col])))
end
end
cpy_im
end
Error (original)
ArgumentError: element type FixedPointNumbers.N0f8 is an 8-bit type representing 256 values from 0.0 to 1.0, but the values (-0.0039215684f0, -0.007843137f0, -0.007843137f0, 1.0f0) do not lie within this range.
See the READMEs for FixedPointNumbers and ColorTypes for more information.
I am able to identify a simple case where such error may occur (a white pixel surrounded by all black pixels or vice-versa). I tried "fixing" this by attempting to follow the advice here from another stackoverflow question, but I get more errors to the effect of Math on colors is deliberately undefined in ColorTypes, but see the ColorVectorSpace package..
Code attempting to apply solution from the other SO question
function convolve(img::Matrix{<:Any}, kernel)
(half_kernel_w, half_kernel_h) = size(kernel) .÷ 2
(width, height) = size(img)
cpy_im = copy(img)
for row ∈ 1+half_kernel_h:height-half_kernel_h
for col ∈ 1+half_kernel_w:width-half_kernel_w
from_row, to_row = row .+ [-half_kernel_h, half_kernel_h]
from_col, to_col = col .+ [-half_kernel_h, half_kernel_h]
cpy_im[row, col] = sum((kernel .* RGB.(img[from_row:to_row, from_col:to_col] ./ 2 .+ 128)))
end
end
cpy_im
end
Corresponding error
MethodError: no method matching +(::ColorTypes.RGBA{Float32}, ::Int64)
Math on colors is deliberately undefined in ColorTypes, but see the ColorVectorSpace package.
Closest candidates are:
+(::Any, ::Any, !Matched::Any, !Matched::Any...) at operators.jl:591
+(!Matched::T, ::T) where T<:Union{Int128, Int16, Int32, Int64, Int8, UInt128, UInt16, UInt32, UInt64, UInt8} at int.jl:87
+(!Matched::ChainRulesCore.AbstractThunk, ::Any) at ~/.julia/packages/ChainRulesCore/a4mIA/src/tangent_arithmetic.jl:122
Now, I can try using convert etc., but when I look at the big picture, I start to wonder what the idiomatic way of solving this problem in Julia is. And that is my question. If you had to implement convolution by hand from scratch, what would be a good way to do so?
EDIT:
Here is an implementation that works, though it may not be idiomatic
function convolve(img::Matrix{<:Any}, kernel)
(half_kernel_h, half_kernel_w) = size(kernel) .÷ 2
(height, width) = size(img)
cpy_im = copy(img)
# println(Dict("width" => width, "height" => height, "half_kernel_w" => half_kernel_w, "half_kernel_h" => half_kernel_h, "row range" => 1+half_kernel_h:(height-half_kernel_h), "col range" => 1+half_kernel_w:(width-half_kernel_w)))
for row ∈ 1+half_kernel_h:(height-half_kernel_h)
for col ∈ 1+half_kernel_w:(width-half_kernel_w)
from_row, to_row = row .+ (-half_kernel_h, half_kernel_h)
from_col, to_col = col .+ (-half_kernel_w, half_kernel_w)
vals = Dict()
for method ∈ [red, green, blue, alpha]
x = sum((kernel .* method.(img[from_row:to_row, from_col:to_col])))
if x > 1
x = 1
elseif x < 0
x = 0
end
vals[method] = x
end
cpy_im[row, col] = RGBA(vals[red], vals[green], vals[blue], vals[alpha])
end
end
cpy_im
end
First of all, the error
Math on colors is deliberately undefined in ColorTypes, but see the ColorVectorSpace package.
should direct you to read the docs of the ColorVectorSpace package, where you will learn that using ColorVectorSpace will now enable math on RGB types. (The absence of default support it deliberate, because the way the image-processing community treats RGB is colorimetrically wrong. But everyone has agreed not to care, hence the ColorVectorSpace package.)
Second,
ArgumentError: element type FixedPointNumbers.N0f8 is an 8-bit type representing 256 values from 0.0 to 1.0, but the values (-0.0039215684f0, -0.007843137f0, -0.007843137f0, 1.0f0) do not lie within this range.
indicates that you're trying to write negative entries with an element type, N0f8, that can't support such values. Instead of cpy_im = copy(img), consider something like cpy_im = [float(c) for c in img] which will guarantee a floating-point representation that can support negative values.
Third, I would recommend avoiding steps like RGB.(img...) when nothing about your function otherwise addresses whether images are numeric, grayscale, or color. Fundamentally the only operations you need are scalar multiplication and addition, and it's better to write your algorithm generically leveraging only those two properties.
Tim Holy's answer above is correct - keep things simple and avoid relying on third-party packages when you don't need to.
I might point out that another option you may not have considered is to use a different algorithm. What you are implementing is the naive method, whereas many convolution routines using different algorithms for different sizes, such as im2col and Winograd (you can look these two up, I have a website that covers the idea behind both here).
The im2col routine might be worth doing as essentially you can break the routine in several pieces:
Unroll all 'regions' of the image to do a dot-product with the filter/kernel on, and stack them together into a single matrix.
Do a matrix-multiply with the unrolled input and filter/kernel.
Roll the output back into the correct shape.
It might be more complicated overall, but each part is simpler, so you may find this easier to do. A matrix multiply routine is definitely quite easy to implement. For 1x1 (single-pixel) convolutions where the image and filter have the same ordering (i.e. NCHW images and FCHW filter) the first and last steps are trivial as essentially no rolling/unrolling is necessary.
A final word of advice - start simpler and add in the code to handle edge-cases, convolutions are definitely fiddly to work with.
Hope this helps!
I need to solve the following equation:
I Know the matrix G, how can I find the the matrix p subject to ||p|| = 1.
Currently I am solving in opencv as follows:
Mat w, u, EigenVectors;
SVD::compute(A, w, u, EigenVectors);
Mat p = EigenVectors.row(EigenVectors.rows-1);
I want to know how can I ensure the condition ||p|| = 1.
Also I want to know the significance and meaning of other rows/cols of the EigenVectors(transposed) ?
I believe you can use cv::SVD::solveZ(). It finds a unit-length solution x of a singular linear system A * x = 0
Looks like you need to use Lagrange multipliers method.
As I know, OpenCV have no ready to use tools for that.
Good example for MATLAB: Lagrange Multipliers
I am trying to find a simple algorithm to find the correspondence between two sets of 2D points (registration). One set contains the template of an object I'd like to find and the second set mostly contains points that belong to the object of interest, but it can be noisy (missing points as well as additional points that do not belong to the object). Both sets contain roughly 40 points in 2D. The second set is a homography of the first set (translation, rotation and perspective transform).
I am interested in finding an algorithm for registration in order to get the point-correspondence. I will be using this information to find the transform between the two sets (all of this in OpenCV).
Can anyone suggest an algorithm, library or small bit of code that could do the job? As I'm dealing with small sets, it does not have to be super optimized. Currently, my approach is a RANSAC-like algorithm:
Choose 4 random points from set 1 and from set 2.
Compute transform matrix H (using openCV getPerspective())
Warp 1st set of points using H and test how they aligned to the 2nd set of points
Repeat 1-3 N times and choose best transform according to some metric (e.g. sum of squares).
Any ideas? Thanks for your input.
With python you can use Open3D librarry, wich is very easy to install in Anaconda. To your purpose ICP should work fine, so we'll use the classical ICP, wich minimizes point-to-point distances between closest points in every iteration. Here is the code to register 2 clouds:
import numpy as np
import open3d as o3d
# Parameters:
initial_T = np.identity(4) # Initial transformation for ICP
distance = 0.1 # The threshold distance used for searching correspondences
(closest points between clouds). I'm setting it to 10 cm.
# Read your point clouds:
source = o3d.io.read_point_cloud("point_cloud_1.xyz")
target = o3d.io.read_point_cloud("point_cloud_0.xyz")
# Define the type of registration:
type = o3d.pipelines.registration.TransformationEstimationPointToPoint(False)
# "False" means rigid transformation, scale = 1
# Define the number of iterations (I'll use 100):
iterations = o3d.pipelines.registration.ICPConvergenceCriteria(max_iteration = 100)
# Do the registration:
result = o3d.pipelines.registration.registration_icp(source, target, distance, initial_T, type, iterations)
result is a class with 4 things: the transformation T(4x4), 2 metrict (rmse and fitness) and the set of correspondences.
To acess the transformation:
I used it a lot with 3D clouds obteined from Terrestrial Laser Scanners (TLS) and from robots (Velodiny LIDAR).
With MATLAB:
We'll use the point-to-point ICP again, because your data is 2D. Here is a minimum example with two point clouds random generated inside a triangle shape:
% Triangle vértices:
V1 = [-20, 0; -10, 10; 0, 0];
V2 = [-10, 0; 0, 10; 10, 0];
% Create clouds and show pair:
points = 5000
N1 = criar_nuvem_triangulo(V1,points);
N2 = criar_nuvem_triangulo(V2,points);
pcshowpair(N1,N2)
% Registrate pair N1->N2 and show:
[T,N1_tranformed,RMSE]=pcregistericp(N1,N2,'Metric','pointToPoint','MaxIterations',100);
pcshowpair(N1_tranformed,N2)
"criar_nuvem_triangulo" is a function to generate random point clouds inside a triangle:
function [cloud] = criar_nuvem_triangulo(V,N)
% Function wich creates 2D point clouds in triangle format using random
% points
% Parameters: V = Triangle vertices (3x2 Matrix)| N = Number of points
t = sqrt(rand(N, 1));
s = rand(N, 1);
P = (1 - t) * V(1, :) + bsxfun(#times, ((1 - s) * V(2, :) + s * V(3, :)), t);
points = [P,zeros(N,1)];
cloud = pointCloud(points)
end
results:
You may just use cv::findHomography. It is a RANSAC-based approach around cv::getPerspectiveTransform.
auto H = cv::findHomography(srcPoints, dstPoints, CV_RANSAC,3);
Where 3 is the reprojection threshold.
One traditional approach to solve your problem is by using point-set registration method when you don't have matching pair information. Point set registration is similar to method you are talking about.You can find matlab implementation here.
Thanks
I am working on replicating a neural network. I'm trying to get an understanding of how the standard layer types work. In particular, I'm having trouble finding a description anywhere of how cross-channel normalisation layers behave on the backward-pass.
Since the normalization layer has no parameters, I could guess two possible options:
The error gradients from the next (i.e. later) layer are passed backwards without doing anything to them.
The error gradients are normalized in the same way the activations are normalized across channels in the forward pass.
I can't think of a reason why you'd do one over the other based on any intuition, hence why I'd like some help on this.
EDIT1:
The layer is a standard layer in caffe, as described here http://caffe.berkeleyvision.org/tutorial/layers.html (see 'Local Response Normalization (LRN)').
The layer's implementation in the forward pass is described in section 3.3 of the alexNet paper: http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf
EDIT2:
I believe the forward and backward pass algorithms are described in both the Torch library here: https://github.com/soumith/cudnn.torch/blob/master/SpatialCrossMapLRN.lua
and in the Caffe library here: https://github.com/BVLC/caffe/blob/master/src/caffe/layers/lrn_layer.cpp
Please could anyone who is familiar with either/both of these translate the method for the backward pass stage into plain english?
It uses the chain rule to propagate the gradient backwards through the local response normalization layer. It is somewhat similar to a nonlinearity layer in this sense (which also doesn't have trainable parameters on its own, but does affect gradients going backwards).
From the code in Caffe that you linked to I see that they take the error in each neuron as a parameter, and compute the error for the previous layer by doing following:
First, on the forward pass they cache a so-called scale, that is computed (in terms of AlexNet paper, see the formula from section 3.3) as:
scale_i = k + alpha / n * sum(a_j ^ 2)
Here and below sum is sum indexed by j and goes from max(0, i - n/2) to min(N, i + n/2)
(note that in the paper they do not normalize by n, so I assume this is something that Caffe does differently than AlexNet). Forward pass is then computed as b_i = a_i + scale_i ^ -beta.
To backward propagate the error, let's say that the error coming from the next layer is be_i, and the error that we need to compute is ae_i. Then ae_i is computed as:
ae_i = scale_i ^ -b * be_i - (2 * alpha * beta / n) * a_i * sum(be_j * b_j / scale_j)
Since you are planning to implement it manually, I will also share two tricks that Caffe uses in their code that makes the implementation simpler:
When you compute the addends for the sum, allocate an array of size N + n - 1, and pad it with n/2 zeros on each end. This way you can compute the sum from i - n/2 to i + n/2, without caring about going below zero and beyond N.
You don't need to recompute the sum on each iteration, instead compute the the addends in advance (a_j^2 for the front pass, be_j * b_j / scale_j for the backward pass), then compute the sum for i = 0, and then for each consecutive i just add addend[i + n/2] and subtract addend[i - n/2 - 1], it will give you the value of the sum for the new value of i in constant time.
Of cause,you can either print the variables to observe the changes with them or use the debug model to see how errors change during passing the net.
I have an alternative formulation of the backward and I don't know if it is equivalent to caffe's:
So caffe's is :
ae_i = scale_i ^ -b * be_i - (2 * alpha * beta / n) * a_i * sum(be_j * b_j / scale_j)
by differentiating the original expression
b_i = a_i/(scale_i^-b)
I get
ae_i = scale_i ^ -b * be_i - (2 * alpha * beta / n) * a_i * be_i*sum(ae_j)/scale_i^(-b-1)
I am trying to extract Rotation matrix and Translation vector from the essential matrix.
<pre><code>
SVD svd(E,SVD::MODIFY_A);
Mat svd_u = svd.u;
Mat svd_vt = svd.vt;
Mat svd_w = svd.w;
Matx33d W(0,-1,0,
1,0,0,
0,0,1);
Mat_<double> R = svd_u * Mat(W).t() * svd_vt; //or svd_u * Mat(W) * svd_vt;
Mat_<double> t = svd_u.col(2); //or -svd_u.col(2)
</code></pre>
However, when I am using R and T (e.g. to obtain rectified images), the result does not seem to be right(black images or some obviously wrong outputs), even so I used different combination of possible R and T.
I suspected to E. According to the text books, my calculation is right if we have:
E = U*diag(1, 1, 0)*Vt
In my case svd.w which is supposed to be diag(1, 1, 0) [at least in term of a scale], is not so. Here is an example of my output:
svd.w = [21.47903827647813; 20.28555196246256; 5.167099204708699e-010]
Also, two of the eigenvalues of E should be equal and the third one should be zero. In the same case the result is:
eigenvalues of E = 0.0000 + 0.0000i, 0.3143 +20.8610i, 0.3143 -20.8610i
As you see, two of them are complex conjugates.
Now, the questions are:
Is the decomposition of E and calculation of R and T done in a right way?
If the calculation is right, why the internal rules of essential matrix are not satisfied by the results?
If everything about E, R, and T is fine, why the rectified images obtained by them are not correct?
I get E from fundamental matrix, which I suppose to be right. I draw epipolar lines on both the left and right images and they all pass through the related points (for all the 16 points used to calculate the fundamental matrix).
Any help would be appreciated.
Thanks!
I see two issues.
First, discounting the negligible value of the third diagonal term, your E is about 6% off the ideal one: err_percent = (21.48 - 20.29) / 20.29 * 100 . Sounds small, but translated in terms of pixel error it may be an altogether larger amount.
So I'd start by replacing E with the ideal one after SVD decomposition: Er = U * diag(1,1,0) * Vt.
Second, the textbook decomposition admits 4 solutions, only one of which is physically plausible (i.e. with 3D points in front of the camera). You may be hitting one of non-physical ones. See http://en.wikipedia.org/wiki/Essential_matrix#Determining_R_and_t_from_E .