Vectorizing distance to several points on Octave (Matlab) - vectorization

I'm writing a k-means algorithm. At each step, I want to compute the distance of my n points to k centroids, without a for loop, and for d dimensions.
The problem is I have a hard time splitting on my number of dimensions with the Matlab functions I know. Here is my current code, with x being my n 2D-points and y my k centroids (also 2D-points of course), and with the points distributed along dimension 1, and the spatial coordinates along the dimension 2:
dist = #(a,b) (a - b).^2;
dx = bsxfun(dist, x(:,1), y(:,1)'); % x is (n,1) and y is (1,k)
dy = bsxfun(dist, x(:,2), y(:,2)'); % so the result is (n,k)
dists = dx + dy; % contains the square distance of each points to the k centroids
[_,l] = min(dists, [], 2); % we then argmin on the 2nd dimension
How to vectorize furthermore ?
First edit 3 days later, searching on my own
Since asking this question I made progress on my own towards vectorizing this piece of code.
The code above runs in approximately 0.7 ms on my example.
I first used repmat to make it easy to do broadcasting:
dists = permute(permute(repmat(x,1,1,k), [3,2,1]) - y, [3,2,1]).^2;
dists = sum(dists, 2);
[~,l] = min(dists, [], 3);
As expected it is slightly slower since we replicate the matrix, it runs at 0.85 ms.
From this example it was pretty easy to use bsxfun for the whole thing, but it turned out to be extremely slow, running in 150 ms so more than 150 times slower than the repmat version:
dist = #(a, b) (a - b).^2;
dists = permute(bsxfun(dist, permute(x, [3, 2, 1]), y), [3, 2, 1]);
dists = sum(dists, 2);
[~,l] = min(dists, [], 3);
Why is it so slow ? Isn't vectorizing always an improvement on speed, since it uses vector instructions on the CPU ? I mean of course simple for loops could be optimized to use it aswell, but how can vectorizing make the code slower ? Did I do it wrong ?
Using a for loop
For the sake of completeness, here's the for loop version of my code, surprisingly the fastest running in 0.4 ms, not sure why..
for i=1:k
dists(:,i) = sum((x - y(i,:)).^2, 2);
endfor
[~,l] = min(dists, [], 2);

Note: This answer was written when the question was also tagged MATLAB. Links to Octave documentation added after the MATLAB tag was removed.
You can use the pdist2MATLAB/Octave function to calculate pairwise distances between two sets of observations.
This way, you offload the bother of vectorization to the people who wrote MATLAB/Octave (and they have done a pretty good job of it)
X = rand(10,3);
Y = rand(5,3);
D = pdist2(X, Y);
D is now a 10x5 matrix where the i, jth element is the distance between the ith X and jth Y point.
You can pass it the kind of distance you want as the third argument -- e.g. 'euclidean', 'minkowski', etc, or you could pass a function handle to your custom function like so:
dist = #(a,b) (a - b).^2;
D = pdist2(X, Y, dist);
As saastn mentions, pdist2(..., 'smallest', k) makes things easier in k-means. This returns just the smallest k values from each column of pdist2's result. Octave doesn't have this functionality, but it's easily replicated using sort()MATLAB/Octave.
D_smallest = sort(D);
D_smallest = D_smallest(1:k, :);

Related

Gradient descent on linear regression not converging

I have implemented a very simple linear regression with gradient descent algorithm in JavaScript, but after consulting multiple sources and trying several things, I cannot get it to converge.
The data is absolutely linear, it's just the numbers 0 to 30 as inputs with x*3 as their correct outputs to learn.
This is the logic behind the gradient descent:
train(input, output) {
const predictedOutput = this.predict(input);
const delta = output - predictedOutput;
this.m += this.learningRate * delta * input;
this.b += this.learningRate * delta;
}
predict(x) {
return x * this.m + this.b;
}
I took the formulas from different places, including:
Exercises from Udacity's Deep Learning Foundations Nanodegree
Andrew Ng's course on Gradient Descent for Linear Regression (also here)
Stanford's CS229 Lecture Notes
this other PDF slides I found from Carnegie Mellon
I have already tried:
normalizing input and output values to the [-1, 1] range
normalizing input and output values to the [0, 1] range
normalizing input and output values to have mean = 0 and stddev = 1
reducing the learning rate (1e-7 is as low as I went)
having a linear data set with no bias at all (y = x * 3)
having a linear data set with non-zero bias (y = x * 3 + 2)
initializing the weights with random non-zero values between -1 and 1
Still, the weights (this.b and this.m) do not approach any of the data values, and they diverge into infinity.
I'm obviously doing something wrong, but I cannot figure out what it is.
Update: Here's a little bit more context that may help figure out what my problem is exactly:
I'm trying to model a simple approximation to a linear function, with online learning by a linear regression pseudo-neuron. With that, my parameters are:
weights: [this.m, this.b]
inputs: [x, 1]
activation function: identity function z(x) = x
As such, my net will be expressed by y = this.m * x + this.b * 1, simulating the data-driven function that I want to approximate (y = 3 * x).
What I want is for my network to "learn" the parameters this.m = 3 and this.b = 0, but it seems I get stuck at a local minima.
My error function is the mean-squared error:
error(allInputs, allOutputs) {
let error = 0;
for (let i = 0; i < allInputs.length; i++) {
const x = allInputs[i];
const y = allOutputs[i];
const predictedOutput = this.predict(x);
const delta = y - predictedOutput;
error += delta * delta;
}
return error / allInputs.length;
}
My logic for updating my weights will be (according to the sources I've checked so far) wi -= alpha * dError/dwi
For the sake of simplicity, I'll call my weights this.m and this.b, so we can relate it back to my JavaScript code. I'll also call y^ the predicted value.
From here:
error = y - y^
= y - this.m * x + this.b
dError/dm = -x
dError/db = 1
And so, applying that to the weight correction logic:
this.m += alpha * x
this.b -= alpha * 1
But this doesn't seem correct at all.
I finally found what's wrong, and I'm answering my own question in hopes it will help beginners in this area too.
First, as Sascha said, I had some theoretical misunderstandings. It may be correct that your adjustment includes the input value verbatim, but as he said, it should already be part of the gradient. This all depends on your choice of the error function.
Your error function will be the measure of what you use to measure how off you were from the real value, and that measurement needs to be consistent. I was using mean-squared-error as a measurement tool (as you can see in my error method), but I was using a pure-absolute error (y^ - y) inside of the training method to measure the error. Your gradient will depend on the choice of this error function. So choose only one and stick with it.
Second, simplify your assumptions in order to test what's wrong. In this case, I had a very good idea what the function to approximate was (y = x * 3) so I manually set the weights (this.b and this.m) to the right values and I still saw the error diverge. This means that weight initialization was not the problem in this case.
After searching some more, my error was somewhere else: the function that was feeding data into the network was mistakenly passing a 3 hardcoded value into the predicted output (it was using a wrong index in an array), so the oscillation I saw was because of the network trying to approximate to y = 0 * x + 3 (this.b = 3 and this.m = 0), but because of the small learning rate and the error in the error function derivative, this.b wasn't going to get near to the right value, making this.m making wild jumps to adjust to it.
Finally, keep track of the error measurement as your network trains, so you can have some insight into what's going on. This helps a lot to identify a difference between simple overfitting, big learning rates and plain simple mistakes.

Simple registration algorithm for small sets of 2D points

I am trying to find a simple algorithm to find the correspondence between two sets of 2D points (registration). One set contains the template of an object I'd like to find and the second set mostly contains points that belong to the object of interest, but it can be noisy (missing points as well as additional points that do not belong to the object). Both sets contain roughly 40 points in 2D. The second set is a homography of the first set (translation, rotation and perspective transform).
I am interested in finding an algorithm for registration in order to get the point-correspondence. I will be using this information to find the transform between the two sets (all of this in OpenCV).
Can anyone suggest an algorithm, library or small bit of code that could do the job? As I'm dealing with small sets, it does not have to be super optimized. Currently, my approach is a RANSAC-like algorithm:
Choose 4 random points from set 1 and from set 2.
Compute transform matrix H (using openCV getPerspective())
Warp 1st set of points using H and test how they aligned to the 2nd set of points
Repeat 1-3 N times and choose best transform according to some metric (e.g. sum of squares).
Any ideas? Thanks for your input.
With python you can use Open3D librarry, wich is very easy to install in Anaconda. To your purpose ICP should work fine, so we'll use the classical ICP, wich minimizes point-to-point distances between closest points in every iteration. Here is the code to register 2 clouds:
import numpy as np
import open3d as o3d
# Parameters:
initial_T = np.identity(4) # Initial transformation for ICP
distance = 0.1 # The threshold distance used for searching correspondences
(closest points between clouds). I'm setting it to 10 cm.
# Read your point clouds:
source = o3d.io.read_point_cloud("point_cloud_1.xyz")
target = o3d.io.read_point_cloud("point_cloud_0.xyz")
# Define the type of registration:
type = o3d.pipelines.registration.TransformationEstimationPointToPoint(False)
# "False" means rigid transformation, scale = 1
# Define the number of iterations (I'll use 100):
iterations = o3d.pipelines.registration.ICPConvergenceCriteria(max_iteration = 100)
# Do the registration:
result = o3d.pipelines.registration.registration_icp(source, target, distance, initial_T, type, iterations)
result is a class with 4 things: the transformation T(4x4), 2 metrict (rmse and fitness) and the set of correspondences.
To acess the transformation:
I used it a lot with 3D clouds obteined from Terrestrial Laser Scanners (TLS) and from robots (Velodiny LIDAR).
With MATLAB:
We'll use the point-to-point ICP again, because your data is 2D. Here is a minimum example with two point clouds random generated inside a triangle shape:
% Triangle vértices:
V1 = [-20, 0; -10, 10; 0, 0];
V2 = [-10, 0; 0, 10; 10, 0];
% Create clouds and show pair:
points = 5000
N1 = criar_nuvem_triangulo(V1,points);
N2 = criar_nuvem_triangulo(V2,points);
pcshowpair(N1,N2)
% Registrate pair N1->N2 and show:
[T,N1_tranformed,RMSE]=pcregistericp(N1,N2,'Metric','pointToPoint','MaxIterations',100);
pcshowpair(N1_tranformed,N2)
"criar_nuvem_triangulo" is a function to generate random point clouds inside a triangle:
function [cloud] = criar_nuvem_triangulo(V,N)
% Function wich creates 2D point clouds in triangle format using random
% points
% Parameters: V = Triangle vertices (3x2 Matrix)| N = Number of points
t = sqrt(rand(N, 1));
s = rand(N, 1);
P = (1 - t) * V(1, :) + bsxfun(#times, ((1 - s) * V(2, :) + s * V(3, :)), t);
points = [P,zeros(N,1)];
cloud = pointCloud(points)
end
results:
You may just use cv::findHomography. It is a RANSAC-based approach around cv::getPerspectiveTransform.
auto H = cv::findHomography(srcPoints, dstPoints, CV_RANSAC,3);
Where 3 is the reprojection threshold.
One traditional approach to solve your problem is by using point-set registration method when you don't have matching pair information. Point set registration is similar to method you are talking about.You can find matlab implementation here.
Thanks

OpenCV: Essential Matrix Decomposition

I am trying to extract Rotation matrix and Translation vector from the essential matrix.
<pre><code>
SVD svd(E,SVD::MODIFY_A);
Mat svd_u = svd.u;
Mat svd_vt = svd.vt;
Mat svd_w = svd.w;
Matx33d W(0,-1,0,
1,0,0,
0,0,1);
Mat_<double> R = svd_u * Mat(W).t() * svd_vt; //or svd_u * Mat(W) * svd_vt;
Mat_<double> t = svd_u.col(2); //or -svd_u.col(2)
</code></pre>
However, when I am using R and T (e.g. to obtain rectified images), the result does not seem to be right(black images or some obviously wrong outputs), even so I used different combination of possible R and T.
I suspected to E. According to the text books, my calculation is right if we have:
E = U*diag(1, 1, 0)*Vt
In my case svd.w which is supposed to be diag(1, 1, 0) [at least in term of a scale], is not so. Here is an example of my output:
svd.w = [21.47903827647813; 20.28555196246256; 5.167099204708699e-010]
Also, two of the eigenvalues of E should be equal and the third one should be zero. In the same case the result is:
eigenvalues of E = 0.0000 + 0.0000i, 0.3143 +20.8610i, 0.3143 -20.8610i
As you see, two of them are complex conjugates.
Now, the questions are:
Is the decomposition of E and calculation of R and T done in a right way?
If the calculation is right, why the internal rules of essential matrix are not satisfied by the results?
If everything about E, R, and T is fine, why the rectified images obtained by them are not correct?
I get E from fundamental matrix, which I suppose to be right. I draw epipolar lines on both the left and right images and they all pass through the related points (for all the 16 points used to calculate the fundamental matrix).
Any help would be appreciated.
Thanks!
I see two issues.
First, discounting the negligible value of the third diagonal term, your E is about 6% off the ideal one: err_percent = (21.48 - 20.29) / 20.29 * 100 . Sounds small, but translated in terms of pixel error it may be an altogether larger amount.
So I'd start by replacing E with the ideal one after SVD decomposition: Er = U * diag(1,1,0) * Vt.
Second, the textbook decomposition admits 4 solutions, only one of which is physically plausible (i.e. with 3D points in front of the camera). You may be hitting one of non-physical ones. See http://en.wikipedia.org/wiki/Essential_matrix#Determining_R_and_t_from_E .

Need a specific example of U-Matrix in Self Organizing Map

I'm trying to develop an application using SOM in analyzing data. However, after finishing training, I cannot find a way to visualize the result. I know that U-Matrix is one of the method but I cannot understand it properly. Hence, I'm asking for a specific and detail example how to construct U-Matrix.
I also read an answer at U-matrix and self organizing maps but it only refers to 1 row map, how about 3x3 map? I know that for 3x3 map:
m(1) m(2) m(3)
m(4) m(5) m(6)
m(7) m(8) m(9)
a 5x5 matrix must me created:
u(1) u(1,2) u(2) u(2,3) u(3)
u(1,4) u(1,2,4,5) u(2,5) u(2,3,5,6) u(3,6)
u(4) u(4,5) u(5) u(5,6) u(6)
u(4,7) u(4,5,7,8) u(5,8) u(5,6,8,9) u(6,9)
u(7) u(7,8) u(8) u(8,9) u(9)
but I don't know how to calculate u-weight u(1,2,4,5), u(2,3,5,6), u(4,5,7,8) and u(5,6,8,9).
Finally, after constructing U-Matrix, is there any way to visualize it using color, e.g. heat map?
Thank you very much for your time.
Cheers
I don't know if you are still interested in this but I found this link
http://www.uni-marburg.de/fb12/datenbionik/pdf/pubs/1990/UltschSiemon90
which explains very speciffically how to calculate the U-matrix.
Hope it helps.
By the way, the site were I found the link has several resources referring to SOMs I leave it here in case anyone is interested:
http://www.ifs.tuwien.ac.at/dm/somtoolbox/visualisations.html
The essential idea of a Kohonen map is that the data points are mapped to a
lattice, which is often a 2D rectangular grid.
In the simplest implementations, the lattice is initialized by creating a 3D
array with these dimensions:
width * height * number_features
This is the U-matrix.
Width and height are chosen by the user; number_features is just the number
of features (columns or fields) in your data.
Intuitively this is just creating a 2D grid of dimensions w * h
(e.g., if w = 10 and h = 10 then your lattice has 100 cells), then
into each cell, placing a random 1D array (sometimes called "reference tuples")
whose size and values are constrained by your data.
The reference tuples are also referred to as weights.
How is the U-matrix rendered?
In my example below, the data is comprised of rgb tuples, so the reference tuples
have length of three and each of the three values must lie between 0 and 255).
It's with this 3D array ("lattice") that you begin the main iterative loop
The algorithm iteratively positions each data point so that it is closest to others similar to it.
If you plot it over time (iteration number) then you can visualize cluster
formation.
The plotting tool i use for this is the brilliant Python library, Matplotlib,
which plots the lattice directly, just by passing it into the imshow function.
Below are eight snapshots of the progress of a SOM algorithm, from initialization to 700 iterations. The newly initialized (iteration_count = 0) lattice is rendered in the top left panel; the result from the final iteration, in the bottom right panel.
Alternatively, you can use a lower-level imaging library (in Python, e.g., PIL) and transfer the reference tuples onto the 2D grid, one at a time:
for y in range(h):
for x in range(w):
img.putpixel( (x, y), (
SOM.Umatrix[y, x, 0],
SOM.Umatrix[y, x, 1],
SOM.Umatrix[y, x, 2])
)
Here img is an instance of PIL's Image class. Here the image is created by iterating over the grid one pixel at a time; for each pixel, putpixel is called on img three times, the three calls of course corresponding to the three values in an rgb tuple.
From the matrix that you create:
u(1) u(1,2) u(2) u(2,3) u(3)
u(1,4) u(1,2,4,5) u(2,5) u(2,3,5,6) u(3,6)
u(4) u(4,5) u(5) u(5,6) u(6)
u(4,7) u(4,5,7,8) u(5,8) u(5,6,8,9) u(6,9)
u(7) u(7,8) u(8) u(8,9) u(9)
The elements with single numbers like u(1), u(2), ..., u(9) as just the elements with more than two numbers like u(1,2,4,5), u(2,3,5,6), ... , u(5,6,8,9) are calculated using something like the mean, median, min or max of the values in the neighborhood.
It's a nice idea calculate the elements with two numbers first, one possible code for that is:
for i in range(self.h_u_matrix):
for j in range(self.w_u_matrix):
nb = (0,0)
if not (i % 2) and (j % 2):
nb = (0,1)
elif (i % 2) and not (j % 2):
nb = (1,0)
self.u_matrix[(i,j)] = np.linalg.norm(
self.weights[i //2, j //2] - self.weights[i //2 +nb[0], j // 2 + nb[1]],
axis = 0
)
In the code above the self.h_u_matrix = self.weights.shape[0]*2 - 1 and self.w_u_matrix = self.weights.shape[1]*2 - 1 are the dimensions of the U-Matrix. With that said, for calculate the others elements it's necessary obtain a list with they neighboors and apply a mean for example. The following code implements that's idea:
for i in range(self.h_u_matrix):
for j in range(self.w_u_matrix):
if not (i % 2) and not (j % 2):
nodelist = []
if i > 0:
nodelist.append((i-1,j))
if i < 4:
nodelist.append((i+1, j))
if j > 0:
nodelist.append((i,j -1))
if j < 4:
nodelist.append((i,j+1))
meanlist = [self.u_matrix[u_node] for u_node in nodelist]
self.u_matrix[(i,j)] = np.mean(meanlist)
elif (i % 2) and (j % 2):
meanlist = [
(i - 1, j),
(i + 1, j),
(i, j - 1),
(i, j + 1)]
self.u_matrix[(i,j)] = np.mean(meanlist)

Cascaded Hough Transform in OpenCV

Is it possible to perform a Cascaded Hough Transform in OpenCV? I understand its just a HT followed by another one. The problem I'm facing is that the values returned are always rho and theta and never in y-intercept form.
Is it possible to convert these values back to y-intercept and split them into sub-spaces so I can detect vanishing points?
Or is it just better to program an implementation of HT myself in, say, Python?
you could try to populate the Hough domain with m and c parameters instead, so that y = mx + c can be re-written as c = y - mx so instead of the usual rho = x cos(theta) + y sin(theta), you have c = y - mx
normally, you'd go through the thetas and calculate the rho, then you increment the accumulator value for that pair of rho and theta. Here, you'd go through the value of m and calculate the values of c, then accumulate that m,c element in the accumulator. The bin with the most votes would be the right m,c
// going through the image looking for edge pixels
for (i = 0;i<numrows;i++)
{
for (j = 0;j<numcols;j++)
{
if (img[i*numcols + j] > 1)
{
for (n = first_m;n<last_m;n++)
{
index = i - n * j;
accum[n][index]++;
}
}
}
}
I guess where this becomes ineffective is that its hard to define the step size for going through m as they should technically go from -infinity to infinity so you'd kind of have trouble. yeah, so much for Hough transform in terms of m,c. Lol
I guess you could go the other way and isolate m so it would be m = (y-c)/x so that now, you cycle through a bunch of y values that make sense and its much more manageable though it's still hard to define your accumulator matrix because m still has no limit. I guess you could limit the values of m that you would be interested in looking for.
Yeah, much more sense to go with rho and theta and convert them into y = mx + c and then even making a brand new image and re-running the hough transform on it.
I don't think OpenCV can perform cascaded hough transforms. You should convert them to xy space yourself. This article might help you:
http://aishack.in/tutorials/converting-lines-from-normal-to-slopeintercept-form/

Resources