I'm looking for algorithms to do curve fitting from tabular XY data to a Gaussian function (a.k.a. bell curve). By googling I can find few gaussian fitting algos for Matlab, here are couple of them:
https://ccrma.stanford.edu/~jos/sasp/Fitting_Gaussian_Data.html
http://jila.colorado.edu/bec/BEC_for_everyone/matlabfitting.htm
One seems to use 'polyfit' function of Matlab for the job.
Anyone seen readily made algo for Lua language (either gaussian or polyfit)? If not, I would greatly appreciate one's help for creating/porting such algorithm as it would probably consume a day with my limited Lua skills.
Here's my attempt to extract gaussian fit out of noisy measurement data.
require 'gsl'
require 'math'
--x=x coordinates, y=y coordinates
--clip=relative clip/ignore level 0..1 (i.e 0.1 removes values below 10% of max amplitide)
--removeoffset=set to true if y data offset should be removed
function gaussianFit( x, y, clip, removeoffset )
local xx = {}
local yy = {}
local yoffset=0
if removeoffset==nil or removeoffset==false then
else --remove y data offset
yoffset=gsl.Vector(y):min()
end
local ymax=gsl.Vector(y):max()-yoffset
--pick only data points that has y coord larger than clip level
for i=1,#x do
if (y[i]-yoffset) > (clip*ymax) then
table.insert(xx, x[i])
table.insert(yy, math.log(y[i]-yoffset))
end
end
local xvect = gsl.Vector(xx)
local yvect = gsl.Vector(yy)
--fit to polynomial
local poly3 = gsl.fit.poly(3) -- a third degree polynomial
local fit = gsl.lsfit({xvect, poly3}, yvect, nil, "fmulti") -- fits xx and yy with poly3
--convert to gauss coeffs
local A2=fit:coeffs()[3]
local A1=fit:coeffs()[2]
local A0=fit:coeffs()[1]
local sigma=math.sqrt(-1/(2*A2))
local mu=A1*math.pow(sigma,2)
local A=math.exp(A0+math.pow(mu,2)/(2*math.pow(sigma,2)))
return sigma, mu, A
end
xx={1, 2, 3, 4, 5, 6, 7, 8, 9}
yy={1, 2, 4, 6, 4, 3, 2, 1, 1}
sigma,mu,A=gaussianFit(xx,yy,0.1,false)
print(sigma.." "..mu.." ".. A)
--prints 2.2829275461334 4.6387484511153 4.201115115886
You can rearrange the equation to a linear form, and then use the methods described in Linear Regression by Paul Bourke (a little down the page).
If you need, I can demonstrate the rearranging process for you.
If you really need, I can provide an implementation of the line-of-best-fit algorithm in Lua.
Related
I'm writing a k-means algorithm. At each step, I want to compute the distance of my n points to k centroids, without a for loop, and for d dimensions.
The problem is I have a hard time splitting on my number of dimensions with the Matlab functions I know. Here is my current code, with x being my n 2D-points and y my k centroids (also 2D-points of course), and with the points distributed along dimension 1, and the spatial coordinates along the dimension 2:
dist = #(a,b) (a - b).^2;
dx = bsxfun(dist, x(:,1), y(:,1)'); % x is (n,1) and y is (1,k)
dy = bsxfun(dist, x(:,2), y(:,2)'); % so the result is (n,k)
dists = dx + dy; % contains the square distance of each points to the k centroids
[_,l] = min(dists, [], 2); % we then argmin on the 2nd dimension
How to vectorize furthermore ?
First edit 3 days later, searching on my own
Since asking this question I made progress on my own towards vectorizing this piece of code.
The code above runs in approximately 0.7 ms on my example.
I first used repmat to make it easy to do broadcasting:
dists = permute(permute(repmat(x,1,1,k), [3,2,1]) - y, [3,2,1]).^2;
dists = sum(dists, 2);
[~,l] = min(dists, [], 3);
As expected it is slightly slower since we replicate the matrix, it runs at 0.85 ms.
From this example it was pretty easy to use bsxfun for the whole thing, but it turned out to be extremely slow, running in 150 ms so more than 150 times slower than the repmat version:
dist = #(a, b) (a - b).^2;
dists = permute(bsxfun(dist, permute(x, [3, 2, 1]), y), [3, 2, 1]);
dists = sum(dists, 2);
[~,l] = min(dists, [], 3);
Why is it so slow ? Isn't vectorizing always an improvement on speed, since it uses vector instructions on the CPU ? I mean of course simple for loops could be optimized to use it aswell, but how can vectorizing make the code slower ? Did I do it wrong ?
Using a for loop
For the sake of completeness, here's the for loop version of my code, surprisingly the fastest running in 0.4 ms, not sure why..
for i=1:k
dists(:,i) = sum((x - y(i,:)).^2, 2);
endfor
[~,l] = min(dists, [], 2);
Note: This answer was written when the question was also tagged MATLAB. Links to Octave documentation added after the MATLAB tag was removed.
You can use the pdist2MATLAB/Octave function to calculate pairwise distances between two sets of observations.
This way, you offload the bother of vectorization to the people who wrote MATLAB/Octave (and they have done a pretty good job of it)
X = rand(10,3);
Y = rand(5,3);
D = pdist2(X, Y);
D is now a 10x5 matrix where the i, jth element is the distance between the ith X and jth Y point.
You can pass it the kind of distance you want as the third argument -- e.g. 'euclidean', 'minkowski', etc, or you could pass a function handle to your custom function like so:
dist = #(a,b) (a - b).^2;
D = pdist2(X, Y, dist);
As saastn mentions, pdist2(..., 'smallest', k) makes things easier in k-means. This returns just the smallest k values from each column of pdist2's result. Octave doesn't have this functionality, but it's easily replicated using sort()MATLAB/Octave.
D_smallest = sort(D);
D_smallest = D_smallest(1:k, :);
I'm in a situation where I need to train a model to predict a scalar value, and it's important to have the predicted value be in the same direction as the true value, while the squared error being minimum.
What would be a good choice of loss function for that?
For example:
Let's say the predicted value is -1 and the true value is 1. The loss between the two should be a lot greater than the loss between 3 and 1, even though the squared error of (3, 1) and (-1, 1) is equal.
Thanks a lot!
This turned out to be a really interesting question - thanks for asking it! First, remember that you want your loss functions to be defined entirely of differential operations, so that you can back-propagation though it. This means that any old arbitrary logic won't necessarily do. To restate your problem: you want to find a differentiable function of two variables that increases sharply when the two variables take on values of different signs, and more slowly when they share the same sign. Additionally, you want some control over how sharply these values increase, relative to one another. Thus, we want something with two configurable constants. I started constructing a function that met these needs, but then remembered one you can find in any high school geometry text book: the elliptic paraboloid!
The standard formulation doesn't meet the requirement of sign agreement symmetry, so I had to introduce a rotation. The plot above is the result. Note that it increases more sharply when the signs don't agree, and less sharply when they do, and that the input constants controlling this behaviour are configurable. The code below is all that was needed to define and plot the loss function. I don't think I've ever used a geometric form as a loss function before - really neat.
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from matplotlib import cm
def elliptic_paraboloid_loss(x, y, c_diff_sign, c_same_sign):
# Compute a rotated elliptic parabaloid.
t = np.pi / 4
x_rot = (x * np.cos(t)) + (y * np.sin(t))
y_rot = (x * -np.sin(t)) + (y * np.cos(t))
z = ((x_rot**2) / c_diff_sign) + ((y_rot**2) / c_same_sign)
return(z)
c_diff_sign = 4
c_same_sign = 2
a = np.arange(-5, 5, 0.1)
b = np.arange(-5, 5, 0.1)
loss_map = np.zeros((len(a), len(b)))
for i, a_i in enumerate(a):
for j, b_j in enumerate(b):
loss_map[i, j] = elliptic_paraboloid_loss(a_i, b_j, c_diff_sign, c_same_sign)
fig = plt.figure()
ax = fig.gca(projection='3d')
X, Y = np.meshgrid(a, b)
surf = ax.plot_surface(X, Y, loss_map, cmap=cm.coolwarm,
linewidth=0, antialiased=False)
plt.show()
From what I understand, your current loss function is something like:
loss = mean_square_error(y, y_pred)
What you could do, is to add one other component to your loss, being this a component that penalizes negative numbers and does nothing with positive numbers. And you can choose a coefficient for how much you want to penalize it. For that, we can use like a negative shaped ReLU. Something like this:
Let's call "Neg_ReLU" to this component. Then, your loss function will be:
loss = mean_squared_error(y, y_pred) + Neg_ReLU(y_pred)
So for example, if your result is -1, then the total error would be:
mean_squared_error(1, -1) + 1
And if your result is 3, then the total error would be:
mean_squared_error(1, -1) + 0
(See in the above function how Neg_ReLU(3) = 0, and Neg_ReLU(-1) = 1.
If you want to penalize more the negative values, then you can add a coefficient:
coeff_negative_value = 2
loss = mean_squared_error(y, y_pred) + coeff_negative_value * Neg_ReLU
Now the negative values are more penalized.
The ReLU negative function we can build it like this:
tf.nn.relu(tf.math.negative(value))
So summarizing, in the end your total loss will be:
coeff = 1
Neg_ReLU = tf.nn.relu(tf.math.negative(y))
total_loss = mean_squared_error(y, y_pred) + coeff * Neg_ReLU
I'm trying to understand RNNs and I would like to find a simple example that actually shows the one hot vectors and the numerical operations. Preferably conceptual since actual code may make it even more confusing. Most examples I google just show boxes with loops coming out of them and its really difficult to understand what exactly is going on. In the rare case where they do show the vectors its still difficult to see how they are getting the values.
for example I don't know where the values are coming from in this picture https://i1.wp.com/karpathy.github.io/assets/rnn/charseq.jpeg
If the example could integrate LSTMs and other popular extensions that would be cool too.
In the simple RNN case, a network accepts an input sequence x and produces an output sequence y while a hidden sequence h stores the network's dynamic state, such that at timestep i: x(i) ∊ ℝM, h(i) ∊ ℝN, y(i) ∊ ℝP the real valued vectors of M/N/P dimensions corresponding to input, hidden and output values respectively. The RNN changes its state and omits output based on the state equations:
h(t) = tanh(Wxh ∗ [x(t); h(t-1)]), where Wxh a linear map: ℝM+N ↦ ℝN, * the matrix multiplication and ; the concatenation operation. Concretely, to obtain h(t) you concatenate x(t) with h(t-1), you apply matrix multiplication between Wxh (of shape (M+N, N)) and the concatenated vector (of shape M+N) , and you use a tanh non-linearity on each element of the resulting vector (of shape N).
y(t) = sigmoid(Why * h(t)), where Why a linear map: ℝN ↦ ℝP. Concretely, you apply matrix multiplication between Why (of shape (N, P)) and h(t) (of shape N) to obtain a P-dimensional output vector, on which the sigmoid function is applied.
In other words, obtaining the output at time t requires iterating through the above equations for i=0,1,...,t. Therefore, the hidden state acts as a finite memory for the system, allowing for context-dependent computation (i.e. h(t) fully depends on both the history of the computation and the current input, and so does y(t)).
In the case of gated RNNs (GRU or LSTM), the state equations get somewhat harder to follow, due to the gating mechanisms which essentially allow selection between the input and the memory, but the core concept remains the same.
Numeric Example
Let's follow your example; we have M = 4, N = 3, P = 4, so Wxh is of shape (7, 3) and Why of shape (3, 4). We of course do not know the values of either W matrix, so we cannot reproduce the same results; we can still follow the process though.
At timestep t<0, we have h(t) = [0, 0, 0].
At timestep t=0, we receive input x(0) = [1, 0, 0, 0]. Concatenating x(0) with h(0-), we get [x(t); h(t-1)] = [1, 0, 0 ..., 0] (let's call this vector u to ease notation). We apply u * Wxh (i.e. multiplying a 7-dimensional vector with a 7 by 3 matrix) and get a vector v = [v1, v2, v3], where vi = Σj uj Wji = u1 W1i + u2 W2i + ... + u7 W7i. Finally, we apply tanh on v, obtaining h(0) = [tanh(v1), tanh(v2), tanh(v3)] = [0.3, -0.1, 0.9]. From h(0) we can also get y(0) via the same process; multiply h(0) with Why (i.e. 3 dimensional vector with a 3 by 4 matrix), get a vector s = [s1, s2, s3, s4], apply sigmoid on s and get σ(s) = y(0).
At timestep t=1, we receive input x(1) = [0, 1, 0, 0]. We concatenate x(1) with h(0) to get a new u = [0, 1, 0, 0, 0.3, -0.1, 0.9]. u is again multiplied with Wxh, and tanh is again applied on the result, giving us h(1) = [1, 0.3, 1]. Similarly, h(1) is multiplied by Why, giving us a new s vector on which we apply the sigmoid to obtain σ(s) = y(1).
This process continues until the input sequence finishes, ending the computation.
Note: I have ignored bias terms in the above equations because they do not affect the core concept and they make notation impossible to follow
I need to perform a custom spatial convolution in Torch. Rather than simply multiplying each input pixel by a weight for that pixel and adding them together with the filter's bias to form each output pixel, I need to do a more complex mathematical function to the input pixels before adding them together.
I know how to do this, but I do not know a GOOD way to do this. The best way I've come up with is to take the full input tensor, create a bunch of secondary tensors that are "views" of the original without allocating additional memory, putting those into a Replicate layer (the output filter count being the replication count), and feeding that into a ParallelTable layer containing a bunch of regular layers that have their parameters shared between filters.
The trouble is, even though this is fine memory-wise with a very manageable overhead, we're talking inputwidth^inputheight^inputdepth^outputdepth mini-networks, here. Maybe there's some way to create massive "long and tall" networks that work on the entire replicated input set at once, but how do I create layers that are partially-connected (like convolutions) instead of fully-connected?
I would have liked to just use inheritance to create a special copy of the regular SpatialConvolution "class" and modify it, but I can't even try because it's implemented in an external C library. I can't just use regular layers before a regular SpatialConvolution layer because I need to do my math with different weights and biases for each filter (shared between applications of the same filter to different input coordinates).
Good question. You made me give some serious thought.
Your approach has a flaw: it does not allow to take advantage of vectorized computations since each mini-network works independently.
My idea is as follows:
Suppose network's input and output are 2D tensors. We can produce (efficiently, without memory copying) an auxiliary 4D tensor
rf_input (kernel_size x kernel_size x output_h x output_w)
such that rf_input[:, :, k, l] is a 2D tensor of size kernel_size x kernel_size containing a receptive field which output[k, l] will be gotten from. Then we iterate over positions inside the kernel rf_input[i, j, :, :] getting pixels at position (i, j) inside all receptive fields and computing their contribution to each output[k, l] at once using vectorization.
Example:
Let our "convolving" function be, for example, a product of tangents of sums:
Then its partial derivative w.r.t. the input pixel at position (s,t) in its receptive field is
Derivative w.r.t. weight is the same.
At the end, of course, we must sum up gradients from different output[k,l] points. For example, each input[m, n] contributes to at most kernel_size^2 outputs as a part of their receptive fields, and each weight[i, j] contributes to all output_h x output_w outputs.
Simple implementation may look like this:
require 'nn'
local CustomConv, parent = torch.class('nn.CustomConv', 'nn.Module')
-- This module takes and produces a 2D map.
-- To work with multiple input/output feature maps and batches,
-- you have to iterate over them or further vectorize computations inside the loops.
function CustomConv:__init(ker_size)
parent.__init(self)
self.ker_size = ker_size
self.weight = torch.rand(self.ker_size, self.ker_size):add(-0.5)
self.gradWeight = torch.Tensor(self.weight:size()):zero()
end
function CustomConv:_get_recfield_input(input)
local rf_input = {}
for i = 1, self.ker_size do
rf_input[i] = {}
for j = 1, self.ker_size do
rf_input[i][j] = input[{{i, i - self.ker_size - 1}, {j, j - self.ker_size - 1}}]
end
end
return rf_input
end
function CustomConv:updateOutput(_)
local output = torch.Tensor(self.rf_input[1][1]:size())
-- Kernel-specific: our kernel is multiplicative, so we start with ones
output:fill(1)
--
for i = 1, self.ker_size do
for j = 1, self.ker_size do
local ker_pt = self.rf_input[i][j]:clone()
local w = self.weight[i][j]
-- Kernel-specific
output:cmul(ker_pt:add(w):tan())
--
end
end
return output
end
function CustomConv:updateGradInput_and_accGradParameters(_, gradOutput)
local gradInput = torch.Tensor(self.input:size()):zero()
for i = 1, self.ker_size do
for j = 1, self.ker_size do
local ker_pt = self.rf_input[i][j]:clone()
local w = self.weight[i][j]
-- Kernel-specific
local subGradInput = torch.cmul(gradOutput, torch.cdiv(self.output, ker_pt:add(w):tan():cmul(ker_pt:add(w):cos():pow(2))))
local subGradWeight = subGradInput
--
gradInput[{{i, i - self.ker_size - 1}, {j, j - self.ker_size - 1}}]:add(subGradInput)
self.gradWeight[{i, j}] = self.gradWeight[{i, j}] + torch.sum(subGradWeight)
end
end
return gradInput
end
function CustomConv:forward(input)
self.input = input
self.rf_input = self:_get_recfield_input(input)
self.output = self:updateOutput(_)
return self.output
end
function CustomConv:backward(input, gradOutput)
gradInput = self:updateGradInput_and_accGradParameters(_, gradOutput)
return gradInput
end
If you change this code a bit:
updateOutput:
output:fill(0)
[...]
output:add(ker_pt:mul(w))
updateGradInput_and_accGradParameters:
local subGradInput = torch.mul(gradOutput, w)
local subGradWeight = torch.cmul(gradOutput, ker_pt)
then it will work exactly as nn.SpatialConvolutionMM with zero bias (I've tested it).
I am trying to extract Rotation matrix and Translation vector from the essential matrix.
<pre><code>
SVD svd(E,SVD::MODIFY_A);
Mat svd_u = svd.u;
Mat svd_vt = svd.vt;
Mat svd_w = svd.w;
Matx33d W(0,-1,0,
1,0,0,
0,0,1);
Mat_<double> R = svd_u * Mat(W).t() * svd_vt; //or svd_u * Mat(W) * svd_vt;
Mat_<double> t = svd_u.col(2); //or -svd_u.col(2)
</code></pre>
However, when I am using R and T (e.g. to obtain rectified images), the result does not seem to be right(black images or some obviously wrong outputs), even so I used different combination of possible R and T.
I suspected to E. According to the text books, my calculation is right if we have:
E = U*diag(1, 1, 0)*Vt
In my case svd.w which is supposed to be diag(1, 1, 0) [at least in term of a scale], is not so. Here is an example of my output:
svd.w = [21.47903827647813; 20.28555196246256; 5.167099204708699e-010]
Also, two of the eigenvalues of E should be equal and the third one should be zero. In the same case the result is:
eigenvalues of E = 0.0000 + 0.0000i, 0.3143 +20.8610i, 0.3143 -20.8610i
As you see, two of them are complex conjugates.
Now, the questions are:
Is the decomposition of E and calculation of R and T done in a right way?
If the calculation is right, why the internal rules of essential matrix are not satisfied by the results?
If everything about E, R, and T is fine, why the rectified images obtained by them are not correct?
I get E from fundamental matrix, which I suppose to be right. I draw epipolar lines on both the left and right images and they all pass through the related points (for all the 16 points used to calculate the fundamental matrix).
Any help would be appreciated.
Thanks!
I see two issues.
First, discounting the negligible value of the third diagonal term, your E is about 6% off the ideal one: err_percent = (21.48 - 20.29) / 20.29 * 100 . Sounds small, but translated in terms of pixel error it may be an altogether larger amount.
So I'd start by replacing E with the ideal one after SVD decomposition: Er = U * diag(1,1,0) * Vt.
Second, the textbook decomposition admits 4 solutions, only one of which is physically plausible (i.e. with 3D points in front of the camera). You may be hitting one of non-physical ones. See http://en.wikipedia.org/wiki/Essential_matrix#Determining_R_and_t_from_E .