While messing around with noise outside of Roblox, I realized Perlin/Simplex Noise does not like negative inputs. Remembering Roblox has a noise function, I tried there, and found out negative numbers do work nicely for Roblox's math.noise(). Does anybody know how they made this work, or how to get negative numbers to work for Perlin/Simplex noise in general?
The Simplex Noise I am using (copied from here but changed to have the bitwise and operation):
local function bit_and(a, b) --bitwise and operation
local p, c = 1, 0
while a > 0 and b > 0 do
local ra, rb = a%2, b%2
if (ra + rb) > 1 then
c = c + p
end
a = (a - ra) / 2
b = (b - rb) / 2
p = p * 2
end
return c
end
-- 2D simplex noise
local grad3 = {
{1,1,0},{-1,1,0},{1,-1,0},{-1,-1,0},
{1,0,1},{-1,0,1},{1,0,-1},{-1,0,-1},
{0,1,1},{0,-1,1},{0,1,-1},{0,-1,-1}
}
local p = {151,160,137,91,90,15,
131,13,201,95,96,53,194,233,7,225,140,36,103,30,69,142,8,99,37,240,21,10,23,
190, 6,148,247,120,234,75,0,26,197,62,94,252,219,203,117,35,11,32,57,177,33,
88,237,149,56,87,174,20,125,136,171,168, 68,175,74,165,71,134,139,48,27,166,
77,146,158,231,83,111,229,122,60,211,133,230,220,105,92,41,55,46,245,40,244,
102,143,54, 65,25,63,161, 1,216,80,73,209,76,132,187,208, 89,18,169,200,196,
135,130,116,188,159,86,164,100,109,198,173,186, 3,64,52,217,226,250,124,123,
5,202,38,147,118,126,255,82,85,212,207,206,59,227,47,16,58,17,182,189,28,42,
223,183,170,213,119,248,152, 2,44,154,163, 70,221,153,101,155,167, 43,172,9,
129,22,39,253, 19,98,108,110,79,113,224,232,178,185, 112,104,218,246,97,228,
251,34,242,193,238,210,144,12,191,179,162,241, 81,51,145,235,249,14,239,107,
49,192,214, 31,181,199,106,157,184, 84,204,176,115,121,50,45,127, 4,150,254,
138,236,205,93,222,114,67,29,24,72,243,141,128,195,78,66,215,61,156,180}
local perm = {}
for i=0,511 do
perm[i+1] = p[bit_and(i, 255) + 1]
end
local function dot(g, ...)
local v = {...}
local sum = 0
for i=1,#v do
sum = sum + v[i] * g[i]
end
return sum
end
local noise = {}
function noise.produce(xin, yin)
local n0, n1, n2 -- Noise contributions from the three corners
-- Skew the input space to determine which simplex cell we're in
local F2 = 0.5*(math.sqrt(3.0)-1.0)
local s = (xin+yin)*F2; -- Hairy factor for 2D
local i = math.floor(xin+s)
local j = math.floor(yin+s)
local G2 = (3.0-math.sqrt(3.0))/6.0
local t = (i+j)*G2
local X0 = i-t -- Unskew the cell origin back to (x,y) space
local Y0 = j-t
local x0 = xin-X0 -- The x,y distances from the cell origin
local y0 = yin-Y0
-- For the 2D case, the simplex shape is an equilateral triangle.
-- Determine which simplex we are in.
local i1, j1 -- Offsets for second (middle) corner of simplex in (i,j) coords
if x0 > y0 then
i1 = 1
j1 = 0 -- lower triangle, XY order: (0,0)->(1,0)->(1,1)
else
i1 = 0
j1 = 1
end-- upper triangle, YX order: (0,0)->(0,1)->(1,1)
-- A step of (1,0) in (i,j) means a step of (1-c,-c) in (x,y), and
-- a step of (0,1) in (i,j) means a step of (-c,1-c) in (x,y), where
-- c = (3-sqrt(3))/6
local x1 = x0 - i1 + G2 -- Offsets for middle corner in (x,y) unskewed coords
local y1 = y0 - j1 + G2
local x2 = x0 - 1 + 2 * G2 -- Offsets for last corner in (x,y) unskewed coords
local y2 = y0 - 1 + 2 * G2
-- Work out the hashed gradient indices of the three simplex corners
local ii = bit_and(i, 255)
local jj = bit_and(j, 255)
local gi0 = perm[ii + perm[jj+1]+1] % 12
local gi1 = perm[ii + i1 + perm[jj + j1+1]+1] % 12
local gi2 = perm[ii + 1 + perm[jj + 1+1]+1] % 12
-- Calculate the contribution from the three corners
local t0 = 0.5 - x0 * x0 - y0 * y0
if t0 < 0 then
n0 = 0.0
else
t0 = t0 * t0
n0 = t0 * t0 * dot(grad3[gi0+1], x0, y0) -- (x,y) of grad3 used for 2D gradient
end
local t1 = 0.5 - x1 * x1 - y1 * y1
if t1 < 0 then
n1 = 0.0
else
t1 = t1 * t1
n1 = t1 * t1 * dot(grad3[gi1+1], x1, y1)
end
local t2 = 0.5 - x2 * x2 - y2 * y2
if t2 < 0 then
n2 = 0.0
else
t2 = t2 * t2
n2 = t2 * t2 * dot(grad3[gi2+1], x2, y2)
end
-- Add contributions from each corner to get the final noise value.
-- The result is scaled to return values in the interval [-1,1].
return 70.0 * (n0 + n1 + n2)
end
return noise
The Lua programming language version that Roblox uses, LuaU (or Luau), is actually open-source since November of 2021. You can find it here. The math library can be found in this file called lmathlib.cpp and it contains the math.noise function along with internal functions to calculate it, perlin (main function), grad, lerp, and fade. It's a quite complicated thing I can't explain myself, but I have converted it into Lua here.
I'm trying to implement a regression NN that has 3 layers (1 input, 1 hidden and 1 output layer with a continuous result). As a basis I took a classification NN from coursera.org class, but changed the cost function and gradient calculation so as to fit a regression problem (and not a classification one):
My nnCostFunction now is:
function [J grad] = nnCostFunctionLinear(nn_params, ...
input_layer_size, ...
hidden_layer_size, ...
num_labels, ...
X, y, lambda)
Theta1 = reshape(nn_params(1:hidden_layer_size * (input_layer_size + 1)), ...
hidden_layer_size, (input_layer_size + 1));
Theta2 = reshape(nn_params((1 + (hidden_layer_size * (input_layer_size + 1))):end), ...
num_labels, (hidden_layer_size + 1));
m = size(X, 1);
a1 = X;
a1 = [ones(m, 1) a1];
a2 = a1 * Theta1';
a2 = [ones(m, 1) a2];
a3 = a2 * Theta2';
Y = y;
J = 1/(2*m)*sum(sum((a3 - Y).^2))
th1 = Theta1;
th1(:,1) = 0; %set bias = 0 in reg. formula
th2 = Theta2;
th2(:,1) = 0;
t1 = th1.^2;
t2 = th2.^2;
th = sum(sum(t1)) + sum(sum(t2));
th = lambda * th / (2*m);
J = J + th; %regularization
del_3 = a3 - Y;
t1 = del_3'*a2;
Theta2_grad = 2*(t1)/m + lambda*th2/m;
t1 = del_3 * Theta2;
del_2 = t1 .* a2;
del_2 = del_2(:,2:end);
t1 = del_2'*a1;
Theta1_grad = 2*(t1)/m + lambda*th1/m;
grad = [Theta1_grad(:) ; Theta2_grad(:)];
end
Then I use this func in fmincg algorithm, but in firsts iterations fmincg end it's work. I think my gradient is wrong, but I can't find the error.
Can anybody help?
If I understand correctly, your first block of code (shown below) -
m = size(X, 1);
a1 = X;
a1 = [ones(m, 1) a1];
a2 = a1 * Theta1';
a2 = [ones(m, 1) a2];
a3 = a2 * Theta2';
Y = y;
is to get the output a(3) at the output layer.
Ng's slides about NN has the below configuration to calculate a(3). It's different from what your code presents.
in the middle/output layer, you are not doing the activation function g, e.g., a sigmoid function.
In terms of the cost function J without regularization terms, Ng's slides has the below formula:
I don't understand why you can compute it using:
J = 1/(2*m)*sum(sum((a3 - Y).^2))
because you are not including the log function at all.
Mikhaill, I´ve been playing with a NN for continuous regression as well, and had a similar issues at some point. The best thing to do here would be to test gradient computation against a numerical calculation before running the model. If that´s not correct, fmincg won´t be able to train the model. (Btw, I discourage you of using numerical gradient as the time involved is much bigger).
Taking into account that you took this idea from Ng´s Coursera class, I´ll implement a possible solution for you to try using the same notation for Octave.
% Cost function without regularization.
J = 1/2/m^2*sum((a3-Y).^2);
% In case it´s needed, regularization term is added (i.e. for Training).
if (reg==true);
J=J+lambda/2/m*(sum(sum(Theta1(:,2:end).^2))+sum(sum(Theta2(:,2:end).^2)));
endif;
% Derivatives are computed for layer 2 and 3.
d3=(a3.-Y);
d2=d3*Theta2(:,2:end);
% Theta grad is computed without regularization.
Theta1_grad=(d2'*a1)./m;
Theta2_grad=(d3'*a2)./m;
% Regularization is added to grad computation.
Theta1_grad(:,2:end)=Theta1_grad(:,2:end)+(lambda/m).*Theta1(:,2:end);
Theta2_grad(:,2:end)=Theta2_grad(:,2:end)+(lambda/m).*Theta2(:,2:end);
% Unroll gradients.
grad = [Theta1_grad(:) ; Theta2_grad(:)];
Note that, since you have taken out all the sigmoid activation, the derivative calculation is quite simple and results in a simplification of the original code.
Next steps:
1. Check this code to understand if it makes sense to your problem.
2. Use gradient checking to test gradient calculation.
3. Finally, use fmincg and check you get different results.
Try to include sigmoid function to compute second layer (hidden layer) values and avoid sigmoid in calculating the target (output) value.
function [J grad] = nnCostFunction1(nnParams, ...
inputLayerSize, ...
hiddenLayerSize, ...
numLabels, ...
X, y, lambda)
Theta1 = reshape(nnParams(1:hiddenLayerSize * (inputLayerSize + 1)), ...
hiddenLayerSize, (inputLayerSize + 1));
Theta2 = reshape(nnParams((1 + (hiddenLayerSize * (inputLayerSize + 1))):end), ...
numLabels, (hiddenLayerSize + 1));
Theta1Grad = zeros(size(Theta1));
Theta2Grad = zeros(size(Theta2));
m = size(X,1);
a1 = [ones(m, 1) X]';
z2 = Theta1 * a1;
a2 = sigmoid(z2);
a2 = [ones(1, m); a2];
z3 = Theta2 * a2;
a3 = z3;
Y = y';
r1 = lambda / (2 * m) * sum(sum(Theta1(:, 2:end) .* Theta1(:, 2:end)));
r2 = lambda / (2 * m) * sum(sum(Theta2(:, 2:end) .* Theta2(:, 2:end)));
J = 1 / ( 2 * m ) * (a3 - Y) * (a3 - Y)' + r1 + r2;
delta3 = a3 - Y;
delta2 = (Theta2' * delta3) .* sigmoidGradient([ones(1, m); z2]);
delta2 = delta2(2:end, :);
Theta2Grad = 1 / m * (delta3 * a2');
Theta2Grad(:, 2:end) = Theta2Grad(:, 2:end) + lambda / m * Theta2(:, 2:end);
Theta1Grad = 1 / m * (delta2 * a1');
Theta1Grad(:, 2:end) = Theta1Grad(:, 2:end) + lambda / m * Theta1(:, 2:end);
grad = [Theta1Grad(:) ; Theta2Grad(:)];
end
Normalize the inputs before passing it in nnCostFunction.
In accordance with Week 5 Lecture Notes guideline for a Linear System NN you should make following changes in the initial code:
Remove num_lables or make it 1 (in reshape() as well)
No need to convert y into a logical matrix
For a2 - replace sigmoid() function to tanh()
In d2 calculation - replace sigmoidGradient(z2) with (1-tanh(z2).^2)
Remove sigmoid from output layer (a3 = z3)
Replace cost function in the unregularized portion to linear one: J = (1/(2*m))*sum((a3-y).^2)
Create predictLinear(): use predict() function as a basis, replace sigmoid with tanh() for the first layer hypothesis, remove second sigmoid for the second layer hypothesis, remove the line with max() function, use output of the hidden layer hypothesis as a prediction result
Verify your nnCostFunctionLinear() on the test case from the lecture note