I read about Concat layer on Caffe website. However, I don't know if my understanding of it is right.
Let's say that as an input I have two layers that can be described as W1 x H1 x D1 and W2 x H2 x D2, where W is the width, H is height and D is depth.
Thus, as I understand with Axis set to 0 output will be (W1 + W2) x (H1 + H2) x D, where D = D1 = D2.
With Axis set to 1 output will be W x H x (D1 + D2), where H = H1 = H2 and W = W1 = W2.
Is my understanding correct? If no I would be grateful for an explanation.
I'm afraid you are a bit off...
Look at this caffe.help.
Usually, data in caffe is stored in 4D "blobs": BxCxHxW (that is, batch size by channel/feature/depth by height by width).
Now if you have two blobs B1xC1xH1xW1 and B2xC2xH2xW2 you can concatenate them along axis: 1 (along channel dimension) to form an output blob with C=C1+C2. This is only possible iff B1==B2 and H1==H2 and W1==W2, resulting with B1x(C1+C2)xH1xW1
Related
Without specifying units, I can express area and volume and have Maxima show the relationship:
(%i1) areaNoUnits: area = width * length$
(%i2) volumeNoUnits: volume = area * height$
(%i3) volumeNoUnits, areaNoUnits;
(%o3) volume = height length width
(%i4) subst(areaNoUnits, volumeNoUnits);
(%o4) volume = height length width
Now I want to specify units so I will use the ezunits package.
The ` (backtick) operator is the building block of ezunits:
An expression a ` b represents a dimensional quantity, with a indicating a nondimensional quantity and b indicating the dimensional units.
When I add units to the area and volume expressions, evaluation and substitution do not work:
(%i1) load ("ezunits")$
(%i2) areaWithUnits: area ` m^2 = (width ` m) * (length ` m);
2 2
(%o2) area ` m = length width ` m
(%i3) volumeWithUnits: volume ` m^3 = (area ` m^2) * (height ` m);
3 3
(%o3) volume ` m = area height ` m
(%i4) volumeWithUnits, areaWithUnits;
3 3
(%o4) volume ` m = area height ` m
(%i5) subst(areaWithUnits, volumeWithUnits);
3 3
(%o5) volume ` m = area height ` m
The expected output is:
volumeWithUnits, areaWithUnits;
3 3
volume ` m = height length width ` m
I do not see a function in the ezunits package to do evaluation or substitution. What is the right way to do this?
I would phrase it like this:
(%i2) load (ezunits) $
(%i3) width: W ` m;
(%o3) W ` m
(%i4) length: L ` m;
(%o4) L ` m
(%i5) area: width * length;
2
(%o5) L W ` m
(%i6) height: H ` m;
(%o6) H ` m
(%i7) volume: area * height;
3
(%o7) H L W ` m
I wrote each part as conceptualname: symbolforquantity ` unit and then wrote just conceptualname in further calculations, instead of conceptualname ` unit.
The substitution you tried in %i5 didn't work because subst is a purely formal substitution -- if there isn't a literal subexpression which is the same as the substituted-for expression, it doesn't match; subst doesn't look for rearrangements or factorizations which could help make a match. There are ways to work around that, so it might be possible to make your original formulation work, but I think it's better overall to sidestep the problem and work with conceptualname and symbolforquantity ` unit.
To say a little about what more one could do with expressions like %o7 above. There are at least two ways to replace symbols H, L, and W with specific values. One is to call subst:
(%i2) load (ezunits) $
(%i3) volume: H*L*W ` m^3;
3
(%o3) H L W ` m
(%i4) subst ([L = 20, W = %pi], volume);
3
(%o4) 20 %pi H ` m
Another is to make use of ev.
(%i5) ev (volume, L = 20, W = %pi);
3
(%o5) 20 %pi H ` m
Note that at the input prompt, something, someflags, somevalues is equivalent to ev(something, someflags, somevalues).
(%i6) volume, L = 20, W = %pi;
3
(%o6) 20 %pi H ` m
This is just a convenience. Within a function, one has to say ev(...); the shorter syntax isn't understood there.
ev is often convenient, but it's generally simpler to predict what the result is going to be with subst instead.
minΣ(||xi-Xci||^2+ λ||ci||),
s.t cii = 0,
where X is a matrix of shape d * n and C is of the shape n * n, xi and ci means a column of X and C separately.
X is known here and based on X we want to find C.
Usually with a loss like that you need to vectorize it, instead of working with columns:
loss = X - tf.matmul(X, C)
loss = tf.reduce_sum(tf.square(loss))
reg_loss = tf.reduce_sum(tf.square(C), 0) # L2 loss for each column
reg_loss = tf.reduce_sum(tf.sqrt(reg_loss))
total_loss = loss + lambd * reg_loss
To implement the zero constraint on the diagonal of C, the best way is to add it to the loss with another constant lambd2:
reg_loss2 = tf.trace(tf.square(C))
total_loss = total_loss + lambd2 * reg_loss2
I am trying to implement logistic regression with gradient descent,
I get my Cost function j_theta for the number of iterations and fortunately my j_theta is decreasing when plotted j_theta against the number of iteration.
The data set I use is given below:
x=
1 20 30
1 40 60
1 70 30
1 50 50
1 50 40
1 60 40
1 30 40
1 40 50
1 10 20
1 30 40
1 70 70
y= 0
1
1
1
0
1
0
0
0
0
1
The code that I managed to write for logistic regression using Gradient descent is:
%1. The below code would load the data present in your desktop to the octave memory
x=load('stud_marks.dat');
%y=load('ex4y.dat');
y=x(:,3);
x=x(:,1:2);
%2. Now we want to add a column x0 with all the rows as value 1 into the matrix.
%First take the length
[m,n]=size(x);
x=[ones(m,1),x];
X=x;
% Now we limit the x1 and x2 we need to leave or skip the first column x0 because they should stay as 1.
mn = mean(x);
sd = std(x);
x(:,2) = (x(:,2) - mn(2))./ sd(2);
x(:,3) = (x(:,3) - mn(3))./ sd(3);
% We will not use vectorized technique, Because its hard to debug, We shall try using many for loops rather
max_iter=50;
theta = zeros(size(x(1,:)))';
j_theta=zeros(max_iter,1);
for num_iter=1:max_iter
% We calculate the cost Function
j_cost_each=0;
alpha=1;
theta
for i=1:m
z=0;
for j=1:n+1
% theta(j)
z=z+(theta(j)*x(i,j));
z
end
h= 1.0 ./(1.0 + exp(-z));
j_cost_each=j_cost_each + ( (-y(i) * log(h)) - ((1-y(i)) * log(1-h)) );
% j_cost_each
end
j_theta(num_iter)=(1/m) * j_cost_each;
for j=1:n+1
grad(j) = 0;
for i=1:m
z=(x(i,:)*theta);
z
h=1.0 ./ (1.0 + exp(-z));
h
grad(j) += (h-y(i)) * x(i,j);
end
grad(j)=grad(j)/m;
grad(j)
theta(j)=theta(j)- alpha * grad(j);
end
end
figure
plot(0:1999, j_theta(1:2000), 'b', 'LineWidth', 2)
hold off
figure
%3. In this step we will plot the graph for the given input data set just to see how is the distribution of the two class.
pos = find(y == 1); % This will take the postion or array number from y for all the class that has value 1
neg = find(y == 0); % Similarly this will take the position or array number from y for all class that has value 0
% Now we plot the graph column x1 Vs x2 for y=1 and y=0
plot(x(pos, 2), x(pos,3), '+');
hold on
plot(x(neg, 2), x(neg, 3), 'o');
xlabel('x1 marks in subject 1')
ylabel('y1 marks in subject 2')
legend('pass', 'Failed')
plot_x = [min(x(:,2))-2, max(x(:,2))+2]; % This min and max decides the length of the decision graph.
% Calculate the decision boundary line
plot_y = (-1./theta(3)).*(theta(2).*plot_x +theta(1));
plot(plot_x, plot_y)
hold off
%%%%%%% The only difference is In the last plot I used X where as now I use x whose attributes or features are featured scaled %%%%%%%%%%%
If you view the graph of x1 vs x2 the graph would look like,
After I run my code I create a decision boundary. The shape of the decision line seems to be okay but it is a bit displaced. The graph of the x1 vs x2 with decision boundary is given below:
![enter image description here][2]
Please suggest me where am I going wrong ....
Thanks:)
The New Graph::::
![enter image description here][1]
If you see the new graph the coordinated of x axis have changed ..... Thats because I use x(feature scalled) instead of X.
The problem lies in your cost function calculation and/or gradient calculation, your plotting function is fine. I ran your dataset on the algorithm I implemented for logistic regression but using the vectorized technique because in my opinion it is easier to debug.
The final values I got for theta were
theta =
[-76.4242,
0.8214,
0.7948]
I also used alpha = 0.3
I plotted the decision boundary and it looks fine, I would recommend using the vectorized form as it is easier to implement and to debug in my opinion.
I also think your implementation of gradient descent is not quite correct. 50 iterations is just not enough and the cost at the last iteration is not good enough. Maybe you should try to run it for more iterations with a stopping condition.
Also check this lecture for optimization techniques.
https://class.coursera.org/ml-006/lecture/37
I'm trying to implement a regression NN that has 3 layers (1 input, 1 hidden and 1 output layer with a continuous result). As a basis I took a classification NN from coursera.org class, but changed the cost function and gradient calculation so as to fit a regression problem (and not a classification one):
My nnCostFunction now is:
function [J grad] = nnCostFunctionLinear(nn_params, ...
input_layer_size, ...
hidden_layer_size, ...
num_labels, ...
X, y, lambda)
Theta1 = reshape(nn_params(1:hidden_layer_size * (input_layer_size + 1)), ...
hidden_layer_size, (input_layer_size + 1));
Theta2 = reshape(nn_params((1 + (hidden_layer_size * (input_layer_size + 1))):end), ...
num_labels, (hidden_layer_size + 1));
m = size(X, 1);
a1 = X;
a1 = [ones(m, 1) a1];
a2 = a1 * Theta1';
a2 = [ones(m, 1) a2];
a3 = a2 * Theta2';
Y = y;
J = 1/(2*m)*sum(sum((a3 - Y).^2))
th1 = Theta1;
th1(:,1) = 0; %set bias = 0 in reg. formula
th2 = Theta2;
th2(:,1) = 0;
t1 = th1.^2;
t2 = th2.^2;
th = sum(sum(t1)) + sum(sum(t2));
th = lambda * th / (2*m);
J = J + th; %regularization
del_3 = a3 - Y;
t1 = del_3'*a2;
Theta2_grad = 2*(t1)/m + lambda*th2/m;
t1 = del_3 * Theta2;
del_2 = t1 .* a2;
del_2 = del_2(:,2:end);
t1 = del_2'*a1;
Theta1_grad = 2*(t1)/m + lambda*th1/m;
grad = [Theta1_grad(:) ; Theta2_grad(:)];
end
Then I use this func in fmincg algorithm, but in firsts iterations fmincg end it's work. I think my gradient is wrong, but I can't find the error.
Can anybody help?
If I understand correctly, your first block of code (shown below) -
m = size(X, 1);
a1 = X;
a1 = [ones(m, 1) a1];
a2 = a1 * Theta1';
a2 = [ones(m, 1) a2];
a3 = a2 * Theta2';
Y = y;
is to get the output a(3) at the output layer.
Ng's slides about NN has the below configuration to calculate a(3). It's different from what your code presents.
in the middle/output layer, you are not doing the activation function g, e.g., a sigmoid function.
In terms of the cost function J without regularization terms, Ng's slides has the below formula:
I don't understand why you can compute it using:
J = 1/(2*m)*sum(sum((a3 - Y).^2))
because you are not including the log function at all.
Mikhaill, I´ve been playing with a NN for continuous regression as well, and had a similar issues at some point. The best thing to do here would be to test gradient computation against a numerical calculation before running the model. If that´s not correct, fmincg won´t be able to train the model. (Btw, I discourage you of using numerical gradient as the time involved is much bigger).
Taking into account that you took this idea from Ng´s Coursera class, I´ll implement a possible solution for you to try using the same notation for Octave.
% Cost function without regularization.
J = 1/2/m^2*sum((a3-Y).^2);
% In case it´s needed, regularization term is added (i.e. for Training).
if (reg==true);
J=J+lambda/2/m*(sum(sum(Theta1(:,2:end).^2))+sum(sum(Theta2(:,2:end).^2)));
endif;
% Derivatives are computed for layer 2 and 3.
d3=(a3.-Y);
d2=d3*Theta2(:,2:end);
% Theta grad is computed without regularization.
Theta1_grad=(d2'*a1)./m;
Theta2_grad=(d3'*a2)./m;
% Regularization is added to grad computation.
Theta1_grad(:,2:end)=Theta1_grad(:,2:end)+(lambda/m).*Theta1(:,2:end);
Theta2_grad(:,2:end)=Theta2_grad(:,2:end)+(lambda/m).*Theta2(:,2:end);
% Unroll gradients.
grad = [Theta1_grad(:) ; Theta2_grad(:)];
Note that, since you have taken out all the sigmoid activation, the derivative calculation is quite simple and results in a simplification of the original code.
Next steps:
1. Check this code to understand if it makes sense to your problem.
2. Use gradient checking to test gradient calculation.
3. Finally, use fmincg and check you get different results.
Try to include sigmoid function to compute second layer (hidden layer) values and avoid sigmoid in calculating the target (output) value.
function [J grad] = nnCostFunction1(nnParams, ...
inputLayerSize, ...
hiddenLayerSize, ...
numLabels, ...
X, y, lambda)
Theta1 = reshape(nnParams(1:hiddenLayerSize * (inputLayerSize + 1)), ...
hiddenLayerSize, (inputLayerSize + 1));
Theta2 = reshape(nnParams((1 + (hiddenLayerSize * (inputLayerSize + 1))):end), ...
numLabels, (hiddenLayerSize + 1));
Theta1Grad = zeros(size(Theta1));
Theta2Grad = zeros(size(Theta2));
m = size(X,1);
a1 = [ones(m, 1) X]';
z2 = Theta1 * a1;
a2 = sigmoid(z2);
a2 = [ones(1, m); a2];
z3 = Theta2 * a2;
a3 = z3;
Y = y';
r1 = lambda / (2 * m) * sum(sum(Theta1(:, 2:end) .* Theta1(:, 2:end)));
r2 = lambda / (2 * m) * sum(sum(Theta2(:, 2:end) .* Theta2(:, 2:end)));
J = 1 / ( 2 * m ) * (a3 - Y) * (a3 - Y)' + r1 + r2;
delta3 = a3 - Y;
delta2 = (Theta2' * delta3) .* sigmoidGradient([ones(1, m); z2]);
delta2 = delta2(2:end, :);
Theta2Grad = 1 / m * (delta3 * a2');
Theta2Grad(:, 2:end) = Theta2Grad(:, 2:end) + lambda / m * Theta2(:, 2:end);
Theta1Grad = 1 / m * (delta2 * a1');
Theta1Grad(:, 2:end) = Theta1Grad(:, 2:end) + lambda / m * Theta1(:, 2:end);
grad = [Theta1Grad(:) ; Theta2Grad(:)];
end
Normalize the inputs before passing it in nnCostFunction.
In accordance with Week 5 Lecture Notes guideline for a Linear System NN you should make following changes in the initial code:
Remove num_lables or make it 1 (in reshape() as well)
No need to convert y into a logical matrix
For a2 - replace sigmoid() function to tanh()
In d2 calculation - replace sigmoidGradient(z2) with (1-tanh(z2).^2)
Remove sigmoid from output layer (a3 = z3)
Replace cost function in the unregularized portion to linear one: J = (1/(2*m))*sum((a3-y).^2)
Create predictLinear(): use predict() function as a basis, replace sigmoid with tanh() for the first layer hypothesis, remove second sigmoid for the second layer hypothesis, remove the line with max() function, use output of the hidden layer hypothesis as a prediction result
Verify your nnCostFunctionLinear() on the test case from the lecture note
I'm experimenting with the ImageTransformation function to try to make anamorphic versions of images, but with limited progress so far. I'm aiming for the results you get using the image reflected in a cylindrical mirror, where the image curves around the central mirror for about 270 degrees. The wikipedia article has a couple of neat examples (and I borrowed Holbein's skull from them too).
i = Import["../Desktop/Holbein_Skull.jpg"];
i = ImageResize[i, 120]
f[x_, y_] := {(2 (y - 0.3) Cos [1.5 x]), (2 (y - 0.3) Sin [1.5 x])};
ImageTransformation[i, f[#[[1]], #[[2]]] &, Padding -> White]
But I can't persuade Mathematica to show me the entire image, or to bend it correctly. The anamorphic image should wrap right round the mirror placed "inside" the centre of the image, but it won't. I found suitable values for constants by putting it inside a manipulate (and turning the resolution down :). I'm using the formula:
x1 = a(y + b) cos(kx)
y1 = a(y + b) sin(kx)
Any help producing a better result would be greatly appreciated!
In ImageTransformation[f,img], the function f is such that a point {x,y} in the resulting image corresponds to f[{x,y}] in img. Since the resulting image is basically the polar transformation of img, f should be the inverse polar transformation, so you could do something like
anamorphic[img_, angle_: 270 Degree] :=
Module[{dim = ImageDimensions[img], rInner = 1, rOuter},
rOuter = rInner (1 + angle dim[[2]]/dim[[1]]);
ImageTransformation[img,
Function[{pt}, {ArcTan[-#2, #1] & ## pt, Norm[pt]}],
DataRange -> {{-angle/2, angle/2}, {rInner, rOuter}},
PlotRange -> {{-rOuter, rOuter}, {-rOuter, rOuter}},
Padding -> White
]
]
The resulting image looks something like
anamorphic[ExampleData[{"TestImage", "Lena"}]]
Note that you can a similar result with ParametricPlot and TextureCoordinateFunction, e.g.
anamorphic2[img_Image, angle_: 270 Degree] :=
Module[{rInner = 1,rOuter},
rOuter = rInner (1 + angle #2/#1 & ## ImageDimensions[img]);
ParametricPlot[{r Sin[t], -r Cos[t]}, {t, -angle/2, angle/2},
{r, rInner, rOuter},
TextureCoordinateFunction -> ({#3, #4} &),
PlotStyle -> {Opacity[1], Texture[img]},
Mesh -> None, Axes -> False,
BoundaryStyle -> None,
Frame -> False
]
]
anamorphic2[ExampleData[{"TestImage", "Lena"}]]
Edit
In answer to Mr.Wizard's question, if you don't have access to ImageTransformation or Texture you could transform the image data by hand by doing something like
anamorph3[img_, angle_: 270 Degree, imgWidth_: 512] :=
Module[{data, f, matrix, dim, rOuter, rInner = 1.},
dim = ImageDimensions[img];
rOuter = rInner (1 + angle #2/#1 & ## dim);
data = Table[
ListInterpolation[#[[All, All, i]],
{{rOuter, rInner}, {-angle/2, angle/2}}], {i, 3}] &#ImageData[img];
f[i_, j_] := If[Abs[j] <= angle/2 && rInner <= i <= rOuter,
Through[data[i, j]], {1., 1., 1.}];
Image#Table[f[Sqrt[i^2 + j^2], ArcTan[i, -j]],
{i, -rOuter, rOuter, 2 rOuter/(imgWidth - 1)},
{j, -rOuter, rOuter, 2 rOuter/(imgWidth - 1)}]]
Note that this assumes that img has three channels. If the image has fewer or more channels, you need to adapt the code.