what input x maximize activation function in an autoencoder hidden layer? - machine-learning

Hi when i am reading about Stanford's Machine Learning materials about autoencoder, found a formula hard to prove by myself. Link to Material
Question is:
" What input image x would cause ai to be maximally activated? "
Screen shot of the Question and Context:
Many thanks to your answers in advance!

While this can be rigorously solved using KLT conditions and Lagrange multipliers, there is a more intuitive way to figure the result out. I assume that f(.) is a monotone increasing, sigmoid type of nonlinearity (ReLU is also valid). So, finding the maximum of w1x1+...+w100x100 + b under the constraint (x1)^2+...+(x100)^2 <= 1 is equivalent to finding the maximum of f(w1x1+...+w100x100 + b) with the same constraint.
Note that g = w1x1+...+w100x100 + b is a linear function of x terms (Name it as g, so later we can refer it by that). So, the direction of largest increase at any point (x1,...,x100) in the domain of that function is the same, which is the gradient. The gradient is simply (w1,w2,...,w100) at any point in the domain, which means if we go in the direction of (w1,w2,...,w100), independent from where we start, we obtain the largest increase in the function. To make things simplier and to allow us to visualize, assume that we are in the R^2 space and the function is w1x1 + w2x2 + b:
The optimum x1 and x2 are constrained to lie in or on the circle C:(x1)^2 + (x2)^2 =1. Assume that we are on the origin (0.0). If we go in the direction of the gradient (blue arrow) (w1,w2), we are going to attain the largest value of the function where the blue arrow intersects with the circle. That intersection has the coordinates c*(w1,w2) and it is c^2(w1^2 + w2^2) = 1, where c is a scalar coefficient. c is easily solved as c= 1 / sqrt(w1^2 + w2^2). Then at the intersection we have x1=w1/sqrt(w1^2 + w2^2) and x2=w2/sqrt(w1^2 + w2^2), which the solution we seek. This can be extended in the same way to 100 dimensional case.
You may ask why we started at the origin and not any other point in the circle. Note that the red line is perpendicular to the gradient vector and the function is constant along that line. Draw that (u1,u2) line, preserving its orientation, arbitrarily with the constraint that it intersects the circle C. Then choose any point on the line, such that it lies within the circle. On the (u1,u2) line, you start at the same value of the function g, wherever you are. Then as you go in the (w1,w2) direction, the longest path taken within the circle always goes through the origin, which means the path you increase the function g the most.

Related

Hough Transform Accumulator to Cartesian

I'm studying a course on vision systems and one of the questions posed was;
For the accumulator shown;
Determine the most likely r,θ combination representing the straight line of the greatest strength in the original image.
From my understanding of the accumulator this would be r = 60, θ = 150 as the 41 votes is the highest number of votes in this cluster of large votes. Am I correct with this combination?
And hence calculate the equation of this line in the form y = mx + c
I'm not sure of the conversion steps required to convert the r = 60, θ = 150 to y = mx + c with the information given since r = 60, θ = 150 denotes 1 point on the line.
State the resolution of your answer and give your reasoning
I assume the resolution is got to do with some of the steps in the auscultation and not the actual resolution of the original image since that's irrelevant to the edges detected in the image.
Any guidance on the above 3 points would be greatly appreciated!
Yes, this is correct.
This is asking you what the slope and intercept are of the line given r and theta. r and theta are not one point on the line, they are one point of the accumulator. r and theta describe a line using the line equation in polar coordinates: . This is the cool thing about the hough transform, every line in one space, (i.e. image space) can be described by a point in another space (r, theta). This could be done with m and b from the line equation , but as we all know, m is undefined for vertical lines. This is the reason the polar line equation is used. It is important to note that the line described by the HT r and theta refers to a line from the origin extending to the actual line in the image. This means your image line y = mx + b equation will need to be orthogonal to the polar equation. The wiki article on the HT describes this well and shows examples. I would recommend drawing a diagram of your r and theta extending to a line like this:
Then use trig to get two points on the red line. Two points are enough to give you m and b from the line equation.
I'm not entirely sure what "resolution" refers to in this context. But it does seem like your line estimator will have some precision loss since r is every 20 mm and theta is every 15 degrees. Perhaps it is asking what degree of error you could get given an accumulator of this resolution.

Geometric representation of Perceptrons (Artificial neural networks)

I am taking this course on Neural networks in Coursera by Geoffrey Hinton (not current).
I have a very basic doubt on weight spaces.
https://d396qusza40orc.cloudfront.net/neuralnets/lecture_slides%2Flec2.pdf
Page 18.
If I have a weight vector (bias is 0) as [w1=1,w2=2] and training case as {1,2,-1} and {2,1,1}
where I guess {1,2} and {2,1} are the input vectors. How can it be represented geometrically?
I am unable to visualize it? Why is training case giving a plane which divides the weight space into 2? Could somebody explain this in a coordinate axes of 3 dimensions?
The following is the text from the ppt:
1.Weight-space has one dimension per weight.
2.A point in the space has particular setting for all the weights.
3.Assuming that we have eliminated the threshold each hyperplane could be represented as a hyperplane through the origin.
My doubt is in the third point above. Kindly help me understand.
It's probably easier to explain if you look deeper into the math. Basically what a single layer of a neural net is performing some function on your input vector transforming it into a different vector space.
You don't want to jump right into thinking of this in 3-dimensions. Start smaller, it's easy to make diagrams in 1-2 dimensions, and nearly impossible to draw anything worthwhile in 3 dimensions (unless you're a brilliant artist), and being able to sketch this stuff out is invaluable.
Let's take the simplest case, where you're taking in an input vector of length 2, you have a weight vector of dimension 2x1, which implies an output vector of length one (effectively a scalar)
In this case it's pretty easy to imagine that you've got something of the form:
input = [x, y]
weight = [a, b]
output = ax + by
If we assume that weight = [1, 3], we can see, and hopefully intuit that the response of our perceptron will be something like this:
With the behavior being largely unchanged for different values of the weight vector.
It's easy to imagine then, that if you're constraining your output to a binary space, there is a plane, maybe 0.5 units above the one shown above that constitutes your "decision boundary".
As you move into higher dimensions this becomes harder and harder to visualize, but if you imagine that that plane shown isn't merely a 2-d plane, but an n-d plane or a hyperplane, you can imagine that this same process happens.
Since actually creating the hyperplane requires either the input or output to be fixed, you can think of giving your perceptron a single training value as creating a "fixed" [x,y] value. This can be used to create a hyperplane. Sadly, this cannot be effectively be visualized as 4-d drawings are not really feasible in browser.
Hope that clears things up, let me know if you have more questions.
I have encountered this question on SO while preparing a large article on linear combinations (it's in Russian, https://habrahabr.ru/post/324736/). It has a section on the weight space and I would like to share some thoughts from it.
Let's take a simple case of linearly separable dataset with two classes, red and green:
The illustration above is in the dataspace X, where samples are represented by points and weight coefficients constitutes a line. It could be conveyed by the following formula:
w^T * x + b = 0
But we can rewrite it vice-versa making x component a vector-coefficient and w a vector-variable:
x^T * w + b = 0
because dot product is symmetrical. Now it could be visualized in the weight space the following way:
where red and green lines are the samples and blue point is the weight.
More possible weights are limited to the area below (shown in magenta):
which could be visualized in dataspace X as:
Hope it clarifies dataspace/weightspace correlation a bit. Feel free to ask questions, will be glad to explain in more detail.
The "decision boundary" for a single layer perceptron is a plane (hyper plane)
where n in the image is the weight vector w, in your case w={w1=1,w2=2}=(1,2) and the direction specifies which side is the right side. n is orthogonal (90 degrees) to the plane)
A plane always splits a space into 2 naturally (extend the plane to infinity in each direction)
you can also try to input different value into the perceptron and try to find where the response is zero (only on the decision boundary).
Recommend you read up on linear algebra to understand it better:
https://www.khanacademy.org/math/linear-algebra/vectors_and_spaces
For a perceptron with 1 input & 1 output layer, there can only be 1 LINEAR hyperplane. And since there is no bias, the hyperplane won't be able to shift in an axis and so it will always share the same origin point. However, if there is a bias, they may not share a same point anymore.
I think the reason why a training case can be represented as a hyperplane because...
Let's say
[j,k] is the weight vector and
[m,n] is the training-input
training-output = jm + kn
Given that a training case in this perspective is fixed and the weights varies, the training-input (m, n) becomes the coefficient and the weights (j, k) become the variables.
Just as in any text book where z = ax + by is a plane,
training-output = jm + kn is also a plane defined by training-output, m, and n.
Equation of a plane passing through origin is written in the form:
ax+by+cz=0
If a=1,b=2,c=3;Equation of the plane can be written as:
x+2y+3z=0
So,in the XYZ plane,Equation: x+2y+3z=0
Now,in the weight space;every dimension will represent a weight.So,if the perceptron has 10 weights,Weight space will be 10 dimensional.
Equation of the perceptron: ax+by+cz<=0 ==> Class 0
ax+by+cz>0 ==> Class 1
In this case;a,b & c are the weights.x,y & z are the input features.
In the weight space;a,b & c are the variables(axis).
So,for every training example;for eg: (x,y,z)=(2,3,4);a hyperplane would be formed in the weight space whose equation would be:
2a+3b+4c=0
passing through the origin.
I hope,now,you understand it.
Consider we have 2 weights. So w = [w1, w2]. Suppose we have input x = [x1, x2] = [1, 2]. If you use the weight to do a prediction, you have z = w1*x1 + w2*x2 and prediction y = z > 0 ? 1 : 0.
Suppose the label for the input x is 1. Thus, we hope y = 1, and thus we want z = w1*x1 + w2*x2 > 0. Consider vector multiplication, z = (w ^ T)x. So we want (w ^ T)x > 0. The geometric interpretation of this expression is that the angle between w and x is less than 90 degree. For example, the green vector is a candidate for w that would give the correct prediction of 1 in this case. Actually, any vector that lies on the same side, with respect to the line of w1 + 2 * w2 = 0, as the green vector would give the correct solution. However, if it lies on the other side as the red vector does, then it would give the wrong answer.
However, suppose the label is 0. Then the case would just be the reverse.
The above case gives the intuition understand and just illustrates the 3 points in the lecture slide. The testing case x determines the plane, and depending on the label, the weight vector must lie on one particular side of the plane to give the correct answer.

to measure Contact Angle between 2 edges in an image

i need to find out contact angle between 2 edges in an image using open cv and python so can anybody suggest me how to find it? if not code please let me know algorithm for the same.
Simplify edge A and B into a line equation (using only the few last pixels)
Get the line equations of the two lines (form y = mx + b)
Get the angle orientations of the two lines θ=atan|1/m|
Subtract the two angles from each other
Make sure to do the special case of infinite slope, and also do some simple math to get Final_Angle = (0;pi)

How to determine curvature of a cubic bezier path at an end point

I've been working on a project where I use Bezier paths to draw the curves I need. Each basic shape in my project consists of three cubic Bezier curves arranged end to end so that the slopes match where they meet.
The basic problem I need to solve is whether the compound curve made of the three Bezier curves intersects with itself. After thinking about it for a while, I've figured out that given the constraints of the curves, I can simplify the task to something else:
The curvature of each of the three Bezier paths should be opposite in curvature direction relative to the curve it's abutted to. In other words, there should be an inflection point where one bezier curve abuts to another one. If that is not the case, I want to reject the parameter set that generated the curves and select a different set.
In any case, my basic question is how to detect whether there is an inflection point where the curves abut each other.
In the illustration, each of the three Bezier curves is shown using a different color. The left black curve curves in the opposite direction from the red curve at the point where they meet, but the right black curve curves in the same direction. There is an inflection point where the red and left black curve meet but not where the red and right black curve meet.
Edit:
Below, I've added another image, showing the polygon enclosing the Bezier path. The crossing lines of the polygon, shown in the black curve, tests for an inflection point, not a loop. I'm guessing that one curve intersecting another can be tested by checking whether the enclosing polygons intersect, as illustrated by the red and blue curves.
P.S. Since there has been some question about the constraints, I will list some of them here:
The left most point and the rightmost point have the same y value.
The x value of the control point of the leftmost point is less than
the x value of the control point of the rightmost point. This keeps
the black and blue curves from intersecting each other.
The slope at the leftmost and rightmost points is within about +/- 10
degrees of horizontal.
The intersection of the black and red curves, and the intersection of
the red and blue curves divide the full curve in approximately
thirds. I don't have exact numbers, but a sample bound would be that
the x value of the left end of the red curve is somewhere between 25%
and 40% of the x value of the rightmost point.
The y value of the points of intersection are +/- some small fraction
of the overall width.
The slope at the intersections is > 0.6 and < 3.0 (positive or
negative).
The equation for curvature is moderately simple. You only need the sign of the curvature, so you can skip a little math. You are basically interested in the sign of the cross product of the first and second derivatives.
This simplification only works because the curves join smoothly. Without equal tangents a more complex test would be needed.
The sign of curvature of curve P:
ax = P[1].x - P[0].x; // a = P1 - P0
ay = P[1].y - P[0].y;
bx = P[2].x - P[1].x - ax; // b = P2 - P1 - a
by = P[2].y - P[1].y - ay;
cx = P[3].x - P[2].x - bx*2 - ax; // c = P3 - P2 - 2b - a
cy = P[3].y - P[2].y - by*2 - ay;
bc = bx*cy - cx*by;
ac = ax*cy - cx*ay;
ab = ax*by - bx*ay;
r = ab + ac*t + bc*t*t;
Note that r is the cross product of the first and second derivative and the sign indicate the direction of curvature. Calculate r at t=1 on the left curve and at t=0 on the right curve. If the product is negative, then there is an inflection point.
If you have curves U, V and W where U is the left black, V is the middle red, and W is the right black, then calculate bc, ac, and ab above for each. The following test will be true if both joins are inflection points:
(Uab+Uac+Ubc)*(Vab) < 0 && (Vab+Vac+Vbc)*(Wab) < 0
The equation for curvature has a denominator that I have ignored. It does not affect the sign of curvature and would only be zero if the curve was a line.
Math summary:
// start with the classic bezier curve equation
P = (1-t)^3*P0 + 3*(1-t)^2*t*P1 + 3*(1-t)*t^2*P2 + t^3*P3
// convert to polynomial
P = P0 + 3*t*(P1 - P0) + 3*t^2*(P2 - 2*P1 + P0) + t^3*(P3 - 3*P2 + 3*P1 - P0)
// rename the terms to a,b,c
P = P0 + 3at + 3btt + cttt
// find the first and second derivatives
P' = 3a + 6bt + 3ctt
P" = 6b + 6ct
// and the cross product after some reduction
P' x P" = ab + act + bctt
One of the deterministic ways to check if a bezier curve has a double-point or self-intersection is to calculate the inverse equation and evaluate the root, since the inversion equation of a bezier curve is always zero at the point of self-intersection.
As detailed in this example by T.W.Sederberg –course notes. Then for detecting whether two bezier curves (red and black in the question) intersect with each other, there are a few methods, binary subdivision (easier to implement) and bezier clipping (very good balance of efficiency and code complexity), implicitization (not worth it).
But a more efficient way to do it probably is to subdivide the curves into small line-segments and find the intersection. Here is one way to do it. Especially when you are interested in knowing whether the path self-intersect, but not in an accurate point of intersection.
If you have well defined assumptions about the locations of the CVs of the piece-wise bezier curves (or poly-bezier as #Pomax mentioned above), you can try out the curvature based method as you have mentioned in your question.
Here is a plot of the curvature on a similar path as in the question. Under similar circumstances, it seems this could give you the solution you are looking for. Also, in case of a cusp, it seems this quick and dirty method still works!
# Victor Engel : Thank you for the clarification. Now the solution is even easier.
In short, look at torques of both curves. If they pull together, the "inflexion" occurs, if they are opposed, the "curvature" continues.
Remark: Words "inflexion" and "curvature" have here a more intuitive nature, than a strict mathematical meaning, therefore apostrophes are used.
As before, a set of points P0,P1,P2,P3 defines the first cubic Bezier curve, and Q0,Q1,Q2 and Q3 the second.
Moreover, 4 vectors: a,b,u,w will be useful.
a = P2P3
b = P1P2
u = Q0Q1
v = Q1Q2
I will omit a C1-continuity checking, it's already done by you. ( This means P3=Q0 and a x u=0)
Finally, Tp and Tq torques will appear.
Tp = b x a
( "x" means a vector product, but Tp on the planar is treated like a simple number, not a vector. }
if Tp=0 {very rarely vectors a, b can be parallel.}
b = P0P1
Tp = b x a
if Tp=0
No! It's can't be! Who straightened the curve???
STOP
endif
endif
Tq = u x v
if Tq=0 {also vectors u, v can be parallel.}
v = Q2Q3
Tq = u x v
if Tq=0
Oh no! What happened to my curve???
STOP
endif
endif
Now the final test:
if Tp*Tq < 0 then Houston! We have AN "INFLEXION"!
else WE CONTINUE THE TURN!
Suppose you have 4 points: P0, P1, P2 and P3, which describes a cubic Bezier's curve (cBc for short) . P0 and P3 are starting and ending points, P1 and P2 are directional points.
When P0, P1 and P2 are collinear, then cBc has an inflection point at P0.
When P1, P2 and P3 are collinear, then cBC has an inflection point at P3.
Remarks.
1) Use the vector product ( P0P1 x P1P2 and P1P2 x P2P3 respectively) to check the collinearity. The result should be a zero-lenght vector.
2) Avoid the situation where P0, P1, P2 and P3 are all collinear, the cBc they create aren't curves in the common sense.
Corollary.
When P1=P2, cBc has inflection points at both sides.

Math/OpenGL ES: Draw 3D bezier curve of varying width

I've been working on a problem for several weeks and have reached a point that I'd like to make sure I'm not overcomplicating my approach. This is being done in OpenGL ES 2.0 on iOS, but the principles are universal, so I don't mind the answers being purely mathematical in form. Here's the rundown.
I have 2 points in 3D space along with a control point that I am using to produce a bezier curve with the following equation:
B(t) = (1 - t)2P0 + 2(1 - t)tP1 + t2P2
The start/end points are being positioned at dynamic coordinates on a fairly large sphere, so x/y/z varies greatly, making a static solution not so practical. I'm currently rendering the points using GL_LINE_STRIP. The next step is to render the curve using GL_TRIANGLE_STRIP and control the width relative to height.
According to this quick discussion, a good way to solve my problem would be to find points that are parallel to the curve along both sides taking account the direction of it. I'd like to create 3 curves in total, pass in the indices to create a bezier curve of varying width, and then draw it.
There's also talk of interpolation and using a Loop-Blinn technique that seem to solve the specific problems of their respective questions. I believe that the solutions, however, might be too complex for what I'm going after. I'm also not interested bringing textures into the mix. I prefer that the triangles are just drawn using the colors I'll calculate later on in my shaders.
So, before I go into more reading on Trilinear Interpolation, Catmull-Rom splines, the Loop-Blinn paper, or explore sampling further, I'd like to make sure what direction is most likely to be the best bet. I think I can say the problem in its most basic form is to take a point in 3D space and find two parallel points along side it that take into account the direction the next point will be plotted.
Thank you for your time and if I can provide anything further, let me know and I'll do my best to add it.
This answer does not (as far as I see) favor one of the methods you mentioned in your question, but is what I would do in this situation.
I would calculate the normalized normal (or binormal) of the curve. Let's say I take the normalized normal and have it as a function of t (N(t)). With this I would write a helper function to calculate the offset point P:
P(t, o) = B(t) + o * N(t)
Where o means the signed offset of the curve in normal direction.
Given this function one would simply calculate the points to the left and right of the curve by:
Points = [P(t, -w), P(t, w), P(t + s, -w), P(t + s, w)]
Where w is the width of the curve you want to achieve.
Then connect these points via two triangles.
For use in a triangle strip this would mean the indices:
0 1 2 3
Edit
To do some work with the curve one would generally calculate the Frenet frame.
This is a set of 3 vectors (Tangent, Normal, Binormal) that gives the orientation in a curve at a given parameter value (t).
The Frenet frame is given by:
unit tangent = B'(t) / || B'(t) ||
unit binormal = (B'(t) x B''(t)) / || B'(t) x B''(t) ||
unit normal = unit binormal x unit tangent
In this example x denotes the cross product of two vectors and || v || means the length (or norm) of the enclosed vector v.
As you can see you need the first (B'(t)) and the second (B''(t)) derivative of the curve.

Resources