What is b / ||w|| in support vector machine? - machine-learning

In the picture of SVM from Wikipedia, at the lower left corner - pointed by the red arrow, there's b / ||w||. How is that calculated? In other words, why is the line in the picture b / ||w||? Thanks.

The line represents the affine subspace of points x, whose scalar product with the weight vector w yields b. As w is typically not a unit vector (i.e. need not have length 1), one has to divide b by the norm (the "length") of w to get the actual distance from the origin.
More precisely: imagine a vector x starting at the origin and reaching out to a point on the red line and let u be the unit vector in the direction of w, i.e. u = w / ||w||. Then the scalar product of x and u multiplied with u is the projection of x onto the unit vector u and its length corresponds to the distance of the red line from the origin. If you calculate the scalar product of <x,w> (written as x*w in the graphics) instead, you still get a projection on u, which has length b (that's actually how b is defined), so to get back the distance from the origin one has to calculate b/||w||.

Related

How to calculate *VC dimension* of H in this case?

Let H denote the set of axis-parallel rectangles in Rn. Each
rectangle defines a binary classifier that assigns label +1 to points
inside the rectangle and label -1 to points outside the rectangle.
Each rectangle is defined by an interval [ai,bi] in each dimension 1
≤ i ≤ n. A pointx=(x1,...,xn)∈Rn is in the rectangle if ai ≤xi ≤bi
for1≤i≤n.
What is the VC dimension of H? Justify your answer.
It's typed using LaTex; this is the original text.
From my understanding, VC dimension is kinda largest integer d such that there exists a sample of size d that can be shattered by the hypothesis set H, but in this case, how could we calculate this if H is rectangles?
For an axis-aligned rectangle in Rn, the VC dimension is (at least) 2*n.
Consider first a rectangle in R1, which is just a pair of min/max values (a1, b1) along the x1 axis (not actually a rectangle). This pair of values has VC dimension of 2 because for any two points in R1, you can set (a1, b1) such that one, both, or neither of the two points is between them. But when you add a third point along that axis, there is no way you can include the two outer points in the range (a1, b1) without also including the middle point.
To make it easier to visualize, suppose the two points lie at x1 = -2 and x1 = 2, respectively. You can shatter the set by defining your rectangle bounds (a1, b1) as
(-3, 3) -> both points included
(-3, 1) -> first point only
(-1, 3) -> second point only
(-1, 1) -> neither point included
Now suppose you add an additional dimension so your space is in R2 (i.e., now it is a true rectangular region). Add two more points and let the x1 coordinates of the new points be 0 so they will always be included along the first dimension (for the rectangles defined above) and set the x2 coordinates of the new points to -2 and 2, respectively. Also, set the x2 coordinates of the first two points to 0.
When you set (a1, b1, a2, b2) to (-3, 3, -3, 3) you will include all 4 points. And you can obviously exclude any point by simply be reducing the magnitude of one of the 4 bounds from 3 to 1. Since you can include any subset of the 4 points by changing the R2 rectangle's corresponding bound, the VC dimension of the R2 rectangle is at least 4.
It should be fairly obvious that you can repeat this procedure for an arbitrary number of dimensions. Each time you add a new dimension xi, set the xi coordinate of all points to 0 except for the two new points you add for that dimension (they will be at xi = -2 and xi = 2, respectively).
Since you can shatter 2 additional points for every dimension, then for an axis-aligned rectangle in Rn, the VC dimension will be at least 2*n.

Implementing convolutional neural network backprop in ArrayFire (gradient calculation)

I modified equation 9.12 in http://www.deeplearningbook.org/contents/convnets.html to center the MxN convolution kernel.
That gives the following expression (take it on faith for now) for the gradient, assuming 1 input and 1 output channel (to simplify):
dK(krow, kcol) = sum(G(row, col) * V(row+krow-M/2, col+kcol-N/2); row, col)
To read the above, the single element of dK at krow, kcol is equal to the sum over all of the rows and cols of the product of G times a shifted V. Note G and V have the same dimensions. We will define going outside V to result in a zero.
For example, in one dimension, if G is [a b c d], V is [w x y z], and M is 3, then the first sum is dot (G, [0 w x y]), the second sum is dot (G, [w x y z]), and the third sum is dot (G, [x y z 0]).
ArrayFire has a shift operation, but it does a circular shift, rather than a shift with zero insertion. Also, the kernel sizes MxN are typically small, e.g., 7x7, so it seems a more optimal implementation would read in G and V once only, and accumulate over the kernel.
For that 1D example, we would read in a and w,x and start with [a*0 aw ax]. Then we read in b,y and add [bw bx by]. Then read in c,z and add [cx cy cz]. Then read in d and finally add [dy dz d*0].
Is there a direct way to compute dK in ArrayFire? I can't help but think this is some kind of convolution, but I've been unable to wrap my head around what the convolution would look like.
Ah so. For a 3x3 dK array, I use unwrap to convert my MxN input arrays to two MxN column vectors. Then I do 9 dot products of shifted subsets of the two column vectors. No, that doesn't work since the shift is in 2 dimensions.
So I need to create intermediate arrays of 1 x (MxN) and (MxN) x 9 in size, where each column of the latter is a shifted MxN window of the original with a pad border of zeros of size 1, and then do a matrix multiply.
Hmm, that requires too much memory (sometimes.) So the final solution is to do a gfor over the output 3x3, and for each loop, do a dot product of the unwrapped-once G and the unwrapped-repeatedly V.
Agreed?

Ortho projection of 3D points with a vector

I have 3D points and I need to make an 2D orthographic projection of them onto a plane that is defined by the origin and a normal n. The meaning of this is basically looking at the points from the top (given the vertical vector). How can I do it?
What I'm thinking is:
project point P onto the 3D plane: P - P dot n * n
look at the 3D plane from the "back" in respect to the normal (not sure how to define this)
do an ortho projection using max-min coordinates of the points in the plane to define the clipping
I am working with iOS.
One way to do this would be to:
rotate the coordinate system so that the plane of interest lies in the x-y plane, and the normal vector n is aligned with the z-axis
project the points onto the x-y plane by setting their z-components to 0
Set up the coordinate transformation
There are infinitely many solutions to this problem since we can always rotate a solution in the x-y plane to get another valid solution.
To fix this, let's choose a vector v lying in the plane that will line up with the x-axis after the transformation. Any vector will do; let's take the vector in the plane with coordinates x=1 and y=0.
Since our plane intersects the origin, its equation is:
x*n1 + y*n2 + z*n3 = 0
z = -(x*n1 + y*n2)/n3
After substituting x=1, y=0, we see that
v = [1 0 -n1/n3]
We also need to make sure v is normalized, so set
v = v/sqrt(v1*v1 + v2*v2 + v3*v3)
EDIT: The above method will fail in cases where n3=0. An alternative method to find v is to take a random point P1 from our point set that is not a scalar multiple of n and calculate v = P1 - P1 dot n * n, which is the projection of P1 into the plane. Just keep searching through your points until you find one that satisfies (P1 dot n/norm(n)) != P1 and this is guaranteed to work.
Now we need a vector u that will line up with the y-axis after the transformation. We get this from the cross product of n and v:
u = n cross v
If n and v are normalized, then u is automatically normalized.
Next, create the matrix
M = [ v1 v2 v3 ]
[ u1 u2 u3 ]
[ n1 n2 n3 ]
Transform the points
Now given a 3 by N array of points P, we just follow the two steps above
P_transformed = M*P
P_plane = set the third row of P_transformed to zero
The x-y coordinates of P_plane are now a 2D coordinate system in the plane.
If you need to get the 3D spatial coordinates back, just do the reverse transformation with P_space = M_transpose*P_plane.

Why is weight vector orthogonal to decision plane in neural networks

I am beginner in neural networks. I am learning about perceptrons.
My question is Why is weight vector perpendicular to decision boundary(Hyperplane)?
I referred many books but all are mentioning that weight vector is perpendicular to decision boundary but none are saying why?
Can anyone give me an explanation or reference to a book?
The weights are simply the coefficients that define a separating plane. For the moment, forget about neurons and just consider the geometric definition of a plane in N dimensions:
w1*x1 + w2*x2 + ... + wN*xN - w0 = 0
You can also think of this as being a dot product:
w*x - w0 = 0
where w and x are both length-N vectors. This equation holds for all points on the plane. Recall that we can multiply the above equation by a constant and it still holds so we can define the constants such that the vector w has unit length. Now, take out a piece of paper and draw your x-y axes (x1 and x2 in the above equations). Next, draw a line (a plane in 2D) somewhere near the origin. w0 is simply the perpendicular distance from the origin to the plane and w is the unit vector that points from the origin along that perpendicular. If you now draw a vector from the origin to any point on the plane, the dot product of that vector with the unit vector w will always be equal to w0 so the equation above holds, right? This is simply the geometric definition of a plane: a unit vector defining the perpendicular to the plane (w) and the distance (w0) from the origin to the plane.
Now our neuron is simply representing the same plane as described above but we just describe the variables a little differently. We'll call the components of x our "inputs", the components of w our "weights", and we'll call the distance w0 a bias. That's all there is to it.
Getting a little beyond your actual question, we don't really care about points on the plane. We really want to know which side of the plane a point falls on. While w*x - w0 is exactly zero on the plane, it will have positive values for points on one side of the plane and negative values for points on the other side. That's where the neuron's activation function comes in but that's beyond your actual question.
Intuitively, in a binary problem the weight vector points in the direction of the '1'-class, while the '0'-class is found when pointing away from the weight vector. The decision boundary should thus be drawn perpendicular to the weight vector.
See the image for a simplified example: You have a neural network with only 1 input which thus has 1 weight. If the weight is -1 (the blue vector), then all negative inputs will become positive, so the whole negative spectrum will be assigned to the '1'-class, while the positive spectrum will be the '0'-class. The decision boundary in a 2-axis plane is thus a vertical line through the origin (the red line). Simply said it is the line perpendicular to the weight vector.
Lets go through this example with a few values. The output of the perceptron is class 1 if the sum of all inputs * weights is larger than 0 (the default threshold), otherwise if the output is smaller than the threshold of 0 then the class is 0. Your input has value 1. The weight applied to this single input is -1, so 1 * -1 = -1 which is less than 0. The input is thus assigned class 0 (NOTE: class 0 and class 1 could have just been called class A or class B, don't confuse them with the input and weight values). Conversely, if the input is -1, then input * weight is -1 * -1 = 1, which is larger than 0, so the input is assigned to class 1. If you try every input value then you will see that all the negative values in this example have an output larger than 0, so all of them belong to class 1. All positive values will have an output of smaller than 0 and therefore will be classified as class 0. Draw the line which separates all positive and negative input values (the red line) and you will see that this line is perpendicular to the weight vector.
Also note that the weight vector is only used to modify the inputs to fit the wanted output. What would happen without a weight vector? An input of 1, would result in an output of 1, which is larger than the threshold of 0, so the class is '1'.
The second image on this page shows a perceptron with 2 inputs and a bias. The first input has the same weight as my example, while the second input has a weight of 1. The corresponding weight vector together with the decision boundary are thus changed as seen in the image. Also the decision boundary has been translated to the right due to an added bias of 1.
Here is a viewpoint from a more fundamental linear algebra/calculus standpoint:
The general equation of a plane is Ax + By + Cz = D (can be extended for higher dimensions). The normal vector can be extracted from this equation: [A B C]; it is the vector orthogonal to every other vector that lies on the plane.
Now if we have a weight vector [w1 w2 w3], then when do w^T * x >= 0 (to get positive classification) and w^T * x < 0 (to get negative classification). WLOG, we can also do w^T * x >= d. Now, do you see where I am going with this?
The weight vector is the same as the normal vector from the first section. And as we know, this normal vector (and a point) define a plane: which is exactly the decision boundary. Hence, because the normal vector is orthogonal to the plane, then so too is the weight vector orthogonal to the decision boundary.
Start with the simplest form, ax + by = 0, weight vector is [a, b], feature vector is [x, y]
Then y = (-a/b)x is the decision boundary with slope -a/b
The weight vector has slope b/a
If you multiply those two slopes together, result is -1
This proves decision boundary is perpendicular to weight vector
Although the question was asked 2 years ago, I think many students will have the same doubts. I reached this answer because I asked the same question.
Now, just think of X, Y (a Cartesian coordinate system is a coordinate system that specifies each point uniquely in a plane by a pair of numerical coordinates, which are the signed distances from the point to two fixed perpendicular directed lines [from Wikipedia]).
If Y = 3X, in geometry Y is perpendicular to X, then let w = 3, then Y = wX, w = Y/X and if we want to draw the relation between X, w we will have two perpendicular lines just like when we draw the relation between X, Y. So always think of the w-coefficient as perpendicular to X and Y.

Calculating homography matrix using arbitrary known geometrical relations

I am using OpenCV for an optical measurement system. I need to carry out a perspective transformation between two images, captured by a digital camera. In the field of view of the camera I placed a set of markers (which lie in a common plane), which I use as corresponding points in both images. Using the markers' positions I can calculate the homography matrix. The problem is, that the measured object, whose images I actually want to transform is positioned in a small distance from the markers and in parallel to the markers' plane. I can measure this distance.
My question is, how to take that distance into account when calculating the homography matrix, which is necessary to perform the perspective transformation.
In my solution it is a strong requirement not to use the measured object points for calculation of homography (and that is why I need other markers in the field of view).
Please let me know if the description is not precise.
Presented in the figure is the exemplary image.
The red rectangle is the measured object. It is physically placed in a small distance behind the circular markers.
I capture images of the object from different camera's positions. The measured object can deform between each acquisition. Using circular markers, I want to transform the object's image to the same coordinates. I can measure the distance between object and markers but I do not know, how should I modify the homography matrix in order to work on the measured object (instead of the markers).
This question is quite old, but it is interesting and it might be useful to someone.
First, here is how I understood the problem presented in the question:
You have two images I1 and I2 acquired by the same digital camera at two different positions. These images both show a set of markers which all lie in a common plane pm. There is also a measured object, whose visible surface lies in a plane po parallel to the marker's plane but with a small offset. You computed the homography Hm12 mapping the markers positions in I1 to the corresponding markers positions in I2 and you measured the offset dm-o between the planes po and pm. From that, you would like to calculate the homography Ho12 mapping points on the measured object in I1 to the corresponding points in I2.
A few remarks on this problem:
First, notice that an homography is a relation between image points, whereas the distance between the markers' plane and the object's plane is a distance in world coordinates. Using the latter to infer something about the former requires to have a metric estimation of the camera poses, i.e. you need to determine the euclidian and up-to-scale relative position & orientation of the camera for each of the two images. The euclidian requirement implies that the digital camera must be calibrated, which should not be a problem for an "optical measurement system". The up-to-scale requirement implies that the true 3D distance between two given 3D points must be known. For instance, you need to know the true distance l0 between two arbitrary markers.
Since we only need the relative pose of the camera for each image, we may choose to use a 3D coordinate system centered and aligned with the coordinate system of the camera for I1. Hence, we will denote the projection matrix for I1 by P1 = K1 * [ I | 0 ]. Then, we denote the projection matrix for I2 (in the same 3D coordinate system) by P2 = K2 * [ R2 | t2 ]. We will also denote by D1 and D2 the coefficients modeling lens distortion respectively for I1 and I2.
As a single digital camera acquired both I1 and I2, you may assume that K1 = K2 = K and D1 = D2 = D. However, if I1 and I2 were acquired with a long delay between the acquisitions (or with a different zoom, etc), it will be more accurate to consider that two different camera matrices and two sets of distortion coefficients are involved.
Here is how you could approach such a problem:
The steps in order to estimate P1 and P2 are as follows:
Estimate K1, K2 and D1, D2 via calibration of the digital camera
Use D1 and D2 to correct images I1 and I2 for lens distortion, then determine the marker positions in the corrected images
Compute the fundamental matrix F12 (mapping points in I1 to epilines in I2) from the corresponding markers positions and infer the essential matrix E12 = K2T * F12 * K1
Infer R2 and t2 from E12 and one point correspondence (see this answer to a related question). At this point, you have an affine estimation of the camera poses, but not an up-to-scale one since t2 has unit norm.
Use the measured distance l0 between two arbitrary markers to infer the correct norm for t2.
For the best accuracy, you may refine P1 and P2 using a bundle adjustment, with K1 and ||t2|| fixed, based on the corresponding marker positions in I1 and I2.
At this point, you have an accurate metric estimation of the camera poses P1 = K1 * [ I | 0 ] and P2 = K2 * [ R2 | t2 ]. Now, the steps to estimate Ho12 are as follows:
Use D1 and D2 to correct images I1 and I2 for lens distortion, then determine the marker positions in the corrected images (same as 2. above, no need to re-do that) and estimate Hm12 from these corresponding positions
Compute the 3x1 vector v describing the markers' plane pm by solving this linear equation: Z * Hm12 = K2 * ( R2 - t2 * vT ) * K1-1 (see HZ00 chapter 13, result 13.5 and equation 13.2 for a reference on that), where Z is a scaling factor. Infer the distance to origin dm = ||v|| and the normal n = v / ||v||, which describe the markers' plane pm in 3D.
Since the object plane po is parallel to pm, they have the same normal n. Hence, you can infer the distance to origin do for po from the distance to origin dm for pm and from the measured plane offset dm-o, as follows: do = dm ± dm-o (the sign depends of the relative position of the planes: positive if pm is closer to the camera for I1 than po, negative otherwise).
From n and do describing the object plane in 3D, infer the homography Ho12 = K2 * ( R2 - t2 * nT / do ) * K1-1 (see HZ00 chapter 13, equation 13.2)
The homography Ho12 maps points on the measured object in I1 to the corresponding points in I2, where both I1 and I2 are assumed to be corrected for lens distortion. If you need to map points from and to the original distorted image, don't forget to use the distortion coefficients D1 and D2 to transform the input and output points of Ho12.
The reference I used:
[HZ00] "Multiple view geometry for computer vision", by R.Hartley and A.Zisserman, 2000.

Resources