Can we use combination of two polynomial ( degree n and n+1 ) for regression to fit a curve for application in wall-filter? - image-processing

I am trying to remove clutter(noise, tissue interference) from color flow ultrasound image of blood vessels - what is technically called wall filter. What I tried is to used single polynomial regression of degree 2 to fit the clutter space and remove this from the original data (from the color flow matrix data) and I also used Legendre polynomial to acheive the same goal. I am gettig different result for degree 2 and degree 3. Degree 2 gives somehow good result but degree 3 messed result. I heard there is a away to combine these two polynomial with certain weigh to acheive better fit.
here is what I tied in math lab:
for k = 1:m*n %%% m and n beig dimension of the data
i = mod(k-1,m)+1; % row
j = (k-i)/m+1; % column
p = polyfit(t,IQ_sample(i,j,:),2); % polynomial curve fitting - IQ_sample is my 3D data , 2 degree used here
IQ_filterd(i,j,:) = IQ_sample_sq(i,j,:)-polyval(p,t); % remove the clutter from the data -filtering
end
I did similar to the legendre polynomial . I am wondering if there is a way to fit two polynomials hoping to get better fit thereby getting better filter. I may be mixing up some concept as i am new comer to this area, I apprecaite your insight on this and realted tips on wall-filter using polynomial regression and thank you.

Related

Unable to understand transition from one equation to another in linear regression. (From Ch-2, The elements of Statistical Learning)

While reading linear regression in Ch-2 of book "The elements of Statistical Learning", I came across 2 equations and I failed to understand how the 2nd was derived from the first.
Background:
How do we fit the linear model to a set of training data? There are
many different methods, but by far the most popular is the method of
least squares. In this approach, we pick the coefficients β to minimize the
residual sum of squares
Equation 1
RSS(β) is a quadratic function of the parameters, and hence its minimum
always exists, but may not be unique. The solution is easiest to characterize
in matrix notation. We can write
Equation 2
where X is an N × p matrix with each row an input vector, and y is an
N-vector of the outputs in the training set.
1st equation:
2nd equation:
I got it. The RHS of the 2nd equation is in the matrix form and to get the 1st equation, you have to transpose one part of the RHS of 2nd equation(this is how matrix multiplication is done)

What is the correct way to average several rotation matrices?

I get many rotation vectors from pose estimation of many frames (while camera is stationary) and I want the most accurate measure. Theoretically Can I average through rotation vector\matrices\other kind of data? or is that wrong?
in addition, how can I tell when a rotation vector\matrix is an outlier (i.e. very different from all the others and may be a miscalculation)? for example, in translation matrix I see the difference in centimeters of every entry and can have an intuitive threshold. Is there a similar way in rotation?
One way, if you want to average rotations that are 'close', is to seek, in analogy with the mean of say numbers, the value that minimises the 'dispersion'. For numbers x[], the mean is what mimnimises
disp = Sum{ i | sqr( x[i]-mean)}
So for rotations R[] we can seek a rotation Q to minimise
disp = Sum{ i | Tr( (R[i]-Q)'*(R[i]-Q))}
where Tr is the trace and ' denotes transpose. Note that writing things this way does not change what we are tring to minimise, it just makes the algebra easier.
That particular measure of dispersion leads to a practical way of computing Q:
a/ compute the 'mean matrix' of the rotations
M = Sum{ i | R[i] } /N
b/ take the SVD of that
M = U*D*V'
c/ compute the rotation closest to M
Q = U*V'
You may not average rotation matrices, specifically not, when you use the term "most accurate". But let's go back to start: Matrix multiplications, i.e. rotations, do not commute. ABC != BAC != CBA ... the outcomes can be as dramatically apart as imaginable.
As far the outliers go: use quaternions instead of rotation matrices. Firstly, the amount of calculation steps can be minimised leading to higher performance there are tons of implementations of that online. And secondly by building euclidean norms on the quaternions, you get a good measure for outliers.

Explain difference between opencv's template matching methods in non-mathematical way

I'm trying to use opencv to find some template in images. While opencv has several template matching methods, I have big trouble to understand the difference and when to use which by looking at their mathematic equization:
CV_TM_SQDIFF
CV_TM_SQDIFF_NORMED
CV_TM_CCORR
CV_TM_CCORR_NORMED
CV_TM_CCOEFF
Can someone explain the major difference between all these method in a non-mathematical way?
The general idea of template matching is to give each location in the target image I, a similarity measure, or score, for the given template T. The output of this process is the image R.
Each element in R is computed from the template, which spans over the ranges of x' and y', and a window in I of the same size.
Now, you have two windows and you want to know how similar they are:
CV_TM_SQDIFF - Sum of Square Differences (or SSD):
Simple euclidian distance (squared):
Take every pair of pixels and subtract
Square the difference
Sum all the squares
CV_TM_SQDIFF_NORMED - SSD Normed
This is rarely used in practice, but the normalization part is similar in the next methods.
The nominator term is same as above, but divided by a factor, computed from the
- square root of the product of:
sum of the template, squared
sum of the image window, squared
CV_TM_CCORR - Cross Correlation
Basically, this is a dot product:
Take every pair of pixels and multiply
Sum all products
CV_TM_CCOEFF - Cross Coefficient
Similar to Cross Correlation, but normalized with their Covariances (which I find hard to explain without math. But I would refer to
mathworld
or mathworks
for some examples

Margin for optimal decision plane

For a given dataset of 2-D input data, we apply the SVM learning
algorithm and achieve an optimal decision plane:
H(x) = x^1 + 2x^2 + 3
What is the margin of this SVM?
I've been looking at this for hours trying to work out how to answer this. I think it's meant to be relatively simple but I've been searching through my learning material and cannot find how I'm meant to answer this.
I'd appreciate some help on the steps I should use to solve this.
Thanks.
It is imposible to calculate the margin wit only given optimal decision plane. You should give the support vectors or at least samples of classes.
Anyway, you can follow this steps:
1- Calculate Lagrange Multipliers (alphas) I don' t know which environment you work on but you can use Quadratic Programming Solver of MATLAB: quadprog(), it is not hard to use.
2- Find support vectors. Remember, only alphas of support vectors don' t equal to zero (but other alphas of samples equal to zero) so you can find support vectors of classes.
3- Calculate w vector which is a vector orthogonal to optimal hyperplane. You know, can use the summation below to calculate this vector:
where,
alpha(i): alphas (lagrange multipliers) of support vector;
y(i) : labels of samples (say -1 or +1);
phi() : kernel function;
x(i) : support vectors.
4- Take one support vector from each class lets say one is SV1 from class 1 and other SV2 from class 2. Now you can calculate the margin using vector projection and dot product:
margin = < (SV1 - SV2), w > / norm(w)
where,
<(SV1 - SV2), w> : dot product of vector (SV1 - SV2) and vector w
norm(w) : norm of vector w

Geometric representation of Perceptrons (Artificial neural networks)

I am taking this course on Neural networks in Coursera by Geoffrey Hinton (not current).
I have a very basic doubt on weight spaces.
https://d396qusza40orc.cloudfront.net/neuralnets/lecture_slides%2Flec2.pdf
Page 18.
If I have a weight vector (bias is 0) as [w1=1,w2=2] and training case as {1,2,-1} and {2,1,1}
where I guess {1,2} and {2,1} are the input vectors. How can it be represented geometrically?
I am unable to visualize it? Why is training case giving a plane which divides the weight space into 2? Could somebody explain this in a coordinate axes of 3 dimensions?
The following is the text from the ppt:
1.Weight-space has one dimension per weight.
2.A point in the space has particular setting for all the weights.
3.Assuming that we have eliminated the threshold each hyperplane could be represented as a hyperplane through the origin.
My doubt is in the third point above. Kindly help me understand.
It's probably easier to explain if you look deeper into the math. Basically what a single layer of a neural net is performing some function on your input vector transforming it into a different vector space.
You don't want to jump right into thinking of this in 3-dimensions. Start smaller, it's easy to make diagrams in 1-2 dimensions, and nearly impossible to draw anything worthwhile in 3 dimensions (unless you're a brilliant artist), and being able to sketch this stuff out is invaluable.
Let's take the simplest case, where you're taking in an input vector of length 2, you have a weight vector of dimension 2x1, which implies an output vector of length one (effectively a scalar)
In this case it's pretty easy to imagine that you've got something of the form:
input = [x, y]
weight = [a, b]
output = ax + by
If we assume that weight = [1, 3], we can see, and hopefully intuit that the response of our perceptron will be something like this:
With the behavior being largely unchanged for different values of the weight vector.
It's easy to imagine then, that if you're constraining your output to a binary space, there is a plane, maybe 0.5 units above the one shown above that constitutes your "decision boundary".
As you move into higher dimensions this becomes harder and harder to visualize, but if you imagine that that plane shown isn't merely a 2-d plane, but an n-d plane or a hyperplane, you can imagine that this same process happens.
Since actually creating the hyperplane requires either the input or output to be fixed, you can think of giving your perceptron a single training value as creating a "fixed" [x,y] value. This can be used to create a hyperplane. Sadly, this cannot be effectively be visualized as 4-d drawings are not really feasible in browser.
Hope that clears things up, let me know if you have more questions.
I have encountered this question on SO while preparing a large article on linear combinations (it's in Russian, https://habrahabr.ru/post/324736/). It has a section on the weight space and I would like to share some thoughts from it.
Let's take a simple case of linearly separable dataset with two classes, red and green:
The illustration above is in the dataspace X, where samples are represented by points and weight coefficients constitutes a line. It could be conveyed by the following formula:
w^T * x + b = 0
But we can rewrite it vice-versa making x component a vector-coefficient and w a vector-variable:
x^T * w + b = 0
because dot product is symmetrical. Now it could be visualized in the weight space the following way:
where red and green lines are the samples and blue point is the weight.
More possible weights are limited to the area below (shown in magenta):
which could be visualized in dataspace X as:
Hope it clarifies dataspace/weightspace correlation a bit. Feel free to ask questions, will be glad to explain in more detail.
The "decision boundary" for a single layer perceptron is a plane (hyper plane)
where n in the image is the weight vector w, in your case w={w1=1,w2=2}=(1,2) and the direction specifies which side is the right side. n is orthogonal (90 degrees) to the plane)
A plane always splits a space into 2 naturally (extend the plane to infinity in each direction)
you can also try to input different value into the perceptron and try to find where the response is zero (only on the decision boundary).
Recommend you read up on linear algebra to understand it better:
https://www.khanacademy.org/math/linear-algebra/vectors_and_spaces
For a perceptron with 1 input & 1 output layer, there can only be 1 LINEAR hyperplane. And since there is no bias, the hyperplane won't be able to shift in an axis and so it will always share the same origin point. However, if there is a bias, they may not share a same point anymore.
I think the reason why a training case can be represented as a hyperplane because...
Let's say
[j,k] is the weight vector and
[m,n] is the training-input
training-output = jm + kn
Given that a training case in this perspective is fixed and the weights varies, the training-input (m, n) becomes the coefficient and the weights (j, k) become the variables.
Just as in any text book where z = ax + by is a plane,
training-output = jm + kn is also a plane defined by training-output, m, and n.
Equation of a plane passing through origin is written in the form:
ax+by+cz=0
If a=1,b=2,c=3;Equation of the plane can be written as:
x+2y+3z=0
So,in the XYZ plane,Equation: x+2y+3z=0
Now,in the weight space;every dimension will represent a weight.So,if the perceptron has 10 weights,Weight space will be 10 dimensional.
Equation of the perceptron: ax+by+cz<=0 ==> Class 0
ax+by+cz>0 ==> Class 1
In this case;a,b & c are the weights.x,y & z are the input features.
In the weight space;a,b & c are the variables(axis).
So,for every training example;for eg: (x,y,z)=(2,3,4);a hyperplane would be formed in the weight space whose equation would be:
2a+3b+4c=0
passing through the origin.
I hope,now,you understand it.
Consider we have 2 weights. So w = [w1, w2]. Suppose we have input x = [x1, x2] = [1, 2]. If you use the weight to do a prediction, you have z = w1*x1 + w2*x2 and prediction y = z > 0 ? 1 : 0.
Suppose the label for the input x is 1. Thus, we hope y = 1, and thus we want z = w1*x1 + w2*x2 > 0. Consider vector multiplication, z = (w ^ T)x. So we want (w ^ T)x > 0. The geometric interpretation of this expression is that the angle between w and x is less than 90 degree. For example, the green vector is a candidate for w that would give the correct prediction of 1 in this case. Actually, any vector that lies on the same side, with respect to the line of w1 + 2 * w2 = 0, as the green vector would give the correct solution. However, if it lies on the other side as the red vector does, then it would give the wrong answer.
However, suppose the label is 0. Then the case would just be the reverse.
The above case gives the intuition understand and just illustrates the 3 points in the lecture slide. The testing case x determines the plane, and depending on the label, the weight vector must lie on one particular side of the plane to give the correct answer.

Resources