Computing SNR with discrete random variables - signal-processing

Suppose:
The signal S is a random variable which takes the values {0, 1, 2, 3} with probabilities {0.3, 0.2, 0.1, 0.4}.
The noise N is a random variable which takes the values {-2, -1, 0, 1, 2} with probabilities {0.1, 0.1, 0.6, 0.1, 0.1}.
The SNR is given by (Power of S)/(Power of N) = ((Amplitude of S)/(Amplitude of N))^2.
My question is: How is the amplitude of a random variable computed ? Is it:
The Root-Mean_Square (RMS) amplitude ?
The Variance ?

It's E[S^2]/E[N^2] where E is the Expectation operator.

Related

SVD not giving Rotation and Translation matrix from stereo essential matrix

I am trying to extract rotation and translation from the fundamental matrix. I used the intrinsic matrices to get the essential matrix, but the SVD is not giving expected results. So I composed the essential matrix and trying my SVD code to get the Rotation and translation matrices back and found that the SVD code is wrong.
I created the essential matrix using Rotation and translation matrices
R = [[ 0.99965657, 0.02563432, -0.00544263],
[-0.02596087, 0.99704732, -0.07226806],
[ 0.00357402, 0.07238453, 0.9973704 ]]
T = [-0.1679611706725666, 0.1475313058767286, -0.9746915198833979]
tx = np.array([[0, -T[2], T[1]], [T[2], 0, -T[0]], [-T[1], T[0], 0]])
E = R.dot(tx)
// E Output: [[-0.02418259, 0.97527093, 0.15178621],
[-0.96115177, -0.01316561, 0.16363519],
[-0.21769595, -0.16403593, 0.01268507]]
Now while trying to get it back using SVD.
U,S,V = np.linalg.svd(E)
diag_110 = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 0]])
newE = U.dot(diag_110).dot(V.T)
U,S,V = np.linalg.svd(newE)
W = np.array([[0, -1, 0], [1, 0, 0], [0, 0, 1]])
Z = np.array([[0, 1, 0],[-1, 0, 0],[0, 0, 0]])
R1 = U.dot(W).dot(V.T)
R2 = U.dot(W.T).dot(V.T)
T = U.dot(Z).dot(U.T);
T = [T[1,0], -T[2, 0], T[2, 1]]
'''
Output
R1 : [[-0.99965657, -0.00593909, 0.02552386],
[ 0.02596087, -0.35727319, 0.93363906],
[-0.00357402, -0.93398105, -0.35730468]]
R2 : [[-0.90837444, -0.20840016, -0.3625262 ],
[ 0.26284261, 0.38971602, -0.8826297 ],
[-0.32522244, 0.89704559, 0.29923163]]
T : [-0.1679611706725666, 0.1475313058767286, -0.9746915198833979],
'''
What is wrong with the SVD code? I referred the code here and here
Your R1 output is a left-handed and axis-permuted version of your initial (ground-truth) rotation matrix: notice that the first column is opposite to the ground-truth, and the second and third are swapped, and that the determinant of R1 is ~= -1 (i.e. it's a left-handed frame).
The reason this happens is that the SVD decomposition returns unitary matrices U and V with no guaranteed parity. In addition, you multiplied by an axis-permutation matrix W. It is up to you to flip or permute the axes so that the rotation has the correct handedness. You do so by enforcing constraints from the images and the scene, and known order of the cameras (i.e. knowing which camera is the left one).

What is the difference between an Embedding Layer and a Dense Layer?

The docs for an Embedding Layer in Keras say:
Turns positive integers (indexes) into dense vectors of fixed size. eg. [[4], [20]] -> [[0.25, 0.1], [0.6, -0.2]]
I believe this could also be achieved by encoding the inputs as one-hot vectors of length vocabulary_size, and feeding them into a Dense Layer.
Is an Embedding Layer merely a convenience for this two-step process, or is something fancier going on under the hood?
An embedding layer is faster, because it is essentially the equivalent of a dense layer that makes simplifying assumptions.
Imagine a word-to-embedding layer with these weights:
w = [[0.1, 0.2, 0.3, 0.4],
[0.5, 0.6, 0.7, 0.8],
[0.9, 0.0, 0.1, 0.2]]
A Dense layer will treat these like actual weights with which to perform matrix multiplication. An embedding layer will simply treat these weights as a list of vectors, each vector representing one word; the 0th word in the vocabulary is w[0], 1st is w[1], etc.
For an example, use the weights above and this sentence:
[0, 2, 1, 2]
A naive Dense-based net needs to convert that sentence to a 1-hot encoding
[[1, 0, 0],
[0, 0, 1],
[0, 1, 0],
[0, 0, 1]]
then do a matrix multiplication
[[1 * 0.1 + 0 * 0.5 + 0 * 0.9, 1 * 0.2 + 0 * 0.6 + 0 * 0.0, 1 * 0.3 + 0 * 0.7 + 0 * 0.1, 1 * 0.4 + 0 * 0.8 + 0 * 0.2],
[0 * 0.1 + 0 * 0.5 + 1 * 0.9, 0 * 0.2 + 0 * 0.6 + 1 * 0.0, 0 * 0.3 + 0 * 0.7 + 1 * 0.1, 0 * 0.4 + 0 * 0.8 + 1 * 0.2],
[0 * 0.1 + 1 * 0.5 + 0 * 0.9, 0 * 0.2 + 1 * 0.6 + 0 * 0.0, 0 * 0.3 + 1 * 0.7 + 0 * 0.1, 0 * 0.4 + 1 * 0.8 + 0 * 0.2],
[0 * 0.1 + 0 * 0.5 + 1 * 0.9, 0 * 0.2 + 0 * 0.6 + 1 * 0.0, 0 * 0.3 + 0 * 0.7 + 1 * 0.1, 0 * 0.4 + 0 * 0.8 + 1 * 0.2]]
=
[[0.1, 0.2, 0.3, 0.4],
[0.9, 0.0, 0.1, 0.2],
[0.5, 0.6, 0.7, 0.8],
[0.9, 0.0, 0.1, 0.2]]
However, an Embedding layer simply looks at [0, 2, 1, 2] and takes the weights of the layer at indices zero, two, one, and two to immediately get
[w[0],
w[2],
w[1],
w[2]]
=
[[0.1, 0.2, 0.3, 0.4],
[0.9, 0.0, 0.1, 0.2],
[0.5, 0.6, 0.7, 0.8],
[0.9, 0.0, 0.1, 0.2]]
So it's the same result, just obtained in a hopefully faster way.
The Embedding layer does have limitations:
The input needs to be integers in [0, vocab_length).
No bias.
No activation.
However, none of those limitations should matter if you just want to convert an integer-encoded word into an embedding.
Mathematically, the difference is this:
An embedding layer performs select operation. In keras, this layer is equivalent to:
K.gather(self.embeddings, inputs) # just one matrix
A dense layer performs dot-product operation, plus an optional activation:
outputs = matmul(inputs, self.kernel) # a kernel matrix
outputs = bias_add(outputs, self.bias) # a bias vector
return self.activation(outputs) # an activation function
You can emulate an embedding layer with fully-connected layer via one-hot encoding, but the whole point of dense embedding is to avoid one-hot representation. In NLP, the word vocabulary size can be of the order 100k (sometimes even a million). On top of that, it's often needed to process the sequences of words in a batch. Processing the batch of sequences of word indices would be much more efficient than the batch of sequences of one-hot vectors. In addition, gather operation itself is faster than matrix dot-product, both in forward and backward pass.
Here I want to improve the voted answer by providing more details:
When we use embedding layer, it is generally to reduce one-hot input vectors (sparse) to denser representations.
Embedding layer is much like a table lookup. When the table is small, it is fast.
When the table is large, table lookup is much slower. In practice, we would use dense layer as a dimension reducer to reduce the one-hot input instead of embedding layer in this case.

how can I compute a POLY_FIT with some known coefficient?

let's take this example:
we use X and Y data corresponding to the known polynomial f (x) = 0.25 - x + x2. Using POLY_FIT to compute a second degree polynomial fit returns the exact coefficients (to within machine accuracy).
; Define an 11-element vector of independent variable data:
X = [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]
; Define an 11-element vector of dependent variable data:
Y = [0.25, 0.16, 0.09, 0.04, 0.01, 0.00, 0.01, 0.04, 0.09, $
0.16, 0.25]
; Define a vector of measurement errors:
measure_errors = REPLICATE(0.01, 11)
; Compute the second degree polynomial fit to the data:
result = POLY_FIT(X, Y, 2, MEASURE_ERRORS=measure_errors, $
SIGMA=sigma)
; Print the coefficients:
PRINT, 'Coefficients: ', result
PRINT, 'Standard errors: ', sigma
this example prints,
Coefficients: 0.250000 -1.00000 1.00000
Standard errors: 0.00761853 0.0354459 0.0341395
that works as expected, but let say I already know the coefficient 0.25, how can I pass to POLY_FIT that coefficient ? or maybe I have to use some other FIT function ?
source: http://www.exelisvis.com/docs/POLY_FIT.html
Have you looked at MPFIT? It has keywords to pass information about the parameters. Docs.

Cosine similarity when one of vectors is all zeros

How to express the cosine similarity ( http://en.wikipedia.org/wiki/Cosine_similarity )
when one of the vectors is all zeros?
v1 = [1, 1, 1, 1, 1]
v2 = [0, 0, 0, 0, 0]
When we calculate according to the classic formula we get division by zero:
Let d1 = 0 0 0 0 0 0
Let d2 = 1 1 1 1 1 1
Cosine Similarity (d1, d2) = dot(d1, d2) / ||d1|| ||d2||dot(d1, d2) = (0)*(1) + (0)*(1) + (0)*(1) + (0)*(1) + (0)*(1) + (0)*(1) = 0
||d1|| = sqrt((0)^2 + (0)^2 + (0)^2 + (0)^2 + (0)^2 + (0)^2) = 0
||d2|| = sqrt((1)^2 + (1)^2 + (1)^2 + (1)^2 + (1)^2 + (1)^2) = 2.44948974278
Cosine Similarity (d1, d2) = 0 / (0) * (2.44948974278)
= 0 / 0
I want to use this similarity measure in a clustering application.
And I often will need to compare such vectors.
Also [0, 0, 0, 0, 0] vs. [0, 0, 0, 0, 0]
Do you have any experience?
Since this is a similarity (not a distance) measure should I use special case for
d( [1, 1, 1, 1, 1]; [0, 0, 0, 0, 0] ) = 0
d([0, 0, 0, 0, 0]; [0, 0, 0, 0, 0] ) = 1
what about
d([1, 1, 1, 0, 0]; [0, 0, 0, 0, 0] ) = ? etc.
If you have 0 vectors, cosine is the wrong similarity function for your application.
Cosine distance is essentially equivalent to squared Euclidean distance on L_2 normalized data. I.e. you normalize every vector to unit length 1, then compute squared Euclidean distance.
The other benefit of Cosine is performance - computing it on very sparse, high-dimensional data is faster than Euclidean distance. It benefits from sparsity to the square, not just linear.
While you obviously can try to hack the similarity to be 0 when exactly one is zero, and maximal when they are identical, it won't really solve the underlying problems.
Don't choose the distance by what you can easily compute.
Instead, choose the distance such that the result has a meaning on your data. If the value is undefined, you don't have a meaning...
Sometimes, it may work to discard constant-0 data as meaningless data anyway (e.g. analyzing Twitter noise, and seeing a Tweet that is all numbers, no words). Sometimes it doesn't.
It is undefined.
Think you have a vector C that is not zero in place your zero vector. Multiply it by epsilon > 0 and let run epsilon to zero. The result will depend on C, so the function is not continuous when one of the vectors is zero.

Kalman Filter : some doubts

I have several questions:
In the example given in openCV document:
/* generate measurement */
cvMatMulAdd( kalman->measurement_matrix, state, measurement, measurement );
Is this correct?
In the tutorial: An Introduction to the Kalman Filter by Welch and Bishop
in Equation 1.2 it says measurement = H*state + measurement noise
Doesn't seems both are same.
I was trying to implement bouncing ball tracking for a single ball.
I tried the following: (Please point out if I am doing it incorrectly.)
For the measurement I am measuring two things: a) x b) y of the centroid of the ball.
I am just mentioning lines which are different from the example given in opencv documentation.
CvKalman* kalman = cvCreateKalman( 5, 2, 0 );
const float A[] = { 1, 0, 1, 0, 0,
0, 1, 0, 1, 0,
0, 0, 1, 0, 0,
0, 0, 0, 1, 1,
0, 0, 0, 0, 1};
CvMat* state = cvCreateMat( 5, 1, CV_32FC1 );
CvMat* measurement = cvCreateMat( 2, 1, CV_32FC1 );
//initialize the state of kalman filter
state->data.fl[0] = mean_c;
state->data.fl[1] = mean_r;
state->data.fl[2] = mean_c - prev_mean_c;
state->data.fl[3] = mean_r - prev_mean_r;
state->data.fl[4] = 9.81;
after initialization, this is what gives crash
cvMatMulAdd( kalman->transition_matrix, state,
kalman->process_noise_cov, state );
In this line they just use variable measurement to store noise. See previous line:
cvRandArr( &rng, measurement, CV_RAND_NORMAL, cvRealScalar(0),cvRealScalar(sqrt(kalman->measurement_noise_cov->data.fl[0])) );
You should change dimension of H matrix as well. It must be 5 by 2 to make it possible to calculate H*state + measurement noise. You get an error probably in line
memcpy( cvkalman->measurement_matrix->data.fl, H, sizeof(H));
because in initial example cvkalman->measurement_matrix and H are allocated as 4 by 4 matrices and you decreased dimension of cvkalman->measurement_matrix only to 5 by 2 (4*4 is more than 5*2)

Resources