Problems with meanStdDev OpenCV - opencv

I am trying to write a thresholding function that would take as parameter my Threshold function. for that I need to use meanStdDev.
Mat structElem = dst(Range(i - radius, i + radius), Range(j - radius, j + radius));
meanStdDev(structElem, mean, stdev);
double threshValue = mean[0] * stdMean[0] + stdMean[1] * stdev[0] + stdMean[2]);
here mean and stdev are scalars while stdMean is an array that I use for getting the Thresh value. The funny thing is that when I try to do the same with 8Bit images everything works.

Documantation of meanStdDev says that
"The function meanStdDev calculates the mean and the standard deviation M of array elements independently for each channel and returns it via the output parameters"
....
"results can be stored in Scalar_ 's."
so mean and stdev values are scalar values
for color images split image into channels
and calculate and apply a threshold for each channell indpendently.
mean[0] mean value of first channel
mean[1] mean value of second channel
....

Related

Difference between absdiff and normal subtraction in OpenCV

I am currently planning on training a binary image classification model. The images I want to train on are the difference between two original pictures. In other words, for each data entry, I start out with 2 pictures, take their difference, and the label that difference as a 0 or 1. My question is what is the best way to find this difference. I know about cv2.absdiff and then normal subtraction of images - what is the most effective way to go about this?
About the data: The images I'm training on are screenshots that usually are the same but may have small differences. I found that normal subtraction seems to show the differences less than absdiff.
This is the code I use for absdiff:
diff = cv2.absdiff(img1, img2)
mask = cv2.cvtColor(diff, cv2.COLOR_BGR2GRAY)
th = 1
imask = mask>1
canvas = np.zeros_like(img2, np.uint8)
canvas[imask] = img2[imask]
And then this for normal subtraction:
def extract_diff(self,imageA, imageB, image_name, path):
subtract = imageB.astype(np.float32) - imageA.astype(np.float32)
mask = cv2.inRange(np.abs(subtract),(30,30,30),(255,255,255))
th = 1
imask = mask>1
canvas = np.zeros_like(imageA, np.uint8)
canvas[imask] = imageA[imask]
Thanks!
A difference can be negative or positive.
For some number types, such as uint8 (unsigned 8-bit int), which can't be negative (have no sign), a negative value wraps around and the value would make no sense anymore. Other types can be signed (e.g. floats, signed ints), so a negative value can be represented correctly.
That's why cv.absdiff exists. It always gives you absolute differences, and those are okay to represent in an unsigned type.
Example with numbers: a = 4, b = 6. a-b should be -2, right?
That value, as an uint8, will wrap around to become 0xFE, or 254 in decimal. The 254 value has some relation to the true -2 difference, but it also incorporates the range of values of the data type (8 bits: 256 values), so it's really just "code".
cv.absdiff would give you the absolute of the difference (-2), which is 2.

Backpropagation algorithm through cross-channel local response normalization (LRN) layer

I am working on replicating a neural network. I'm trying to get an understanding of how the standard layer types work. In particular, I'm having trouble finding a description anywhere of how cross-channel normalisation layers behave on the backward-pass.
Since the normalization layer has no parameters, I could guess two possible options:
The error gradients from the next (i.e. later) layer are passed backwards without doing anything to them.
The error gradients are normalized in the same way the activations are normalized across channels in the forward pass.
I can't think of a reason why you'd do one over the other based on any intuition, hence why I'd like some help on this.
EDIT1:
The layer is a standard layer in caffe, as described here http://caffe.berkeleyvision.org/tutorial/layers.html (see 'Local Response Normalization (LRN)').
The layer's implementation in the forward pass is described in section 3.3 of the alexNet paper: http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf
EDIT2:
I believe the forward and backward pass algorithms are described in both the Torch library here: https://github.com/soumith/cudnn.torch/blob/master/SpatialCrossMapLRN.lua
and in the Caffe library here: https://github.com/BVLC/caffe/blob/master/src/caffe/layers/lrn_layer.cpp
Please could anyone who is familiar with either/both of these translate the method for the backward pass stage into plain english?
It uses the chain rule to propagate the gradient backwards through the local response normalization layer. It is somewhat similar to a nonlinearity layer in this sense (which also doesn't have trainable parameters on its own, but does affect gradients going backwards).
From the code in Caffe that you linked to I see that they take the error in each neuron as a parameter, and compute the error for the previous layer by doing following:
First, on the forward pass they cache a so-called scale, that is computed (in terms of AlexNet paper, see the formula from section 3.3) as:
scale_i = k + alpha / n * sum(a_j ^ 2)
Here and below sum is sum indexed by j and goes from max(0, i - n/2) to min(N, i + n/2)
(note that in the paper they do not normalize by n, so I assume this is something that Caffe does differently than AlexNet). Forward pass is then computed as b_i = a_i + scale_i ^ -beta.
To backward propagate the error, let's say that the error coming from the next layer is be_i, and the error that we need to compute is ae_i. Then ae_i is computed as:
ae_i = scale_i ^ -b * be_i - (2 * alpha * beta / n) * a_i * sum(be_j * b_j / scale_j)
Since you are planning to implement it manually, I will also share two tricks that Caffe uses in their code that makes the implementation simpler:
When you compute the addends for the sum, allocate an array of size N + n - 1, and pad it with n/2 zeros on each end. This way you can compute the sum from i - n/2 to i + n/2, without caring about going below zero and beyond N.
You don't need to recompute the sum on each iteration, instead compute the the addends in advance (a_j^2 for the front pass, be_j * b_j / scale_j for the backward pass), then compute the sum for i = 0, and then for each consecutive i just add addend[i + n/2] and subtract addend[i - n/2 - 1], it will give you the value of the sum for the new value of i in constant time.
Of cause,you can either print the variables to observe the changes with them or use the debug model to see how errors change during passing the net.
I have an alternative formulation of the backward and I don't know if it is equivalent to caffe's:
So caffe's is :
ae_i = scale_i ^ -b * be_i - (2 * alpha * beta / n) * a_i * sum(be_j * b_j / scale_j)
by differentiating the original expression
b_i = a_i/(scale_i^-b)
I get
ae_i = scale_i ^ -b * be_i - (2 * alpha * beta / n) * a_i * be_i*sum(ae_j)/scale_i^(-b-1)

OpenCV: Essential Matrix Decomposition

I am trying to extract Rotation matrix and Translation vector from the essential matrix.
<pre><code>
SVD svd(E,SVD::MODIFY_A);
Mat svd_u = svd.u;
Mat svd_vt = svd.vt;
Mat svd_w = svd.w;
Matx33d W(0,-1,0,
1,0,0,
0,0,1);
Mat_<double> R = svd_u * Mat(W).t() * svd_vt; //or svd_u * Mat(W) * svd_vt;
Mat_<double> t = svd_u.col(2); //or -svd_u.col(2)
</code></pre>
However, when I am using R and T (e.g. to obtain rectified images), the result does not seem to be right(black images or some obviously wrong outputs), even so I used different combination of possible R and T.
I suspected to E. According to the text books, my calculation is right if we have:
E = U*diag(1, 1, 0)*Vt
In my case svd.w which is supposed to be diag(1, 1, 0) [at least in term of a scale], is not so. Here is an example of my output:
svd.w = [21.47903827647813; 20.28555196246256; 5.167099204708699e-010]
Also, two of the eigenvalues of E should be equal and the third one should be zero. In the same case the result is:
eigenvalues of E = 0.0000 + 0.0000i, 0.3143 +20.8610i, 0.3143 -20.8610i
As you see, two of them are complex conjugates.
Now, the questions are:
Is the decomposition of E and calculation of R and T done in a right way?
If the calculation is right, why the internal rules of essential matrix are not satisfied by the results?
If everything about E, R, and T is fine, why the rectified images obtained by them are not correct?
I get E from fundamental matrix, which I suppose to be right. I draw epipolar lines on both the left and right images and they all pass through the related points (for all the 16 points used to calculate the fundamental matrix).
Any help would be appreciated.
Thanks!
I see two issues.
First, discounting the negligible value of the third diagonal term, your E is about 6% off the ideal one: err_percent = (21.48 - 20.29) / 20.29 * 100 . Sounds small, but translated in terms of pixel error it may be an altogether larger amount.
So I'd start by replacing E with the ideal one after SVD decomposition: Er = U * diag(1,1,0) * Vt.
Second, the textbook decomposition admits 4 solutions, only one of which is physically plausible (i.e. with 3D points in front of the camera). You may be hitting one of non-physical ones. See http://en.wikipedia.org/wiki/Essential_matrix#Determining_R_and_t_from_E .

how to get r,g,b value using opencv2.3 [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
OpenCV rgb value for cv::Point in cv::Mat
As you know, in matlab it's easy to get r/g/b values using r = image(:,:,1).
But in openCV (before 2.2) we must use pointer like this:
plImage* img=cvCreateImage(cvSize(640,480),IPL_DEPTH_32F,3);
((float *)(img->imageData + i*img->widthStep))[j*img->nChannels + 0]=111; // B
((float *)(img->imageData + i*img->widthStep))[j*img->nChannels + 1]=112; // G
((float *)(img->imageData + i*img->widthStep))[j*img->nChannels + 2]=113; // R
But as openCV2.3 comes out, it's easy to get pixel value of a single channel image like this:
Mat image;
int pixel = image.at<uchar>(row,col);
So I just wonder it there also a easy way to get the r,g,b pixel value of a multichannel image just like that in the Matlab? Any help will be appreciated =)
For C++ interface you can do:
Vec3f pixel = image.at<Vec3f>(row, col);
int b = pixel[0];
int g = pixel[1];
int r = pixel[2];
as vasile said, getting a cell as a Vec3 will get you the pixel with easy access to its rgb components, this is the simplest solution in opencv since the data structure saves the pixels in the following format "RGBRGBRGBRGBRGB..." while matlab saves it as "RRRRRRRGGGGGGGBBBBBBBB..."
to get a specified channel like in matlab you can use the CvSplit (or cv::split in c++ style), this function will split the image into its 3-4 different channels so you could access a channels like in matlab. in the provided links you can find also a reference for the opposite function - merge

Need a specific example of U-Matrix in Self Organizing Map

I'm trying to develop an application using SOM in analyzing data. However, after finishing training, I cannot find a way to visualize the result. I know that U-Matrix is one of the method but I cannot understand it properly. Hence, I'm asking for a specific and detail example how to construct U-Matrix.
I also read an answer at U-matrix and self organizing maps but it only refers to 1 row map, how about 3x3 map? I know that for 3x3 map:
m(1) m(2) m(3)
m(4) m(5) m(6)
m(7) m(8) m(9)
a 5x5 matrix must me created:
u(1) u(1,2) u(2) u(2,3) u(3)
u(1,4) u(1,2,4,5) u(2,5) u(2,3,5,6) u(3,6)
u(4) u(4,5) u(5) u(5,6) u(6)
u(4,7) u(4,5,7,8) u(5,8) u(5,6,8,9) u(6,9)
u(7) u(7,8) u(8) u(8,9) u(9)
but I don't know how to calculate u-weight u(1,2,4,5), u(2,3,5,6), u(4,5,7,8) and u(5,6,8,9).
Finally, after constructing U-Matrix, is there any way to visualize it using color, e.g. heat map?
Thank you very much for your time.
Cheers
I don't know if you are still interested in this but I found this link
http://www.uni-marburg.de/fb12/datenbionik/pdf/pubs/1990/UltschSiemon90
which explains very speciffically how to calculate the U-matrix.
Hope it helps.
By the way, the site were I found the link has several resources referring to SOMs I leave it here in case anyone is interested:
http://www.ifs.tuwien.ac.at/dm/somtoolbox/visualisations.html
The essential idea of a Kohonen map is that the data points are mapped to a
lattice, which is often a 2D rectangular grid.
In the simplest implementations, the lattice is initialized by creating a 3D
array with these dimensions:
width * height * number_features
This is the U-matrix.
Width and height are chosen by the user; number_features is just the number
of features (columns or fields) in your data.
Intuitively this is just creating a 2D grid of dimensions w * h
(e.g., if w = 10 and h = 10 then your lattice has 100 cells), then
into each cell, placing a random 1D array (sometimes called "reference tuples")
whose size and values are constrained by your data.
The reference tuples are also referred to as weights.
How is the U-matrix rendered?
In my example below, the data is comprised of rgb tuples, so the reference tuples
have length of three and each of the three values must lie between 0 and 255).
It's with this 3D array ("lattice") that you begin the main iterative loop
The algorithm iteratively positions each data point so that it is closest to others similar to it.
If you plot it over time (iteration number) then you can visualize cluster
formation.
The plotting tool i use for this is the brilliant Python library, Matplotlib,
which plots the lattice directly, just by passing it into the imshow function.
Below are eight snapshots of the progress of a SOM algorithm, from initialization to 700 iterations. The newly initialized (iteration_count = 0) lattice is rendered in the top left panel; the result from the final iteration, in the bottom right panel.
Alternatively, you can use a lower-level imaging library (in Python, e.g., PIL) and transfer the reference tuples onto the 2D grid, one at a time:
for y in range(h):
for x in range(w):
img.putpixel( (x, y), (
SOM.Umatrix[y, x, 0],
SOM.Umatrix[y, x, 1],
SOM.Umatrix[y, x, 2])
)
Here img is an instance of PIL's Image class. Here the image is created by iterating over the grid one pixel at a time; for each pixel, putpixel is called on img three times, the three calls of course corresponding to the three values in an rgb tuple.
From the matrix that you create:
u(1) u(1,2) u(2) u(2,3) u(3)
u(1,4) u(1,2,4,5) u(2,5) u(2,3,5,6) u(3,6)
u(4) u(4,5) u(5) u(5,6) u(6)
u(4,7) u(4,5,7,8) u(5,8) u(5,6,8,9) u(6,9)
u(7) u(7,8) u(8) u(8,9) u(9)
The elements with single numbers like u(1), u(2), ..., u(9) as just the elements with more than two numbers like u(1,2,4,5), u(2,3,5,6), ... , u(5,6,8,9) are calculated using something like the mean, median, min or max of the values in the neighborhood.
It's a nice idea calculate the elements with two numbers first, one possible code for that is:
for i in range(self.h_u_matrix):
for j in range(self.w_u_matrix):
nb = (0,0)
if not (i % 2) and (j % 2):
nb = (0,1)
elif (i % 2) and not (j % 2):
nb = (1,0)
self.u_matrix[(i,j)] = np.linalg.norm(
self.weights[i //2, j //2] - self.weights[i //2 +nb[0], j // 2 + nb[1]],
axis = 0
)
In the code above the self.h_u_matrix = self.weights.shape[0]*2 - 1 and self.w_u_matrix = self.weights.shape[1]*2 - 1 are the dimensions of the U-Matrix. With that said, for calculate the others elements it's necessary obtain a list with they neighboors and apply a mean for example. The following code implements that's idea:
for i in range(self.h_u_matrix):
for j in range(self.w_u_matrix):
if not (i % 2) and not (j % 2):
nodelist = []
if i > 0:
nodelist.append((i-1,j))
if i < 4:
nodelist.append((i+1, j))
if j > 0:
nodelist.append((i,j -1))
if j < 4:
nodelist.append((i,j+1))
meanlist = [self.u_matrix[u_node] for u_node in nodelist]
self.u_matrix[(i,j)] = np.mean(meanlist)
elif (i % 2) and (j % 2):
meanlist = [
(i - 1, j),
(i + 1, j),
(i, j - 1),
(i, j + 1)]
self.u_matrix[(i,j)] = np.mean(meanlist)

Resources