I have just started studying the Viola-Jones face detection algorithm to design a face recognition system. off all the things i understood, I have confusion regarding the phrase:"sum of pixels". Does it means sum of colors at given pixels or the sum of distance of the given pixels?
Generally if you see something like that they're talking about the value of a pixel (its intensity). According to OpenCV, the value of a pixel is calculated as 0.299 R + 0.587 G + 0.114 B. This is how OpenCV converts to grayscale, btw. So when they talk about the sum of pixels, they're likely talking about the sum of the pixel values in a given region (i.e. for a 3x3 region, take the value of each pixel and sum it up).
The sum of pixels is the cumulative sum of pixels along the 2 dimensions of the image
Please check cumsum function in Matlab.
For example:
I = cumsum(cumsum(double(image)),2)
Check out this link for some good info on the Viola Jones Face Detection technique
Good Luck!
Related
In this Distill article (https://distill.pub/2017/feature-visualization/) in footnote 8 authors write:
The Fourier transforms decorrelates spatially, but a correlation will still exist
between colors. To address this, we explicitly measure the correlation between colors
in the training set and use a Cholesky decomposition to decorrelate them.
I have trouble understanding how to do that. I understand that for an arbitrary image I can calculate a correlation matrix by interpreting the image's shape as [channels, width*height] instead of [channels, height, width]. But how to take the whole dataset into account? It can be averaged over, but that doesn't have anything to do with Cholesky decomposition.
Inspecting the code confuses me even more (https://github.com/tensorflow/lucid/blob/master/lucid/optvis/param/color.py#L24). There's no code for calculating correlations, but there's a hard-coded version of the matrix (and the decorrelation happens by matrix multiplication with this matrix). The matrix is named color_correlation_svd_sqrt, which has svd inside of it, and SVD wasn't mentioned anywhere else. Also the matrix there is non-triangular, which means that it hasn't come from the Cholesky decomposition.
Clarifications on any points I've mentioned would be greatly appreciated.
I figured out the answer to your question here: How to calculate the 3x3 covariance matrix for RGB values across an image dataset?
In short, you calculate the RGB covariance matrix for the image dataset and then do the following calculations
U,S,V = torch.svd(dataset_rgb_cov_matrix)
epsilon = 1e-10
svd_sqrt = U # torch.diag(torch.sqrt(S + epsilon))
I know that we take a 16x16 window of "in-between" pixels around the key point. we split that window into sixteen 4x4 windows. From each 4x4 window, we generate a histogram of 8 bins. Each bin corresponding to 0-44 degrees, 45-89 degrees, etc. Gradient orientations from the 4x4 are put into these bins. This is done for all 4x4 blocks. Finally, we normalize the 128 values you get.
Where they get their value
but I misunderstand where the 128 number get their value from? did it refer to the corresponding magnitude of the orientation value or what?
I would be grateful if anyone describes any numerical example Regards!
In SIFT (Scale-Invariant Feature Transform), the 128 dimensional feature vector is made up of 4x4 samples per window in 8 directions per sample -- 4x4x8 = 128.
For an illustrated guide see A Short introduction to descriptors, and in particular this image, showing 8-direction measurements (cardinal and inter-cardinal) embedded in each of the 4x4 grid squares (center image) and then a histogram of directions (right image):
From your question I believe you are also unclear on what the information inside the descriptor is -- it is called Histograms of Oriented Gradients (HOG). For further reading, Wikipedia has an overview of HOG gradient computation:
Each pixel within the cell casts a weighted vote for an orientation-based histogram channel based on the values found in the gradient computation.
Everything is built on those per-pixel "votes".
I found on the internet that laplacian method is quite good technique to compute the sharpness of a image. I was trying to implement it in opencv 2.4.10. How can I get the sharpness measure after applying the Laplacian function? Below is the code:
Mat src_gray, dst;
int kernel_size = 3;
int scale = 1;
int delta = 0;
int ddepth = CV_16S;
GaussianBlur( src, src, Size(3,3), 0, 0, BORDER_DEFAULT );
/// Convert the image to grayscale
cvtColor( src, src_gray, CV_RGB2GRAY );
/// Apply Laplace function
Mat abs_dst;
Laplacian( src_gray, dst, ddepth, kernel_size, scale, delta, BORDER_DEFAULT );
//compute sharpness
??
Can someone please guide me on this?
Possible duplicate of: Is there a way to detect if an image is blurry?
so your focus measure is:
cv::Laplacian(src_gray, dst, CV_64F);
cv::Scalar mu, sigma;
cv::meanStdDev(dst, mu, sigma);
double focusMeasure = sigma.val[0] * sigma.val[0];
Edit #1:
Okay, so a well focused image is expected to have sharper edges, so the use of image gradients are instrumental in order to determine a reliable focus measure. Given an image gradient, the focus measure pools the data at each point as an unique value.
The use of second derivatives is one technique for passing the high spatial frequencies, which are associated with sharp edges. As a second derivative operator we use the Laplacian operator, that is approximated using the mask:
To pool the data at each point, we use two methods. The first one is the sum of all the absolute values, driving to the following focus measure:
where L(m, n) is the convolution of the input image I(m, n) with the mask L. The second method calculates the variance of the absolute values, providing a new focus measure given by:
where L overline is the mean of absolute values.
Read the article
J.L. Pech-Pacheco, G. Cristobal, J. Chamorro-Martinez, J.
Fernandez-Valdivia, "Diatom autofocusing in brightfield microscopy: a
comparative study", 15th International Conference on Pattern
Recognition, 2000. (Volume:3 )
for more information.
Not exactly the answer, but I got a formula using an intuitive approach that worked on the wild.
I'm currently working in a script to detect multiple faces in a picture with a crowd, using mtcnn , which it worked very well, however it also detected many faces so blurry that you couldn't say it was properly a face.
Example image:
Faces detected:
Matrix of detected faces:
mtcnn detected about 123 faces, however many of them had little resemblance as a face. In fact, many faces look more like a stain than anything else...
So I was looking a way of 'filtering' those blurry faces. I tried the Laplacian filter and FFT way of filtering I found on this answer , however I had inconsistent results and poor filtering results.
I turned my research in computer vision topics, and finally tried to implement an 'intuitive' way of filtering using the following principle:
When more blurry is an image, less 'edges' we have
If we compare a crisp image with a blurred version of the same image, the results tends to 'soften' any edges or adjacent contrasting regions. Based on that principle, I was finding a way of weighting edges and then a simple way of 'measuring' the results to get a confidence value.
I took advantage of Canny detection in OpenCV and then apply a mean value of the result (Python):
def getBlurValue(image):
canny = cv2.Canny(image, 50,250)
return np.mean(canny)
Canny return 2x2 array same image size . I selected threshold 50,250 but it can be changed depending of your image and scenario.
Then I got the average value of the canny result, (definitively a formula to be improved if you know what you're doing).
When an image is blurred the result will get a value tending to zero, while crisp image tend to be a positive value, higher when crisper is the image.
This value depend on the images and threshold, so it is not a universal solution for every scenario, however a best value can be achieved normalizing the result and averaging all the faces (I need more work on that subject).
In the example, the values are in the range 0-27.
I averaged all faces and I got about a 3.7 value of blur
If I filter images above 3.7:
So I kept with mosth crisp faces:
That consistently gave me better results than the other tests.
Ok, you got me. This is a tricky way of detecting a blurriness values inside the same image space. But I hope people can take advantage of this findings and apply what I learned in its own projects.
I am not able to under stand the formula ,
What is W (window) and intensity in the formula mean,
I found this formula in opencv doc
http://docs.opencv.org/trunk/doc/py_tutorials/py_feature2d/py_features_harris/py_features_harris.html
For a grayscale image, intensity levels (0-255) tells you how bright is the pixel..hope that you already know about it.
So, now the explanation of your formula is below:
Aim: We want to find those points which have maximum variation in terms of intensity level in all direction i.e. the points which are very unique in a given image.
I(x,y): This is the intensity value of the current pixel which you are processing at the moment.
I(x+u,y+v): This is the intensity of another pixel which lies at a distance of (u,v) from the current pixel (mentioned above) which is located at (x,y) with intensity I(x,y).
I(x+u,y+v) - I(x,y): This equation gives you the difference between the intensity levels of two pixels.
W(u,v): You don't compare the current pixel with any other pixel located at any random position. You prefer to compare the current pixel with its neighbors so you chose some value for "u" and "v" as you do in case of applying Gaussian mask/mean filter etc. So, basically w(u,v) represents the window in which you would like to compare the intensity of current pixel with its neighbors.
This link explains all your doubts.
For visualizing the algorithm, consider the window function as a BoxFilter, Ix as a Sobel derivative along x-axis and Iy as a Sobel derivative along y-axis.
http://docs.opencv.org/doc/tutorials/imgproc/imgtrans/sobel_derivatives/sobel_derivatives.html will be useful to understand the final equations in the above pdf.
I want algorithm for recognizing multiple no of shapes(Specially rectangle and squares) in a picture.Preferably I am using C# so, I am looking forward for solutions in C#.
check aforgenet....
http://www.aforgenet.com/forum/
If you are looking for a library that does a lot of image processing for you there is always OpenCV. I think it is is c++ though.
You can use the Circularity algorithm as a first approach, which is very easy to compute:
C = p2/a where p is the perimeter (border area) and a is shape area.
To know how to read/write pixels quickly, take a look here
Alternatively look for shape signature algorithm available at Rafael Gonzales book. In this algorithm you compute the center of the object using central momentum, the you compute the distance between the center and each border pixel. You'll end up with a 1D signal where peaks represent bigger distance from the center. In a square, you have 4 symmetric peaks while in a rectangle 2 big peaks and 2 smaller ones.