I want to implement an OpenCV version of VL_PHOW() (matlab src code) from VLFeat. In few words, it's dense SIFT with multiple scales (increasing SIFT descriptor bin size) to make it scale invariant.
However, the authors suggests to apply a Gaussian kernel to improve the results. In paritcular, the Magnif parameter describe it:
Magnif 6 The image is smoothed by a Gaussian kernel of standard
deviation SIZE / MAGNIF. Note that, in the standard SIFT descriptor,
the magnification value is 3; here the default one is 6 as it seems to
perform better in applications.
And this is the relevant matlab code:
% smooth the image to the appropriate scale based on the size
% of the SIFT bins
sigma = opts.sizes(si) / opts.magnif ;
ims = vl_imsmooth(im, sigma) ;
My question is: how can I implement this in OpenCV? The equivalent function in OpenCV seems to be GaussianBlur, but I can't figure out how to represent the code above in terms of this function.
Related
I have an image (cv::Mat, type CV_32F) representing grid-sampled height function. The grid has constant raster (dx,dy) per pixel.
I would like to estimate its gradient magnitude. Using OpenCV's Sobel filter, I approximate derivatives like this:
dfdx=zz.Sobel(zz,cv2.CV_32F,1,0,ksize=3,scale=?)
dfdy=zz.Sobel(zz,cv2.CV_32F,0,1,ksize=3,scale=?)
gradMag=np.sqrt(dfdx**2+dfdy**2)
The scale parameter is barely documented, but looking into the source, it is used to multiply derivative kernels, i.e. the (-1,0,1) for finite differences. Using the 3x3 Sobel kernel, I assumed the scale should then be 1/2*dx or 1/2*dy (finite differences scehme) to obtain derivatives in true scale, but that does not seem to be the case: I was testing this on a synthetic image of hemisphere with different raster but not getting consistent results.
How is scale supposed to be used to incorporate raster dimensions, thus getting real derivative estimates?
Scale must be equal 0.25, from here: OpenCV's Sobel filter - why does it look so bad, especially compared to Gimp?
The normalization divisor for kernels can be calculated by the following fomula:
enter code heref = max(abs(sumNegative), abs(sumPositive))
where sumNegative is the sum of negative values in the kernel and sumPositive the sum of positive values in the kernel.
I am trying to blur a ROI in an image using Gaussian filter and imageJ software.
I am getting the desired result with blur radius as 9 in imageJ.
Now I am trying to write the corresponding OpenCV C++ application to do same operations which I did with imageJ.
The Gaussian Blur signature in openCV is as below:
C++: void GaussianBlur(InputArray src, OutputArray dst, Size ksize, double sigmaX, double sigmaY=0, int borderType=BORDER_DEFAULT )
What is the sigmaX and sigmaY corresponding to ImageJ blur radius of 9?
I tried many resources such as:
Blur Radius
but I am not getting the same results with OpenCV.
Could you please elaborate on how the results are "not the same" ?
The blur radius in ImageJ is defined as "'Radius' means the radius of decay to exp(-0.5) ~ 61%, i.e. the standard deviation sigma of the Gaussian" (coming from ImageJ documentation : https://imagej.nih.gov/ij/developer/api/ij/plugin/filter/GaussianBlur.html#GaussianBlur--)
I see no reason why it should not be implemented the same way in OpenCV.
However, I also observe these differences between ImageJ and OpenCV gaussian blur.
While for the moment I have no solution to make these absolutely the same, I managed to get them closer, and can see one potential difference and one difference for sure in implementation :
Kernel size (potential difference) :
Are you aware that kernel size and gaussian radius are two different things ? Kernel size is the size of the kernel applied to the image (3*3, 5*5 etc), but inside this kernel a gaussian with any radius can theroetically exist. However, kernel size is often chosed such that on the kernel borders, the gaussian function has decayed to about zero.
This being said, ImageJ automatically choses the kernel for you depending on the radius you chose, in order to fulfill the "gaussian decays to zero on borders" condition. The OpenCV function also does that if you set sigma to your desired radius and ksize as zero. The question is "do they both do it the same way ?".
ImageJ's implementation of this is trickier than you might think : "In ImageJ, the size of the kernel actually used depends on the accuracy
needed: With sigma=1, for 16-bit and float images the kernel is 9 pixels
wide (which gives 9x9 for a 2D image), but for 8-bit or RGB images is is
only 7 pixels wide because there is no need for a very high accuracy if
there are only 256 different values. For large values of sigma, the situation is more complex: For sigma >=8, the data are first downscaled, then the Gaussian Blur is applied, and interpolation is used for upscaling to the original number of data points. The downscaling and interpolation algorithms are specially designed for best accuracy.", etc etc (coming from the "ImageJ forum", I can't post the link since I don't have enough reputation, but just google this quote if you want the source)
I do not know if OpenCV does such operations or if it computes the kernel size differently, thus giving different results. (couldn't find it with Google).
Borders (difference for sure) : As you probably know, the gaussian filter goes over every pixel in the image and computes a new value for this pixel based on its neighbors. But what about the pixels close to the borders, where the gaussian kernel is wider than their distance from the image's border ? How do algorithms handle it ? By inspecting my images closer, I found that the main differences between the OCV implementation and the IJ one were on the border pixels.
Well it turns out ImageJ and OpenCV handle these pixels differently :
ImageJ gaussian, "Like all convolution operations in ImageJ, it assumes that out-of-image pixels have a value equal to the nearest edge pixel." (from same ImageJ doc than above).
However, OpenCV lets you chose other options, and the default one, called BORDER_DEFAULT in the OpenCV call, is BORDER_REFLECT_101 (http://docs.opencv.org/3.0-beta/doc/py_tutorials/py_core/py_basic_ops/py_basic_ops.html) (at least I think it is, it is the default border for another method using borders, so I would think it is also the default border for the gaussian). BORDER_REFLECT_101 sort of "mirrors" the borders (gfedcb|abcdefgh, see link).
To get closer to ImageJ (aaaaaaaa|abcdefgh), use BORDER_DEFAULT=BORDER_REPLICATE. With this, I get closer results between the two implementations (though not exactly the same, I will keep investigating and edit my answer if I find more clues).
[Note : I am working in Python2.7 (not C++) and OpenCV 3, but I don't think it has an impact on this problem]
I have implemented an image blending method for seamless blending using plain C++. Now I want to convert this code for GPU (using OpenGL ES 2 Shaders for mobile devices). Basically the method creates Gaussian and Laplacian Pyramides for each image which are then combined from low-resolution to top (see also the paper "The Laplacian Pyramid as a Compact Image Code" from Burt et.al. 1983).
My problem is that the Laplacian pyramid levels can have negative values but my devices do not support float or integer type textures (using the ORB_texture_float extension e.g.).
I already looked for papers dealing with GPU-based pyramids but without finding something really useful.
How can I implement such a pyramid efficiently for a GPU?
Is it possible to calculate a Gaussian/Laplacian pyramid level without iterating through the preceding levels?
Regards,
EDIT
It seems as if there is no "good" way to calculate Laplacian Pyramids completely on GPU except using two passes (one for signs, one for values) which do not have support for either signed types (for instance ARB_texture_float) or types larger than byte when the the image's data range is between [0..255]. My Laplacian Pyramid runs perfectly on GPUs with ARB_texture_float extension but without the extension (and some adjustments to compress the range) the pyramid gets "wrong" due to range compression.
The safest way for you to implement a Laplacian pyramid if your textures are unsigned integers is to store two pyramids - one pyramid that contains the gradient magnitude of the Laplacian and another pyramid that stores the sign of the pixel at that location.
Yes. Any level in a Gaussian or Laplacian pyramid has a closed form solution based on the sigma value that you want to compute. Consider the base case of a LoG pyramid computed at intervals of sigma = (2/3). The first level of the pyramid has sigma 2/3 and is produced simply by convolving with a 5x5 LoG filter with sigma 2/3. The second convolution with the same filter produces an LoG image with sigma 4/3, and finally the third has sigma 6/3, or 2, so we subsample the image to produce the next integer level of the pyramid. If you want to compute the LoG of an image at sigma 2, the levels at sigma 2/3 and 4/3 are not necessary - simply subsample the image one time and convolve with an LoG filter with sigma 1.
If you want to compute the LoG at sigma = 20, quad-subsample the image (16 pixel blocks become 1 pixel) to give you a sigma 16 image, then convolve once with a sigma 4/3 LoG filter.
im trying to implement a gaussian blur with the use of FFT and could find here the following recipe.
This means that you can take the
Fourier transform of the image and the
filter, multiply the (complex)
results, and then take the inverse
Fourier transform.
I've got a kernel K, a 7x7 Matrix
and a Image I, a 512x512 Matrix.
I do not understand how to multiply K by I.
Is the only way to do that by making K as big as I (512x512) ?
Yes, you do need to make K as big as I by padding it with zeros. Also, after padding, but before you take the FFT of the kernel, you need to translate it with wraparound, such that the center of the kernel (the peak of the Gaussian) is at (0,0). Otherwise, your filtered image will be translated. Alternatively, you can translate the resulting filtered image once you are done.
Another point: for small kernels not using the FFT may actually be faster. A 2D Gaussian kernel is separable, meaning that you can separate it into two 1D kernels for x and y. Then instead of a 2D convolution, you can do two 1D convolutions in x and y directions in the spatial domain. For smaller kernels that may end up being faster than doing the convolution in the frequency domain using the FFT.
If you are comfortable with pixel shader and if FFT is not your main goal here, but convolution with gaussian blur kernel IS,- then i can recommend my tutorial on what convolution is
regards.
When applying a Gaussian blur to an image, typically the sigma is a parameter (examples include Matlab and ImageJ).
How does one know what sigma should be? Is there a mathematical way to figure out an optimal sigma? In my case, i have some objects in images that are bright compared to the background, and I need to find them computationally. I am going to apply a Gaussian filter to make the center of these objects even brighter, which hopefully facilitates finding them. How can I determine the optimal sigma for this?
There's no formula to determine it for you; the optimal sigma will depend on image factors - primarily the resolution of the image and the size of your objects in it (in pixels).
Also, note that Gaussian filters aren't actually meant to brighten anything; you might want to look into contrast maximization techniques - sounds like something as simple as histogram stretching could work well for you.
edit: More explanation - sigma basically controls how "fat" your kernel function is going to be; higher sigma values blur over a wider radius. Since you're working with images, bigger sigma also forces you to use a larger kernel matrix to capture enough of the function's energy. For your specific case, you want your kernel to be big enough to cover most of the object (so that it's blurred enough), but not so large that it starts overlapping multiple neighboring objects at a time - so actually, object separation is also a factor along with size.
Since you mentioned MATLAB - you can take a look at various gaussian kernels with different parameters using the fspecial('gaussian', hsize, sigma) function, where hsize is the size of the kernel and sigma is, well, sigma. Try varying the parameters to see how it changes.
I use this convention as a rule of thumb. If k is the size of kernel than sigma=(k-1)/6 . This is because the length for 99 percentile of gaussian pdf is 6sigma.
You have to find a min/max of a function G such that G(X,sigma) where X is a set of your observations (in your case, your image grayscale values) , This function can be anything that maintain the "order" of the intensities of the iamge, for example, this can be done with the 1st derivative of the image (as G),
fil = fspecial('sobel');
im = imfilter(I,fil);
imagesc(im);
colormap = gray;
this gives you the result of first derivative of an image, now you want to find max sigma by
maximzing G(X,sigma), that means that you are trying a few sigmas (let say, in increasing order) until you reach a sigma that makes G maximal. This can also be done with second derivative.
Given the central value of the kernel equals 1 the dimension that guarantees to have the outermost value less than a limit (e.g 1/100) is as follows:
double limit = 1.0 / 100.0;
size = static_cast<int>(2 * std::ceil(sqrt(-2.0 * sigma * sigma * log(limit))));
if (size % 2 == 0)
{
size++;
}