GPU-based Laplacian Pyramid - image-processing

I have implemented an image blending method for seamless blending using plain C++. Now I want to convert this code for GPU (using OpenGL ES 2 Shaders for mobile devices). Basically the method creates Gaussian and Laplacian Pyramides for each image which are then combined from low-resolution to top (see also the paper "The Laplacian Pyramid as a Compact Image Code" from Burt et.al. 1983).
My problem is that the Laplacian pyramid levels can have negative values but my devices do not support float or integer type textures (using the ORB_texture_float extension e.g.).
I already looked for papers dealing with GPU-based pyramids but without finding something really useful.
How can I implement such a pyramid efficiently for a GPU?
Is it possible to calculate a Gaussian/Laplacian pyramid level without iterating through the preceding levels?
Regards,
EDIT
It seems as if there is no "good" way to calculate Laplacian Pyramids completely on GPU except using two passes (one for signs, one for values) which do not have support for either signed types (for instance ARB_texture_float) or types larger than byte when the the image's data range is between [0..255]. My Laplacian Pyramid runs perfectly on GPUs with ARB_texture_float extension but without the extension (and some adjustments to compress the range) the pyramid gets "wrong" due to range compression.

The safest way for you to implement a Laplacian pyramid if your textures are unsigned integers is to store two pyramids - one pyramid that contains the gradient magnitude of the Laplacian and another pyramid that stores the sign of the pixel at that location.
Yes. Any level in a Gaussian or Laplacian pyramid has a closed form solution based on the sigma value that you want to compute. Consider the base case of a LoG pyramid computed at intervals of sigma = (2/3). The first level of the pyramid has sigma 2/3 and is produced simply by convolving with a 5x5 LoG filter with sigma 2/3. The second convolution with the same filter produces an LoG image with sigma 4/3, and finally the third has sigma 6/3, or 2, so we subsample the image to produce the next integer level of the pyramid. If you want to compute the LoG of an image at sigma 2, the levels at sigma 2/3 and 4/3 are not necessary - simply subsample the image one time and convolve with an LoG filter with sigma 1.
If you want to compute the LoG at sigma = 20, quad-subsample the image (16 pixel blocks become 1 pixel) to give you a sigma 16 image, then convolve once with a sigma 4/3 LoG filter.

Related

How bilinear interpolation works when down scaling?

I can clearly understand how bilinear interpolation works when up scaling the image, like fill the values while taking 4 nearest neighbours, but i can't understand how it works while down scaling the image. It would mean a lot to me if someone clarify for me.
Scaling an image requires mapping pixels from the input to pixels on the output. If those pixel coordinates don't map to an integer, interpolation is required to estimate what the pixel value would have been. The "Bi" part of bilinear means it's linear interpolation applied in two dimensions independently. If for example output pixel 2,3 needs to come from input coordinates 1.5,7.2 you would interpolate in the X direction by taking 0.5 of each of the pixels at 1.0 and 2.0, then interpolate in the Y direction by taking 0.8 of the pixel at 7.0 and 0.2 of the pixel at 8.0. Usually these operations are combined into a single set of equations, but they can be applied separately if needed.
Bilinear is a poor choice for downscaling because it leads to aliasing artifacts. This is when you attempt to create spatial frequencies that are beyond the Nyquist sampling limit, and high frequency detail turns into low frequency artifacts. You can minimize this by blurring the image before you downscale it. Or you can choose an interpolation algorithm that incorporates some low pass filtering.

Approximating true heightmap gradient magnitude with opencv's Sobel filter

I have an image (cv::Mat, type CV_32F) representing grid-sampled height function. The grid has constant raster (dx,dy) per pixel.
I would like to estimate its gradient magnitude. Using OpenCV's Sobel filter, I approximate derivatives like this:
dfdx=zz.Sobel(zz,cv2.CV_32F,1,0,ksize=3,scale=?)
dfdy=zz.Sobel(zz,cv2.CV_32F,0,1,ksize=3,scale=?)
gradMag=np.sqrt(dfdx**2+dfdy**2)
The scale parameter is barely documented, but looking into the source, it is used to multiply derivative kernels, i.e. the (-1,0,1) for finite differences. Using the 3x3 Sobel kernel, I assumed the scale should then be 1/2*dx or 1/2*dy (finite differences scehme) to obtain derivatives in true scale, but that does not seem to be the case: I was testing this on a synthetic image of hemisphere with different raster but not getting consistent results.
How is scale supposed to be used to incorporate raster dimensions, thus getting real derivative estimates?
Scale must be equal 0.25, from here: OpenCV's Sobel filter - why does it look so bad, especially compared to Gimp?
The normalization divisor for kernels can be calculated by the following fomula:
enter code heref = max(abs(sumNegative), abs(sumPositive))
where sumNegative is the sum of negative values in the kernel and sumPositive the sum of positive values in the kernel.

Relation between imageJ blur radius and OpenCV Gaussian Blur sigma

I am trying to blur a ROI in an image using Gaussian filter and imageJ software.
I am getting the desired result with blur radius as 9 in imageJ.
Now I am trying to write the corresponding OpenCV C++ application to do same operations which I did with imageJ.
The Gaussian Blur signature in openCV is as below:
C++: void GaussianBlur(InputArray src, OutputArray dst, Size ksize, double sigmaX, double sigmaY=0, int borderType=BORDER_DEFAULT )
What is the sigmaX and sigmaY corresponding to ImageJ blur radius of 9?
I tried many resources such as:
Blur Radius
but I am not getting the same results with OpenCV.
Could you please elaborate on how the results are "not the same" ?
The blur radius in ImageJ is defined as "'Radius' means the radius of decay to exp(-0.5) ~ 61%, i.e. the standard deviation sigma of the Gaussian" (coming from ImageJ documentation : https://imagej.nih.gov/ij/developer/api/ij/plugin/filter/GaussianBlur.html#GaussianBlur--)
I see no reason why it should not be implemented the same way in OpenCV.
However, I also observe these differences between ImageJ and OpenCV gaussian blur.
While for the moment I have no solution to make these absolutely the same, I managed to get them closer, and can see one potential difference and one difference for sure in implementation :
Kernel size (potential difference) :
Are you aware that kernel size and gaussian radius are two different things ? Kernel size is the size of the kernel applied to the image (3*3, 5*5 etc), but inside this kernel a gaussian with any radius can theroetically exist. However, kernel size is often chosed such that on the kernel borders, the gaussian function has decayed to about zero.
This being said, ImageJ automatically choses the kernel for you depending on the radius you chose, in order to fulfill the "gaussian decays to zero on borders" condition. The OpenCV function also does that if you set sigma to your desired radius and ksize as zero. The question is "do they both do it the same way ?".
ImageJ's implementation of this is trickier than you might think : "In ImageJ, the size of the kernel actually used depends on the accuracy
needed: With sigma=1, for 16-bit and float images the kernel is 9 pixels
wide (which gives 9x9 for a 2D image), but for 8-bit or RGB images is is
only 7 pixels wide because there is no need for a very high accuracy if
there are only 256 different values. For large values of sigma, the situation is more complex: For sigma >=8, the data are first downscaled, then the Gaussian Blur is applied, and interpolation is used for upscaling to the original number of data points. The downscaling and interpolation algorithms are specially designed for best accuracy.", etc etc (coming from the "ImageJ forum", I can't post the link since I don't have enough reputation, but just google this quote if you want the source)
I do not know if OpenCV does such operations or if it computes the kernel size differently, thus giving different results. (couldn't find it with Google).
Borders (difference for sure) : As you probably know, the gaussian filter goes over every pixel in the image and computes a new value for this pixel based on its neighbors. But what about the pixels close to the borders, where the gaussian kernel is wider than their distance from the image's border ? How do algorithms handle it ? By inspecting my images closer, I found that the main differences between the OCV implementation and the IJ one were on the border pixels.
Well it turns out ImageJ and OpenCV handle these pixels differently :
ImageJ gaussian, "Like all convolution operations in ImageJ, it assumes that out-of-image pixels have a value equal to the nearest edge pixel." (from same ImageJ doc than above).
However, OpenCV lets you chose other options, and the default one, called BORDER_DEFAULT in the OpenCV call, is BORDER_REFLECT_101 (http://docs.opencv.org/3.0-beta/doc/py_tutorials/py_core/py_basic_ops/py_basic_ops.html) (at least I think it is, it is the default border for another method using borders, so I would think it is also the default border for the gaussian). BORDER_REFLECT_101 sort of "mirrors" the borders (gfedcb|abcdefgh, see link).
To get closer to ImageJ (aaaaaaaa|abcdefgh), use BORDER_DEFAULT=BORDER_REPLICATE. With this, I get closer results between the two implementations (though not exactly the same, I will keep investigating and edit my answer if I find more clues).
[Note : I am working in Python2.7 (not C++) and OpenCV 3, but I don't think it has an impact on this problem]

How to apply box filter on integral image? (SURF)

Assuming that I have a grayscale (8-bit) image and assume that I have an integral image created from that same image.
Image resolution is 720x576. According to SURF algorithm, each octave is composed of 4 box filters, which are defined by the number of pixels on their side. The
first octave uses filters with 9x9, 15x15, 21x21 and 27x27 pixels. The
second octave uses filters with 15x15, 27x27, 39x39 and 51x51 pixels.The third octave uses filters with 27x27, 51x51, 75x75 and 99x99 pixels. If the image is sufficiently large and I guess 720x576 is big enough (right??!!), a fourth octave is added, 51x51, 99x99, 147x147 and 195x195. These
octaves partially overlap one another to improve the quality of the interpolated results.
// so, we have:
//
// 9x9 15x15 21x21 27x27
// 15x15 27x27 39x39 51x51
// 27x27 51x51 75x75 99x99
// 51x51 99x99 147x147 195x195
The questions are:What are the values in each of these filters? Should I hardcode these values, or should I calculate them? How exactly (numerically) to apply filters to the integral image?
Also, for calculating the Hessian determinant I found two approximations:
det(HessianApprox) = DxxDyy − (0.9Dxy)^2 anddet(HessianApprox) = DxxDyy − (0.81Dxy)^2Which one is correct?
(Dxx, Dyy, and Dxy are Gaussian second order derivatives).
I had to go back to the original paper to find the precise answers to your questions.
Some background first
SURF leverages a common Image Analysis approach for regions-of-interest detection that is called blob detection.
The typical approach for blob detection is a difference of Gaussians.
There are several reasons for this, the first one being to mimic what happens in the visual cortex of the human brains.
The drawback to difference of Gaussians (DoG) is the computation time that is too expensive to be applied to large image areas.
In order to bypass this issue, SURF takes a simple approach. A DoG is simply the computation of two Gaussian averages (or equivalently, apply a Gaussian blur) followed by taking their difference.
A quick-and-dirty approximation (not so dirty for small regions) is to approximate the Gaussian blur by a box blur.
A box blur is the average value of all the images values in a given rectangle. It can be computed efficiently via integral images.
Using integral images
Inside an integral image, each pixel value is the sum of all the pixels that were above it and on its left in the original image.
The top-left pixel value in the integral image is thus 0, and the bottom-rightmost pixel of the integral image has thus the sum of all the original pixels for value.
Then, you just need to remark that the box blur is equal to the sum of all the pixels inside a given rectangle (not originating in the top-lefmost pixel of the image) and apply the following simple geometric reasoning.
If you have a rectangle with corners ABCD (top left, top right, bottom left, bottom right), then the value of the box filter is given by:
boxFilter(ABCD) = A + D - B - C,
where A, B, C, D is a shortcut for IntegralImagePixelAt(A) (B, C, D respectively).
Integral images in SURF
SURF is not using box blurs of sizes 9x9, etc. directly.
What it uses instead is several orders of Gaussian derivatives, or Haar-like features.
Let's take an example. Suppose you are to compute the 9x9 filters output. This corresponds to a given sigma, hence a fixed scale/octave.
The sigma being fixed, you center your 9x9 window on the pixel of interest. Then, you compute the output of the 2nd order Gaussian derivative in each direction (horizontal, vertical, diagonal). The Fig. 1 in the paper gives you an illustration of the vertical and diagonal filters.
The Hessian determinant
There is a factor to take into account the scale differences. Let's believe the paper that the determinant is equal to:
Det = DxxDyy - (0.9 * Dxy)^2.
Finally, the determinant is given by: Det = DxxDyy - 0.81*Dxy^2.
Look at page 17 of this document
http://www.sci.utah.edu/~fletcher/CS7960/slides/Scott.pdf
If you made a code for normal Gaussian 2D convolution, just use the box filter as a Gaussian kernel and the input image will be the same original image not integral image. The results from this method will be same with the one you asked.

gaussian blur with FFT

im trying to implement a gaussian blur with the use of FFT and could find here the following recipe.
This means that you can take the
Fourier transform of the image and the
filter, multiply the (complex)
results, and then take the inverse
Fourier transform.
I've got a kernel K, a 7x7 Matrix
and a Image I, a 512x512 Matrix.
I do not understand how to multiply K by I.
Is the only way to do that by making K as big as I (512x512) ?
Yes, you do need to make K as big as I by padding it with zeros. Also, after padding, but before you take the FFT of the kernel, you need to translate it with wraparound, such that the center of the kernel (the peak of the Gaussian) is at (0,0). Otherwise, your filtered image will be translated. Alternatively, you can translate the resulting filtered image once you are done.
Another point: for small kernels not using the FFT may actually be faster. A 2D Gaussian kernel is separable, meaning that you can separate it into two 1D kernels for x and y. Then instead of a 2D convolution, you can do two 1D convolutions in x and y directions in the spatial domain. For smaller kernels that may end up being faster than doing the convolution in the frequency domain using the FFT.
If you are comfortable with pixel shader and if FFT is not your main goal here, but convolution with gaussian blur kernel IS,- then i can recommend my tutorial on what convolution is
regards.

Resources