I've implemented a gaussian blur fragment shader in GLSL. I understand the main concepts behind all of it: convolution, separation of x and y using linearity, multiple passes to increase radius...
I still have a few questions though:
What's the relationship between sigma and radius?
I've read that sigma is equivalent to radius, I don't see how sigma is expressed in pixels. Or is "radius" just a name for sigma, not related to pixels?
How do I choose sigma?
Considering I use multiple passes to increase sigma, how do I choose a good sigma to obtain the sigma I want at any given pass? If the resulting sigma is equal to the square root of the sum of the squares of the sigmas and sigma is equivalent to radius, what's an easy way to get any desired radius?
What's the good size for a kernel, and how does it relate to sigma?
I've seen most implementations use a 5x5 kernel. This is probably a good choice for a fast implementation with decent quality, but is there another reason to choose another kernel size? How does sigma relate to the kernel size? Should I find the best sigma so that coefficients outside my kernel are negligible and just normalize?
What's the relationship between sigma and radius?
I think your terms here are interchangeable depending on your implementation. For most glsl implementations of Gaussian blur they use the sigma value to define the amount of blur. In the Gaussian blur definition the radius can be considered the 'blur radius'. These terms are in pixel space.
How do I choose sigma?
This will define how much blur you want, which corresponds to the size of the kernel to be used in the convolution. Bigger values will result in more blurring.
The NVidia implementation uses a kernel size of int(sigma*3).
You may experiment using a smaller kernel size with higher values of sigma for performance considerations. These are free parameters to experiment with, which define how many pixels to use for modulation and how much of the corresponding pixel to include in the result.
What's the good size for a kernel, and how does it relate to sigma?
Based on the sigma value you will want to choose a corresponding kernel size. The kernel size will determine how many pixels to sample during the convolution and the sigma will define how much to modulate them by.
You may want to post some code for a more detailed explanation. NVidia has a pretty good chapter on how to build a Gaussian Kernel.
Look at Example 40-1.
Related
I know usually people prefer to choose the odd number as the size of Gaussian Filtering, and since the image made of discrete pixels, we can always locate the central pixel.
But what if the size is an even number? There will lead to several questions:
how will the Gaussian filter be, should it be symmetric or asymmetric?
what if the size number equals to 2?
Thank you.
There really is no such choice to be made.
A Gaussian filtering kernel that is shifted will result in a smoothing + a shift of the image. If you want a filter that doesn’t shift the image, the filter must have the origin of the Gaussian at the origin of the kernel, typically the central pixel of an odd-sized kernel.
Once we have established that, using an even-sized filter must lead to an asymetrical kernel. It is not really desirable to have an asymmetrical smoothing filter (unless we’re talking about adaptive filtering) because the asymmetry introduces a bias.
So, we’re stuck with an odd-sized filter. An even-sized filter will introduce either a bias or a shift of half a pixel.
A 2-pixel kernel cannot be a Gaussian filter because it takes at least 5 samples to represent a Gaussian kernel with sufficient detail for it to present the positive aspects of the Gaussian filter. With fewer samples, the filter will not behave like a Gaussian filter.
For more information about Gaussian filtering, I recommend that you read this blog post that I wrote 10 years ago.
In Convolutional Neural Network (CNN), a filter is select for weights sharing. For example, in the following pictures, a 3x3 window with the stride (distance between adjacent neurons) 1 is chosen.
So my question is: How to choose the window size? If I use 4x4 with the stride being 2, how much difference will it cause? Thanks a lot in advance!
There's no definite answer to this: filter size is one of hyperparameters you generally need to tune. However, there're some useful observations, that may help you. It's often preferred to choose smaller filters, but have greater number of those.
Example: four 5x5 filters have 100 parameters (ignoring bias), while 10 3x3 filters have 90 parameters. Through the larger of filters you still can capture the variety of features in the image, but with fewer parameters. More on this here.
Modern CNNs go even further with this idea and choose consecutive 3x1 and 1x3 convolutional layers. This reduces the number of parameters even more, but doesn't affect the performance. See the evolution of inception network.
The choice of stride is also important, but it affects the tensor shape after the convolution, hence the whole network. The general rule is to use stride=1 in usual convolutions and preserve the spatial size with padding, and use stride=2 when you want to downsample the image.
I know about Gaussian, varaince, image blurring and i think that i understood the concept of variance at Gaussian blur but still i am not 100% sure.
I just want to know the role of sigma or variance at Gaussian smoothing. I mean, what happens by increasing the value of sigma for the same window size..and why it happens?
It would be really helpful if somebody provide some nice literature about it. (I already tried few but couldn't find what i am looking for)
Major confusion:
Higher frequency-> details (e.g. noise),
Lower Frequency-> kind of overview of the image.
By increasing sigma, we are allowing some higher frequencies....so we should get more detailed with increasing frequency but the case is opposite, when we increase sigma, the image becomes more blurry.
I think it should be done in the following steps, first from the signal processing point of view:
Gaussian Filter is a low pass filter. Low pass filters as their names imply pass low frequencies - keeping low frequencies. So when we look at the image in the frequency domain the highest frequencies happen in the edges(places that there is a high change in intensity and each intensity value corresponds to a specific visible frequency).
The role of sigma in the Gaussian filter is to control the variation
around its mean value. So as the Sigma becomes larger the more variance allowed around mean and as the Sigma becomes smaller the less variance allowed around mean.
Filtering in the spatial domain is done through convolution. it simply
means that we apply a kernel on every pixel in the image. The law exists for kernels. Their sum has to be zero.
Now putting all together! When we apply a Gaussian filter to an image, we are doing a low pass filtering. But as you know this happen in the discrete domain(image pixels). So we have to quantize our Gaussian filter in order to make a Gaussian kernel. In the quantization step, as the Gaussian filter(GF) has a small sigma it has the steepest pick. So the more weights will be focused in the center and the less around it.
In the sense of natural image statistics! The scientists in this field of studies showed that our vision system is a kind of Gaussian filter in the responses to the images. see for example take a look at a broad scene! don't pay attention to a specific point! so you see a broad scene with lots things in it. but the details are not clear! Now see a specific point in that seen. you see more details that previously you didn't. This is the Sigma appear here. when you increase the sigma you are looking to the broad scene without paying attention to the details exits. and when you decrease the value you will get more details.
I think Wikipedia can help more than me, Low Pass Filters, Guassian Blur
Put simply, increasing the sigma terms will cast a broader net over the neighboring pixels and decrease the impact of the pixels nearest the pixel of interest, e.g. it makes a blurrier image.
I´m trying to make an implementation of Gaussian blur for a school project.
I need to make both a CPU and a GPU implementation to compare performance.
I am not quite sure that I understand how Gaussian blur works. So one of my questions is
if I have understood it correctly?
Heres what I do now:
I use the equation from wikipedia http://en.wikipedia.org/wiki/Gaussian_blur to calculate
the filter.
For 2d I take RGB of each pixel in the image and apply the filter to it by
multiplying RGB of the pixel and the surrounding pixels with the associated filter position.
These are then summed to be the new pixel RGB values.
For 1d I apply the filter first horizontally and then vetically, which should give
the same result if I understand things correctly.
Is this result exactly the same result as when the 2d filter is applied?
Another question I have is about how the algorithm can be optimized.
I have read that the Fast Fourier Transform is applicable to Gaussian blur.
But I can't figure out how to relate it.
Can someone give me a hint in the right direction?
Thanks.
Yes, the 2D Gaussian kernel is separable so you can just apply it as two 1D kernels. Note that you can't apply these operations "in place" however - you need at least one temporary buffer to store the result of the first 1D pass.
FFT-based convolution is a useful optimisation when you have large kernels - this applies to any kind of filter, not just Gaussian. Just how big "large" is depends on your architecture, but you probably don't want to worry about using an FFT-based approach for anything smaller than, say, a 49x49 kernel. The general approach is:
FFT the image
FFT the kernel, padded to the size of the image
multiply the two in the frequency domain (equivalent to convolution in the spatial domain)
IFFT (inverse FFT) the result
Note that if you're applying the same filter to more than one image then you only need to FFT the padded kernel once. You still have at least two FFTs to perform per image though (one forward and one inverse), which is why this technique only becomes a computational win for large-ish kernels.
I do not understand what a convolution kernel is and how I would apply a convolution matrix to pixels in an image (I am talking about doing a Gaussian Blur operation on an image).
Also could I get an explanation on how to create a kernel for a Gaussian Blur operation?
I am reading this article but I cannot seem to understand how things are done...
Thanks to anyone who takes time to explain this to me :),
ExtremeCoder
The basic idea is that the new pixels of the image are created by an weighted average of the pixels close to it (imagine drawing a circle around the pixel).
For each pixel in the image you are going to create a little square around the pixel. Lets say you take the 8 neighbors next to a pixel (including diagonals even though do not matter here), and we perform a weighted average to get the middle pixel.
In the Gaussian blur case it breaks down to two one dimensional operations. For each pixel take the some amount of pixels next to a pixel in the row direction only. Multiply the pixel values time the weights computed from the Gaussian distribution (or if you are doing this for an visual effect and not for a scientific reason, the weights can anything that looks good) and sum them up. Another way to look at it is the pixel make a vector and the weights make a vector and your are taking the dot product. Repeat this process in the column direction as a separate pass.
A convolution kernel is a matrix of values that specify how the neighborhood of a pixel contribute to that pixel's state in the final image. There's a fair description of the basics here. A gaussian blur is a convolution function that uses a really ugly (you've seen the wikipedia page) function to compute a convolution kernel to pass over the image. You'll find an example kernel for a gaussian in that wikipedia page.
The point of all the math in there is to produce a soft blur that resembles the scatter pattern produced by a mesh screen placed between the viewer and the image. You can think of the 'size' (the standard deviation) of the gaussian as being related to the distance between the image and the screen.
Here's an awesome tool, if you don't want to calculate it all by yourself (like me):
http://www.embege.com/gauss/
EDIT
Since the link seems to be broken now, here's a link to archive.org:
http://web.archive.org/web/20150217075657/http://www.embege.com/gauss