I can clearly understand how bilinear interpolation works when up scaling the image, like fill the values while taking 4 nearest neighbours, but i can't understand how it works while down scaling the image. It would mean a lot to me if someone clarify for me.
Scaling an image requires mapping pixels from the input to pixels on the output. If those pixel coordinates don't map to an integer, interpolation is required to estimate what the pixel value would have been. The "Bi" part of bilinear means it's linear interpolation applied in two dimensions independently. If for example output pixel 2,3 needs to come from input coordinates 1.5,7.2 you would interpolate in the X direction by taking 0.5 of each of the pixels at 1.0 and 2.0, then interpolate in the Y direction by taking 0.8 of the pixel at 7.0 and 0.2 of the pixel at 8.0. Usually these operations are combined into a single set of equations, but they can be applied separately if needed.
Bilinear is a poor choice for downscaling because it leads to aliasing artifacts. This is when you attempt to create spatial frequencies that are beyond the Nyquist sampling limit, and high frequency detail turns into low frequency artifacts. You can minimize this by blurring the image before you downscale it. Or you can choose an interpolation algorithm that incorporates some low pass filtering.
Related
I am looking for a workflow that would clean (and possibly straighten) old and badly scanned images of musical scores (like the one below).
I've tried to use denoise, hough filters, imagemagick geometry filters, and I am struggling to identify the series of filters that remove the scanner noise/bias.
Just some quick ideas:
Remove grayscale noise: Do a low pass filter (darks), since the music is darker than a lot of the noise. Remaining noise is mostly vertical lines.
Rotate image: Sum grayscale values for each column of the image. You'll get a vector with the total pixel lightness in that column. Use gradient descent or search on the rotation of the image (within some bounds like +/-15deg rotation) to maximize the variance of that vector. Idea here is that the vertical noise lines indicate vertical alignment, and so we want the columns of the image to align with these noise lines (= maximized variance).
Remove vertical line noise: After rotation, take median value of each column. The greater the distance (squared difference) a pixel is from that median darkness, the more confident we are it is its true color (e.g. a pure white or black pixel when vertical noise was gray). Since noise is non-white, you could try blending this distance by the whiteness of the median for an alternative confidence metric. Ideally, I think here you'd train some 7x7x2 convolution filter (2 channels being pixel value and distance from median) to estimate true value of the pixel. That would be the most minimal machine learning approach, not using some full-fledged NN. However, given your lack of training data, we'll have to come up with our own heuristic for what the true pixel value is. You likely will need to play around with it, but here's what I think might work:
Set some threshold of confidence; above that threshold we take the value as is. Below the threshold, set to white (the binary expected pixel value for the entire page).
For all values below threshold, take the max confidence value within a +/-2 pixels L1 distance (e.g. 5x5 convolution) as that pixel's value. Seems like features are separated by at least 2 pixels, but for lower resolutions that window size may need to be adjusted. Since white pixels may end up being more confident overall, you could experiment with prioritizing darker pixels (increase their confidence somehow).
Clamp the image contrast and maybe run another low pass filter.
I would like to know the difference between contrast stretching and histogram equalization.
I have tried both using OpenCV and observed the results, but I still have not understood the main differences between the two techniques. Insights would be of much needed help.
Lets Define Contrast first,
Contrast is a measure of the “range” of an image; i.e. how spread its intensities are. It has many formal definitions one famous is Michelson’s:
He says contrast = ( Imax - Imin )/( Imax + I min )
Contrast is strongly tied to an image’s overall visual quality.
Ideally, we’d like images to use the entire range of values available
to them.
Contrast Stretching and Histogram Equalisation have the same goal: making the images to use entire range of values available to them.
But they use different techniques.
Contrast Stretching works like mapping
it maps minimum intensity in the image to the minimum value in the range( 84 ==> 0 in the example above )
With the same way, it maps maximum intensity in the image to the maximum value in the range( 153 ==> 255 in the example above )
This is why Contrast Stretching is un-reliable, if there exist only two pixels have 0 and 255 intensity, it is totally useless.
However a better approach is Histogram Equalisation which uses probability distribution. You can learn the steps here
I came across the following points after some reading.
Contrast stretching is all about increasing the difference between the maximum intensity value in an image and the minimum one. All the rest of the intensity values are spread out between this range.
Histogram equalization is about modifying the intensity values of all the pixels in the image such that the histogram is "flattened" (in reality, the histogram can't be exactly flattened, there would be some peaks and some valleys, but that's a practical problem).
In contrast stretching, there exists a one-to-one relationship of the intensity values between the source image and the target image i.e., the original image can be restored from the contrast-stretched image.
However, once histogram equalization is performed, there is no way of getting back the original image.
In Histogram equalization, you want to flatten the histogram into a uniform distribution.
In contrast stretching, you manipulate the entire range of intensity values. Like what you do in Normalization.
Contrast stretching is a linear normalization that stretches an arbitrary interval of the intensities of an image and fits the interval to an another arbitrary interval (usually the target interval is the possible minimum and maximum of the image, like 0 and 255).
Histogram equalization is a nonlinear normalization that stretches the area of histogram with high abundance intensities and compresses the area with low abundance intensities.
I think that contrast stretching broadens the histogram of the image intensity levels, so the intensity around the range of input may be mapped to the full intensity range.
Histogram equalization, on the other hand, maps all of the pixels to the full range according to the cumulative distribution function or probability.
Contrast is the difference between maximum and minimum pixel intensity.
Both methods are used to enhance contrast, more precisely, adjusting image intensities to enhance contrast.
During histogram equalization the overall shape of the histogram
changes, whereas in contrast stretching the overall shape of
histogram remains same.
I'm trying to translate a photoshop setting for sharpening images to graphicsmagick. Therefore I found this helpful article:
https://redskiesatnight.com/2005/04/06/sharpening-using-image-magick/
The problem is that if I use to photoshop equivalent values explained in the article in graphicsmagick the images are not so sharp and clear like on photoshop.
For example I use this settings on photoshop:
Strength: 500%
Radius: 2.0 Pixel
Threshold: 8
In the article the parameters are explained like this:
The radius parameter
The radius parameter specifies (official documentation)
“the radius of the Gaussian, in pixels, not counting the center pixel”
Unsharp masking, like many other image-processing filters, is a
convolution kernel operation. The filter processes the image pixel by
pixel. For each pixel it examines a block of pixels surrounding it
(the kernel) and does some calculations on them to render the output
pixel value. The radius parameter determines which pixels surrounding
the center pixel will be considered in the convolution kernel: (think
of a circle) the larger the radius, the more pixels that will need to
be processed for each individual pixel.
Image Magick’s radius is similar to the same parameter in Photoshop,
GIMP and other image editors. In practical terms, it affects the size
of the “halos” created to increase contrast along the edges in the
image, increasing acutance and thus apparent sharpness.
How do you know how big of a radius to use? It depends on your output
target resolution, for one thing. It also depends on your personal
preferences, as well as the specific needs of the image at hand. As
far as the resolution issue goes, the GIMP User Manual recommends that
unsharp mask radius be set as follows:
radius = (output ppi / 30) * 0.2 Which is very similar to another commonly found rule of thumb:
radius = output ppi / 150 So for a monitor with 72 PPI resolution, you’d use a radius of approximately 0.5; if your targeting a printer
at 300 PPI you’d use a value of 2.0. Use these as a starting point;
different images have different sharpening requirements, and
individual preference is also a consideration. [Aside: there are a few
postings around the net (including some referenced in this article)
that suggest that Image Magick accepts, but does not honor, fractional
radii; that is, if you specify a radius of 0.5 or 1.2 it is rounded,
or defaults to an integer, or is silently ignored, etc. This is not
true, at least as of version 5.4.7, which is the one that I am using
as I write this article. You can easily see for yourself by doing
something like the following:
$ convert -unsharp 1.2x1.2+5+0 test.tif testo1.tif $ convert -unsharp
1.4x1.4+5+0 test.tif testo2.tif $ composite -compose difference test01.tif testo2.tif diff.tif $ display diff.tif you can also load
them into the GIMP or Photoshop into different layers and change the
blend mode to “Difference”; the resulting image is not black (you may
need to look closely for a 0.2 difference in radius). No, this
mistaken impression likely comes from the fact that there is a
relationship between the radius and sigma parameters, and if you do
not specify sigma properly in relation to the radius, the radius may
indeed be changed, or at least not work as expected. Read on for more
on this.]
Please note that the default radius (if you do not specify anything)
is 0, a special value which tells the unsharp mask algorithm to
“select an appropriate value for the radius”!.
The sigma parameter
The sigma parameter specifies (official documentation)
“the standard deviation of the Gaussian, in pixels”
This is the most confusing parameter of the four, probably because it
is “invisible” in other implementations of unsharp masking, and it is
most sparsely documented. The best explanation I have found for it
came from a google search that unearthed an archived mailing list
thread which had the following snippet:
Comparing the results of
convert -unsharp 1.2x1+4+0 test test1.2x1+4+0
and
convert -unsharp 30x1+4+0 test test30x1+4+0
results in no significant differences but the latter takes approx. 50 times
longer to complete.
That is not surprising. A radius of 30 involves on the order of 61x61
input pixels in the convolution of each output pixel. A radius of 1.2
involves 3x3 or 5x5 pixels.
Please can anybody give me any hints, what 'sigma' means?
It describes the relative weight of pixels as a function of their
distance from the center of the convolution kernel. For small sigma,
the outer pixels have little weight. Another important clue comes from
the documentation for the -unsharp option to convert (emphasis mine):
The -unsharp option sharpens an image. We convolve the image with a
Gaussian operator of the given radius and standard deviation (sigma).
For reasonable results, radius should be larger than sigma. Use a
radius of 0 to have the method select a suitable radius.
Combining the two clues provides some good insight: sigma is a
parameter that gives you some control over how the strength (given by
the amount parameter) of the sharpening is “graduated” or lessened as
you radiate away from a given pixel at the center of the convolution
matrix to the limit defined by the radius. My testing confirms this
inferred conclusion, namely that a bigger sigma causes more pronounced
sharpening for a given radius. That is why the poster in the mailing
list question (above) did not see any significant difference in the
sharpening even though he was using an amount of 400% (!!) and a
threshold of 0%; with a sigma of only 1.0, the strength of the filter
falls off too rapidly to be noticed despite the large difference in
radius between the two invocations. This is also why the man page says
“for reasonable results, radius should be larger than sigma”; if it is
not, then the sigma parameter does not have a graduated effect, as
designed, to “soften” the halos toward their edges; instead it simply
applies the amount evenly to the edge of the radius (which may be what
you want in some circumstances). A general rule of thumb for choosing
sigma might be:
if radius < 1, then sigma = radius else sigma = sqrt(radius) Summary:
choose your radius first, then choose a sigma smaller than or equal to
that. Experimentation will yield the best results. Please note that
the default sigma (if you do not specify anything) is 1.0. This is the
main culprit for why most people don’t see as much effect with Image
Magick’s unsharp mask operator as they do with other implementations
of unsharp mask if they are using a larger radius: unless you bump up
this parameter you are not getting the full benefit of the larger
radius!
[Aside: you might be wondering what happens if sigma is specifed
larger than the radius. The answer, as the documentation states, is
that the result may not be “reasonable”. In my testing, the usual
result is that the sharpening is extended at the specified amount to
the edge of the specified radius, and larger values of sigma have
little if any effect. In some cases (e.g. for radius < 0) specifying a
larger sigma increased the effective radius (e.g. to 1); this may be
the result of a “sanity check” on the parameters in the code. In any
case, keep in mind that the algorithm is designed for sigma to be less
than or equal to the radius, and results may be unexpected if used
otherwise.]
The amount parameter
The amount parameter specifies (official documentation)
“the percentage of the difference between the original and the blur
image that is added back into the original”
The amount parameter can be thought of as the “strength” of the
sharpening: higher values increase contrast in the halos more than
lower ones. Very large values may cause highlights on halos to blow
out or shadows on them to block up. If this happens, consider using a
lower amount with a higher radius instead.
amount should be specified as a decimal value indicating the
percentage. So, for example, if in Photoshop you would use an amount
of 170 (170%), in Image Magick you would use 1.7.
Please note that the default amount (if you do not specify anything)
is 1.0 (i.e. 100%).
The threshold parameter
The threshold parameter specifies (official documentation)
“as a fraction of MaxRGB, needed to apply the difference amount”
The threshold specifies a minimum amount of difference between the
center pixel vs. sourrounding pixels in the convolution kernel
necessary to apply the local contrast enhancement. Increasing this
value causes the algorithm to become less sensitive to differences
that may define edges. Specifying a positive threshold is often used
to avoid sharpening smooth areas that may contain noise (e.g. an area
of blue sky). If you have a noisy image, strongly consider raising the
threshold, or using some kind of smart sharpening technique instead.
The threshold parameter should be specified as a decimal value
indicating this percentage. This is different than GIMP or Photoshop,
which both specify the threshold in actual pixel levels between 0 and
the maximum (for 8-bit images, 255).
Please note that the default threshold (if you do not specify
anything) is 0.05 (i.e. 5%; this corresponds to a threshold of .05 *
255 = 12-13 in Photoshop). Photoshop uses a default threshold of 0
(i.e. no threshold) and the unsharp masking is applied evenly
throughout the image. If that is what you want you will need to
specify a 0.0 value for Image Magick’s threshold. This is undoubtedly
another source of confusion regarding Image Magick’s sharpening
algorithm.
So I did it like than and come up to this command:
gm convert file1.jpg -unsharp 2x1.41+5+0.03 file1_2x1.41+5+0.03.jpg
But like I said the images does not get that much sharpen like in photoshop. We also experimented with a lot of other values but without good images. So is it possible to do photoshop sharpening stuff with graphicsmagick? Or is it just a not good library? The main problem of just using photoshop for sharpenings is that we want to improve the images on our linux server and photoshop is not really good running on linux.
I have implemented an image blending method for seamless blending using plain C++. Now I want to convert this code for GPU (using OpenGL ES 2 Shaders for mobile devices). Basically the method creates Gaussian and Laplacian Pyramides for each image which are then combined from low-resolution to top (see also the paper "The Laplacian Pyramid as a Compact Image Code" from Burt et.al. 1983).
My problem is that the Laplacian pyramid levels can have negative values but my devices do not support float or integer type textures (using the ORB_texture_float extension e.g.).
I already looked for papers dealing with GPU-based pyramids but without finding something really useful.
How can I implement such a pyramid efficiently for a GPU?
Is it possible to calculate a Gaussian/Laplacian pyramid level without iterating through the preceding levels?
Regards,
EDIT
It seems as if there is no "good" way to calculate Laplacian Pyramids completely on GPU except using two passes (one for signs, one for values) which do not have support for either signed types (for instance ARB_texture_float) or types larger than byte when the the image's data range is between [0..255]. My Laplacian Pyramid runs perfectly on GPUs with ARB_texture_float extension but without the extension (and some adjustments to compress the range) the pyramid gets "wrong" due to range compression.
The safest way for you to implement a Laplacian pyramid if your textures are unsigned integers is to store two pyramids - one pyramid that contains the gradient magnitude of the Laplacian and another pyramid that stores the sign of the pixel at that location.
Yes. Any level in a Gaussian or Laplacian pyramid has a closed form solution based on the sigma value that you want to compute. Consider the base case of a LoG pyramid computed at intervals of sigma = (2/3). The first level of the pyramid has sigma 2/3 and is produced simply by convolving with a 5x5 LoG filter with sigma 2/3. The second convolution with the same filter produces an LoG image with sigma 4/3, and finally the third has sigma 6/3, or 2, so we subsample the image to produce the next integer level of the pyramid. If you want to compute the LoG of an image at sigma 2, the levels at sigma 2/3 and 4/3 are not necessary - simply subsample the image one time and convolve with an LoG filter with sigma 1.
If you want to compute the LoG at sigma = 20, quad-subsample the image (16 pixel blocks become 1 pixel) to give you a sigma 16 image, then convolve once with a sigma 4/3 LoG filter.
I have a question about the Convolve function in OpenCV using GPU acceleration.
The speed of the convolutions are roughly 3.5 faster using GPU
when running:
convolve(src_32F, kernel, cresult, false, cbuffer);
However the image borders are missing (in cresult)
The result is excellent otherwise though (kernel size is 60x60)
thanks
This is the way convolution works.
It calculates the value of every pixel as a weighted average of the surrounding ones. So, if you take into account 30 pixels each side, for all the pixels that are closer to the image border than 30 pixels, convolution is not defined.
In the CPU implementation of filtering functions, those missing pixels are supplemented with bogus values based on a given strategy (copy, mirror, blank, etc).
What you can do is to manually pad your matrix with the desired values in a bigger matrix, filter the big one, and then crop it back. For that you can use the gpu::copyMakeBorder() func.