I would like to replace the Discrete Cosine Transform in JPEG format with Hadamard Transform. But I don't know what stage have to be added/dropped/changed in the original algorithm.
As I understand it the JPEG algorithm without the Huffman coding is as follows:
Image division into 8x8 non-overlapping blocks;
Each block is level-shift by subtracting 128 from it;
DCT on each block to frequency domain. Here I want to use Hadamard instead;
Quantization by quality factor;
Reordering of each block in zig-zag pattern;
Removing the trailing zeroes and inserting EOB symbol (End-Of-Block);
My guess is that the zig-zag reordering will not move all the frequencies with the highest energy concentration to the head of the vector, and all the zeroes will be trailing, hence have to be changed.
Also the level-shift, which is used to reduce the range of the DCT coefficents (gives greater precision) may have to be changed.
The answer may be in JPEG-XR format, which uses the HT instead of the DCT, but It will take a while before I can take a copy of it and understand all the mathematics behind it.
You should look at the standard called JPEG-XR.
It uses Hadamard Transform instead of DCT.
There's also an open source implementation of it.
Good Luck.
Related
I'm trying to translate a photoshop setting for sharpening images to graphicsmagick. Therefore I found this helpful article:
https://redskiesatnight.com/2005/04/06/sharpening-using-image-magick/
The problem is that if I use to photoshop equivalent values explained in the article in graphicsmagick the images are not so sharp and clear like on photoshop.
For example I use this settings on photoshop:
Strength: 500%
Radius: 2.0 Pixel
Threshold: 8
In the article the parameters are explained like this:
The radius parameter
The radius parameter specifies (official documentation)
“the radius of the Gaussian, in pixels, not counting the center pixel”
Unsharp masking, like many other image-processing filters, is a
convolution kernel operation. The filter processes the image pixel by
pixel. For each pixel it examines a block of pixels surrounding it
(the kernel) and does some calculations on them to render the output
pixel value. The radius parameter determines which pixels surrounding
the center pixel will be considered in the convolution kernel: (think
of a circle) the larger the radius, the more pixels that will need to
be processed for each individual pixel.
Image Magick’s radius is similar to the same parameter in Photoshop,
GIMP and other image editors. In practical terms, it affects the size
of the “halos” created to increase contrast along the edges in the
image, increasing acutance and thus apparent sharpness.
How do you know how big of a radius to use? It depends on your output
target resolution, for one thing. It also depends on your personal
preferences, as well as the specific needs of the image at hand. As
far as the resolution issue goes, the GIMP User Manual recommends that
unsharp mask radius be set as follows:
radius = (output ppi / 30) * 0.2 Which is very similar to another commonly found rule of thumb:
radius = output ppi / 150 So for a monitor with 72 PPI resolution, you’d use a radius of approximately 0.5; if your targeting a printer
at 300 PPI you’d use a value of 2.0. Use these as a starting point;
different images have different sharpening requirements, and
individual preference is also a consideration. [Aside: there are a few
postings around the net (including some referenced in this article)
that suggest that Image Magick accepts, but does not honor, fractional
radii; that is, if you specify a radius of 0.5 or 1.2 it is rounded,
or defaults to an integer, or is silently ignored, etc. This is not
true, at least as of version 5.4.7, which is the one that I am using
as I write this article. You can easily see for yourself by doing
something like the following:
$ convert -unsharp 1.2x1.2+5+0 test.tif testo1.tif $ convert -unsharp
1.4x1.4+5+0 test.tif testo2.tif $ composite -compose difference test01.tif testo2.tif diff.tif $ display diff.tif you can also load
them into the GIMP or Photoshop into different layers and change the
blend mode to “Difference”; the resulting image is not black (you may
need to look closely for a 0.2 difference in radius). No, this
mistaken impression likely comes from the fact that there is a
relationship between the radius and sigma parameters, and if you do
not specify sigma properly in relation to the radius, the radius may
indeed be changed, or at least not work as expected. Read on for more
on this.]
Please note that the default radius (if you do not specify anything)
is 0, a special value which tells the unsharp mask algorithm to
“select an appropriate value for the radius”!.
The sigma parameter
The sigma parameter specifies (official documentation)
“the standard deviation of the Gaussian, in pixels”
This is the most confusing parameter of the four, probably because it
is “invisible” in other implementations of unsharp masking, and it is
most sparsely documented. The best explanation I have found for it
came from a google search that unearthed an archived mailing list
thread which had the following snippet:
Comparing the results of
convert -unsharp 1.2x1+4+0 test test1.2x1+4+0
and
convert -unsharp 30x1+4+0 test test30x1+4+0
results in no significant differences but the latter takes approx. 50 times
longer to complete.
That is not surprising. A radius of 30 involves on the order of 61x61
input pixels in the convolution of each output pixel. A radius of 1.2
involves 3x3 or 5x5 pixels.
Please can anybody give me any hints, what 'sigma' means?
It describes the relative weight of pixels as a function of their
distance from the center of the convolution kernel. For small sigma,
the outer pixels have little weight. Another important clue comes from
the documentation for the -unsharp option to convert (emphasis mine):
The -unsharp option sharpens an image. We convolve the image with a
Gaussian operator of the given radius and standard deviation (sigma).
For reasonable results, radius should be larger than sigma. Use a
radius of 0 to have the method select a suitable radius.
Combining the two clues provides some good insight: sigma is a
parameter that gives you some control over how the strength (given by
the amount parameter) of the sharpening is “graduated” or lessened as
you radiate away from a given pixel at the center of the convolution
matrix to the limit defined by the radius. My testing confirms this
inferred conclusion, namely that a bigger sigma causes more pronounced
sharpening for a given radius. That is why the poster in the mailing
list question (above) did not see any significant difference in the
sharpening even though he was using an amount of 400% (!!) and a
threshold of 0%; with a sigma of only 1.0, the strength of the filter
falls off too rapidly to be noticed despite the large difference in
radius between the two invocations. This is also why the man page says
“for reasonable results, radius should be larger than sigma”; if it is
not, then the sigma parameter does not have a graduated effect, as
designed, to “soften” the halos toward their edges; instead it simply
applies the amount evenly to the edge of the radius (which may be what
you want in some circumstances). A general rule of thumb for choosing
sigma might be:
if radius < 1, then sigma = radius else sigma = sqrt(radius) Summary:
choose your radius first, then choose a sigma smaller than or equal to
that. Experimentation will yield the best results. Please note that
the default sigma (if you do not specify anything) is 1.0. This is the
main culprit for why most people don’t see as much effect with Image
Magick’s unsharp mask operator as they do with other implementations
of unsharp mask if they are using a larger radius: unless you bump up
this parameter you are not getting the full benefit of the larger
radius!
[Aside: you might be wondering what happens if sigma is specifed
larger than the radius. The answer, as the documentation states, is
that the result may not be “reasonable”. In my testing, the usual
result is that the sharpening is extended at the specified amount to
the edge of the specified radius, and larger values of sigma have
little if any effect. In some cases (e.g. for radius < 0) specifying a
larger sigma increased the effective radius (e.g. to 1); this may be
the result of a “sanity check” on the parameters in the code. In any
case, keep in mind that the algorithm is designed for sigma to be less
than or equal to the radius, and results may be unexpected if used
otherwise.]
The amount parameter
The amount parameter specifies (official documentation)
“the percentage of the difference between the original and the blur
image that is added back into the original”
The amount parameter can be thought of as the “strength” of the
sharpening: higher values increase contrast in the halos more than
lower ones. Very large values may cause highlights on halos to blow
out or shadows on them to block up. If this happens, consider using a
lower amount with a higher radius instead.
amount should be specified as a decimal value indicating the
percentage. So, for example, if in Photoshop you would use an amount
of 170 (170%), in Image Magick you would use 1.7.
Please note that the default amount (if you do not specify anything)
is 1.0 (i.e. 100%).
The threshold parameter
The threshold parameter specifies (official documentation)
“as a fraction of MaxRGB, needed to apply the difference amount”
The threshold specifies a minimum amount of difference between the
center pixel vs. sourrounding pixels in the convolution kernel
necessary to apply the local contrast enhancement. Increasing this
value causes the algorithm to become less sensitive to differences
that may define edges. Specifying a positive threshold is often used
to avoid sharpening smooth areas that may contain noise (e.g. an area
of blue sky). If you have a noisy image, strongly consider raising the
threshold, or using some kind of smart sharpening technique instead.
The threshold parameter should be specified as a decimal value
indicating this percentage. This is different than GIMP or Photoshop,
which both specify the threshold in actual pixel levels between 0 and
the maximum (for 8-bit images, 255).
Please note that the default threshold (if you do not specify
anything) is 0.05 (i.e. 5%; this corresponds to a threshold of .05 *
255 = 12-13 in Photoshop). Photoshop uses a default threshold of 0
(i.e. no threshold) and the unsharp masking is applied evenly
throughout the image. If that is what you want you will need to
specify a 0.0 value for Image Magick’s threshold. This is undoubtedly
another source of confusion regarding Image Magick’s sharpening
algorithm.
So I did it like than and come up to this command:
gm convert file1.jpg -unsharp 2x1.41+5+0.03 file1_2x1.41+5+0.03.jpg
But like I said the images does not get that much sharpen like in photoshop. We also experimented with a lot of other values but without good images. So is it possible to do photoshop sharpening stuff with graphicsmagick? Or is it just a not good library? The main problem of just using photoshop for sharpenings is that we want to improve the images on our linux server and photoshop is not really good running on linux.
I've been working with Discrete Wavelet Transform, I'm new to this theory. I want to access and modify the wavelet coefficients of the decomposed image, Are those wavelet coefficients simply the pixel values of the decomposed image in 2D DWT?
This is for example the result of DWT Decomposition:
So, when I want to access and modify the Wavelet Coefficients, can I just iterate through the pixel values of above image? Thank you for your help.
No. The image is merely illustrative.
The image you are looking on does not exactly correspond to original coefficients. The original wavelet coefficients are real numbers. Unlike them, you are looking on their absolute values quantized into a range from 0 to 255.
It is not true that the coefficients were calculated as pairwise sums and differences of the input samples. The coefficients were calculated using two complementary filters. See the description here. However, it is essential that these coefficients were adjusted and it is no longer possible to synthesize the original image. If you need to synthesize the image, you cannot access the pixels of the referenced image.
I'm getting all pixels' RGB values into
R=[],
G=[],
B=[]
arrays from the picture. They are 8 bits [0-255] values containing arrays. And I need to use Fourier Transform to compress image with a lossy method.
Fourier Transform
N will be the pixel numbers. n is i for array. What will be the k and imaginary j?
Can I implement this equation into a programming language and get the compressed image file?
Or I need to use the transformation equation to a different value instead of RGB?
First off, yes, you should convert from RGB to a luminance space, such as YCbCr. The human eye has higher resolution in luminance (Y) than in the color channels, so you can decimate the colors much more than the luminance for the same level of loss. It is common to begin by reducing the resolution of the Cb and Cr channels by a factor of two in both directions, reducing the size of the color channels by a factor of four. (Look up Chroma Subsampling.)
Second, you should use a discrete cosine transform (DCT), which is effectively the real part of the discrete Fourier transform of the samples shifted over one-half step. What is done in JPEG is to break the image up into 8x8 blocks for each channel, and doing a DCT on every column and row of each block. Then the DC component is in the upper left corner, and the AC components increase in frequency as you go down and to the left. You can use whatever block size you like, though the overall computation time of the DCT will go up with the size, and the artifacts from the lossy step will have a broader reach.
Now you can make it lossy by quantizing the resulting coefficients, more so in the higher frequencies. The result will generally have lots of small and zero coefficients, which is then highly compressible with run-length and Huffman coding.
As of speaking about this 1D discrete denoising via variational calculus I would like to know how to manipulate the length of smoothing term as long as it should be N-1, while the length of data term is N. Here the equation:
E=0;
for i=1:n
E+=(u(i)-f(i))^2 + lambda*(u[i+1]-n[i])
E is the cost of actual u in optimization process
f is given image (noised)
u is output image (denoised)
n is the length of 1D vector.
lambda>=0 is weight of smoothness in optimization process (described around 13 minute in video)
here the length of second term and first term mismatch. How to resolve this?
More importantly, I would like to use linear equation system to solve this problem.
This is nowhere near my cup of tea but I think you are referring to the fact that:
u[i+1]-n[i] is accessing the next pixel making the term work only on resolution 1 pixel smaller then original f image
In graphics and filtering is this usually resolved in 2 ways:
use default value for pixels outside image resolution
you can set default or neutral(for the process) color to those pixels (like black)
use color of the closest neighbor inside image resolution
interpolate the mising pixels (bilinear,bicubic...)
I think the first choice is not suitable for your denoising technique.
change the resolution of output image
Usually after some filtering techniques (via FIR,etc) the result is 1 pixel smaller then the input to resolve the missing data problem. In your case it looks like your resulting u image should be 1 pixel bigger then input image f while computing cost functions.
So either enlarge it via bullet #1 and when the optimization is done you can crop back to original size.
Or virtually crop the f one pixel down (just say n'=n-1) before computing cost function so you avoid access violations (and also you can restore back after the optimization...)
In the Computer Vision System Toolbox for Matlab there are three types of interpolation methods used for Correct lens distortion.
Interpolation method for the function to use on the input image. The interp input interpolation method can be the string, 'nearest', 'linear', or 'cubic'.
My question is: what is the difference between 'nearest', 'linear', or 'cubic' ? and which one implemented in "Zhang" and "Heikkila, J, and O. Silven" methods.
I can't access the paged at the link you wrote in your question (it asks for a username and password) and so I assume your linked page has the same contents of the page http://www.mathworks.it/it/help/vision/ref/undistortimage.html which I quote here:
J = undistortImage(I,cameraParameters,interp) removes lens distortion from the input image, I and specifies the
interpolation method for the function to use on the input image.
Input Arguments
I — Input image
cameraParameters — Object for storing camera parameters
interp — Interpolation method
'linear' (default) | 'nearest' | 'cubic'
Interpolation method for the function to use on
the input image. The interp input interpolation method can be the
string, 'nearest', 'linear', or 'cubic'.
Furthermore, I assume you are referring to these papers:
ZHANG, Zhengyou. A flexible new technique for camera calibration. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 2000, 22.11: 1330-1334.
HEIKKILA, Janne; SILVEN, Olli. A four-step camera calibration procedure with implicit image correction. In: Computer Vision and Pattern Recognition, 1997. Proceedings., 1997 IEEE Computer Society Conference on. IEEE, 1997. p. 1106-1112.
I have searched for the word "interpolation" in the two pdf documents Zhang and Heikkila and Silven and I did not find any direct statement about the interpolation method they have used.
To my knowledge, in general, a camera calibration method is concerned on how to estimate the intrinsic, extrinsic and lens distortion parameters (all these parameters are inside the input argument cameraParameters of Matlab's undistortImage function); the interpolation method is part of a different problem, i.e. the problem of "Geometric Image Transformations".
I quote from the OpenCV's page Geometric Image Transformation (I have slightly modified the original omitting some details and adding some definitions, I assume you are working with grey level image):
The functions in this section perform various geometrical
transformations of 2D images. They do not change the image content but
deform the pixel grid and map this deformed grid to the destination
image. In fact, to avoid sampling artifacts, the mapping is done in
the reverse order, from destination to the source. That is, for each
pixel (x, y) of the destination image, the functions compute
coordinates of the corresponding “donor” pixel in the source image and
copy the pixel value:
dst(x,y) = src(f_x(x,y), f_y(x,y))
where
dst(x,y) is the grey value of the pixel located at row x and column y in the destination image
src(x,y) is the grey value of the pixel located at row x and column y in the source image
f_x is a function that maps the row x and the column y to a new row, it just uses coordinates and not the grey level.
f_y is a function that maps the row x and the column y to a new column, it just uses coordinates and not the grey level.
The actual implementations of the geometrical transformations, from
the most generic remap() and to the simplest and the fastest resize()
, need to solve two main problems with the above formula:
• Extrapolation of non-existing pixels. Similarly to the filtering
functions described in the previous section, for some (x,y) , either
one of f_x(x,y) , or f_y(x,y) , or both of them may fall outside of
the image. In this case, an extrapolation method needs to be used.
OpenCV provides the same selection of extrapolation methods as in the
filtering functions. In addition, it provides the method
BORDER_TRANSPARENT . This means that the corresponding pixels in the
destination image will not be modified at all.
• Interpolation of pixel
values. Usually f_x(x,y) and f_y(x,y) are floating-point numbers. This
means that <f_x, f_y> can be either an affine or
perspective transformation, or radial lens distortion correction, and
so on. So, a pixel value at fractional coordinates needs to be
retrieved. In the simplest case, the coordinates can be just rounded
to the nearest integer coordinates and the corresponding pixel can be
used. This is called a nearest-neighbor interpolation. However, a
better result can be achieved by using more sophisticated
interpolation methods, where a polynomial function is fit into some
neighborhood of the computed pixel (f_x(x,y), f_y(x,y)), and then the
value of the polynomial at (f_x(x,y), f_y(x,y)) is taken as the
interpolated pixel value. In OpenCV, you can choose between several
interpolation methods. See resize() for details.
For a "soft" introduction see also for example Cambridge in colour - DIGITAL IMAGE INTERPOLATION.
So let's say you need the grey level of pixel at x=20.2 y=14.7, since x and y are number with a fractional part different from zero you will need to "invent" (compute) the grey level in some way. In the simplest case ('nearest' interpolation) you just say that the grey level at (20.2,14.7) is the grey level you retrieve at (20,15), it is called "nearest" because 20 is the nearest integer value to 20.2 and 15 is the nearest integer value to 14.7.
In the (bi)'linear' interpolation you will compute the value at (20.2,14.7) with a combination of the grey levels of the four pixels at (20,14), (20,15), (21,14), (21,15); for the details on how to compute the combination see the Wikipedia page which has a numeric example.
The (bi)'cubic' interpolation considers the combination of sixteen pixels in order to compute the value at (20.2,14.7), see the Wikipedia page.
I suggest you to try all the three methods, with the same input image, and see the differences in the output image.
Interpolation method is actually independent of the camera calibration. Any time you apply a geometric transformation to an image, such as rotation, re-sizing, or distortion compensation, the pixels in the new image will correspond to points between the pixels of the old image. So you have to interpolate their values somehow.
'nearest' means you simply use the value of the nearest pixel.
'linear' means you use bi-linear interpolation. The new pixel's value is a weighted sum of the values of the neighboring pixels in the input image, where the weights are proportional to distances.
'cubic' means you use a bi-cubic interpolation, which is more complicated than bi-linear, but may give you a smoother image.
A good description of these interpolation methods is given in the documentation for the interp2 function.
And finally, just to clarify, the undistortImage function is in the Computer Vision System Toolbox.