Fast Fourier Transformation - Rounding Errors - image-processing

I am transforming an image to a frequency spectrum, convolving it with a kernel, then inverse-transforming it back.
I wanted to ask how I can handle the rounding errors which occur during the transformation. Like when I transform an image, then immediately transform it back I have an average PSNR of 127. (I transform the pixels in float format between 0.0 and 1.0.)
Is it possible to calculate the errors and correct them?

Short answer: If you want less rounding error, then you need a more accurate number format. Also, you cannot calculate the error.
More accurate floating-point formats include:
x87 80-bit extended precision (long double)
Fixed point with BigInteger
BigDecimal
Also, isn't a PSNR of 127 dB very good already?

Related

Interpreting unsigned short depth map values

I'm trying to test my algorithm on the lineMOD object detection dataset. According to the author, the depth values are stored as unsigned short values. I've managed to load the depth values into a cv::Mat but I would like to convert them to the typical float representation [0,1].
At first I assumed that I just have to divide with the maximum unsigned short but this doesn't seem to be the case since the maximum value I find seems to be 3399 while there are a lot of zeros in the depth map. I suppose the zeros mean that the specific pixel is a point that is too far for the depth camera to detect.
Is it possible that these unsigned shorts represent millimeters? If not, how should I convert the depth values before applying the transforms that generate the point cloud?
I guess the pixel values are not millimeters, rather some relative values, because it is easier for a depth camera to get relative depth values than accurate millimeter values, the values even might not be linear. Consult the author to get more information.
You may try a few options:
Consult the author to fully understand what does the depth value mean, then do the conversion accordingly.
Find out what is the actual pixel range among a single image, or among all of
your images, say [534, 4399], scale it to [0.1, 1.0], set those zeros to be 0.0
Simply scale the full range of unsigned short [0 ~ 65535] to [0.0, 1.0]

How bilinear interpolation works when down scaling?

I can clearly understand how bilinear interpolation works when up scaling the image, like fill the values while taking 4 nearest neighbours, but i can't understand how it works while down scaling the image. It would mean a lot to me if someone clarify for me.
Scaling an image requires mapping pixels from the input to pixels on the output. If those pixel coordinates don't map to an integer, interpolation is required to estimate what the pixel value would have been. The "Bi" part of bilinear means it's linear interpolation applied in two dimensions independently. If for example output pixel 2,3 needs to come from input coordinates 1.5,7.2 you would interpolate in the X direction by taking 0.5 of each of the pixels at 1.0 and 2.0, then interpolate in the Y direction by taking 0.8 of the pixel at 7.0 and 0.2 of the pixel at 8.0. Usually these operations are combined into a single set of equations, but they can be applied separately if needed.
Bilinear is a poor choice for downscaling because it leads to aliasing artifacts. This is when you attempt to create spatial frequencies that are beyond the Nyquist sampling limit, and high frequency detail turns into low frequency artifacts. You can minimize this by blurring the image before you downscale it. Or you can choose an interpolation algorithm that incorporates some low pass filtering.

JPEG algorithm - Replacing DCT with Hadamard Transform

I would like to replace the Discrete Cosine Transform in JPEG format with Hadamard Transform. But I don't know what stage have to be added/dropped/changed in the original algorithm.
As I understand it the JPEG algorithm without the Huffman coding is as follows:
Image division into 8x8 non-overlapping blocks;
Each block is level-shift by subtracting 128 from it;
DCT on each block to frequency domain. Here I want to use Hadamard instead;
Quantization by quality factor;
Reordering of each block in zig-zag pattern;
Removing the trailing zeroes and inserting EOB symbol (End-Of-Block);
My guess is that the zig-zag reordering will not move all the frequencies with the highest energy concentration to the head of the vector, and all the zeroes will be trailing, hence have to be changed.
Also the level-shift, which is used to reduce the range of the DCT coefficents (gives greater precision) may have to be changed.
The answer may be in JPEG-XR format, which uses the HT instead of the DCT, but It will take a while before I can take a copy of it and understand all the mathematics behind it.
You should look at the standard called JPEG-XR.
It uses Hadamard Transform instead of DCT.
There's also an open source implementation of it.
Good Luck.

what is the PSD unit by using FFT method

I'm just doing a power spectral density analysis of a signal in time domain. I'm following the fft method described in :
http://www.mathworks.com/support/tech-notes/1700/1702.html
It gives the real physical unit for the PSD. However, the unit is "power", is that mean "V^2/Hz"?
If I take 10*log10(power) or 10*log10(V^2/Hz), do I get the unit of "dB/Hz"?
Then how can I convert it to dBm/MHz?
It depends on the unit of your timeseries. Often we think of this as just "amplitude", but if your timeseries is a series of voltage amplitude vs. time, then your PSD estimate will be Volts^2/Hz. This is because the PSD is the Fourier Transform of the autocorrelation of your original signal: The autocorrelation has units of Volts^2, and running it through the Fourier Transform decomposes these units over frequency, instead of time, resulting in units of Volts^2/Hz. This is commonly referred to as Watts/Hz, but the conversion from Volts^2 to Watts is not very physically meaningful, as W = V^2/R.
10*log10(power) will result in a unit of dB/Hz, but remember that decibels are always a comparison between two power levels; you are quantifying a ratio of powers. A better definition of decibels is 10*log10(P1/P0), as explained here. If you simply plug a PSD bin estimate into this equation, you are setting your PSD bin to P1 and implicitly comparing it to a P0 value of 1. This may be what you want, and it may not be. For visualization purposes, this is fairly typical, but if you have a standard reference power you should be comparing to, you should use that for P0 instead.
Assuming that you are attempting to plot a dB Power Spectral Density estimate, to convert from Hz to MHz, you simple rescale the x-axis of your frequency graph. Remember that a MHz is just 1 million Hz, so the only difference is that 240000Hz = 0.24MHz
EDIT
The point brought up by mtrw is a very valid one; if you are dealing with large amounts of data and are averaging FFT vectors, I highly suggest the Multitaper method; it's a much more statistically sound method of sacrificing frequency resolution for greater confidence on your PSD estimate.
If you have a PSD in W/Hz i.e. 100 W/Hz then you have 50 dBm/Hz. dB/Hz or is often vaguely and generically used instead of dBm/Hz. Audacity uses dB as shorthand for dBFS (not dBFS/Hz, because it is computing a DFT, and discrete frequencies use a power spectrum and not a density) . A digital signal that reaches 50% of the maximum level has an amplitude of −6 dBFS, which is 6 dB below full scale – the removal of the MSB, hence the 6dB/bit figure (because 50% of maximum level is 25% of maximum power; 1/4 = - 6dB)
dBm is the logarithmic ratio of the power with respect to 1mW, you divide the power by 1mW to get a unitless ratio, and then take the logarithm to get dB units, which in this case makes more sense to be clarified as dBm.
dBc/Hz is the ratio with respect to the carrier power, which is a ratio of two dBm/Hz values, meaning you subtract them and you get dBc/Hz; you get the same result if you divide the two linear power levels in W and then convert the ratio to dB (or more appropriately dBc).
dB-Hz is a logarithmic measure of bandwidth with respect to 1Hz and
dBJ is a measure of spectral density as a logarithmic ratio to 1 joule, seeing as W/Hz is indeed J.
Power spectral density is a density function, so you need to integrate it to get the actual quantity, like a line Integral of a V/m electric field, or a probability density of probability per x. This does not make sense for discrete quantities and instead the power spectrum is used akin to a probability mass function. If you see dB (which should be used for the discrete frequency domain) instead of dBm/Hz then it's wrong, but if you see it instead of dBm then it's right, as long as it's made clear what the reference is.

Recommendable way of rounding up currency values

First of all see the following problem:
SetRoundMode(rmUp) and rounding “round” values like 10, results in 10,0001.
I need to round currency values up, so 0.8205 becomes 0.83, but the SimpleRoundTo behavior displayed above is giving me some headaches.
How can I round currency values up in a safe way?
You can use the Ceil function:
newvalue := Ceil(oldvalue * 100) / 100;
Note that rounding 0.8205 up to 0.83, and also rounding 0.8305 up to 0.84, will result in an upward bias on average in your rounding. The default rounding mode is bankers rounding, which rounds towards even numbers to avoid a directional bias.
This is particularly important if there is a double-entry nature to your calculations. Rounding with a directional bias can result in a mismatch on either side.
Using SetRoundMode changes the FPU control word. Be aware that this FPU mode rounding is applied to floating-point operations in situations that might not be obvious when thinking in terms of the Currency type, which is a fixed-point type (scaled 64-bit integer). A small imprecision in intermediate floating-point calculations, such as 82.000000000000001, will end up rounding up even when the value as Currency is anticipated to be 82.00. Changing the thread-global rounding mode is only to be done with caution.
You're doing it wrong.
Don't use floats to represent important types like time and money!
Use integers that represent the highest precision you need. For example use an integer that represents 1000th of a cent. Then you can pass around 82050 around and when you finally need to display it as a string then and only then do you do the rounding using integer calculations.
To actually answer your question, $0.8205 should not be rounded up. $0.825 should be.

Resources