In libjpeg I am unable to locate the 8x8 DCT matrix ? If I am not wrong this matrix is always a constant for a 8x8 block . it must contain 1/sqrt(8) on the first row but where is this matrix ?
In an actual JPEG implementation, the DCT matrix is usually factored down to its Gaussian Normal Form. That gives a series of matrix multiplications. However, in the normal form, these only involve operations on the diagonal and values adjacent to the diagonal. Most of the values in the normalized matrices are zero so you can omit them.
That transforms the DCT into a series of 8 parallel operations.
This book describes a couple of ways the matrix operations can be transformed:
http://www.amazon.com/Compressed-Image-File-Formats-JPEG/dp/0201604434/ref=pd_bxgy_b_img_y
This book describes a tensor approach that is theoretically more efficient but tends not to be so in implementation
http://www.amazon.com/JPEG-Compression-Standard-Multimedia-Standards/dp/0442012721/ref=pd_bxgy_b_img_y
It doesn't. Or maybe it's somewhere in a sneaky place, but it doesn't really matter. Real implementations of DCT don't work that way, they're very specialized pieces of code that have all the constants hardcoded into them, and they look nothing like a matrix multiplication. It is occasionally useful to view the transform as a matrix multiplication from a theoretical standpoint, but it can be implemented much more efficiently.
For the DCT in libjpeg, see for example the file jfdctflt.c (or one of its friends).
Related
Hi guys I’ve thinking about this question:
I know that we use Fourier transform to get into frequency domain to process the image.
I read the text book, it said that when we are done with processing the image in the Fourier domain we have to invert it back to get processed image.
And the textbook taught to get the real part of the inverse.
However, when I go through the OpenCv tutorial, no matter if using OpenCV or NumPy version, eventually they use magnitude (for OpenCV) or np.abs (for NumPy).
For OpenCV, the inverse returns two channels which contain the real and imaginary components. When I took the real part of the inverse, I got a totally weird image.
May somebody who knows the meaning behind all of this:
Why using magnitude or abs to get processed image?
What’s wrong with textbook instruction (take the real part of inverse)?
The textbook is right, the tutorial is wrong.
A real-valued image has a complex conjugate symmetry in the Fourier domain. This means that the FFT of the image will have a specific symmetry. Any processing that you do must preserve this symmetry if you want the inverse transform to remain real-valued. If you do this processing wrong, then the inverse transform will be complex-valued, and probably non-sensical.
If you preserve the symmetry in the Fourier domain properly, then the imaginary component of the inverse transform will be nearly zero (likely different from zero because of numerical imprecision). Discarding this imaginary component is the correct thing to do. Computing the magnitude will yield the same result, except all negative values will become positive (note some filters are meant to produce negative values, such as derivative filters), and at an increased computational cost.
For example, a convolution is a multiplication in the Fourier domain. The filter in the Fourier domain must be real-valued and symmetric around the origin. Often people will confuse where the origin is in the Fourier domain, and multiply by a filter that is seems symmetric, but actually is shifted with respect to the origin making it not symmetric. This shift introduces a phase change of the inverse transform (see the shift property of the Fourier transform). The magnitude of the inverse transform is not affected by the phase change, so taking the magnitude of this inverse transform yields an output that sort of looks OK, except if one expects to see negative values in the filter result. It would have been better to correctly understand the FFT algorithm, create a properly symmetric filter in the Fourier domain, and simply keep the real part of the inverse transform.
Nonetheless, some filters are specifically designed to break the symmetry and yield a complex-valued filter output. For example the Gabor filter has an even (symmetric) component and an odd (anti-symmetric) component. The even component yields a real-valued output, the odd component yields an imaginary-valued output. In this case, it is the magnitude of the complex value that is of interest. Likewise, a quadrature filter is specifically meant to produce a complex-valued output. From this output, the analytic signal (or its multi-dimensional extension, the monogenic signal), both the magnitude and the phase are of interest, for example as used in the phase congruency method of edge detection.
Looking at the linked tutorial, it is the line
fshift[crow-30:crow+30, ccol-30:ccol+30] = 0
which generates the Fourier-domain filter and applies it to the image (it is equivalent to multiplying by a filter with 1s and 0s). This tutorial correctly computes the origin of the Fourier domain (though for Python 3 you would use crow,ccol = rows//2 , cols//2 to get the integer division). But the filter above is not symmetric around that origin. In Python, crow-30:crow+30 indicates 30 pixels to the left of the origin, and only 29 pixels to the right (the right bound is not included!). The correct filter would be:
fshift[crow-30:crow+30+1, ccol-30:ccol+30+1] = 0
With this filter, the inverse transform is purely real (imaginary component has values in the order of 1e-13, which is numerical errors). Thus, it is now possible (and correct) to replace img_back = np.abs(img_back) with img_back = np.real(img_back).
I have an image where I need to detect an object as fast as possible. I also know that I only need to detect the object closest to the center.
AFAIK Opencv's MatchTemplate works somewhat like this (pseudocode):
for(x in width):
for(y in height):
value = calcSimilarity(inputImage, searchedImage, x, y)
matched[x][y] = value
After that, I have to loop through the resulting image and find the point closest to the center, which is all quite a waste.
So I'm wondering if I can do something like:
coordsGen = new CoordsGen() // a class that generates specific coords for me
while(!coordsGen.stop):
x, y = coordsGen.next()
value = calcSimilarity(inputImage, searchedImage, x, y)
if(value > treshold)
return x, y
Basically what I need here is the calcSimilarity function. This would allow me to optimize the process greatly.
There are many choices of similarity scoring methods for template matching in general.*
OpenCV has 3 available template matching modes:
Sum of square differences (Euclidean distance)
Cross-correlation
Pearson correlation coefficient
And in OpenCV each of those three have normed/scaled versions as well:
Normalized sum of square differences
Normalized cross-correlation
Normalized Pearson correlation coefficient
You can see the actual formulas used in the OpenCV docs under TemplateMatchModes though these agree with the general formulas you can find everywhere for the above methods.
You can code the template matching yourself instead of using OpenCV. However, note that OpenCV is optimized for these operations and in general is blazing fast at template matching. OpenCV uses a DFT to perform some of these computations to reduce the computational load. For e.g., see:
Why is opencv's Template Matching ... so fast?
OpenCV Sum of squared differences speed
You can also use OpenCV's minMaxLoc() to find the min/maximum value instead of looping through yourself. Also, you didn't specify how you're accessing your values but not all lookup methods are as fast as others. See How to scan images to see the fastest Mat access operations. Spoiler: raw pointers.
The main speedup your optimization would look to give you is early termination of the function. However, I don't think you'll achieve faster times in general by coding it yourself, unless there's a significantly smaller subset of the original image that the template is usually in.
A better method to reduce search time if your images are very big would be to use a pyramid resolution approach. Basically, make template and search images 1/2 your image since, 1/2 of that, 1/2 of that, and so on. Then you start the template matching on a small 1/16 or whatever sized image and find the general location of the template. Then you do the same for the next image size up, but you only search a small subset around where your template was at the previous scale. Then each time you grow the image size closer to the original, you're only looking for small differences of a few pixels to nail down the position more accurately. The general location is first found with the smallest scaled image, which only takes a fraction of the time to find compared to the original image size, and then you simply refine it by scaling up.
* Note that OpenCV doesn't include other template matching methods which you may see elsewhere. In particular, OpenCV has a sum of square differences but no sum of absolute distances method. Phase differences are also used as a similarity metric, but don't exist in OpenCV. Either way, cross-correlation and sum of square differences are both extremely common in image processing and unless you have a special image domain, should work fine.
I have a bunch of gray-scale images decomposed into superpixels. Each superpixel in these images have a label in the rage of [0-1]. You can see one sample of images below.
Here is the challenge: I want the spatially (locally) neighboring superpixels to have consistent labels (close in value).
I'm kind of interested in smoothing local labels but do not want to apply Gaussian smoothing functions or whatever, as some colleagues suggested. I have also heard about Conditional Random Field (CRF). Is it helpful?
Any suggestion would be welcome.
I'm kind of interested in smoothing local labels but do not want to apply Gaussian smoothing functions or whatever, as some colleagues suggested.
And why is that? Why do you not consider helpful advice of your colleagues, which are actually right. Applying smoothing function is the most reasonable way to go.
I have also heard about Conditional Random Field (CRF). Is it helpful?
This also suggests, that you should rather go with collegues advice, as CRF has nothing to do with your problem. CRF is a classifier, sequence classifier to be exact, requiring labeled examples to learn from and has nothing to do with the setting presented.
What are typical approaches?
The exact thing proposed by your collegues, you should define a smoothing function and apply it to your function values (I will not use a term "labels" as it is missleading, you do have values in [0,1], continuous values, "label" denotes categorical variable in machine learning) and its neighbourhood.
Another approach would be to define some optimization problem, where your current assignment of values is one goal, and the second one is "closeness", for example:
Let us assume that you have points with values {(x_i, y_i)}_{i=1}^N and that n(x) returns indices of neighbouring points of x.
Consequently you are trying to find {a_i}_{i=1}^N such that they minimize
SUM_{i=1}^N (y_i - a_i)^2 + C * SUM_{i=1}^N SUM_{j \in n(x_i)} (a_i - a_j)^2
------------------------- - --------------------------------------------
closeness to current constant to closeness to neighbouring values
values weight each part
You can solve the above optimization problem using many techniques, for example through scipy.optimize.minimize module.
I am not sure that your request makes any sense.
Having close label values for nearby superpixels is trivial: take some smooth function of (X, Y), such as constant or affine, taking values in the range [0,1], and assign the function value to the superpixel centered at (X, Y).
You could also take the distance function from any point in the plane.
But this is of no use as it is unrelated to the image content.
I´m trying to make an implementation of Gaussian blur for a school project.
I need to make both a CPU and a GPU implementation to compare performance.
I am not quite sure that I understand how Gaussian blur works. So one of my questions is
if I have understood it correctly?
Heres what I do now:
I use the equation from wikipedia http://en.wikipedia.org/wiki/Gaussian_blur to calculate
the filter.
For 2d I take RGB of each pixel in the image and apply the filter to it by
multiplying RGB of the pixel and the surrounding pixels with the associated filter position.
These are then summed to be the new pixel RGB values.
For 1d I apply the filter first horizontally and then vetically, which should give
the same result if I understand things correctly.
Is this result exactly the same result as when the 2d filter is applied?
Another question I have is about how the algorithm can be optimized.
I have read that the Fast Fourier Transform is applicable to Gaussian blur.
But I can't figure out how to relate it.
Can someone give me a hint in the right direction?
Thanks.
Yes, the 2D Gaussian kernel is separable so you can just apply it as two 1D kernels. Note that you can't apply these operations "in place" however - you need at least one temporary buffer to store the result of the first 1D pass.
FFT-based convolution is a useful optimisation when you have large kernels - this applies to any kind of filter, not just Gaussian. Just how big "large" is depends on your architecture, but you probably don't want to worry about using an FFT-based approach for anything smaller than, say, a 49x49 kernel. The general approach is:
FFT the image
FFT the kernel, padded to the size of the image
multiply the two in the frequency domain (equivalent to convolution in the spatial domain)
IFFT (inverse FFT) the result
Note that if you're applying the same filter to more than one image then you only need to FFT the padded kernel once. You still have at least two FFTs to perform per image though (one forward and one inverse), which is why this technique only becomes a computational win for large-ish kernels.
What are the ways in which to quantify the texture of a portion of an image? I'm trying to detect areas that are similar in texture in an image, sort of a measure of "how closely similar are they?"
So the question is what information about the image (edge, pixel value, gradient etc.) can be taken as containing its texture information.
Please note that this is not based on template matching.
Wikipedia didn't give much details on actually implementing any of the texture analyses.
Do you want to find two distinct areas in the image that looks the same (same texture) or match a texture in one image to another?
The second is harder due to different radiometry.
Here is a basic scheme of how to measure similarity of areas.
You write a function which as input gets an area in the image and calculates scalar value. Like average brightness. This scalar is called a feature
You write more such functions to obtain about 8 - 30 features. which form together a vector which encodes information about the area in the image
Calculate such vector to both areas that you want to compare
Define similarity function which takes two vectors and output how much they are alike.
You need to focus on steps 2 and 4.
Step 2.: Use the following features: std() of brightness, some kind of corner detector, entropy filter, histogram of edges orientation, histogram of FFT frequencies (x and y directions). Use color information if available.
Step 4. You can use cosine simmilarity, min-max or weighted cosine.
After you implement about 4-6 such features and a similarity function start to run tests. Look at the results and try to understand why or where it doesnt work. Then add a specific feature to cover that topic.
For example if you see that texture with big blobs is regarded as simmilar to texture with tiny blobs then add morphological filter calculated densitiy of objects with size > 20sq pixels.
Iterate the process of identifying problem-design specific feature about 5 times and you will start to get very good results.
I'd suggest to use wavelet analysis. Wavelets are localized in both time and frequency and give a better signal representation using multiresolution analysis than FT does.
Thre is a paper explaining a wavelete approach for texture description. There is also a comparison method.
You might need to slightly modify an algorithm to process images of arbitrary shape.
An interesting approach for this, is to use the Local Binary Patterns.
Here is an basic example and some explanations : http://hanzratech.in/2015/05/30/local-binary-patterns.html
See that method as one of the many different ways to get features from your pictures. It corresponds to the 2nd step of DanielHsH's method.