In all tutorials there is an image and after applying FFT ,we get the image in frequecy domain.And by following the inverse procedure we reconstruct the original image.
But what if we start with unknown jpg of a fourier transform(In K space of MRI for example).I can not inverse the step in pseudocode:
fourier_image=log(abs(ffshift(fft(original_image))))
and go back.
Related
Can someone please guide the steps/the operation to be performed to construct this image and detect the broken fence position of the Image.
Thresholding the image to a binary image : to convert the input image to a binary image
Inverting the image : inverting it to get a black background and white lines
Dilation with SE one unit of the fence structure
Apply Erosion
Bitwise-and masks together: retrieve the original back- and foreground the image is inverted by subtracting the bitwise_or from 255
Constructed Image - Original Image will give us the position of the broken fence
Will this solution work ?
Depends what you call locate.
After large horizontal erosion and binarization:
Assuming that I have a grayscale (8-bit) image and assume that I have an integral image created from that same image.
Image resolution is 720x576. According to SURF algorithm, each octave is composed of 4 box filters, which are defined by the number of pixels on their side. The
first octave uses filters with 9x9, 15x15, 21x21 and 27x27 pixels. The
second octave uses filters with 15x15, 27x27, 39x39 and 51x51 pixels.The third octave uses filters with 27x27, 51x51, 75x75 and 99x99 pixels. If the image is sufficiently large and I guess 720x576 is big enough (right??!!), a fourth octave is added, 51x51, 99x99, 147x147 and 195x195. These
octaves partially overlap one another to improve the quality of the interpolated results.
// so, we have:
//
// 9x9 15x15 21x21 27x27
// 15x15 27x27 39x39 51x51
// 27x27 51x51 75x75 99x99
// 51x51 99x99 147x147 195x195
The questions are:What are the values in each of these filters? Should I hardcode these values, or should I calculate them? How exactly (numerically) to apply filters to the integral image?
Also, for calculating the Hessian determinant I found two approximations:
det(HessianApprox) = DxxDyy − (0.9Dxy)^2 anddet(HessianApprox) = DxxDyy − (0.81Dxy)^2Which one is correct?
(Dxx, Dyy, and Dxy are Gaussian second order derivatives).
I had to go back to the original paper to find the precise answers to your questions.
Some background first
SURF leverages a common Image Analysis approach for regions-of-interest detection that is called blob detection.
The typical approach for blob detection is a difference of Gaussians.
There are several reasons for this, the first one being to mimic what happens in the visual cortex of the human brains.
The drawback to difference of Gaussians (DoG) is the computation time that is too expensive to be applied to large image areas.
In order to bypass this issue, SURF takes a simple approach. A DoG is simply the computation of two Gaussian averages (or equivalently, apply a Gaussian blur) followed by taking their difference.
A quick-and-dirty approximation (not so dirty for small regions) is to approximate the Gaussian blur by a box blur.
A box blur is the average value of all the images values in a given rectangle. It can be computed efficiently via integral images.
Using integral images
Inside an integral image, each pixel value is the sum of all the pixels that were above it and on its left in the original image.
The top-left pixel value in the integral image is thus 0, and the bottom-rightmost pixel of the integral image has thus the sum of all the original pixels for value.
Then, you just need to remark that the box blur is equal to the sum of all the pixels inside a given rectangle (not originating in the top-lefmost pixel of the image) and apply the following simple geometric reasoning.
If you have a rectangle with corners ABCD (top left, top right, bottom left, bottom right), then the value of the box filter is given by:
boxFilter(ABCD) = A + D - B - C,
where A, B, C, D is a shortcut for IntegralImagePixelAt(A) (B, C, D respectively).
Integral images in SURF
SURF is not using box blurs of sizes 9x9, etc. directly.
What it uses instead is several orders of Gaussian derivatives, or Haar-like features.
Let's take an example. Suppose you are to compute the 9x9 filters output. This corresponds to a given sigma, hence a fixed scale/octave.
The sigma being fixed, you center your 9x9 window on the pixel of interest. Then, you compute the output of the 2nd order Gaussian derivative in each direction (horizontal, vertical, diagonal). The Fig. 1 in the paper gives you an illustration of the vertical and diagonal filters.
The Hessian determinant
There is a factor to take into account the scale differences. Let's believe the paper that the determinant is equal to:
Det = DxxDyy - (0.9 * Dxy)^2.
Finally, the determinant is given by: Det = DxxDyy - 0.81*Dxy^2.
Look at page 17 of this document
http://www.sci.utah.edu/~fletcher/CS7960/slides/Scott.pdf
If you made a code for normal Gaussian 2D convolution, just use the box filter as a Gaussian kernel and the input image will be the same original image not integral image. The results from this method will be same with the one you asked.
When given an image such as this:
And not knowing the color of the object in the image, I would like to be able to automatically find the best H, S and V ranges to threshold the object itself, in order to get a result such as this:
In this example, I manually found the values and thresholded the image using cv::inRange.The output I'm looking for, are the best H, S and V ranges (min and max value each, total of 6 integer values) to threshold the given object in the image, without knowing in advance what color the object is. I need to use these values later on in my code.
Keypoints to remember:
- All given images will be of the same size.
- All given images will have the same dark background.
- All the objects I'll put in the images will be of full color.
I can brute force over all possible permutations of the 6 HSV ranges values, threshold each one and find a clever way to figure out when the best blob was found (blob size maybe?). That seems like a very cumbersome, long and highly ineffective solution though.
What would be good way to approach this? I did some research, and found that OpenCV has some machine learning capabilities, but I need to have the actual 6 values at the end of the process, and not just a thresholded image.
You could create a small 2 layer neural network for the task of dynamic HSV masking.
steps:
create/generate ground truth annotations for image and its HSV range for the required object
design a small neural network with at least 1 conv layer and 1 fcn layer.
Input : Mask of the image after applying the HSV range from ground truth( mxn)
Output : mxn mask of the image in binary
post processing : multiply the mask with the original image to get the required object highligted
I am new to Open Cv, I want to transform the two images src and dst image . I am using cv::estimateRigidTransform() to calculate the transformation matrix and after that using cv::warpAffine() to transform from dst to src. when I compare the new transformed image with src image it is almost same (transformed), but when I am getting the abs difference of new transformed image and the src image, there is lot of difference. what should I do as My dst image has some rotation and translation factor as well. here is my code
cv::Mat transformMat = cv::estimateRigidTransform(src, dst, true);
cv::Mat output;
cv::Size dsize = leftImageMat.size(); //This specifies the output image size--change needed
cv::warpAffine(src, output, transformMat, dsize);
Src Image
destination Image
output image
absolute Difference Image
Thanks
You have some misconceptions about the process.
The method cv::estimateRigidTransform takes as input two sets of corresponding points. And then solves set of equations to find the transformation matrix. The output of the transformation matches src points to dst points (exactly or closely, if exact match is not possible - for example float coordinates).
If you apply estimateRigidTransform on two images, OpenCV first find matching pairs of points using some internal method (see opencv docs).
cv::warpAffine then transforms the src image to dst according to given transformation matrix. But any (almost any) transformation is loss operation. The algorithm has to estimate some data, because they aren't available. This process is called interpolation, using known information you calculate the unknown value. Some info regarding image scaling can be found on wiki. Same rules apply to other transformations - rotation, skew, perspective... Obviously this doesn't apply to translation.
Given your test images, I would guess that OpenCV takes the lampshade as reference. From the difference is clear that the lampshade is transformed best. Default the OpenCV uses linear interpolation for warping as it's fastest method. But you can set more advances method for better results - again consult opencv docs.
Conclusion:
The result you got is pretty good, if you bear in mind, it's result of automated process. If you want better results, you'll have to find another method for selecting corresponding points. Or use better interpolation method. Either way, after the transform, the diff will not be 0. It virtually impossible to achieve that, because bitmap is discrete grid of pixels, so there will always be some gaps, which needs to be estimated.
Still I am not clear about the Fourier transformation. I know it represents the frequency information of the images and I can reconstruct an image using the fourier transformation.
Say, I have an image I(x,y). Its fourier transformation is F(I). I want to reconstruct a small rectangular area in that image starting from (x1,y1) and ending at (x2,y2) without reconstructing the whole image.
Is it possible to reconstruct only a small patch from F(I)?
To answer your question, yes it is possible; do an inverse FFT and then crop the image normally. If it seems like a cop-out it is because you're attempting to do a time-domain task in frequency domain which isn't going to be very natural.
If you insist that the calculation be done in frequency domain I think you should be able to phase shift the image to the origin (x1 + y1) then inverse FFT and discard samples outside (x2 - x1, y2 - y1).
The fundamental problem is that in frequency domain each bin (or pixel for a 2D FFT) represents a frequency and phase across the entire image in time domain. Discarding a single pixel in frequency domain results in a loss of that frequency information for the whole image and cannot be localized.