I have an image array which, because of the physical nature of the acquisition process, has a log distribution. What is the standard workflow for Histogram Equalization (HE) in this scenario: 1\apply HE directly; or 2\ apply log10(x) transformation and then HE?
Thanks!
Anything you do you should transfer your histogram to the uniform one as much as possible (because in that case, the contrast is more visible for our visual system). As your distribution is like logarithmic curves, it is suggested to perform the inverse of logarithm first(you should know your logarithm base first) and then apply histogram equalization on it.
Related
When reading about classic computer vision I am confused on how multiscale feature matching works.
Suppose we use an image pyramid,
How do you deal with the same feature being detected at multiple scales? How do you decide which to make a deacriptor for?
How do you connected features between scales? For example let's say you have a feature detected and matched to a descriptor at scale .5. Is this location then translated to its location in the initial scale?
I can share something about SIFT that might answer question (1) for you.
I'm not really sure what you mean in your question (2) though, so please clarify?
SIFT (Scale-Invariant Feature Transform) was designed specifically to find features that remains identifiable across different image scales, rotations, and transformations.
When you run SIFT on an image of some object (e.g. a car), SIFT will try to create the same descriptor for the same feature (e.g. the license plate), no matter what image transformation you apply.
Ideally, SIFT will only produce a single descriptor for each feature in an image.
However, this obviously doesn't always happen in practice, as you can see in an OpenCV example here:
OpenCV illustrates each SIFT descriptor as a circle of different size. You can see many cases where the circles overlap. I assume this is what you meant in question (1) by "the same feature being detected at multiple scales".
And to my knowledge, SIFT doesn't really care about this issue. If by scaling the image enough you end up creating multiple descriptors from "the same feature", then those are distinct descriptors to SIFT.
During descriptor matching, you simply brute-force compare your list of descriptors, regardless of what scale it was generated from, and try to find the closest match.
The whole point of SIFT as a function, is to take in some image feature under different transformations, and produce a similar numerical output at the end.
So if you do end up with multiple descriptors of the same feature, you'll just end up having to do more computational work, but you will still essentially match the same pair of feature across two images regardless.
Edit:
If you are asking about how to convert coordinates from the scaled images in the image pyramid back into original image coordinates, then David Lowe's SIFT paper dedicates section 4 on that topic.
The naive approach would be to simply calculate the ratios of the scaled coordinates vs the scaled image dimensions, then extrapolate back to the original image coordinates and dimensions. However, this is inaccurate, and becomes increasingly so as you scale down an image.
Example: You start with a 1000x1000 pixel image, where a feature is located at coordinates (123,456). If you had scaled down the image to 100x100 pixel, then the scaled keypoint coordinate would be something like (12,46). Extrapolating back to the original coordinates naively would give the coordinates (120,460).
So SIFT fits a Taylor expansion of the Difference of Gaussian function, to try and locate the original interesting keypoint down to sub-pixel levels of accuracy; which you can then use to extrapolate back to the original image coordinates.
Unfortunately, the math for this part is quite beyond me. But if you are fluent in math, C programming, and want to know specifically how SIFT is implemented; I suggest you dive into Rob Hess' SIFT implementation, lines 467 through 648 is probably the most detailed you can get.
All my ground truth images are color but my program is a genetic algorithm working with binary images. I do not know how to compute the fitness in order to see the similarity and differences between the ground truth and my binary image.
The colors you see are most likely just classes, the colors solely exist to showcase the differences better. To compute the similarity, you can use the Dice similarity coefficient or the F1 Score. If you need a more granular overview, i suggest you compute so called Confusion Matrices.
Long story short: Each color represents a different class in your output, you can calculate the metrics mentioned above to compare your output to your groundtruth.
How can the spatial context (or neighbourhood) of a pixel be taken into account (besides the pixel intensity) when clustering an image?
For the time being, I'm using K-means, GMM and Fuzzy C-means which cluster the image based only on the distribution of the pixel intensities. But, I need to include the information on the spatial context of the pixel into the clustering, to avoid the misclassification caused by the noise speckle.
The standard approach for segmentation is to add the X and Y coordinates with appropriate scaling to the color values (in RGB or Lab space).
Examples of these are SLIC (K-means clustering in x-y-Lab space) and Quickshift (an accelerated mean shift in x-y-Lab space).
When also considering spacial distances, it is often possible to gain a lot of speed. Check out the implementations in scikit-image or this blog or my blog
I have a set of 100K 64x64 gray patches (that are already aligned, meaning they all have the same orientation) and I would like to extract a SIFT descriptor from each one using OpenCV.
It is clear to me all I need to do is to define a vector with one keypoint kp such that: kp.x=32, kp.y=32.
However, I don't know how to set the kp.size parameter. From going over SIFT's code, it looks as it's doing some non-trivial calculations with that parameter instead of just assuming that it's the size of the patch.
Question 1: what should be the kp.size parameter when extracting SIFT descriptors from patches of size 64x64?
Question 2: what should be the kp.size parameter when extracting SURF descriptors from patches of size 64x64?
If you look at sift original publication, the scale of the keypoint is used to weight the histogram of gradients magnitude and orientations(paragraph 6. The local image descriptor). So in your case, since the grey patches are aligned, it is up to you to decide if you want to weight the contributions of the pixels further from the patch center or not, and select the scale (i.e. the with of the gaussian weighting window) accordingly.
For SURF, it's basically the same principle except that instead of gradient magnitude the response to haar wavelet is use, but you could still weight those responses with a gaussian window.
Also, since you are working with those aligned patches I would advise you not to use the high-level functions of OpenCV, but to simply use/recode the descriptor extraction part, and to apply any weighting you want to compute your patch representation. One reason to do so is that, in the SIFT example, the computation of SIFT descriptors might "add new keypoints" to the one you provided, if the algorithm is "not happy" with the keypoint orientation, it duplicates the keypoint at the same location but different orientation.
Okay. So the SIFT descriptor uses a neighbourhood of 4x4 grids usually, each grid usually being 4x4 pixels. Therefore the neighbourhood in pixels is usually 16x16. The scale/size is the parameter to determine the amount of downsampling/blurring/radius of keypoint. So I would think in your case, this would be 4.
You probably would also know that SIFT keypoints also work on sub-pixel layers. (32,32) would not be the exact center of your image patch, which would actually be (32.5, 32.5) if your image dimensions (x,y) start from 1. If they start from 0, it would be (31.5, 31.5)- as in the case of opencv.
I am doing some image enhancement experiments so I take photos from my cheap camera. The camera has mosaic artifacts and all images look like grid. I think pillbox (out-of-focus) kernel and Gaussian kernel would not be the best candidates. Any suggestions?
EDIT:
Sample
I suspect this cannot be done via a constant kernel, because the effects on pixels are not the same (so there are "grids").
The effects are non linear. (And probably non-stationary), so you cannot simply invert the convolution and enhance the image -- if you could, the camera chip would do it on-board.
The best way to work out what the convolution is (or at least an approximation to it) might be to take photos of known patterns, compute, and working in 2D frequency/laplace domain divide the resulting spectra to get a linear approximation to the filter.
I suspect that the convolution you discover by doing this will be very context dependant -- so the best way to enhance an image might be to divide it into tiles, classify each region of the image as belonging to a different set (for each of which you could work out a different linear approximation to the convolution, based on test data), and then deconvolve each separately.