I've read about the power law (Gamma) Transformations so let's look to the equation: s = c*r^γ
Suppose that I have one pixel which has intensity of 37. If the gamma is 0.4 and c is 1, then the output intensity is 37^(0.4) which is 4.2. Thus it's darker, not brighter. But then why does it look brighter in the example in my textbook?
The gamma transformation applies to data in the range [0,1]. So, for your typical unsigned 8-bit integer image, you would have to scale it first to that range. The equation, including the scaling, then would be:
s = 255 * (r/255)^γ
Now you'd have, for r = 37 and γ = 0.4: s = 255 * (37/255)^0.4 = 117.8. This is brighter.
Related
A similar question has been asked here. However I could not understand it clearly.
I understand that SIFT computation has the following steps:
Finding scale space extrema
Keypoint localization(and filtering)
Orientation assignment (using computation of gradient magnitude and orientation)
Create SIFT descriptor
My question is for the fourth step: How to set the region over which the SIFT descriptor is computed? Also how is the shape of the region for SIFT computation determined?
Suppose the scale space extrema was found at scale "s" in the second octave. I use the gradient orientation to align to a canonical orientation. How do I set the region of computation of the SIFT descriptor using these information? Do I use the scale or the magnitude of the gradient to find the region on which SIFT is to be computed? Also how is the shape of the region determined?
So this was surprisingly tricky to find an answer for.
David Lowe's original paper only seemed to provide vague theoretical explanation on how his algorithm worked.
And as far as I know, his official implementation never had its feature descriptor code open-sourced.
So I'm basing my answer off what I consider the next-most canonical implementation of the SIFT algorithm, being Rob Hess' OpenSIFT implementation;
which became the base for OpenCV's official implementation.
Anyway, here is my understanding of how SIFT roughly works:
Once you have located your extrema, you should know which octave & interval of the Gaussian Pyramid the extrema belongs to.
Based on Rob's code (these two functions on lines 1026-1112), the feature descriptor is calculated from the blurred image of that octave & interval.
And the region for calculating SIFT is a square shape surrounding the keypoint. This medium article also seems to agree (see illustration).
The SIFT formula for the Gaussian Kernel scale, relative to the original image size is (reference):
base_scale * 2^(octave + interval / intervals_per_octave)
Or this formula if working relative to the halved image in each octave:
base_scale * 2^(interval / intervals_per_octave)
Where the original paper defined the parameters through experiments as:
base_scale = 1.6 and intervals_per_octave = 3
So if your SIFT was set to have 3 intervals per octave, with a base Gaussian scale of 1.6, and the extrema was found on octave 2, interval 3;
the image will have been blurred by a Gaussian Kernel of scale : 1.6 * 2^(2 + 3/3) = 12.80 pixels
Now the actual array size of the Gaussian kernel will depend on the code you use, as the scale and the kernel size can be set independently.
In cases like MATLAB, I've found a helpful guidelines from this SO thread.
The selected answer recommends kernel width of 6 times the scale (i.e. 3 sigma rule), our kernel width (and height) is 12.80 * 6 ≈ 77 pixels;
thus, a SIFT descriptor region of size 77x77 pixels.
Meanwhile, the OpenCV implementation appears to leave the size of the kernel to be determined by OpenCV's own built-in Gaussian Blur function.
Line 246 from OpenCV's code leaves the Gaussian Blur function parameter ksize as zeroes,
which the official docs only states the kernel size will be "computed from sigma", and never defines how it is actually calculated...
Finally, for Rob's implementation, I have to admit that I couldn't quite understand what was happening in this final step. ¯\_(ツ)_/¯
From lines 1026-1112 Rob defined the code below, which shows show how he calculates the orientation histogram for the SIFT descriptor.
The code shows he defined a radius and used the nested for-loops with i and j to iterate through the square region around the keypoint, located at point (r,c).
Yet what I don't really understand is:
How he defined radius, with the Gaussian scale scl multiplied with some unknown constant SIFT_DESCR_SCL_FCTR = 3.0
As well as hist_width * sqrt(2) * ( d + 1.0 ) * 0.5 + 0.5, where d = SIFT_DESCR_WIDTH = 4
hist_width = SIFT_DESCR_SCL_FCTR * scl;
radius = hist_width * sqrt(2) * ( d + 1.0 ) * 0.5 + 0.5;
for( i = -radius; i <= radius; i++ )
for( j = -radius; j <= radius; j++ )
{
/*
Calculate sample's histogram array coords rotated relative to ori.
Subtract 0.5 so samples that fall e.g. in the center of row 1 (i.e.
r_rot = 1.5) have full weight placed in row 1 after interpolation.
*/
c_rot = ( j * cos_t - i * sin_t ) / hist_width;
r_rot = ( j * sin_t + i * cos_t ) / hist_width;
rbin = r_rot + d / 2 - 0.5;
cbin = c_rot + d / 2 - 0.5;
if( rbin > -1.0 && rbin < d && cbin > -1.0 && cbin < d )
if( calc_grad_mag_ori( img, r + i, c + j, &grad_mag, &grad_ori ))
{
grad_ori -= ori;
while( grad_ori < 0.0 )
grad_ori += PI2;
while( grad_ori >= PI2 )
grad_ori -= PI2;
obin = grad_ori * bins_per_rad;
w = exp( -(c_rot * c_rot + r_rot * r_rot) / exp_denom );
interp_hist_entry( hist, rbin, cbin, obin, grad_mag * w, d, n );
}
}
But regardless of how the exact size of the region is calculated, I think the general concept is the same.
To calculate the region size based on the original Gaussian scale.
Besides, given that the features are supposed to be "weighted by a Gaussian window" (original paper, section 6.1, page 15);
as long as the region you define is large enough to contain most of the meaningful orientation histograms, you are fine.
In summary:
The SIFT descriptor is calculated from the halved & blurred image of the same octave/interval as the keypoint (OpenSIFT)
The region for the SIFT descriptor is a square shape surrounding the keypoint (medium)(image)
The region size is calculated based on the Gaussian kernel scale, though the exact method for calculation can vary an easy rule of thumb is "width of 6 times the kernel scale" (thread)
I would like to use OpenCV to detect which rectangles in an image have a majority of pixels close to a given color.
Here's an example of an image I would like to process using this to identify rectangular regions that contain mostly gray pixels (possibly roads):
More precisely, given:
dimensions h x w (height and weight of candidate rectangles)
a distance function dist for colors (for example, the norm of the vector difference between the color vector, which could be RGB or any other representation)
a color vector C
a maximum distance d for colors to be from C
a minimum percentage rate r of pixels in a given rectangle to be within distance d from C for the rectangle to be of interest,
return a mask M in which each pixel P is 1 if the rectangle of size h x w left-cornered by P contains at least r % of its pixels within distance d from C when measured with dist.
In pseudo-code, pixel P in the mask is 1 if and only if:
def rectangle_left_cornered_at_P_is_of_interest(P):
n_pixels_near_C = size([P' for P' in rectangle(P, P + (h,w)) if dist(P',C) < d])
return n_pixels_near_C / (h * w) > r
I imagine there may already exist a filter/kernel that does just that (or can be used to do that) in OpenCV, but I am still learning about it and could not identify one by looking at the documentation. Is there such a thing?
You can use HSV for this . you may have to play with the values a bit for the mask but it will get the job done.
img = cv2.imread(img)
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
lower_gray = np.array([0, 5, 50], np.uint8)
upper_gray = np.array([350, 50, 255], np.uint8)
mask = cv2.inRange(hsv, lower_gray, upper_gray)
img_res = cv2.bitwise_and(img, img, mask = mask)
cv2.imwrite('gray.png',img_res)
You should also refer to this post. Its a good post on the use of HSV.
Basicly all you will need for this job will be :
HSV masks,
Otsu thresholding , blurs and may be erosion and dilation.
Use them in some combition that fits your requirement best.
In the SURF technique, and more precisely within the feature description stage, the authors have stated (if I understand correctly) that the description will be performed in a area of 20 times sigma. Sigma represents the scale on which the keypoint was detected.
Sigma = 0.4 x L where L = 2^Octave x level+1. If we use the OpenCV implementation, the DetectAndCompute function computes, with the value of Keypoint.size, the radius of the circle surrounding the keypoint.
My question is : How could we get the value of sigma from the radius value ?
According to these lines:
KeyPoint& kp = (*keypoints)[k];
float size = kp.size;
Point2f center = kp.pt;
/* The sampling intervals and wavelet sized for selecting an orientation
and building the keypoint descriptor are defined relative to 's' */
float s = size*1.2f/9.0f;
This value s = size*1.2f/9.0f is not montioned in the bay's article scale= L*0.4 or
scale= L* 1.2/3 any one can explain me this part??
In OpenCV how do you calculate the average gradient strength in a Mat and the average gradient direction?
I have sourced the below methods by googling but I want to confirm I am actually doing this correctly before moving onto the next step.
Is this correct?
Mat img = imread('foo.png', CV_8UC); // read image as grayscale single channel
// Calculate the mean intensity and the std deviation
// Any errors here or am I doing this correctly?
Scalar sMean, sStdDev;
meanStdDev(src, sMean, sStdDev);
double mean = sMean[0];
double stddev = sStdDev[0];
// Calculate the average gradient magnitude/strength across the image
// Any errors here or am I doing this correctly?
Mat dX, dY, magnitude;
Sobel(src, dX, CV_32F, 1, 0, 1);
Sobel(src, dY, CV_32F, 0, 1, 1);
magnitude(dX, dY, magnitude);
Scalar sMMean, sMStdDev;
meanStdDev(magnitude, sMMean, sMStdDev);
double magnitudeMean = sMMean[0];
double magnitudeStdDev = sMStdDev[0];
// Calculate the average gradient direction across the image
// Any errors here or am I doing this correctly?
Scalar avgHorizDir = mean(dX);
Scalar avgVertDir = mean(dY);
double avgDir = atan2(-avgVertDir[0], avgHorizDir[0]);
float blurriness = cv::videostab::calcBlurriness(src); // low values = sharper. High values = blurry
Technically those are the correct ways of obtaining the two averages.
The way you compute mean direction uses weighted directional statistics, meaning that pixels without a strong gradient have less influence on the average.
However, for most images this average direction is not very meaningful, as there exist edges in all directions and cancel out.
If your image is of a single edge, then this will work great.
If your image has lines in it, containing edges in opposite directions, this will not work. In this case, you want to average the double angle (average orientations). The obvious way of doing this is to compute the direction per pixel as an angle, double them, then use directional statistics to average (ie convert back to vectors and average those). Doubling the angle causes opposite directions to be mapped to the same value, thus averaging doesn’t cancel these out.
Another simple way to average orientations is to take the average of the tensor field obtained by the outer product of the gradient field with itself, and determine the direction of the eigenvector corresponding to the largest eigenvalue. The tensor field is obtained as follows:
Mat Sxx = dX * dX;
Mat Syy = dY * dY;
Mat Sxy = dX * dY;
This should then be averaged:
Scalar mSxx = mean(sXX);
Scalar mSyy = mean(sYY);
Scalar mSxy = mean(sXY);
These values form a 2x2 real-valued symmetric matrix:
| mSxx mSxy |
| mSxy mSyy |
It is relatively straight-forward to determine its eigendecomposition, and can be done analytically. I don’t have the equations on hand right now, so I’ll leave it as an exercise to the reader. :)
I am looking to analyze the most dominant color in a UIImage on iOS (color present in the most pixels) and I stumbled upon Core Image's filter based API, particularly CIAreaHistogram.
It seems like this filter could probably help me but I am struggling to understand the API. Firstly it says the output of the filter is a one-dimensional image which is the length of your input-bins and one pixel in height. How do I read this data? I basically want to figure out the color-value with the highest frequency so I am expecting the data to contain some kind of frequency count for each color, its not clear to me how this one-dimensional image would represent that because it does not really explain the data I can expect inside this 1-d image. And if its truly a histogram why would it not return a data-structure representing that like a dictionary
Second, in the API it asks for a number of bins? What should that input be? If I want an exact analysis would the input bin parameter be the color-space of my image? What does making the bin value smaller do, I would imagine it just approximates nearby colors via Euclidean distance to the nearest bin. If this is the case will that not yield exact histogram results, why would anyone want to do that?
Any input on the above two questions from an API perspective would help me greatly
Ian Ollmann's idea of calculating the histogram just for the hue is really neat and can be done with a simple color kernel. This kernel returns a monochrome image of just the hue of an image (based on this original work)
let shaderString = "kernel vec4 kernelFunc(__sample c)" +
"{" +
" vec4 K = vec4(0.0, -1.0 / 3.0, 2.0 / 3.0, -1.0);" +
" vec4 p = mix(vec4(c.bg, K.wz), vec4(c.gb, K.xy), step(c.b, c.g));" +
" vec4 q = mix(vec4(p.xyw, c.r), vec4(c.r, p.yzx), step(p.x, c.r));" +
" float d = q.x - min(q.w, q.y);" +
" float e = 1.0e-10;" +
" vec3 hsv = vec3(abs(q.z + (q.w - q.y) / (6.0 * d + e)), d / (q.x + e), q.x);" +
" return vec4(vec3(hsv.r), 1.0);" +
"}"
let colorKernel = CIColorKernel(string: shaderString)
If I get the hue of an image of a blue sky, the resulting histogram looks like this:
...while a warm sunset gives a histogram like this:
So, that looks like a good technique to get the dominant hue of an image.
Simon
CIAreaHistogram returns an image where the reg, green, blue and alpha values of each of the pixels indicates the frequency of that tone in the image. You can render that image to an array of UInt8 to look at the histogram data. There's also an undocumented outputData value:
let filter = CIFilter(
name: "CIAreaHistogram",
withInputParameters: [kCIInputImageKey: image])!
let histogramData = filter.valueForKey("outputData")
However, I've found vImage to be a better framework for working with histograms. First off, you need to create a vImage image format:
var format = vImage_CGImageFormat(
bitsPerComponent: 8,
bitsPerPixel: 32,
colorSpace: nil,
bitmapInfo: CGBitmapInfo(
rawValue: CGImageAlphaInfo.PremultipliedLast.rawValue),
version: 0,
decode: nil,
renderingIntent: .RenderingIntentDefault)
vImage works with image buffers that can be created from CGImage rather than CIImage instances (you can create one with the createCGImage method of CIContext. vImageBuffer_InitWithCGImage will create an image buffer:
var inBuffer: vImage_Buffer = vImage_Buffer()
vImageBuffer_InitWithCGImage(
&inBuffer,
&format,
nil,
imageRef,
UInt32(kvImageNoFlags))
Now to create arrays of Uint which will hold the histogram values for the four channels:
let red = [UInt](count: 256, repeatedValue: 0)
let green = [UInt](count: 256, repeatedValue: 0)
let blue = [UInt](count: 256, repeatedValue: 0)
let alpha = [UInt](count: 256, repeatedValue: 0)
let redPtr = UnsafeMutablePointer<vImagePixelCount>(red)
let greenPtr = UnsafeMutablePointer<vImagePixelCount>(green)
let bluePtr = UnsafeMutablePointer<vImagePixelCount>(blue)
let alphaPtr = UnsafeMutablePointer<vImagePixelCount>(alpha)
let rgba = [redPtr, greenPtr, bluePtr, alphaPtr]
let histogram = UnsafeMutablePointer<UnsafeMutablePointer<vImagePixelCount>>(rgba)
The final step is to perform the calculation, which will populate the four arrays, and free the buffer's data:
vImageHistogramCalculation_ARGB8888(&inBuffer, histogram, UInt32(kvImageNoFlags))
free(inBuffer.data)
A quick check of the alpha array of an opaque image should yield 255 zeros with the final value corresponding to the number of pixels in the image:
print(alpha) // [0, 0, 0, 0, 0 ... 409600]
A histogram won't give you the dominant color from a visual perspective: an image which is half yellow {1,1,0} and half black {0,0,0} will give the same results as an image which is half red {1,0,0} and held green {0,1,0}.
Hope this helps,
Simon
One problem with the histogram approach is that you lose correlation between the color channels. That is, half your image could be magenta and half yellow. You will find a red histogram that is all in the 1.0 bin, but the blue and green bins would be evenly split between 0.0 and 1.0 with nothing in between. Even though you can be quite sure that red is bright, you won't be able to say much about what the blue and green component should be for the "predominant color"
You could use a 3D histogram with 2**(8+8+8) bins, but this is quite large and you will find the signal is quite sparse. By happenstance three pixels might land in one bin and have no two the same elsewhere, even though many users could tell you that there is a predominant color and it has nothing to do with that pixel.
You could make the 3D histogram a lot lower resolution and have (for example) just 16 bins per color channel. It is much more likely that bins will have a statistically meaningful population count this way. This should give you a starting point to find a mean for a local population of pixels in that bin. If each bin had a count and a {R,G,B} sum, then you could quickly find the mean color for pixels in that bin once you had identified the most popular bins. This method is still subject to some influence from the histogram grid. You will be more likely to identify colors in the middle of a grid cell than at the edges. Populations may span multiple grid cells. Something like kmeans might be another method.
If you just want predominant hue, then conversion to a color space like HSV followed by a histogram of hue would work.
I'm not aware of any filters in vImage, CI or MetalPerformanceShaders to do these things for you. You can certainly write code in either the CPU or Metal to do it without a lot of trouble.