From what I understand, the function color.rgb2gray()from the scikit-image package is a method of image normalization.
imgGray = color.rgb2gray(img)
According to the documentation:
"The value of each grayscale pixel is calculated as the weighted sum of the corresponding red, green and blue pixels as:
Y = 0.2125 R + 0.7154 G + 0.0721 B"
Can anyone confirm if my interpretation of this is correct?
Thanks!
Related
I want to train some models to work with grayscale images, which e.g. is useful for microscope applications (Source). Therefore I want to train my model on graysale imagenet, using the pytorch grayscale conversion (torchvision.transforms.Grayscale), to convert the RGB imagenet to a grayscale imagenet. Internally pytorch rotates the color space from RGB to YPbPr as follows:
Y' is the grayscale channel then, so that Pb and Pr can be neglected after transformation. Actually pytorch even only calculates
grayscale = (0.2989 * r + 0.587 * g + 0.114 * b)
To normalize the image data, I need to know grayscale-imagenet's mean pixel value, as well as the standard deviation. Is it possible to calculate those?
I had success in calculating the mean pixel intensity using
meanGrayscale = 0.2989 * r.mean() + 0.587 * g.mean() + 0.114 * b.mean()
Transforming an image and then calculating the grayscale mean, gives the same result as first calculating the RGB means and then transforming those to a grayscale mean.
However, I am clueless when it comes to calculating the variance or standard deviation now. Does somebody have any idea, or knows some good literature on the topic? Is this even possible?
I found a publication "Jianxin Gong - Clarifying the Standard Deviational Ellipse" ... There he does it in 2 dimensions (as far as I understand). I just could not figure out yet how to do it in 3D.
Okay, I wasn't able to calculate the standard deviation as planned, but did it using the code below. The grayscale imagenet's train dataset mean and standard deviation are (round it as much as you like):
Mean: 0.44531356896770125
Standard Deviation: 0.2692461874154524
import multiprocessing
import os
def calcSTD(d):
meanValue = 0.44531356896770125
squaredError = 0
numberOfPixels = 0
for f in os.listdir("/home/imagenet/ILSVRC/Data/CLS-LOC/train/"+str(d)+"/"):
if f.endswith(".JPEG"):
image = imread("/home/imagenet/ILSVRC/Data/CLS-LOC/train/"+str(d)+"/"+str(f))
###Transform to gray if not already gray anyways
if np.array(image).ndim == 3:
matrix = np.array(image)
blue = matrix[:,:,0]/255
green = matrix[:,:,1]/255
red = matrix[:,:,2]/255
gray = (0.2989 * red + 0.587 * green + 0.114 * blue)
else:
gray = np.array(image)/255
###----------------------------------------------------
for line in gray:
for pixel in line:
squaredError += (pixel-meanValue)**2
numberOfPixels += 1
return (squaredError, numberOfPixels)
a_pool = multiprocessing.Pool()
folders = []
[folders.append(f.name) for f in os.scandir("/home/imagenet/ILSVRC/Data/CLS-LOC/train") if f.is_dir()]
resultStD = a_pool.map(calcSTD, folders)
StD = (sum([intensity[0] for intensity in resultStD])/sum([pixels[1] for pixels in resultStD]))**0.5
print(StD)
During the process some errors like this occured:
/opt/conda/lib/python3.7/site-packages/PIL/TiffImagePlugin.py:771:
UserWarning: Possibly corrupt EXIF data. Expecting to read 8 bytes
but only got 4. Skipping tag 41486 "Possibly corrupt EXIF data. "
The repective images from the 2019 version of ImageNet were skipped.
Code of function performing accumulation in [a, b] space
minr: minimum radius.
maxr: maximum radius.
Magnitude: Binary output after edge detection using Sobel.
Gradient: Direction map. Calculated with 'atan(Ix./Iy)' where Ix is horizontal and Iy is vertical.
function [A] = accumulation(minr, maxr, magnitude, gradient)
[rows, cols, ~] = size(magnitude);
A = zeros(rows, cols, maxr);
for row = 1:rows
for col = 1:cols
for r = minr:maxr
a = row - r * cos(gradient(row, col));
b = col - r * sin(gradient(row, col));
a = round(a);
b = round(b);
if (a > 0 && a <= rows && b > 0 && b <= cols)
A(a, b, r) = A(a, b, r) + (magnitude(row, col)/r);
end
end
end
end
end
Output
Although, I am using 3 dimensional array, following image is in 2D just to show the issue.
Steps performed before accumulation in [a, b] space
Smoothing using 3x3 Gaussian filter.
Edge detection using Sobel operators which returns magnitude and direction map.
Thresholding and thinning.
Source
I am using Circle Detection Using Hough Transforms
Documentation by Jaroslav Borovicka for guidance.
One issue I see in you code is that you set only one point per r. You need two. Note that the gradient gives you the orientation of the edge, but you don't know the direction towards the center -- unless you're computing the gradient of an image with solid disks, and you know the contrast with the background (i.e it's always black in white or white on black). Typically one sets a point at distance r in direction theta and another in direction theta + pi.
Another problem you might be having is inaccuracy in the computation of the gradient. If you compute this on a binarized image, the direction of the gradient will be off by a lot. Smithing your grey-value image before computing the gradient might help (or better, use Gaussian gradients).
"Smoothing using 3x3 Gaussian filter" is wrong by definition. See the link above.
"Thresholding and thinning" -- try not thresholding. Your code is set up to accumulate using gradient magnitude as weights. Use those, they'll help.
Finally, don't use atan, use atan2 instead.
for a machine vision project I am trying to search image data for quadratic surfaces (f(x,y) = Ax^2+Bx+Cy^2+Dy+Exy+F). My plan is to iterate through regions of data and perform a surface-fit, look at the error, see if it's a continuous surface (which would probably indicate a feature in the image).
I was previously able to find quadratic curves (f(x) = Ax^2+Bx+C) in the image data by sampling lines, by using the equations on this site
Link
this worked well, was promising, but it would be much more useful for my task to find 2-D regions that form continuous surfaces.
I see lots of articles indicating that least squares regressions scales up to multiple dimensions, but I'm not able to find code for this Hopefully there is a "closed form" (non-iterative, just compute from your data points) solution, like described above for 1D data. Does anybody know of some source or pseudocode that accomplishes this? Thanks.
(Sorry if my terminology is a bit off.)
I'm not sure what your background is, but if you know some linear algebra you will find linear least squares on wikipedia useful.
Lets take the following example. Say we have the following image
and we want to know how well this fits to a 2D quadratic function in a least squares sense.
Probably the most straightforward way to solve the problem is to compute the optimal coefficients in a least squares sense, then check the error.
First we need to describe the matrices.
Let X be a matrix containing every x,y coordinate in the image, taking the form
X = [x1 x1^2 y1 y1^2 x1*y1 1;
x2 x2^2 y2 y2^2 x2*y2 1;
...
xN xN^2 yN yN^2 xN*yN 1];
For the example image above, X would be a 100x6 matrix.
Let y be the image intensity values in a vector of the form
y = [img(x1,y1);
img(x2,y2);
...
img(xN,yN)]
In this case y is a 100 element column vector.
We want to minimize the least squares objective function S with respect to the vector of coefficients b
S(b) = |y - X*b|^2
where |.| is the L2 norm and b is the desired coefficients
b = [A;
B;
C;
D;
E;
F]
Taking the vector derivative of S(b) with respect to b, setting to zero, and solving for b leads to the standard least squares solution.
b = inv(X'X)*X'*y
where inv is the matrix inverse, ' is transpose, and * is matrix multiplication.
MATLAB example.
% Generate an image
% define x,y coordinates for each location in the image
[x,y] = meshgrid(1:10,1:10);
% true coefficients
b_true = [0.1 0.5 0.3 -0.4 0.4 124];
% magnitude of noise
P = 2;
% create image
img = b_true(1).*x + b_true(2).*x.^2 + b_true(3).*y + b_true(4).*y.^2 + b_true(5).*x.*y + b_true(6);
noise = P*randn(10,10);
img = img + noise;
% Begin least squares optimization
% create matrices
X = [x(:) x(:).^2 y(:) y(:).^2 x(:).*y(:) ones(size(x(:)))];
y = img(:);
% estimated coefficients
b = (X.'*X)\(X.')*y
% mean square error (expected to be near P^2)
E = 1/numel(y) * sum((y - X*b).^2)
Output
b =
0.0906
0.5093
0.1245
-0.3733
0.3776
124.5412
E =
3.4699
In your application you would probably want to define some threshold such that when E < threshold you accept the image (or image region) as a quadratic polynomial.
I'd like to compute a sort of direction field on a 2D image, as (poorly) illustrated from this photoshop mockup. NOTE: This is NOT a vector field as you learn about in differential equations. Instead, this is something that draws along the lines that one would see if they computed level sets of the image.
Are there known methods of obtaining this type of direction field (red lines) of an image? It seems like it almost behaves like the normal to the gradient, but this isn't exactly it, either, since there are places where the gradient is zero and I'd like direction fields at these locations as well.
I was able to find a paper on how to do this for fingerprint processing that went into enough detail that their results were repeatable. It's unfortunately behind a paywall, but here it is for anyone interested and able to access the full text:
Systematic methods for the computation of the directional fields and singular points of fingerprints
EDIT: As requested, here is a quick and dirty summary (in Python) of how this is achieved in the above paper.
A naive approach would be to average the gradient in a small square neighborhood around the target pixel, much like the superimposed grid on the image in the question, and then compute the normal. However, if you simply average the gradient, it's possible that opposite gradients in the region will cancel each other (e.g. when computing the orientation along a ridge). Thus, it is common to compute with squared gradients, since gradients pointing in opposite directions would then be aligned. There is a clever formula for the squared gradient based on the original gradient. I won't give the derivation, but here is the formula:
Now, take the sum of squared gradients over the region (modulo some piece-wise defined compensations for the way angles work). Finally, through some arctangent magic, you'll get the orientation field.
If you run the following code on a smooth grayscale bitmap image with the grid-size chosen appropriately and then plot the orientation field O alongside your original image, you'll see how the orientation field more or less gives the angles I asked about in my original question.
from scipy import misc
import numpy as np
import math
# Import the grayscale image
bmp = misc.imread('path/filename.bmp')
# Compute the gradient - VERY important to convert to floats!
grad = np.gradient(bmp.astype(float))
# Set the block size (superimposed grid on the sample image in the question)
blockRadius=5
# Compute the orientation field. Result will be a matrix of angles in [0, \pi), one for each pixel in the original (grayscale) image.
O = np.zeros(bmp.shape)
for x in range(0,bmp.shape[0]):
for y in range(0,bmp.shape[1]):
numerator = 0.
denominator = 0.
for i in range(max(0,x-blockRadius),min(bmp.shape[0],x+blockRadius)):
for j in range(max(0,y-blockRadius),min(bmp.shape[0],y+blockRadius)):
numerator = numerator + 2.*grad[0][i,j]*grad[1][i,j]
denominator = denominator + (math.pow(grad[0][i,j],2.) - math.pow(grad[1][i,j],2.))
if denominator==0:
O[x,y] = 0
elif denominator > 0:
O[x,y] = (1./2.)*math.atan(numerator/denominator)
elif numerator >= 0:
O[x,y] = (1./2.)*(math.atan(numerator/denominator)+math.pi)
elif numerator < 0:
O[x,y] = (1./2.)*(math.atan(numerator/denominator)-math.pi)
for x in range(0,bmp.shape[0]):
for y in range(0,bmp.shape[1]):
if O[x,y] <= 0:
O[x,y] = O[x,y] + math.pi
else:
O[x,y] = O[x,y]
Cheers!
I am trying to blur a scanned text document to the point that the text lines are blurred to black.. I mean the text blends into each other and all I see are black lines.
I'm new to MATLAB and even though I know the basics I cannot get the image to blur properly. I have read this: Gaussian Blurr and according to that the blur is managed/decided by the sigma function. But that is not how it works in the code I wrote.
While trying to learn Gaussian blurring in Matlab I came to find out that its achieved by using this function: fspecial('gaussian',hsize,sigma);
So apparently there are two variables hsize specifies number of rows or columns in the function while sigma is the standard deviation.
Can some one please explain the significance of hsize here and why it has a much deeper effect on the result even more than sigma?
Why is it that even if I increase sigma to a very high value the blurr is not effected but the image is distorted a lot by increasing the hsize
here is my code:
img = imread('c:\new.jpg');
h = fspecial('gaussian',hsize,sigma);
out = imfilter(img,h);
imshow(out);
and the results are attached:
Why is it not only controlled by sigma? What role does hsize play? Why cant I get it to blur the text only rather than distort the entire image?
Thank you
hsize refers to the size of the filter. Specifically, a filter that is Nx
x Ny pixels uses a pixel region Nx x Ny in size centered around each
pixel when computing the response of the filter. The response is just how
the pixels in that region are combined together. In the case of a
gaussian filter, the intensity at each pixel around the central one is
weighted according to a gaussian function prior to performing a box average over the region.
sigma refers to the standard deviation of the gaussian (see documentation
for fspecial) with units in pixels. As you increase sigma (keeping the
size of the filter the same) eventually you approach a simple box average with uniform weighting
over the filter area around the central pixel, so you stop seeing an effect from increasing sigma.
The similarity between results obtained with gaussian blur (with large value of sigma) and a box
average are shown in the left and middle images below. The right image shows
the results of eroding the image, which is probably what you want.
The code:
% gaussian filter:
hsize = 5;
sigma = 10;
h = fspecial('gaussian',hsize,sigma);
out = imfilter(img,h);
% box filter:
h = fspecial('average',hsize);
out = imfilter(img,h);
% erode:
se=strel('ball',4,4);
out = imerode(img,se);
Fspecial's Manual
h = fspecial('gaussian', hsize, sigma) returns a rotationally
symmetric Gaussian lowpass filter of size hsize with standard
deviation sigma (positive). hsize can be a vector specifying the
number of rows and columns in h, or it can be a scalar, in which case
h is a square matrix. The default value for hsize is [3 3]; the
default value for sigma is 0.5. Not recommended. Use imgaussfilt or
imgaussfilt3 instead.
where they say that fspecial - gaussian is not recommended.
In deciding the standard deviation (sigma), you need still decide hsize which affects the blurring.
In imgaussfilt, you decide the standard deviation and the system considers you the rest.
I can get much more better tolerance levels with imgaussfilt and imgaussfilt3 in my systems in Matlab 2016a, example output here in the body
im = im2double( imgGray );
sigma = 5;
simulatedPsfImage = imgaussfilt(im, sigma);
simulatedPsfImage = im2double( simulatedPsfImage );
[ measuredResolution, standardError, bestFitData ] = ...
EstimateResolutionFromPsfImage( simulatedPsfImage, [1.00 1.00] );
Note that the tolerance levels of fspecial are high [0.70 1.30] by default.