I am performing an SVD image compression- SVD allows you to approximate the actual image matrix by a lower rank matrix of rank v, thus achieving compression(link).
Here's the pseudo-code:
load image_mat % load image as a matrix
[U, S, V] = SVD(image_mat) % U and V are square, S is diagonal (mxn)
set S(v+1:end,:) = 0; set S(:,v+1:end) = 0; % set all values after rank v as zero
new_image = U*S*V';
The problem I am facing is this: Once I perform the lower rank approximation, the old and the new matrix are of the same size (m x n). Both images contain the same number of pixels (since U and V do not change.) Thus, the file size does not (read: CANNOT!) change. However, I see the image quality changing drastically for different values of v.
What am I missing?
EDIT: Further explanation:
Below is the result of the SVD decompression, by rank reduction:
My question is, if the number of pixels in both the pictures remains the same, how would I get a file size reduction (decompression) ? Except the fact that the matrix of singular values (S) is changing in size, everything else pretty much remains the same (despite the obvious drop in image quality), i.e the new matrix constructed after decompression has the same size 512 x 512 as the original image.
You are not missing anything. the original image has mn data while the compressed one has k+km+k*n data where k is the rank.
Related
I am evaluating template matching algorithm to differentiate similar and dissimilar objects. What I found is confusing, I had an impression of template matching is a method which compares raw pixel intensity values. Hence when the pixel value varies I expected Template Matching to give a less match percentage.
I have a template and search image having same shape and size differing only in color(Images attached). When I did template matching surprisingly I am getting match percentage greater than 90%.
img = cv2.imread('./images/searchtest.png', cv2.IMREAD_COLOR)
template = cv2.imread('./images/template.png', cv2.IMREAD_COLOR)
res = cv2.matchTemplate(img, template, cv2.TM_CCORR_NORMED)
min_val, max_val, min_loc, max_loc = cv2.minMaxLoc(res)
print(max_val)
Template Image :
Search Image :
Can someone give me an insight why it is happening so? I have even tried this in HSV color space, Full BGR image, Full HSV image, Individual channels of B,G,R and Individual channels of H,S,V. In all the cases I am getting a good percentage.
Any help could be really appreciated.
res = cv2.matchTemplate(img, template, cv2.TM_CCORR_NORMED)
There are various argument, which you can use to find templates e.g. cv2.TM_CCOEFF, cv2.TM_CCOEFF_NORMED, cv2.TM_CCORR, cv2.TM_CCORR_NORMED, cv2.TM_SQDIFF, cv2.TM_SQDIFF_NORMED
You can look into their equation here:
https://docs.opencv.org/2.4/modules/imgproc/doc/object_detection.html
From what I think if you want to use your template matching so that it doesn't match shape of different colours, then you should use CV_TM_SQDIFF or maybe cv2.TM_CCOEFF_NORMED.
Correlation term gives matching for maximum value and Squared difference terms gives matching for minimum values. So in case you have exact shape and size, not same color though, you will get high value of correlation (see the equation in above link).
Concept:
Suppose X=(X_1,X_2,....X_n), Y=(Y_1,Y_2,...,Y_n) satisfy Y_i=a * X_i for all i and some positive constant a, then
(sum of all X_i * Y_i)=a * (Sum of (X_i)^2)=SquareRoot(Sum of (X_i)^2)*SquareRoot(Sum of (a * X_i)^2).
therefore (sum of all X_i * Y_i)/(SquareRoot(Sum of (X_i)^2)*SquareRoot(Sum of (Y_i)^2))=1.
In your case, X represent your template image, almost only two color, background is black which is 0, the foreground color is constant c. Y represent ROI of your image, which is also almost only two color, background is 0, foreground color is another constant d. So we have a=d/c to satisfy above mentioned concept. So if we use cv2.TM_CCORR_NORMED, we get result near 1 is what we expected.
As for cv2.TM_CCOEFF_NORMED, if Y_i=a * X_i+b for all i and some constant b and some positive constant a, then correlation coefficient between X and Y is 1(Basic statistics). So if we use cv2.TM_CCOEFF_NORMED, we get result near 1 is what we expected.
What is the correct mean of normalization in image processing? I googled it but i had different definition. I'll try to explain in detail each definition.
Normalization of a kernel matrix
If normalization is referred to a matrix (such as a kernel matrix for convolution filter), usually each value of the matrix is divided by the sum of the values of the matrix in order to have the sum of the values of the matrix equal to one (if all values are greater than zero). This is useful because a convolution between an image matrix and our kernel matrix give an output image with values between 0 and the max value of the original image. But if we use a sobel matrix (that have some negative values) this is not true anymore and we have to stretch the output image in order to have all values between 0 and max value.
Normalization of an image
I basically find two definition of normalization. The first one is to "cut" values too high or too low. i.e. if the image matrix has negative values one set them to zero and if the image matrix has values higher than max value one set them to max values. The second one is to linear stretch all the values in order to fit them into the interval [0, max value].
I will extend a bit the answer from #metsburg. There are several ways of normalizing an image (in general, a data vector), which are used at convenience for different cases:
Data normalization or data (re-)scaling: the data is projected in to a predefined range (i.e. usually [0, 1] or [-1, 1]). This is useful when you have data from different formats (or datasets) and you want to normalize all of them so you can apply the same algorithms over them. Is usually performed as follows:
Inew = (I - I.min) * (newmax - newmin)/(I.max - I.min) + newmin
Data standarization is another way of normalizing the data (used a lot in machine learning), where the mean is substracted to the image and dividied by its standard deviation. It is specially useful if you are going to use the image as an input for some machine learning algorithm, as many of them perform better as they assume features to have a gaussian form with mean=0,std=1. It can be performed easyly as:
Inew = (I - I.mean) / I.std
Data stretching or (histogram stretching when you work with images), is refereed as your option 2. Usually the image is clamped to a minimum and maximum values, setting:
Inew = I
Inew[I < a] = a
Inew[I > b] = b
Here, image values that are lower than a are set to a, and the same happens inversely with b. Usually, values of a and b are calculated as percentage thresholds. a= the threshold that separates bottom 1% of the data and b=the thredhold that separates top 1% of the data. By doing this, you are removing outliers (noise) from the image.
This is similar (simpler) to histogram equalization, which is another used preprocessing step.
Data normalization, can also be refereed to a normalization of a vector respect to a norm (l1 norm or l2/euclidean norm). This, in practice, is translated as to:
Inew = I / ||I||
where ||I|| refeers to a norm of I.
If the norm is choosen to be the l1 norm, the image will be divided by the sum of its absolute values, making the sum of the whole image be equal to 1. If the norm is choosen to be l2 (or euclidean), then image is divided by the sum of the square values of I, making the sum of square values of I be equal to 1.
The first 3 are widely used with images (not the 3 of them, as scaling and standarization are incompatible, but 1 of them or scaling + streching or standarization + stretching), the last one is not that useful. It is usually applied as a preprocess for some statistical tools, but not if you plan to work with a single image.
Answer by #Imanol is great, i just want to add some examples:
Normalize the input either pixel wise or dataset wise. Three normalization schemes are often seen:
Normalizing the pixel values between 0 and 1:
img /= 255.0
Normalizing the pixel values between -1 and 1 (as Tensorflow does):
img /= 127.5
img -= 1.0
Normalizing according to the dataset mean & standard deviation (as Torch does):
img /= 255.0
mean = [0.485, 0.456, 0.406] # Here it's ImageNet statistics
std = [0.229, 0.224, 0.225]
for i in range(3): # Considering an ordering NCHW (batch, channel, height, width)
img[i, :, :] -= mean[i]
img[i, :, :] /= std[i]
In data science, there are two broadly used normalization types:
1) Where we try to shift the data so that there sum is a particular value, usually 1 (https://stats.stackexchange.com/questions/62353/what-does-it-mean-to-use-a-normalizing-factor-to-sum-to-unity)
2) Normalize data to fit it within a certain range (usually, 0 to 1): https://stats.stackexchange.com/questions/70801/how-to-normalize-data-to-0-1-range
As of speaking about this 1D discrete denoising via variational calculus I would like to know how to manipulate the length of smoothing term as long as it should be N-1, while the length of data term is N. Here the equation:
E=0;
for i=1:n
E+=(u(i)-f(i))^2 + lambda*(u[i+1]-n[i])
E is the cost of actual u in optimization process
f is given image (noised)
u is output image (denoised)
n is the length of 1D vector.
lambda>=0 is weight of smoothness in optimization process (described around 13 minute in video)
here the length of second term and first term mismatch. How to resolve this?
More importantly, I would like to use linear equation system to solve this problem.
This is nowhere near my cup of tea but I think you are referring to the fact that:
u[i+1]-n[i] is accessing the next pixel making the term work only on resolution 1 pixel smaller then original f image
In graphics and filtering is this usually resolved in 2 ways:
use default value for pixels outside image resolution
you can set default or neutral(for the process) color to those pixels (like black)
use color of the closest neighbor inside image resolution
interpolate the mising pixels (bilinear,bicubic...)
I think the first choice is not suitable for your denoising technique.
change the resolution of output image
Usually after some filtering techniques (via FIR,etc) the result is 1 pixel smaller then the input to resolve the missing data problem. In your case it looks like your resulting u image should be 1 pixel bigger then input image f while computing cost functions.
So either enlarge it via bullet #1 and when the optimization is done you can crop back to original size.
Or virtually crop the f one pixel down (just say n'=n-1) before computing cost function so you avoid access violations (and also you can restore back after the optimization...)
I have a question about the workings of the SIFT algorithm. So, say I have a scale space representation of the individual images across many octaves by convolving the image with Gaussian filters of various sizes. Futhermore, I have computed the various difference of Gaussian (DoG) images for each of these octaves.
Let us assume I have 7 DoG images for a given octave. My question is regarding the maxima finding in these DoG images. According to the literature, one compares against 8 local neighbours and 9 neighbours for each of the neighbouring DoG images.
So, now say I am processing these 7 DoG images and I will start from index 1 and go all the way to index 5. So, something like:
for (int i = 1; i <= 5; ++i)
{
for (int y = 1; y < image_height-1; ++y)
{
for (int x = 1; x < image_width-1; ++x)
{
current_pixel = image[x, y, i];
// Compare with the neighbours
// check if it is a maxima at loc (x, y, i)
}
}
}
So, here I am iterating through the image and will check if it is a maxima at this location. My question is now I will end up with the maxima locations at each of these scales (from 1 to 5 in my case). So, for a given (x, y) location there can be multiple maximas (for example at scale 1, 3 and 5). So, is that a problem or there can be multiple keypoints associated for the same spatial location (x, y)? Can someone explain to me how the algorithm proceeds to refine these keypoints?
You will want to find the extrema across scale as well.
Scale-space extrema detection means finding the extremum for every pixel across "scale" and across "space." Space is the xy-plane in the image. Space is the index into the pyramid.
Wht do you want to do this?
The idea of scale-space extrema detection is to find the scale at which a feature has the highest response. For example, if you have a small blob in the image. Its extremum will be at a fine scale. At a coarse scale, this small blob will be washed out.
For a large blob, computing the score at a fine scale does not produce an extremum. But, if the scale is coarse enough the blob will stand out. That is, for coarser levels of the pyramid smaller structures around that small blob will be washed out, and the large blob will stand out.
When applying a Gaussian blur to an image, typically the sigma is a parameter (examples include Matlab and ImageJ).
How does one know what sigma should be? Is there a mathematical way to figure out an optimal sigma? In my case, i have some objects in images that are bright compared to the background, and I need to find them computationally. I am going to apply a Gaussian filter to make the center of these objects even brighter, which hopefully facilitates finding them. How can I determine the optimal sigma for this?
There's no formula to determine it for you; the optimal sigma will depend on image factors - primarily the resolution of the image and the size of your objects in it (in pixels).
Also, note that Gaussian filters aren't actually meant to brighten anything; you might want to look into contrast maximization techniques - sounds like something as simple as histogram stretching could work well for you.
edit: More explanation - sigma basically controls how "fat" your kernel function is going to be; higher sigma values blur over a wider radius. Since you're working with images, bigger sigma also forces you to use a larger kernel matrix to capture enough of the function's energy. For your specific case, you want your kernel to be big enough to cover most of the object (so that it's blurred enough), but not so large that it starts overlapping multiple neighboring objects at a time - so actually, object separation is also a factor along with size.
Since you mentioned MATLAB - you can take a look at various gaussian kernels with different parameters using the fspecial('gaussian', hsize, sigma) function, where hsize is the size of the kernel and sigma is, well, sigma. Try varying the parameters to see how it changes.
I use this convention as a rule of thumb. If k is the size of kernel than sigma=(k-1)/6 . This is because the length for 99 percentile of gaussian pdf is 6sigma.
You have to find a min/max of a function G such that G(X,sigma) where X is a set of your observations (in your case, your image grayscale values) , This function can be anything that maintain the "order" of the intensities of the iamge, for example, this can be done with the 1st derivative of the image (as G),
fil = fspecial('sobel');
im = imfilter(I,fil);
imagesc(im);
colormap = gray;
this gives you the result of first derivative of an image, now you want to find max sigma by
maximzing G(X,sigma), that means that you are trying a few sigmas (let say, in increasing order) until you reach a sigma that makes G maximal. This can also be done with second derivative.
Given the central value of the kernel equals 1 the dimension that guarantees to have the outermost value less than a limit (e.g 1/100) is as follows:
double limit = 1.0 / 100.0;
size = static_cast<int>(2 * std::ceil(sqrt(-2.0 * sigma * sigma * log(limit))));
if (size % 2 == 0)
{
size++;
}