How to multiply two pixels together - image-processing

I'm reading a paper that involves finding the mean squared error of blocks of pixels. It uses the formula below. I is one image, I' is another image, and x and y are the pixel coordinates in each image.
What is confusing me is exactly how to do this math. Right now I have my images in RGB values. But how do I do this image math properly?
What is the correct way to square my resulting difference image? Is it by squaring the individual RGB channels alone, or should I be converting this to an int representation first?
Ideally I want to be able to compare several MSE's of different images, so keeping all of this data in individual channels doesn't seem to make sense. Is my intuition correct that I should just covert everything to an int representation, then square and divide by N^2 and find the smallest resulting value?
Formula

From this answer to a related question.
It really depends on what you want to detect. For example do you just want a single metric about how different are images that are substantially the same? Do you want to compare discolorations for two images that are not substantially the same, spatially?
So you could use any of a variety of approaches to determine what a value of I actually is. For example, it could be the R value, or G, or B, or something like the sum R+G+B.
I would try a bunch of these and see how your results are turning out, in addition to doing more research on color image differentiation.

Related

Interpolation vs Average

I'm new of computer vision concepts and I'd like to know why, when we double the size of an image, we should use bilinear interpolation where pixels haven't values instead of average between nearest known values pixels.
I'm not sure I agree with the premise that you "should use bilinear interpolation". You shouldn't blindly use anything without thinking about it. For example, if your pixels represent the result of a classification and 1 represents wheat, and 2 represents water, and 3 represents barley, you certainly shouldn't take the average and assume that when you enlarge an image of wheat and barley that some ocean suddenly appears in the middle between the fields.
Bilinear interpolation is actually just averaging, except a) it is in 2 dimensions because images are inherently 2-dimensional and b) if you know you are nearer to one point than another, surely it isn't unreasonable to weight your "guesstimated" value (which, after all, you don't actually know) more towards the geometrically closer value?
I guess my answer is really that there are several types of interpolation, and you should apply some thinking to deciding which one is best for your particular circumstances. Sometimes you don't want to introduce new colours because of classification or palette issues, and in these circumstances you need "nearest neighbour". Sometimes "bilinear" is what you need, sometimes "bicubic".

Impact of converting image to grayscale

I am seeing many Machine learning(CNN) tutorial which converts the read image in grayscale. I want to know how the model will understand original color/use color as one identification criteria if the colors are converted throughout the model creation ?
In consideration with colours, there can be 2 cases in an image processing problem:
Colours are not relevant in object-identification
In this case, converting a coloured image to a grayscale image will not matter, because eventually the model will be learning from the geometry present in the image. The image-binarization will help in sharpening the image by identifying the light and dark areas.
Colours are relevant in object-identification
As you might know that all the colours can be represented as some combination of three primary RGB colours. Each of these R, G and B values usually vary from 0 to 255 for each pixel. However, in gray-scaling, a certain pixel value will be one-dimensional instead of three-dimensional, and it will just vary from 0 to 255. So, yes, there will be some information loss in terms of actual colours, but, that is in tradeoff with the image-sharpness.
So, there can be a combined score of R, G, B values at each point (probably their mean (R+G+B)/3), which can give a number between 0 to 255, which can eventually be used as their representative. So that, instead of specific colour information, the pixel just carries the intensity information.
Reference:
https://en.wikipedia.org/wiki/Grayscale
I would like to add to Shashank's answer.
A model when fed with an image, does not perceive it as we do. Humans perceive images with the variations in colors, stauration of the colors and the brightness of it. We are able to recognize objects and other shapes as well.
However, a model sees an image as a matrix with a bunch of numbers in it (if it is a greyscale image). In case of a color image, it sees it as three matrices stacked above one another filled with numbers(0 -255) in it.
So how does it learn color? Well it doesn't. What it does learn is the variation in the numbers within this matrix (in case of greyscale image). These variations are crucial to determine changes in the image. If the CNN is trained in this respect, it will be able to detect a structure in the image and can also be used for bject detection.

Implementation of image dilation and erosion

I am trying to figure out an efficient way of implementing image dilation and erosion for binary images. As far as I understand it, the naive way would be:
loop through the image
if pixel is 1
loop through the neighborhood based on the structuring element's
height and width
(dilate) substitute each pixel of the image with the value in the
corresponding location of the SE
(erode) check if all neighborhood is equal to the SE, if so keep all
the pixels, else delete the centre
so this means that for each pixel I have to loop through the SE as well making this a O(NMW*H).
Is there a more elegant way of doing this?
Yes there are!!!
First you want to decompose (if possible) your structuring element into segments (a square being composed by a vertical and an horizontal segment). And then you perform only erosion/dilation on segments, which already decreases the complexity.
Now for the erosion/dilation parts, you have different solutions:
If you work only on 8-bits images and do not C/C++, you use an implementation with histograms in order to keep track of the minimum/maximum value. See this remarkable work here. He even adds "landmarks" in order to reduce the number of operations.
If you use C/C++ and work on different types of image encodings, then you can use fast comparisons (SSE2, SSE4 and auto-vectorization), as it is the case in the SMIL library. In this case, you compare row with row, instead of working pixel by pixel, using material accelerations. It seems to be the fastest library ever.
A last way to do, slower but works for all types of encoding, is to use the Lemmonier algorithm. It is implemented by the fulguro library.
For structuring elements of type disk, there is nothing "fast", you have to use the basic algorithm. For hexagonal structuring elements, you can work row by row, but it cannot be parallelized.

GPUImage Taking sum of columns of image

Im using GPUImage in my project and I need an efficient way of taking the column sums. Naive way would obviously be retrieving the raw data and adding values of every column. Can anybody suggest a faster way for that?
One way to do this would be to use the approach I take with the GPUImageAverageColor class (as described in this answer), only instead of reducing the total size of each frame at each step, only do this for one dimension of the image.
The average color filter determines the average color of the overall image by stepping down in a factor of four in both X and Y, averaging 16 pixels into one at each step. If operating in a single direction, you should be able to use hardware interpolation to get an 18X reduction in a single direction per step with good performance. Your final step might either require a quick CPU-based iteration on the much smaller image or a tweaked version of this shader that pulls the last few pixels in a column together into the final result pixel for that column.
You notice that I've been talking about averaging here, because the output values for any OpenGL ES operation will need to be in terms of colors, which only have a 0-255 range per channel. A sum will easily overflow this, but you could use an average as an approximation of your sum, with a more limited dynamic range.
If you only care about one color channel, you could possibly encode a larger value into the RGBA channels and maintain a 32-bit sum that way.
Beyond what I describe above, you could look at performing this sum with the help of the Accelerate framework. While probably not quite as fast as doing a shader-based reduction, it might be good enough for your needs.

Image Comparison

What is the efficient way to compare two images in visual c..?
Also in which format images has to be stored.(bmp, gif , jpeg.....)?
Please provide some suggestions
If the images you are trying to compare have distinctive characteristics that you are trying to differentiate then PCA is an excellent way to go. The question of what format of the file you need is irrelevant really; you need to load it into the program as an array of numbers and do analysis.
Your question opens a can of worms in terms of complexity.
If you want to compare two images to check if they are the same, then you need to perform an md5 on the file (removing possible metainfos which could distort your result).
If you want to compare if they look the same, then it's a completely different story altogether. "Look the same" is intended in a very loose meaning (e.g. they are exactly the same image but stored with two different file formats). For this, you need advanced algorithms, which will give you a probability for two images to be the same. Not being an expert in the field, I would perform the following "invented out of my head" algorithm:
take an arbitrary set of pixel points from the image.
for each pixel "grow" a polygon out of the surrounding pixels which are near in color (according to HSV colorspace)
do the same for the other image
for each polygon of one image, check the geometrical similitude with all the other polygons in the other image, and pick the highest value. Divide this value by the area of the polygon (to normalize).
create a vector out of the highest values obtained
the higher is the norm of this vector, the higher is the chance that the two images are the same.
This algorithm should be insensitive to color drift and image rotation. Maybe also scaling (you normalize against the area). But I restate: not an expert, there's probably much better, and it could make kittens cry.
I did something similar to detect movement from a MJPEG stream and record images only when movement occurs.
For each decoded image, I compared to the previous using the following method.
Resize the image to effectively thumbnail size (I resized fairly hi-res images down by a factor of ten
Compare the brightness of each pixel to the previous image and flag if it is much lighter or darker (threshold value 1)
Once you've done that for each pixel, you can use the count of different pixels to determine whether the image is the same or different (threshold value 2)
Then it was just a matter of tuning the two threshold values.
I did the comparisons using System.Drawing.Bitmap, but as my source images were jpg, there were some artifacting.
It's a nice simple way to compare images for differences if you're going to roll it yourself.
If you want to determine if 2 images are the same perceptually, I believe the best way to do it is using an Image Hashing algorithm. You'd compute the hash of both images and you'd be able to use the hashes to get a confidence rating of how much they match.
One that I've had some success with is pHash, though I don't know how easy it would be to use with Visual C. Searching for "Geometric Hashing" or "Image Hashing" might be helpful.
Testing for strict identity is simple: Just compare every pixel in source image A to the corresponding pixel value in image B. If all pixels are identical, the images are identical.
But I guess don't want this kind of strict identity. You probably want images to be "identical" even if certain transformations have been applied to image B. Examples for these transformations might be:
changing image brightness globally (for every pixel)
changing image brightness locally (for every pixel in a certain area)
changing image saturation golbally or locally
gamma correction
applying some kind of filter to the image (e.g. blurring, sharpening)
changing the size of the image
rotation
e.g. printing an image and scanning it again would probably include all of the above.
In a nutshell, you have to decide which transformations you want to treat as "identical" and then find image measures that are invariant to those transformations. (Alternatively, you could try to revert the translations, but that's not possible if the transformation removes information from the image, like e.g. blurring or clipping the image)

Resources