Algorithm used in compare command of Imagemagick - imagemagick

I used Imagemagick in my project. I implemented a sub-image detection system using the compare command of ImageMagick. It is working well giving fine results. By reading articles i got to know that ImageMagick compares pixels of small image at every possible position within the pixels of larger image.And also i got to know ImageMagick detects rotated images and scaled images using Fuzzy factor.Though i have an rough idea about how the algorithm behave i couldn't find any article related to the algorithms of ImageMagick. Any idea about how this algorithm of compare command actually works?

AT my job I'm currently working on a tool which does some sophisticated optimizations on PDF pages when they're too heavy (too much vectors etc), but sometimes whites out rectangular parts of the image due to the drawing order of PDF objects.
I decided to use the compare tool of ImageMagick v6 (retrocompatibility issues forbid v7 for the moment) to check the page rendering after treatment against original rendering and detect when there have been accidents.
I have tested the available -metrics parameters on pages where the rendering was nearly identical to the original, while there was one page where white out parts were happening, making renderings very different.
I used -fuzz 10% to accept a minor color variation, and used 99 for JPG quality in my tool so JPEG compression doesn't generate too much differences. Beware on this one, as a low quality on jpg compression forces you to augment the fuzz factor, with the risk of missing major visual differences. Unfortunately, you won't find the information in the JPG headers.
I have made my images mid-res (150 dpi), because lo-res proved similar to low JPG quality and generated too much differences. The renderings are same resolution (549*819 = 449 631 pixels), the compare tool is not really strong at finding part of an image in another image. (You'd better off with OpenCV for this.)
Here is a table with some significant results on three different pages in my tool, and then my interpretation of each metric.
The AE stands for absolute error count. This metric roughly gives the number of pixels considered as different, within the 10% fuzz acceptation that I have used. On nearly identical renderings, this value is typically very low, only 18 or 1400 pixels on a total of 450K, while the very different images show nearly 40K different pixels. I think this metric can be checked against a low percentage of total pixels, but in my case this is not distinctive enough. Say I have 1% = 4500 pixels, it could be not neglectable if they happen in only one rectangle. It could be usefull with a geographical dispersion factor but this is more an OpenCV job.
The MEPP is Mean Error Per Pixel. As with all mean statistics it's not obvious to interpret, but the results are quite distinctive. Jumping from about 6K for same images to a huge 1.7M on different images. The problem is to decide a limit value. You can see in my table I had a page with 470K on this metric but it was a visually acceptable rendering artifact. What is an acceptable value, then? As often with mean values, significancy can prove to be very arbitrary and not always appropriate. The only way to find a reasonnable limit is to make a lot of measures on significant cases, maybe using machine learing.
MSE metric is mean error squared, average of the channel error squared. Squared delta values are often more significant in statistics, because they lower the minor differences and accentuate the major ones. (Linear regression correlation factor benefits from this mathematical behaviour.) This metric proves very interesting in my case because values are very consistent for similar cases : even on the page with a visual artifact and 450K different pixels, the value stays well below 1, while the page with whited out parts jumps to 9. This MSE metric is clearly very useable in my case.
NCC means normalized cross correlation. Normalized values are not appropriate in my case, as they make colors closer and diminish the differences between different cases. You can see this on values, although identical rendering are remarquably close to 1, different renderings have a value of 0.86 which is not so far from 1.
PAE gives you the absolute peak difference on color channels on all pixels, so it doesn't tell how many pixels are different. And it doesn't work well for RGB or CMYK as it doesn't tell you which channels are concerned. It's no use in my case.
PHASH is perceptual hash for the sRGB and HCLp colorspaces. I'm not really sure about this one but the result are very interesting as my images are RGB, and mostly because the values on similar rendering are well below 1 even for the page with acceptable visual artifact - while the different renderings give a value of 26. This is very appropriate for my case, just as MSE. I consider this metric as it seems to even more accentuate the differences between minor and major gaps in color values.
PSNR is peak signal to noise ratio : as for PAE, this is a peak value which won't tell how many pixels are concerned so it is absolutely unseable in my case.
RMSE is for root mean squared. Whatever that means, you find the same squared means behaviour as with MSE, so this value can be interesting as it reveals major differences more than minor ones. Here my values are 0.7 and 3.2 for identical images and a hefty 48.7 for different ones. The difficulty is to decide for an acceptable limit: as with AE or MEPP, the only way is to conduct a lot of tests and measure values for a lot of cases and decide for an appropriate value. Machine learning could help for this.
As a conclusion, I decided to use the PHASH metric. But the very conclusion of this study is that you must conduct tests and measure them before deciding the metric to use, as comparison contextes can be very different and metrics can show very different behaviours.
OpenCV is way more appropriate if you have to compare images from different sources, or parts of a global image, or images with big lighting variations. Also, it's quite easy to use from Python and C++. ImageMagick is quite good in my case because I am the one who generates both images.

The fuzz factor in ImageMagick allows two pixels to be compared and considered as the same although their colours may differ slightly.
The trick to understanding it, is to consider an RGB colour cube with Red, Green, Blue, Cyan, Magenta, Yellow and Black and White as the vertices. A fuzz factor of 100% represents the greatest possible distance in that cube, i.e. the length of the diagonal from Black to White and everything is scaled relative to that. It is shown dotted in this diagram.
In general, I would recommend using a percentage value rather than an absolute value, because an absolute fuzz factor of 255 means all colours are the same (black=white) on an 8-bit image, whereas on a 16-bit image, it would be hard to even perceive two colours that differ by 255.
As an example, let's see if a single black pixel is the same as a single mid-grey pixel with 49% fuzz:
compare -metric ae -fuzz 49% xc:black xc:gray null:
1
No, it is different, there is one pixel difference. Now let's try again allowing the pixels to be 51% different yet still match:
compare -metric ae -fuzz 51% xc:black xc:gray null:
0
Now they are considered the same.

Related

Understanding NetPBM's PNM nonlinear RGB color space for converting to grayscale

I am trying to understand how to properly work with the RGB values found in PNM formats in order to inevitably convert them to Grayscale.
Researching the subject, it appears that if the RGB values are nonlinear, then I would need to first convert them to a linear RGB color space, apply my weights, and then convert them back to the same nonlinear color space.
There appears to be an expected format http://netpbm.sourceforge.net/doc/ppm.html:
In the raster, the sample values are "nonlinear." They are proportional to the intensity of the ITU-R Recommendation BT.709 red, green, and blue in the pixel, adjusted by the BT.709 gamma transfer function.
So I take it these values are nonlinear, but not sRGB. I found some thread topics around ImageMagick that say they might save them as linear RGB values.
Am I correct that PNM specifies a standard, but various editors like Photoshop or GIMP may or may not follow it?
From http://netpbm.sourceforge.net/doc/pamrecolor.html
When you use this option, the input and output images are not true Netpbm images, because the Netpbm image format specifies a particular color space. Instead, you are using a variation on the format in which the sample values in the raster have different meaning. Many programs that ostensibly use Netpbm images actually use a variation with a different color space. For example, GIMP uses sRGB internally and if you have GIMP generate a Netpbm image file, it really generates a variation of the format that uses sRGB.
Else where I see this http://netpbm.sourceforge.net/doc/pgm.html:
Each gray value is a number proportional to the intensity of the
pixel, adjusted by the ITU-R Recommendation BT.709 gamma transfer
function. (That transfer function specifies a gamma number of 2.2 and
has a linear section for small intensities). A value of zero is
therefore black. A value of Maxval represents CIE D65 white and the
most intense value in the image and any other image to which the image
might be compared.
BT.709's range of channel values (16-240) is irrelevant to PGM.
Note that a common variation from the PGM format is to have the gray
value be "linear," i.e. as specified above except without the gamma
adjustment. pnmgamma takes such a PGM variant as input and produces a
true PGM as output.
Most sources out there assume they are dealing with linear RGB and just apply their weights and save, possibly not preserving the luminance. I assume that any complaint renderer will assume that these RGB values are gamma compressed... thus technically displaying different grayscale "colors" than what I had specified. Is this correct? Maybe to ask it differently, does it matter? I know it is a loaded question, but if I can't really tell if it is linear or nonlinear, or how it has been compressed or expected to be compressed, will the image processing algorithms (binarization) be greatly effected if I just assume linear RGB values?
There may have been some confusion with my question, so I would like to answer it now that I have researched the situation much further.
To make a long story short... it appears like no one really bothers to re-encode an image's gamma when saving to PNM format. Because of that, since almost everything is sRGB, it will stay sRGB as opposed to the technically correct BT.709, as per the spec.
I reached out to Bryan Henderson of NetPBM. He held the same belief and stated that the method of gamma compression is not as import as knowing if it was applied or not and that we should always assume it is applied when working with PNM color formats.
To reaffirm the effect of that opinion in regard to image processing, please read "Color-to-Grayscale: Does the Method Matter in Image Recognition?", 2012 by Kanan and Cottrell. Basically if you calculate the Mean of the RGB values you will end up in one of three situations: Gleam, Intensity', or Intensity. After comparing the effects of different grayscale conversion formulas, taking into account when and how gamma correction was applied, he discovered that Gleam and Intensity' where the best performers. They differ only by when the gamma correction was added (Gleam has the gamma correction on the input RGB values, while Intensity' takes in linear RGB and applies gamma afterwords). Sadly you drop from 1st and 2nd place down to 8th when no gamma correction is added, aka Intensity. It's interesting to note that it was the simple Mean formula that worked the best, not one of the more popular grayscale formulas most people tout. All of that to say that if you use the Mean formula for converting PNM color to grayscale for image processing applications, you will ensure great performance since we can assume some gamma compression will have been applied. My comment about ImageMagick and linear values appears only to apply to their PGM format.
I hope that helps!
There is only one way good way to convert colour signal to greyscale: going to linear space and add light (and so colour intensities). In this manner you have effective light, and so you can calculate the brightness. Then you can "gamma" correct the value. This is the way light behave (linear space), and how the brightness was measured by CIE (by wavelength).
On television it is standard to build luma and then black and white images) from non-linear R,G,B. This is done because simplicity and the way analog colour television (NTSC and PAL) worked: black and white signal (for BW television) as main signal, and then adding colours (as subcarrier) to BW image. For this reason, the calculations are done in non linear space.
Video could use often such factors (on non-linear space), because it is much quick to calculate, and you can do it easily with integers (there are special matrix to use with integers).
For edge detection algorithms, it should not be important which method you are using: we have difficulty to detect edge with similar L or Y', so we do no care if computers have similar problem.
Note: our eyes are non linear on detecting light intensities, and with similar gamma as phosphors on our old televisions. For this reason using gamma corrected value is useful: it compress the information in a optimal way (or in "analog-TV" past: it reduce perceived noise).
So you if you want Y', do with non linear R',G',B'. But if you need real grey scale, you need to calculate real greyscale going to linear space.
You may see differences especially on mid-greys, and on purple or yellow, where two of R,G,B are nearly the same (and as maximum value between the three).
But on photography programs, there are many different algorithms to convert RGB to greyscale: we do not see the world in greyscale, so different weight (possibly non linear) could help to make out some part of image, which it is the purpose of greyscale photos (by remove distracting colours).
Note Rec.709 never specified the gamma correction to apply (the OETF on the standard is not useful, we need EOTF, and often one is not the inverse of the other, for practical reasons). Only on a successive recommendation this missing information were finally provided. But because many people speak about Rec.709, the inverse of OETF is used as gamma, which it is incorrect.
How to detect: classical yellow sun on blue sky, choosing yellow and blue with same L. If you see sun in grey image, you are transforming with non-linear space (Y' is not equal). If you do no see the sun, you transform linearly.

Effect of Downsample but not fully divide

Hey I need a 320x240 8Bit gray scale image for some Computer Vision Algorithm (Orb Feature tracking). The Raspicam driver I'm using can provide different Image Sizes. Different Image Sizes are achieved by cropping and not down sampling from the driver. As my environment is not ideal lighted the Image is quite dark and noisy. Now I had the idea to take a 640x480 Image and down sample it to 320x240 by combining always 2x2 pixels to one. Normally I would of course divide by 4 to get the correct result. But what would be the effect of dividing it by two or even one (assuming 99% of the intensity values are not bigger then 64 (256/4)). Wouldn't that simulate the effect of larger CCD cells which could gather more light in less time.
The first tests I did showed some pretty good results. Meaning I detected more Features and could follow them better between two frames.
Here, you are not taking proper average of 2x2 blocks(divide by 4). Say, you have two blocks and they have Delta-I difference in intensity. If you divide the intensity of the two blocks by a larger number, the intensity difference will reduce and vice-versa for smaller number.
When you divide the difference(Delta-I) by 2(instead of 4), you are in a way increasing the contrast(intensity difference between background and foreground. As you mentioned that your image is in poor illumination, thereby division by smaller number increases the contrast which is improving tracking. This approach will come under contrast enhancement technique and is a variation of Linear contrast enhancement.

Find "fraction of bright pixels" in image (thresholding?)

I have a large number of grayscale images that show bright "fibers" on a darker background. I am trying to quantify the "amount" of fibers. Since they overlap almost everywhere it will be impossible to count the number of fibers, so instead I want to resort to simply calculating how large the area fraction of the white fibers is compared to the full image (e.g. this one is 55% white, another one with less fibers is only 43% white, etc). In other words, I want to quantify the density of the fibers in the image.
Example pictures:
High density: https://dl.dropboxusercontent.com/u/14309718/f1.jpg
Lower density: https://dl.dropboxusercontent.com/u/14309718/f2.jpg
I figured a simple (adaptive) threshold filter would do the job nicely by just converting the image to purely black/white and then counting the fraction of white pixels. However, my answer seems to depend almost completely and only on the threshold value that I choose. I did some quick experiments by taking a large number of different thresholds and found that in all pictures the fraction of white pixels is almost exactly a linear function of the threshold value. In other words - I can get any answer I want between roughly 10% and 90% depending on the threshold I choose.
This is obviously not a good approach because my results are extremely biased with how I choose the threshold and therefore completely useless. Furthermore I have about 100 of these images and I'm not looking forward to trying to choose the "correct" threshold for all of them manually.
How can I improve this method?
As the images are complex and the outlines of the fibers are fuzzy, there is little hope of getting an "exact" measurement.
What matters then is to achieve repeatability, i.e. ensure that the same fiber density is always assigned the same measurement, even in varying lighting conditions if possible, and different densities are assigned different measurements.
This rules out human intervention in adjusting a threshold.
My best advice is to rely on Otsu thresholding, which is very good at finding meaningful background and foreground intensities and is fairly illumination-independent.
Enhancing the constrast before Otsu should be avoided because binarization commutes with contrast enhancement (so that there is no real benefit), but contrast enhancement can degrade the image by saturating at places.
Just echoing #YvesDaoust' thoughts really - and providing some concrete examples...
You can generate histograms of your images using ImageMagick which is installed on most Linux distros and is available for OSX and Windows. I am just doing this at the command-line but it is powerful and easy to run some tests and see how Yves' suggestion works for you.
# Make histograms for both images
convert 1.jpg histogram:h1.png
convert 2.jpg histogram:h2.png
Yes, they are fairly bimodal - so Otsu thresholding should find a threshold that maximises the between-class variance. Use the script otsuthresh from Fred Weinhaus' website here
./otsuthresh 1.jpg 1.gif
Thresholding Image At 44.7059%
./otsuthresh 2.jpg 2.gif
Thresholding Image At 42.7451%
Count percentage of white pixels in each image:
convert 1.gif -format "%[fx:int(mean*100)]" info:
50
convert 2.gif -format "%[fx:int(mean*100)]" info:
48
Not that brilliant a distinction! Mmmm... I tried adding in a median filter to reduce the noise, but that didn't help. Do you have your images available as PNG to avoid the nasty artefacts?

Determine if an image needs contrasting automatically in OpenCV

OpenCV has a handy cvEqualizeHist() function that works great on faded/low-contrast images.
However when an already high-contrast image is given, the result is a low-contrast one. I got the reason - the histogram being distributed evenly and stuff.
Question is - how do I get to know the difference between a low-contrast and a high-contrast image?
I'm operating on Grayscale images and setting their contrast properly so that thresholding them won't delete the text i'm supposed to extract (thats a different story).
Suggestions welcome - esp on how to find out if the majority of the pixels in the image are light gray (which means that the equalise hist is to be performed)
Please help!
EDIT: thanks everyone for many informative answers. But the standard deviation calculation was sufficient for my requirements and hence I'm taking that to be the answer to my query.
You can probably just use a simple statistical measure of the image to determine whether an image has sufficient contrast. The variance of the image would probably be a good starting point. If the variance is below a certain threshold (to be empirically determined) then you can consider it to be "low contrast".
If you're adjusting contrast just so you can threshold later on, you may be able to avoid the contrast adjustment step if you set your threshold adaptively using Ohtsu's method.
If you're still interested in finding out the image contrast, then read on.
While there are a number of different ways to calculate "contrast". Often, those metrics are applied locally as opposed to the entire image, to make the result more sensitive to image content:
Divide the image into adjacent non-overlaying neighborhoods.
Pick neighborhood sizes that are approximate to size of the features of your image (e.g. if your main feature is horizontal text, make neighborhoods tall enough to capture 2 lines of text, and just as wide).
Apply the metric to each neighborhood individually
Threshold the metric result to separate low and high variance blocks. This will prevent such things as large, blank areas of page skewing your contrast estimates.
From there, you can use a number of features to determine contrast:
The proportion of high metric blocks to low metric blocks
High metric block mean
Intensity distance between the high and low metric blocks (using means, modes, etc)
This may serve as a better indication of image contrast than global image variance alone. Here's why:
(stddev: 50.6)
(stddev: 7.9)
The two images are perfectly in contrast (the grey background is just there to make it obvious it's an image), but their standard deviations (and thus variance) are completely different.
Calculate cumulative histogram of image.
Make linear regression of cumulative histogram in the form y(x) = A*x + B.
Calculate RMSE of real_cumulative_frequency(x)-y(x).
If that RMSE is close to zero - image is already equalized. (That means that for equalized images cumulative histograms must be linear)
Idea is taken from here.
EDIT:
I've illustrated this approach in my blog (C example code included).
There is a support provided in skimage for this. skimage.exposure.is_low_contrast. reference
example :
>>> image = np.linspace(0, 0.04, 100)
>>> is_low_contrast(image)
True
>>> image[-1] = 1
>>> is_low_contrast(image)
True
>>> is_low_contrast(image, upper_percentile=100)
False

good ways to preserve image information when reducing bit depth

I have some (millions) of 16-bit losslessly compressed TIFFs (about 2MB each) and after exhausting TB of disk space I think it's time I archive the older TIFFs as 8-bit JPEGs. Each individual image is a grayscale image, though there may be as many as 5 such images representing the same imaging area at different wavelengths. Now I want to preserve as much information as possible in this process, including the ability to restore the images to their approximate original values. I know there are ways to get further savings through spatial correlations across multiple channels, but the number of channels can vary, and it would be nice to be able to load channels independently.
The images themselves suggest some possible strategies to use since close to ~60% of the area in each image is dark 'background'. So one way to preserve more of the useful image range is just to threshold away anything below this 'background' before scaling and reducing the bit depth. This strategy is, of course, pretty subjective, and I'm looking for any other suggestions for strategies that are demonstrably superior and/or more general. Maybe something like trying to preserve the most image entropy?
Thanks.
Your 2MB TIFFs are already losslessly compressed, so you would be hard-pressed to find a method that allows you to "restore the images" to their original value ranges without some loss of intensity detail.
So here are some questions to narrow down your problem a bit:
What are the image dimensions and number of channels? It's a bit difficult to guess from the filesize and bit depth alone, because as you've mentioned you're using lossless compression. A sample image would be good.
What sort of images are they? E.g. are they B/W blueprints, X-ray/MRI images, color photographs. You mention that around 60% of the images is "background" -- could you tell us more about the image content?
What are they used for? Is it just for a human viewer, or are they training images for some computer algorithm?
What kind of coding efficiency are you expecting? E.g. for the current 2MB filesize, how small do you want your compressed files to be?
Based on that information, people may be able to suggest something. For example, if your images are just color photographs that people will look at, 4:2:0 chroma subsampling will give you a 50% reduction in space without any visually detectable quality loss. You may even be able to keep your 16-bit image depth, if the reduction is sufficient.
Finally, note that you've compared two fundamentally different things in your question:
"top ~40% of the pixels" -- here it sounds like you're talking about contiguous parts of the intensity spectrum (e.g. intensities from 0.6 to 1.0) -- essentially the probability density function of the image.
"close to ~60% of the area in each image" -- here you're talking about the distribution of pixels in the spatial domain.
In general, these two things are unrelated and comparing them is meaningless. There may be an exception for specific image content -- please put up a representative image to make it obvious what you're dealing with.
If you edit your question, I'll have a look and reply if I think of something.

Resources