How to choose the number of bins when creating HSV histogram? - image-processing

I was reading some documentation about HSV histogram, and in several refs the Saturation channel was quantized into 256 values. Why is that? Is there any reason behind choosing this number?
I have the same questions for the Hue channel, often it is quantized into 180 values.

Disclaimer: Off-hand answers (i.e., not backed up by any documentation):
"256" is a popular number for a bin size because Programmers Like Round Numbers -- it fits in a single byte. And "180" because the HSB circle is "360 [degrees]", but "360" does not fit into a single byte.
For many image formats, the range of RGB values is limited to 0..255 per channel -- 3 bytes in total. To store the same amount of data (ignoring any artifacts of converting to another color model), Saturation and Brightness are often expressed in single bytes as well. The same could be done for Hue, by scaling the original range of 0..359 (as Hue is usually expressed as a value in degrees on the HSB Color Wheel) into the byte range 0..255. However, probably because it's easier to do calculations with a number close to the original 360° full circle, the range is clipped to 0..179. That way the value can be stored into a single byte (and thus "HSB" uses as much memory as "RGB") and can be converted trivially back to (close to) its original value -- multiply by 2. Obviously, sticking to the storage space wins over fidelity.
Given 256 values for both S and B, and 180 for H, you end up with a color space of 256*256*180 = 11,796,480 colors. To inspect the number of colors, you build a histogram: an array where you can read out the total amount of pixels in a certain color or color range. Using a color range here, instead of actual values, significantly cuts down the memory requirements.
For an RGB color image, with the colors fairly evenly distributed, you could shift down each channel a certain number of bits. This is how a straightforward conversion from 24-bit "true-color" RGB down to 15-bit RGB "high-color" space works: each channel gets divided by 8, reducing 256 values down to 32 (5 bits per channel). Conversion to a 16-bit high-color RGB space works the same; the bit that got left over in the 15-bit conversion is assigned to green. Thus, the range of colors for green is doubled, which is useful since the human eye is more perceptive for shades of green than for the other two primaries.
It gets more complicated when the colors in the input image are not evenly distributed. A naive solution is to create an array of [256][256][256], initialize all to zero, then fill the array with the colors of the image, and finally sort them. There are better alternatives -- let me consult my old Computer Graphics [1] here. Hold on.
13.4 Reproducing Color mentions the names of two different approaches from Heckbert (Color Image Quantization for Frame Buffer Display, SIGGRAPH 82): the popularity and the median-cut algorithms. (Unfortunately, that's all they say about this topic. I assume efficient code for both can be googled for.)
A rough guess:
The size for each bin (H,S,B) should be reflected by what you are trying to use it for. This older SO question, for example, uses a large bin for hue -- color is considered the most important -- and only 3 different values for both saturation and brightness. Thus, bright images with some subdued areas (say, a comic book) will give a good spread in this histogram, but a real-color photograph will not so much.
The main limit is that the bin sizes, multiplied with each other, should use a reasonably small amount of memory, yet cover enough of each component to get evenly filled. Perhaps some trial-and-error comes into play here. You could initially evenly distribute all of H, S, and B components over the available memory in your histogram and process a small part of the image; say, 1 out of 4 pixels, horizontally and vertically. If you notice one of the component bins fills up too fas where others stay untouched, adjust the ranges and restart.
If you need to do an analysis of multiple pictures, make sure they are all alike in their color gamut. You cannot expect a reasonable bin size to work on all sorts of images; you would end up with an evenly distribution, where all matches are only so-so.
[1] Computer Graphics. Principles and Practices. (1997) J.D. Foley, A. van Dam, S.K. Feiner, and J.F. Hughes, 2nd ed., Reading, MA: Addison-Wesley.

Related

How to group RGB or HEX color codes to bigger sets of color groups?

I am analyzing a very big number of images and extracting the dominant color codes.
I want to group them into ranges of generic color names, like Green, Dark Green, Light Green, Blue, Dark Blue, Light Blue and so on.
I am looking for a language agnostic way in order to implement something by myself, if there are examples I can look into in order to achieve this I would be more than grateful.
In machine learning field, what you want to do is called classification, in which the goal is to assign the label of one of the classes (color) to each of the observations (images).
To do this, classes must be pre-defined. Suppose these are the colors we want to assign to images:
To determine the dominant color of an image, the distance between each of its pixels and all the colors in the table must be calculated. Note that this distance is calculated in RGB color space. To calculate the distance between the ij-th pixel of the image and the k-th color of the table, the following equation can be used:
d_ijk = sqrt((r_ij-r_k)^2+(g_ij-g_k)^2+(b_ij-b_k)^2)
In the next step, for each pixel, the closest color in the table is selected. This is the concept used to compress an image using indexed colors (except that here the palette is the same for all images and is not calculated for each to minimize the difference between the original and the indexed image). Now, as #jairoar pointed out, we can get the histogram of the image (not to be confused with RGB histogram or intensity histogram), and determine the color that has the most repetition.
To show the result of these steps, I used random crops of this work of art! of mine:
This is how images look, before and after indexing (left: original, right: indexed):
And these are most repeated colors (left: indexed, right: dominant color):
But since you said the number of images is large, you should know that these calculations are relatively time consuming. But the good news is that there are ways to increase the performance. For example, instead of using the Euclidean distance (formula above), you can use the City Block or Chebyshev distance. You can also calculate the distance only for a fraction of the pixels instead of calculating it for all the pixels in an image. For this purpose, you can first scale the image to a much smaller size (for example, 32 by 32) and perform calculations for the pixels of this reduced image. If you decided to resize images, don not bother to use bilinear or bicubic interpolations, it doesn't worth the extra computation. Instead, go for the nearest neighbor, which actually performs a rectangular lattice sampling on the original image.
Although the mentioned changes will greatly increase the speed of calculations, but nothing good comes for free. This is a trade-off of performance versus accuracy. For example, in the previous two pictures, we see that the image, which was initially recognized as orange (code 20), has been recognized as pink (code 26) after resizing it.
To determine the parameters of the algorithm (distance measurement, reduced image size and scaling algorithm), you must first perform the classification operation on a number of images with the highest possible accuracy and keep the results as the ground truth. Then, with multiple experiments, obtain a combination of parameters that do not make the classification error more than a maximum tolerable value.
#saastn's fantastic answer assumes you have a set of pre-defined colors that you want to sort your images to. The implementation is easier if you just want to classify the images to one color out of some set of X equidistant colors, a la histogram.
To summarize, round the color of each pixel in the image to the nearest color out of some set of equidistant color bins. This reduces the precision of your colors down to whatever amount of colors that you desire. Then count all of the colors in the image and select the most frequent color as your classification for that image.
Here is my implementation of this in Python:
import cv2
import numpy as np
#Set this to the number of colors that you want to classify the images to
number_of_colors = 8
#Verify that the number of colors chosen is between the minimum possible and maximum possible for an RGB image.
assert 8 <= number_of_colors <= 16777216
#Get the cube root of the number of colors to determine how many bins to split each channel into.
number_of_values_per_channel = number_of_colors ** ( 1 / 3 )
#We will divide each pixel by its maximum value divided by the number of bins we want to divide the values into (minus one for the zero bin).
divisor = 255 / (number_of_values_per_channel - 1)
#load the image and convert it to float32 for greater precision. cv2 loads the image in BGR (as opposed to RGB) format.
image = cv2.imread("image.png", cv2.IMREAD_COLOR).astype(np.float32)
#Divide each pixel by the divisor defined above, round to the nearest bin, then convert float32 back to uint8.
image = np.round(image / divisor).astype(np.uint8)
#Flatten the columns and rows into just one column per channel so that it will be easier to compare the columns across the channels.
image = image.reshape(-1, image.shape[2])
#Find and count matching rows (pixels), where each row consists of three values spread across three channels (Blue column, Red column, Green column).
uniques = np.unique(image, axis=0, return_counts=True)
#The first of the two arrays returned by np.unique is an array compromising all of the unique colors.
colors = uniques[0]
#The second of the two arrays returend by np.unique is an array compromising the counts of all of the unique colors.
color_counts = uniques[1]
#Get the index of the color with the greatest frequency
most_common_color_index = np.argmax(color_counts)
#Get the color that was the most common
most_common_color = colors[most_common_color_index]
#Multiply the channel values by the divisor to return the values to a range between 0 and 255
most_common_color = most_common_color * divisor
#If you want to name each color, you could also provide a list sorted from lowest to highest BGR values comprising of
#the name of each possible color, and then use most_common_color_index to retrieve the name.
print(most_common_color)

subtract one color from another in RGB color space

I would like to subtract color from another. For example, I have two image 100X100 pixel, one with color R:236 G:226 B:43, and another R:63 G:85 B:235. I would like to cut color R:236 G:226 B:43 from R:63 G:85 B:235. But I know it can't subtract like the mathematically method, by layer R:236-63, G:226-85, B:43-235 because i found that the color that less than 0 and more than 255 can't define.
I found another color space in RYB color space.but i don't know how it really work.
Thank you for your help.
You cannot actually subtract colors. But you surely can detect their difference. I suppose this is what you need, anyway.
Here are some thoughts and remarks:
Convert your images to HSV colorspace which transforms RGB values to
Hue, Saturation and Brightness (Value).
All your images should be around a yellowish color (near 60 deg. on
the Hue circle) so they should all have about the same Hue with
minor differences.
Typically if all images are taken at constant lighting conditions
they should have the same Value (brightness).
Saturation, which corresponds to the mixture of white in a color,
typically represents how intense you perceive a color to be. This
would typically be of about the same value for all your images in
constant lighting conditions.
According to your first description, the main difference should be detected in the Hue channel.
A good thing about HSV is that H (hue) is represented by a counterclockwise circle and colors are just positions on this circle, so positive and negative values all make sense (search google for a description of HSV colorspace to get a view of how it looks and works).
You may either detect differences by a subtraction that will lead you to a value either positive either negative, or by taking the absolute value of the subtraction, which will just give a measure of the difference of the two values of Hue (but without any information on the direction of the difference). If you need the direction of the difference you should just stick to a plain subtraction.
For example:
Hue_1 - Hue_2 = Hue_3 (typically a small value for your problem)
if Hue_3 > 0 this means that Hue_1 is a bit towards Green if
Hue_3 < 0 this means that Hue_1 is a bit towards Red
Of course you may also need to take a look at the differences in the other channels, S and V to see if colors are more saturated or more bright, but I cannot be sure you need to do this since we haven't seen any images here.
Of course you can do a lot more sophisticated things...Like apply clustering or classification techniques on the detected hues and classify them to classes according to your problem needs...

Threshold to amplify black lines

Given an image (Like the one given below) I need to convert it into a binary image (black and white pixels only). This sounds easy enough, and I have tried with two thresholding functions. The problem is I cant get the perfect edges using either of these functions. Any help would be greatly appreciated.
The filters I have tried are, the Euclidean distance in the RGB and HSV spaces.
Sample image:
Here it is after running an RGB threshold filter. (40% it more artefects after this)
Here it is after running an HSV threshold filter. (at 30% the paths become barely visible but clearly unusable because of the noise)
The code I am using is pretty straightforward. Change the input image to appropriate color spaces and check the Euclidean distance with the the black color.
sqrt(R*R + G*G + B*B)
since I am comparing with black (0, 0, 0)
Your problem appears to be the variation in lighting over the scanned image which suggests that a locally adaptive thresholding method would give you better results.
The Sauvola method calculates the value of a binarized pixel based on the mean and standard deviation of pixels in a window of the original image. This means that if an area of the image is generally darker (or lighter) the threshold will be adjusted for that area and (likely) give you fewer dark splotches or washed-out lines in the binarized image.
http://www.mediateam.oulu.fi/publications/pdf/24.p
I also found a method by Shafait et al. that implements the Sauvola method with greater time efficiency. The drawback is that you have to compute two integral images of the original, one at 8 bits per pixel and the other potentially at 64 bits per pixel, which might present a problem with memory constraints.
http://www.dfki.uni-kl.de/~shafait/papers/Shafait-efficient-binarization-SPIE08.pdf
I haven't tried either of these methods, but they do look promising. I found Java implementations of both with a cursory Google search.
Running an adaptive threshold over the V channel in the HSV color space should produce brilliant results. Best results would come with higher than 11x11 size window, don't forget to choose a negative value for the threshold.
Adaptive thresholding basically is:
if (Pixel value + constant > Average pixel value in the window around the pixel )
Pixel_Binary = 1;
else
Pixel_Binary = 0;
Due to the noise and the illumination variation you may need an adaptive local thresholding, thanks to Beaker for his answer too.
Therefore, I tried the following steps:
Convert it to grayscale.
Do the mean or the median local thresholding, I used 10 for the window size and 10 for the intercept constant and got this image (smaller values might also work):
Please refer to : http://homepages.inf.ed.ac.uk/rbf/HIPR2/adpthrsh.htm if you need more
information on this techniques.
To make sure the thresholding was working fine, I skeletonized it to see if there is a line break. This skeleton may be the one needed for further processing.
To get ride of the remaining noise you can just find the longest connected component in the skeletonized image.
Thank you.
You probably want to do this as a three-step operation.
use leveling, not just thresholding: Take the input and scale the intensities (gamma correct) with parameters that simply dull the mid tones, without removing the darks or the lights (your rgb threshold is too strong, for instance. you lost some of your lines).
edge-detect the resulting image using a small kernel convolution (5x5 for binary images should be more than enough). Use a simple [1 2 3 2 1 ; 2 3 4 3 2 ; 3 4 5 4 3 ; 2 3 4 3 2 ; 1 2 3 2 1] kernel (normalised)
threshold the resulting image. You should now have a much better binary image.
You could try a black top-hat transform. This involves substracting the Image from the closing of the Image. I used a structural element window size of 11 and a constant threshold of 0.1 (25.5 on for a 255 scale)
You should get something like:
Which you can then easily threshold:
Best of luck.

Differences between gamma correction and exposure in image processing

Anyone know what is the difference between gamma and exposure? And what is the difference between gamma correction and exposure adjustment in image processing?
Since you don't have an image processing background i would start with a basics
1) Every digital image has a dynamic range of gray levels.Now gray levels are nothing but values which ultimately corresponds to a color. Say Mono-chrome image(Black and white image) has only 2 gray levels i.e. 0 and 1 where 0 means black and 1 means white color. Here the dynamic range is [0-1]. In these images each pixel is stored as a single bit.
Similarly there is Gray-scale images have shades of gray in them. Here each pixel is stored as 8-bit so dynamic range is [0-255]. How? just apply the formula (2^n -1) where n is number of bits. i.e. (2^8 - 1) i.e. 256-1 = 255.
Similarly there are color-images which are 24-bit images.In general the dynamic range of gray levels in image is given by [0 - L-1] where L is number of gray levels.
2) Now once you have understood what is dynamic range lets understand Gamma correction.Gamma correction is nothing but a function that compress the dynamic range of images so that we can view the image more nicely or properly. But why do we need to compress dynamic range? A best day to day example is during day time when we cannot see the stars, the reason is because the intensity of sun is so large as compared to the intensity of stars that we cannot see the stars in day time.Similarly when dynamic range is high in an image then that of the display device we cannot see the image properly. Therefore we can use gamma correction to compress the dynamic range of image
3) Gamma correction can be written as g(x,y) = c * f(x,y) ^ # where # is symbol of gamma (since i don't know how to write gamma symbol here, i have used #) and f(x,y) is original image with high dynamic range, g(x,y) is modified image. C is a positive constant.
4) Exposure as said earlier in an answer its phenomena in camera. I don't know much about it as it is not covered in the syllabus of image processing which i am currently studying.
Gamma correction is a non-linear global function that compresses certain ranges in your image. It is mainly used in order to be more efficient from human vision point of view, in fixed point format. It is absent in raw files, but exists in JPEG. Each pixel undergoes the following transformation:
y = x^p
Exposure is a physical phenomenon in your camera. Exposure adjustment on the other hand is linear global function. It is used mainly in order to compensate for lack or excess of exposure in the camera:
y = a*x
Exposure is an indication of the total quantity of light that reaches the CCD of your camera (or the silver ions on film). It can be expressed as the number of photons that hit your image-recording elements.
Films and CCD are calibrated to expect a certain quantity of light (certain number of photons) in order to be able to create an "average" image.
The higher the "expected" quantity of light, the lower the ISO number of your film (or camera setting) => in order to obtain a normal image, a film (or camera setting) of 100 ISO needs more light than a film of 3200 ISO, hence the use of 3200 ISO films for night photography.
next step: the camera thing. When you want to make a picture (= have photons hit your CCD or film), you need to open the diaphragm of your camera. Depending on how much you open your diaphragm, the nature of your image will change (speaking from an artistic point of view here). If your diaphragm is wide open, most of the image which is not perfectly in focus will be blurred (e.g. as used in portrait photography). Conversely, if your diaphragm is only a little bit open during exposure, most of your image will be very sharp. This is used very often for landscape photography.
As your film (or CCD) expect a certain quantity of light with a given ISO value, it is obvious that a smaller diaphragm opening requires longer exposure times whereas a wide open diaphraghm requires a very short time.
Good books about this subject are the series "The Camera", "The Negative" and "The Print" by Ansel Adams.
Conclusion: exposure and gamma correction are different things.
- Exposure is a part of the parameters you need to control while creating your initial image through the use of a camera.
- Gamma correction is related to subsequent manipulation of your image file. I'm not sure if the notion of "gamma correction" is being used in the context of film.
Basically:
Gamma is a monitor thing.
Exposure is a camera thing.

Algorithm for determining the prominant colour of a photograph

When we look at a photo of a group of trees, we are able to identify that the photo is predominantly green and brown, or for a picture of the sea we are able to identify that it is mostly blue.
Does anyone know of an algorithm that can be used to detect the prominent color or colours in a photo?
I can envisage a 3D clustering algorithm in RGB space or something similar. I was wondering if someone knows of an existing technique.
Convert the image from RGB to a color space with brightness and saturation separated (HSL/HSV)
http://en.wikipedia.org/wiki/HSL_and_HSV
Then find the dominating values for the hue component of each pixel. Make a histogram for the hue values of each pixel and analyze in which angle region the peaks fall in. A large peak in the quadrant between 180 and 270 degrees means there is a large portion of blue in the image, for example.
There can be several difficulties in determining one dominant color. Pathological example: an image whose left half is blue and right half is red. Also, the hue will not deal very well with grayscales obviously. So a chessboard image with 50% white and 50% black will suffer from two problems: the hue is arbitrary for a black/white image, and there are two colors that are exactly 50% of the image.
It sounds like you want to start by computing an image histogram or color histogram of the image. The predominant color(s) will be related to the peak(s) in the histogram.
You might want to change the image from RGB to indexed, then you could use a regular histogram and detect the pics (Matlab does this with rgb2ind(), as you probably already know), and then the problem would be reduced to your regular "finding peaks in an array".
Then
n = hist(Y,nbins) bins the elements in vector Y into 10 equally spaced containers and returns the number of elements in each container as a row vector.
Those values in n will give you how many elements in each bin. Then it's just a matter of fiddling with the number of bins to make them wide enough, and with how many elements in each would make you count said bin as a predominant color, then taking the bins that contain those many elements, calculating the index that corresponds with their middle, and converting it to RGB again.
Whatever you're using for your processing probably has similar functions to those
Average all pixels in the image.
Remove all pixels that are farther away from the average color than standard deviation.
GOTO 1 with remaining pixels until arbitrarily few are left (1 or maybe 1%).
You might also want to pre-process the image, for example apply high-pass filter (removing only very low frequencies) to even out lighting in the photo — http://en.wikipedia.org/wiki/Checker_shadow_illusion

Resources