Facial recognition with color images? - opencv

In many examples across the web of facial recognition with OpenCV, i see images being converted to grayscale as part of the "pre-processing" for the facial recognition functionality. What would happen if a color image was used for facial recognition? Why do all examples turn images to grayscale first?

Many image processing and CV algorithms use grayscale images for input rather than color images. One important reason is because by converting to grayscale, it separates the luminance plane from the chrominance planes. Luminance is also more important for distinguishing visual features in a image. For instance, if you want to find edges based on both luminance and chrominance, it requires additional work. Color also doesn't really help us identity important features or characteristics of the image although there may be exceptions.
Grayscale images only have one color channel as opposed to three in a color image (RGB, HSV). The inherent complexity of grayscale images is lower than that of color images as you can obtain features relating to brightness, contrast, edges, shape, contours, textures, and perspective without color.
Processing in grayscale is also much faster. If we make the assumption that processing a three-channel color image takes three times as long as processing a grayscale image then we can save processing time by eliminating color channels we don't need. Essentially, color increases the complexity of the model and in general slows down processing.

Most facial recognition algorithms rely on the general intensity distribution in the image rather than the color intensity information of each channel.
Grayscale images provide exactly this information about the general distribution of intensities in an image (high-intensity areas appearing as white / low-intensity areas as black). Calculating the grayscale image is simple and needs little computing time, you can calculate this intensity by averaging the values of all 3 channels.
In a RGB image, this information is divided in all 3 channels. Take for example a bright yellow with:
RGB (255,217,0)
While this is obviously a color of high intensity, we obtain this information by combining all channels, which is exactly what a grayscale image does. You could of course instead use each channel for your feature calculation and concatenate the results to use all intensity information for this image, but it would result in essentially the same result as using the grayscale version while taking 3 times the computation time.

Related

How to handle YUV422 (YUYV) image as input of a CNN?

I want to feed an image that is stored in the YUV422 (YUYV) format into a CNN. YUV422 means that two pixels are represented by four bytes, basically two pixels share the chroma but have separate luminances.
I understand that for convolutional neural networks the spatiality plays an important role, i.e. that the filters "see" the luminance pixels together with their corresponding chroma pixels. So how would one approach this problem? Or is this no problem at all?
I want to avoid an additional preprocessing step for performance reasons.
Convolutional Neural Networks as implemented in common frameworks like TensorFlow, PyTorch, etc. stores channels in a planar fashion. That is, each channel (R,G,B or Y,U,V) is stored in a continuous region with all the pixels in the image (width x height). This in contrast to the format where channel data are interleaved inside each pixel. So you will need to upsample the subsampled UV channels to match the size of the Y channel, and then feed it to the network in the same way as RGB data.
Others have found it to work OK, but not reaching the performance of RGB. See https://github.com/ducha-aiki/caffenet-benchmark/blob/master/Colorspace.md
and Effect of image colourspace on performance of convolution neural networks by K Sumanth Reddy; Upasna Singh; Prakash K Uttam.
It is unlikely that the YUV to RGB conversion would be a bottleneck. RGB has the distinct advantage that one can reuse many excellent pretrained models (transfer learning).
1
In this paper YUVMultiNet: Real-time YUV multi-task CNN for autonomous driving, channel Y and UV are are fed into different conv sperately.
2 RGB v.s. YUV v.s. HSV v.s. Lab (1)
As mentioned by Jon Nordby,here shows a comparison.
Interestingly, the learned RGB2GRAY method is better than that in openCV.
YCbCr seems to underperform RGB. The experiment is conducted on ImageNet-2012.
I will Try it on COCO or other datasets later
3. RGB v.s. YUV v.s. HSV v.s. Lab (2)
In paper Deep leaning approach with colorimetric spaces and vegetation indices for vine diseases detection in UAV images,YUV is generally better than RGB.
RGB and YUV color spaces have obtained the best performances in terms of discrimination between the four classes.
RGB:
remains sensitive to the size of the CNN’s convolution filter, especially for patches size (64x64).
The HSV and LAB color spaces have performed less than other color spaces.
For HSV:
this is mainly due to the Hue (H) channel that contains all color information grouped into a single channel which is less relevant for network to learn best color features from this color space. It is the color that has the most relevance in the classification, the saturation and value channels did not make a good contribution to the classification.
For the LAB color space:
classification results were not conclusive. This may be due to that a and b channels,
does not represent effectively the colors related to the diseased vineyard. The L-channel contributes little to the classification because it represents the quantity of light the color space.
From the results, that YUV is more stable and consistent which makes it more suitable for symptoms detection. These good performances are related to the color information of the green colour corresponding to healthy vegetation, and the brown and yellow colours characterising diseased vegetation, presented in UV channels.
The combination of different spaces produced lower scores than each space separately except a slight improvement for (16 1 × 6) patch.
This is due to the fact that CNN has not been able to extract good color features from multiple channels for discriminating healthy and diseased classes

Training dataset with coloured and grayscale images

I am trying to train a cnn model for face gender and age detection. My training set contains facial images both coloured and grayscale. How do I normalize this dataset? Or how do I handle a dataset with a mixture of grayscale and coloured images?
Keep in mind the network will just attempt to learn the relationship between your labels (gender/age) and you training data, in the way they are presented to the network.
The optimal choice is depending if you expect the model to work on gray-scale or colored images in the future.
If want to to predict on gray-scale image only
You should train on grayscale image only!
You can use many approaches to convert the colored images to black and white:
simple average of the 3 RGB channels
more sophisticated transforms using cylindrical color spaces as HSV,HSL. There you could use one of the channels as you gray. Normally, tthe V channel corresponds better to human perception than the average of RGB
https://en.wikipedia.org/wiki/HSL_and_HSV
If you need to predict colored image
Obviously, there is not easy way to reconstruct the colors from a grayscale image. Then you must use color images also during training.
if your model accepts MxNx3 image in input, then it will also accept the grayscale ones, given that you replicate the info on the 3 RGB channels.
You should carefully evaluate the number of examples you have, and compare it to the usual training set sizes required by the model you want to use.
If you have enough color images, just do not use the grayscale cases at all.
If you don't have enough examples, make sure you have balanced training and test set for gray/colored cases, otherwise your net will learn to classify gray-scale vs colored separately.
Alternatively, you could consider using masking, and replace with a masking values the missing color channels.
Further alternative you could consider:
- use a pre-trained CNN for feature extraction e.g. VGGs largely available online, and then fine tune the last layers
To me it feels that age and gender estimation would not be affected largely by presence/absence of color, and it might be that reducing the problem to a gray scale images only will help you to convergence since there will be way less parameters to estimate.
You should probably rather consider normalizing you images in terms of pose, orientation, ...
To train a network you have to ensure same size among all the training images, so convert all to grayscale. To normalize you can subtract the mean of training set from each image. Do the same with validation and testing images.
For detailed procedure go through below article:
https://becominghuman.ai/image-data-pre-processing-for-neural-networks-498289068258

Image standarization technique methods

I want to know what are the types of image standardizations techniques available. For example, I'm working on hemoglobin concentration detection using photographs in which the patient pull lower conjunctiva downwards with one hand while holding a color calibration card in the other hand. All images are standardized to enable comparison using a previously established method. First, each image was split into its component 8-bit red, green and blue channels. Each channel’s brightness was adjusted by multiplying its brightness by 200/MB where MB is the mean brightness of the color calibration card’s white square. At this point, the channels were duplicated, with one set merged to produce a 24-bit white-balanced image.
This image standardization technique is not perfect and many times give wrong results. Is there any better image standardization technique available. Any idea or right direction will be helpful.
Edit
As i said above I'm working on the hemoglobin detection from a digital photograph. To reduce the effect of ambient lightning a image standardization technique is to be implemented. Current image standardization technique does not reduce the effect of ambient light to a great extent. The color calibration card in the photograph above could be used for the standardization process. I have heard of "white balance algorithm in mars probe" was used to overcome the same problem but was unable to find a proper source that could give enough information to implement it in OpenCV. Another approach or appropriate reference is helpful.

Color SURF detector

SURF by default works on Gray image. I am thinking to do SURF on HSV image. My method is to separate the channels into H, S and V. And I use S and V for keypoint detection. I tried to compare the number of keypoints in SV vs RGB and in terms of channel wise, HSV gives more features.
Not sure what I am doing is correct or not. Need some explanation of the possibility of applying SURF on HSV image. I have read a paper on applying SIFT on different color space but not SURF.
Is there better way to achieve this?
Can we apply SURF to color, HSV space?
Thank you for your time.
Can we apply SURF to color, HSV space?
I didn't test it, but as far as I know, SIFT and SURF use quite (in principle) similar detection techniques:
SIFT detector uses the Difference-of-Gaussian (DoG) technique to efficiently approximate the Laplacian-of-Gaussian (LoG), which both are Blob Detection techniques.
SURF detector uses box-filters/box-blurs of arbitrary size to compute (or approximate?) The determinant of the Hessian which is a Blob Detection technique.
Both methods use some strategy to compute those blobs in multiple scales (SIFT: DoG-Pyramid; SURF: integral images to scale the filter sizes). At the end, both methods detect blobs in the given 2D array.
So if SIFT can detect good features in your (H)SV channels, SURF should be able to do the same because in principle they both detect blobs. What you will do is detecting blobs in the hue/saturation/value channel:
hue-blobs: regions of similar color-tone which are surrounded by different (all higher or all lower) color-tones;
saturation-blobs: regions of... yea of what? no idea how to interpret that;
value-blobs: should give very similar results to the grayimage converted RGB image's blobs.
One thing to add: I'm just handling the detector! No idea how SIFT/SURF description is influenced by color data.
I didn't test it, but what you could do is using the interest point HSV values as additional matching criteria. What I used in the original implementation and what speeded up matching image pairs was the sign of the determinant of the Hessian matrix. The sign tells us whether it is a light blob on a dark background or a dark blob on a light background. Obviously, one would not attempt to match a dark blob with a bright blob.
In a similar way, you could use HSV values and use the distance. Why matching blue blobs with yellow blobs. Makes no sense, except white balance or lighting is completely messed up. Maybe my paper about matching line segments can help here. I used HSV there.
As for extracting SURF interest points on the different channels H, S, and V, I agree with the answer of Micka.
What you could try is to make a descriptor using the Hue channel.

why we should use gray scale for image processing

I think this can be a stupid question but after read a lot and search a lot about image processing every example I see about image processing uses gray scale to work
I understood that gray scale images use just one channel of color, that normally is necessary just 8 bit to be represented, etc... but, why use gray scale when we have a color image? What are the advantages of a gray scale? I could imagine that is because we have less bits to treat but even today with faster computers this is necessary?
I am not sure if I was clear about my doubt, I hope someone can answer me
thank you very much
As explained by John Zhang:
luminance is by far more important in distinguishing visual features
John also gives an excellent suggestion to illustrate this property: take a given image and separate the luminance plane from the chrominance planes.
To do so you can use ImageMagick separate operator that extracts the current contents of each channel as a gray-scale image:
convert myimage.gif -colorspace YCbCr -separate sep_YCbCr_%d.gif
Here's what it gives on a sample image (top-left: original color image, top-right: luminance plane, bottom row: chrominance planes):
To elaborate a bit on deltheil's answer:
Signal to noise. For many applications of image processing, color information doesn't help us identify important edges or other features. There are exceptions. If there is an edge (a step change in pixel value) in hue that is hard to detect in a grayscale image, or if we need to identify objects of known hue (orange fruit in front of green leaves), then color information could be useful. If we don't need color, then we can consider it noise. At first it's a bit counterintuitive to "think" in grayscale, but you get used to it.
Complexity of the code. If you want to find edges based on luminance AND chrominance, you've got more work ahead of you. That additional work (and additional debugging, additional pain in supporting the software, etc.) is hard to justify if the additional color information isn't helpful for applications of interest.
For learning image processing, it's better to understand grayscale processing first and understand how it applies to multichannel processing rather than starting with full color imaging and missing all the important insights that can (and should) be learned from single channel processing.
Difficulty of visualization. In grayscale images, the watershed algorithm is fairly easy to conceptualize because we can think of the two spatial dimensions and one brightness dimension as a 3D image with hills, valleys, catchment basins, ridges, etc. "Peak brightness" is just a mountain peak in our 3D visualization of the grayscale image. There are a number of algorithms for which an intuitive "physical" interpretation helps us think through a problem. In RGB, HSI, Lab, and other color spaces this sort of visualization is much harder since there are additional dimensions that the standard human brain can't visualize easily. Sure, we can think of "peak redness," but what does that mountain peak look like in an (x,y,h,s,i) space? Ouch. One workaround is to think of each color variable as an intensity image, but that leads us right back to grayscale image processing.
Color is complex. Humans perceive color and identify color with deceptive ease. If you get into the business of attempting to distinguish colors from one another, then you'll either want to (a) follow tradition and control the lighting, camera color calibration, and other factors to ensure the best results, or (b) settle down for a career-long journey into a topic that gets deeper the more you look at it, or (c) wish you could be back working with grayscale because at least then the problems seem solvable.
Speed. With modern computers, and with parallel programming, it's possible to perform simple pixel-by-pixel processing of a megapixel image in milliseconds. Facial recognition, OCR, content-aware resizing, mean shift segmentation, and other tasks can take much longer than that. Whatever processing time is required to manipulate the image or squeeze some useful data from it, most customers/users want it to go faster. If we make the hand-wavy assumption that processing a three-channel color image takes three times as long as processing a grayscale image--or maybe four times as long, since we may create a separate luminance channel--then that's not a big deal if we're processing video images on the fly and each frame can be processed in less than 1/30th or 1/25th of a second. But if we're analyzing thousands of images from a database, it's great if we can save ourselves processing time by resizing images, analyzing only portions of images, and/or eliminating color channels we don't need. Cutting processing time by a factor of three to four can mean the difference between running an 8-hour overnight test that ends before you get back to work, and having your computer's processors pegged for 24 hours straight.
Of all these, I'll emphasize the first two: make the image simpler, and reduce the amount of code you have to write.
I disagree with the implication that gray scale images are always better than color images; it depends on the technique and the overall goal of the processing. For example, if you wanted to count the bananas in an image of a fruit bowl image, then it's much easier to segment when you have a colored image!
Many images have to be in grayscale because of the measuring device used to obtain them. Think of an electron microscope. It's measuring the strength of an electron beam at various space points. An AFM is measuring the amount of resonance vibrations at various points topologically on a sample. In both cases, these tools are returning a singular value- an intensity, so they implicitly are creating a gray-scale image.
For image processing techniques based on brightness, they often can be applied sufficiently to the overall brightness (grayscale); however, there are many many instances where having a colored image is an advantage.
Binary might be too simple and it could not represent the picture character.
Color might be too much and affect the processing speed.
Thus, grayscale is chosen, which is in the mid of the two ends.
First of starting image processing whether on gray scale or color images, it is better to focus on the applications which we are applying. Unless and otherwise, if we choose one of them randomly, it will create accuracy problem in our result. For example, if I want to process image of waste bin, I prefer to choose gray scale rather than color. Because in the bin image I want only to detect the shape of bin image using optimized edge detection. I could not bother about the color of image but I want to see rectangular shape of the bin image correctly.

Resources