I am trying to train a cnn model for face gender and age detection. My training set contains facial images both coloured and grayscale. How do I normalize this dataset? Or how do I handle a dataset with a mixture of grayscale and coloured images?
Keep in mind the network will just attempt to learn the relationship between your labels (gender/age) and you training data, in the way they are presented to the network.
The optimal choice is depending if you expect the model to work on gray-scale or colored images in the future.
If want to to predict on gray-scale image only
You should train on grayscale image only!
You can use many approaches to convert the colored images to black and white:
simple average of the 3 RGB channels
more sophisticated transforms using cylindrical color spaces as HSV,HSL. There you could use one of the channels as you gray. Normally, tthe V channel corresponds better to human perception than the average of RGB
https://en.wikipedia.org/wiki/HSL_and_HSV
If you need to predict colored image
Obviously, there is not easy way to reconstruct the colors from a grayscale image. Then you must use color images also during training.
if your model accepts MxNx3 image in input, then it will also accept the grayscale ones, given that you replicate the info on the 3 RGB channels.
You should carefully evaluate the number of examples you have, and compare it to the usual training set sizes required by the model you want to use.
If you have enough color images, just do not use the grayscale cases at all.
If you don't have enough examples, make sure you have balanced training and test set for gray/colored cases, otherwise your net will learn to classify gray-scale vs colored separately.
Alternatively, you could consider using masking, and replace with a masking values the missing color channels.
Further alternative you could consider:
- use a pre-trained CNN for feature extraction e.g. VGGs largely available online, and then fine tune the last layers
To me it feels that age and gender estimation would not be affected largely by presence/absence of color, and it might be that reducing the problem to a gray scale images only will help you to convergence since there will be way less parameters to estimate.
You should probably rather consider normalizing you images in terms of pose, orientation, ...
To train a network you have to ensure same size among all the training images, so convert all to grayscale. To normalize you can subtract the mean of training set from each image. Do the same with validation and testing images.
For detailed procedure go through below article:
https://becominghuman.ai/image-data-pre-processing-for-neural-networks-498289068258
Related
I m building a CNN model with tensorflow Keras and the dataset available is in black and white.
I m using ImageDataGenerator available from keras.preprocessing.image api to convert image to array. By default it converts every image to 3 channel input. So will my model be able to predict real world image(colored imaged) if the trained image is in color and not black and white?
Also in ImageDataGenerator there is parameter named "color_mode" where it can take input as "grayscale" and gives us 2d array to be used in model. If I go with this approach do I need to convert real world image into grayscale as well?
The color space of the images you train should be the same as the color space of the images your application images.
If luminance is the the most important e.g. OCR, then training on gray scale images should produce a more efficient image. But if you are to recognize things that could appear in different colors, it may be interesting to use a color input.
If the color is not important and you train using 3-channel images, e.g. RGB, you will have to give examples in enough colors to avoid it to overfitting to the color. e.g you want to distinguish a car from a tree, you may end up with a model that maps any green object to a tree and all the rest to cars.
I want to feed an image that is stored in the YUV422 (YUYV) format into a CNN. YUV422 means that two pixels are represented by four bytes, basically two pixels share the chroma but have separate luminances.
I understand that for convolutional neural networks the spatiality plays an important role, i.e. that the filters "see" the luminance pixels together with their corresponding chroma pixels. So how would one approach this problem? Or is this no problem at all?
I want to avoid an additional preprocessing step for performance reasons.
Convolutional Neural Networks as implemented in common frameworks like TensorFlow, PyTorch, etc. stores channels in a planar fashion. That is, each channel (R,G,B or Y,U,V) is stored in a continuous region with all the pixels in the image (width x height). This in contrast to the format where channel data are interleaved inside each pixel. So you will need to upsample the subsampled UV channels to match the size of the Y channel, and then feed it to the network in the same way as RGB data.
Others have found it to work OK, but not reaching the performance of RGB. See https://github.com/ducha-aiki/caffenet-benchmark/blob/master/Colorspace.md
and Effect of image colourspace on performance of convolution neural networks by K Sumanth Reddy; Upasna Singh; Prakash K Uttam.
It is unlikely that the YUV to RGB conversion would be a bottleneck. RGB has the distinct advantage that one can reuse many excellent pretrained models (transfer learning).
1
In this paper YUVMultiNet: Real-time YUV multi-task CNN for autonomous driving, channel Y and UV are are fed into different conv sperately.
2 RGB v.s. YUV v.s. HSV v.s. Lab (1)
As mentioned by Jon Nordby,here shows a comparison.
Interestingly, the learned RGB2GRAY method is better than that in openCV.
YCbCr seems to underperform RGB. The experiment is conducted on ImageNet-2012.
I will Try it on COCO or other datasets later
3. RGB v.s. YUV v.s. HSV v.s. Lab (2)
In paper Deep leaning approach with colorimetric spaces and vegetation indices for vine diseases detection in UAV images,YUV is generally better than RGB.
RGB and YUV color spaces have obtained the best performances in terms of discrimination between the four classes.
RGB:
remains sensitive to the size of the CNN’s convolution filter, especially for patches size (64x64).
The HSV and LAB color spaces have performed less than other color spaces.
For HSV:
this is mainly due to the Hue (H) channel that contains all color information grouped into a single channel which is less relevant for network to learn best color features from this color space. It is the color that has the most relevance in the classification, the saturation and value channels did not make a good contribution to the classification.
For the LAB color space:
classification results were not conclusive. This may be due to that a and b channels,
does not represent effectively the colors related to the diseased vineyard. The L-channel contributes little to the classification because it represents the quantity of light the color space.
From the results, that YUV is more stable and consistent which makes it more suitable for symptoms detection. These good performances are related to the color information of the green colour corresponding to healthy vegetation, and the brown and yellow colours characterising diseased vegetation, presented in UV channels.
The combination of different spaces produced lower scores than each space separately except a slight improvement for (16 1 × 6) patch.
This is due to the fact that CNN has not been able to extract good color features from multiple channels for discriminating healthy and diseased classes
I trained a CNN (on tensorflow) for digit recognition using MNIST dataset.
Accuracy on test set was close to 98%.
I wanted to predict the digits using data which I created myself and the results were bad.
What I did to the images written by me?
I segmented out each digit and converted to grayscale and resized the image into 28x28 and fed to the model.
How come that I get such low accuracy on my data set where as such high accuracy on test set?
Are there other modifications that i'm supposed to make to the images?
EDIT:
Here is the link to the images and some examples:
Excluding bugs and obvious errors, my guess would be that your problem is that you are capturing your hand written digits in a way that is too different from your training set.
When capturing your data you should try to mimic as much as possible the process used to create the MNIST dataset:
From the oficial MNIST dataset website:
The original black and white (bilevel) images from NIST were size
normalized to fit in a 20x20 pixel box while preserving their aspect
ratio. The resulting images contain grey levels as a result of the
anti-aliasing technique used by the normalization algorithm. the
images were centered in a 28x28 image by computing the center of mass
of the pixels, and translating the image so as to position this point
at the center of the 28x28 field.
If your data has a different processing in the training and test phases then your model is not able to generalize from the train data to the test data.
So I have two advices for you:
Try to capture and process your digit images so that they look as similar as possible to the MNIST dataset;
Add some of your examples to your training data to allow your model to train on images similar to the ones you are classifying;
For those still have a hard time with the poor quality of CNN based models for MNIST:
https://github.com/christiansoe/mnist_draw_test
Normalization was the key.
I have a lots of images of paper cards of different shades of colors. Like all blues, or all reds, etc. In the images, they are held up to different objects that are of that color.
I want to write a program to compare the color to the shades on the card and choose the closest shade to the object.
however I realize that for future images my camera is going to be subject to lots of different lighting. I think I should convert into HSV space.
I'm also unsure of what type of distance measure I should use. Given some sort of blobs from the cards, I could average over the HSV and simply see which blob's average is the closest.
But I welcome any and all suggestions, I want to learn more about what I can do with OpenCV.
EDIT: A sample
Here I want to compare the filled in red of the 6th dot to see it is actually the shade of the 3rd paper rectangle.
I think one possibility is to do the following:
Color histograms from Hue and Saturation channels
compute the color histogram of the filled circle.
compute color histogram of the bar of paper.
compute a distance using histogram distance measures.
Possibilities here includes:
Chi square,
Earthmover distance,
Bhattacharya distance,
Histogram intersection etc.
Check this opencv link for details on computing histograms
Check this opencv link for details on the histogram comparisons
Note that when computing the color histograms, convert your images to HSV colorspace as you yourself suggested. Then, there is 2 things to note here.
[EDITED to make this a suggestion rather than a must do because I believe V channel might be necessary to differentiate the shades. Anyhow, try both and go with the one giving better result. Apologies if this sent you off track.] One possibility is to only use the Hue and Saturation channels i.e. you build a 2D
histogram rather than a 3D one consisting of values from the hue and
saturation channels. The reason for doing so is that the variation
in lighting is most felt in the V channel. This, together with the
use of histograms, should hopefully make your comparisons more
robust to lighting changes. There is some discussion on ignoring the
V channel when building color histograms in this post here. You
might find the references therein useful.
Normalize the histograms using the opencv functions. This is to
account for the different sizes of the patches of material (your
small circle vs the huge color bar has different number of pixels).
You might also wish to consider performing some form of preprocessing to "stretch" the color in the image e.g. using histogram equalization or an "S curve" mapping so that the different shades of color get better separated. Then compute the color histograms on this processed image. Keep the information for the mapping and perform it on new test samples before computing their color histograms.
Using ML for classification
Besides simply computing the distance and taking the closest one (i.e. a 1 nearest neighbor classifier), you might want to consider training a classifier to do the classification for you. One reason for doing so is that the training of the classifier will hopefully learn some way to differentiate between the different shades of hues since it has access to them during the training phase and is required to differentiate them. Notice that simply computing a distance, i.e. your suggested method, may not have this property. Hopefully this will give better classification.
The features use in the training can still be the color histograms that I mention above. That is, you compute color histograms as described above for your training samples and pass this to the classifier along with their class (i.e. which shade they are). Then, when you wish to classify a test sample, you likewise compute a color histogram and pass it to the classifier and it will return you the class (shade of color in your case) the color of the test sample belongs to.
Potential problems when training a classifier rather than using a simple distance comparison based approach as you have suggested is partly the added complexity of the program as well as potentially getting bad results when the training data is not good. There is also going to be a lot of parameter tuning involved to get it to work well.
See the opencv machine learning tutorials here for more details. Note that in the examples in the link, the classifier only differentiate between 2 classes whereas you have more than 2 shades of color. This is not a problem as the classifiers in general can work with more than 2 classes.
Hope this helps.
Here is the problem we are trying to solve:
Goal is to classify pixels of a colored image into 3 different classes.
We have a set of manually classified data for training purposes
Pixels almost do not correlate to each other (each have individual behaviour) - so most likely classification is on each individual pixel and based on it's individual features.
3 classes approximately can be mapped to colors of RED, YELLOW and BLACK color families.
We need to have the system semi-automatic, i.e. 3 parameters to control the probability of the presence of 3 outcomes (for final well-tuning)
Having this in mind:
Which classification technique will you choose?
What pixel features will you use for classification (RGB, Ycc, HSV, etc) ?
What modification functions will you choose for well-tuning between three outcomes.
My first try was based on
Naive bayes classifier
HSV (also tried RGB and Ycc)
(failed to find a proper functions for well-tuning)
Any suggestion?
Thanks
For each pixel in the image try using the histogram of colors the n x n window around that pixel as its features. For general-purpose color matching under varied lighting conditions, I have had good luck with using two-dimensional histograms of hue and saturation with a relatively small number of bins along each dimension. Depending upon your lighting consistency it might make sense for you to directly use the RGB values.
As for the classifier, the manual-tuning requirement is most easily expressed using class weights: parameters that specify the relative costs of false negatives versus false positives. I have only used this functionality with SVMs, but I'm sure you can find implementations of other classifiers that support a similar concept.