Currently I am working on my own CNN.I want to understand something.
If the input is RGB image. How I must merge this channels before fully connected network.Or must I merge this channels before first step and operate with this merged channel during all network?
I am taking from input 3 chanels.
Firts I am filtering R-channel than G-channel than B-channel with same filters.
After convolution and pooling I must convert it to 1D vector for fully connected layer.How I must convert this 3 channels together to one 1D vector? –
Something you can always do is to convert your images to grayscale, but you can also handle your input in RGB. Just make sure that your layers correctly process your 3D data.
It matters less HOW you marge them as much as it does to make sure you always encode the same way.
You have several options:
1) input array is all the red, followed by all the green, followed by blue
2) input is rgb of first pixel followed by rgb of next pixel ect.
3) etc.
It is just important to code the values the same way when you used the trained network.
Related
I am building and training a CNN for a binary classification task. I have extracted images (frames) from a labelled video-database I had. The database claims to have videos recorded via active IR Illumination. The frames I have extracted as images, have 3 channel information.
The resulting trained algorithm (CNN model) would be deployed over an embedded board, which would take video feeds from a standard RGB usb-camera and would work on a frame level basis over the video-feed.
Question PART-1:
Now correct me if I am wrong, but I am concerned - since my knowledge suggests that the data distribution of the active IR illuminated videos would be different than that of a standard RGB feed, would this model perform with an equal precision over the RGB images, for classifying frames?
Note 1: Although the videos in the database look like they are 'greyscale' (due to the visible grey-tone of the video, maybe due to active IR illumination) in nature, upon processing, they were found to be containing 3 channel information.
Note 2: The difference between the values of the per-pixel 3 channel information is considerably higher in normal RGB images, when compared to the images (frames) extracted from the database.
For example, in a normal RGB image, if you consider any particular pixel, at random, the values corresponding to the three channels might differ from each other. It may be something like (128, 32, 98) or (34, 209, 173), etc. (Look at the difference between values in the three channels.)
In case of the frames extracted from the videos of the database that I have, the values along the three channels of a pixel DO NOT vary as much as they do in case of regular RGB images - It is something along the lines of (112, 117, 109) or (231, 240, 235) or (32, 34, 30), etc. I am supposing this is due to the fact that the videos are in general grey-ish, for reference - similar to a black and white filter, but not exactly black and white.
Question PART-2:
Would it be fair to convert RGB images into grey-scale and duplicating the single channel twice to essentially make it a three channel image?
Part 1: the neural net will perform best with the more contrasted channels. And training on one type of image will perform poorly on the other type.
Part 2: an RGB image is three-channelled. It would be a nonsense to make the channels equal and lose the good information.
Most probably, your IR images are not grayscale, they are packed as an RGB image for viewing. As they are very similar to each other, the colors are very desaturated, i.e. nearly gray.
And sorry to say, capturing three IR channels is of little use.
I am trying to train a cnn model for face gender and age detection. My training set contains facial images both coloured and grayscale. How do I normalize this dataset? Or how do I handle a dataset with a mixture of grayscale and coloured images?
Keep in mind the network will just attempt to learn the relationship between your labels (gender/age) and you training data, in the way they are presented to the network.
The optimal choice is depending if you expect the model to work on gray-scale or colored images in the future.
If want to to predict on gray-scale image only
You should train on grayscale image only!
You can use many approaches to convert the colored images to black and white:
simple average of the 3 RGB channels
more sophisticated transforms using cylindrical color spaces as HSV,HSL. There you could use one of the channels as you gray. Normally, tthe V channel corresponds better to human perception than the average of RGB
https://en.wikipedia.org/wiki/HSL_and_HSV
If you need to predict colored image
Obviously, there is not easy way to reconstruct the colors from a grayscale image. Then you must use color images also during training.
if your model accepts MxNx3 image in input, then it will also accept the grayscale ones, given that you replicate the info on the 3 RGB channels.
You should carefully evaluate the number of examples you have, and compare it to the usual training set sizes required by the model you want to use.
If you have enough color images, just do not use the grayscale cases at all.
If you don't have enough examples, make sure you have balanced training and test set for gray/colored cases, otherwise your net will learn to classify gray-scale vs colored separately.
Alternatively, you could consider using masking, and replace with a masking values the missing color channels.
Further alternative you could consider:
- use a pre-trained CNN for feature extraction e.g. VGGs largely available online, and then fine tune the last layers
To me it feels that age and gender estimation would not be affected largely by presence/absence of color, and it might be that reducing the problem to a gray scale images only will help you to convergence since there will be way less parameters to estimate.
You should probably rather consider normalizing you images in terms of pose, orientation, ...
To train a network you have to ensure same size among all the training images, so convert all to grayscale. To normalize you can subtract the mean of training set from each image. Do the same with validation and testing images.
For detailed procedure go through below article:
https://becominghuman.ai/image-data-pre-processing-for-neural-networks-498289068258
I am seeing many Machine learning(CNN) tutorial which converts the read image in grayscale. I want to know how the model will understand original color/use color as one identification criteria if the colors are converted throughout the model creation ?
In consideration with colours, there can be 2 cases in an image processing problem:
Colours are not relevant in object-identification
In this case, converting a coloured image to a grayscale image will not matter, because eventually the model will be learning from the geometry present in the image. The image-binarization will help in sharpening the image by identifying the light and dark areas.
Colours are relevant in object-identification
As you might know that all the colours can be represented as some combination of three primary RGB colours. Each of these R, G and B values usually vary from 0 to 255 for each pixel. However, in gray-scaling, a certain pixel value will be one-dimensional instead of three-dimensional, and it will just vary from 0 to 255. So, yes, there will be some information loss in terms of actual colours, but, that is in tradeoff with the image-sharpness.
So, there can be a combined score of R, G, B values at each point (probably their mean (R+G+B)/3), which can give a number between 0 to 255, which can eventually be used as their representative. So that, instead of specific colour information, the pixel just carries the intensity information.
Reference:
https://en.wikipedia.org/wiki/Grayscale
I would like to add to Shashank's answer.
A model when fed with an image, does not perceive it as we do. Humans perceive images with the variations in colors, stauration of the colors and the brightness of it. We are able to recognize objects and other shapes as well.
However, a model sees an image as a matrix with a bunch of numbers in it (if it is a greyscale image). In case of a color image, it sees it as three matrices stacked above one another filled with numbers(0 -255) in it.
So how does it learn color? Well it doesn't. What it does learn is the variation in the numbers within this matrix (in case of greyscale image). These variations are crucial to determine changes in the image. If the CNN is trained in this respect, it will be able to detect a structure in the image and can also be used for bject detection.
So far I have trained my neural network is trained on the MNIST data set (from this tutorial). Now, I want to test it by feeding my own images into it.
I've processed the image using OpenCV by making the dimensions 28x28 pixels, turning it into grayscale, and using adaptive thresholding. Where do I proceed from here?
An 'image' is a 28x28 array of values from 0-1... so not really an image. Just greyscaling your original image will not make it fit for input. You have to go through the following steps.
Load your image into your programming langauge, with 784 rgb values representing pixels
For each rgb value, take the average of r, g and b. Then divide this value by 255. You will now have the greyscale of an image, a value between 0 and 1.
Replace the rgb values with the greyscale values
You will now have an image which looks like this (see the right array):
So you must do everything through your programming language. If you just greyscale an image with a photoeditor, the pixels will still be r,g,b.
You can use libraries like PIL, skimage that let you load the data into numpy arrays in python and also support many image operations like grayscaling, scaling etc.
After you have processed the image and read the data into numpy array you can then feed this to your network.
I need to be able to determine if a shape was drawn correctly or incorrectly,
I have sample data for the shape, that holds the shape and the order of pixels (denoted by the color of the pixel)
for example, you can see of the downsampled image and color variation
I'm having trouble figuring out the network I need to define that will accept this kind of input for training.
should I convert the sampledown image to a matrix and input it? let's say my image is 64x64, I would need 64x64 input neurons (and that's if I ignore the color of the pixels, I think) is that feasible solution?
If you have any guidance, I could use it :)
I gave you an example as below.
It is a binarized 4x4 image of letter c. You can either concatenate the rows or columns. I am concatenating by columns as shown in the figure. Then each pixel is mapped to an input neuron (totally 16 input neurons). In the output layer, I have 26 outputs, the letters a to z.
Note, in the figure, I did not connect all nodes from layer i to layer i+1 for simplicity, which you probably should connect all.
At the output layer, I highlight the node of c to indicate that for this training instance, c is the target label. The expected input and output vector are listed in the bottom of the figure.
If you want to keep the intensity of color, e.g., R/G/B, then you have to triple the number of inputs. Each single pixel is replaced with three neurons.
Hope this helps more. For a further reading, I strongly suggest the deep learning tutorial by Andrew Ng at here - UFLDL. It's the state of art of such image recognition problem. In the exercise with the tutorial, you will be intensively trained to preprocess the images and work with a lot of engineering tricks for image processing, together with the interesting deep learning algorithm end-to-end.