I'm trying to fine-tune the VGG16 model in Keras for a medical imaging application.
Because medical images are gray images, I copy each gray image into RGB channels (3 channels have the same gray image), such that they can be used as the inputs to the VGG model. Let's call them "RGB" images (with quotation marks).
But, if I use the preprocess_input function from keras.applications.vgg16 to preprocess the "RGB" images, then because the the default mode is 'caffe' in preprocess_input, it will subtract mean RGB values [103.939, 116.779, 123.68], which were calculated from the training dataset in Imagenet, from each "RGB" image I created.
However, in my "RGB" images, all 3 RGB channels should have the same mean value, and more importantly because mine are medical images, the mean value should be different from the ones from Imagenet.
So in this case, how should I preprocess my "RGB" images to fine-tune the VGG16 model with pretrained 'Imagenet 'weights?
Also, just to make sure, the pretrained 'Imagenet' weights in Keras were trained with data that were preprocessed in 'caffe' mode, right?
Related
i'm running a regression to predict biophysical parameters on Raster image bands.
I extracted the pixels values in each point of samples, I managed to predict in the same points with multiples non linear regression models.
My question is how can I predict the parameters of all the pixels in my study area using my trained model and how I plot it on my raster with keeping the géo-localisation?
NOTE : I can import the bands with Rasterio or gdal but how to get out with a full prediction?
Thank you in advance.
I m building a CNN model with tensorflow Keras and the dataset available is in black and white.
I m using ImageDataGenerator available from keras.preprocessing.image api to convert image to array. By default it converts every image to 3 channel input. So will my model be able to predict real world image(colored imaged) if the trained image is in color and not black and white?
Also in ImageDataGenerator there is parameter named "color_mode" where it can take input as "grayscale" and gives us 2d array to be used in model. If I go with this approach do I need to convert real world image into grayscale as well?
The color space of the images you train should be the same as the color space of the images your application images.
If luminance is the the most important e.g. OCR, then training on gray scale images should produce a more efficient image. But if you are to recognize things that could appear in different colors, it may be interesting to use a color input.
If the color is not important and you train using 3-channel images, e.g. RGB, you will have to give examples in enough colors to avoid it to overfitting to the color. e.g you want to distinguish a car from a tree, you may end up with a model that maps any green object to a tree and all the rest to cars.
I am trying to train a cnn model for face gender and age detection. My training set contains facial images both coloured and grayscale. How do I normalize this dataset? Or how do I handle a dataset with a mixture of grayscale and coloured images?
Keep in mind the network will just attempt to learn the relationship between your labels (gender/age) and you training data, in the way they are presented to the network.
The optimal choice is depending if you expect the model to work on gray-scale or colored images in the future.
If want to to predict on gray-scale image only
You should train on grayscale image only!
You can use many approaches to convert the colored images to black and white:
simple average of the 3 RGB channels
more sophisticated transforms using cylindrical color spaces as HSV,HSL. There you could use one of the channels as you gray. Normally, tthe V channel corresponds better to human perception than the average of RGB
https://en.wikipedia.org/wiki/HSL_and_HSV
If you need to predict colored image
Obviously, there is not easy way to reconstruct the colors from a grayscale image. Then you must use color images also during training.
if your model accepts MxNx3 image in input, then it will also accept the grayscale ones, given that you replicate the info on the 3 RGB channels.
You should carefully evaluate the number of examples you have, and compare it to the usual training set sizes required by the model you want to use.
If you have enough color images, just do not use the grayscale cases at all.
If you don't have enough examples, make sure you have balanced training and test set for gray/colored cases, otherwise your net will learn to classify gray-scale vs colored separately.
Alternatively, you could consider using masking, and replace with a masking values the missing color channels.
Further alternative you could consider:
- use a pre-trained CNN for feature extraction e.g. VGGs largely available online, and then fine tune the last layers
To me it feels that age and gender estimation would not be affected largely by presence/absence of color, and it might be that reducing the problem to a gray scale images only will help you to convergence since there will be way less parameters to estimate.
You should probably rather consider normalizing you images in terms of pose, orientation, ...
To train a network you have to ensure same size among all the training images, so convert all to grayscale. To normalize you can subtract the mean of training set from each image. Do the same with validation and testing images.
For detailed procedure go through below article:
https://becominghuman.ai/image-data-pre-processing-for-neural-networks-498289068258
So far I have trained my neural network is trained on the MNIST data set (from this tutorial). Now, I want to test it by feeding my own images into it.
I've processed the image using OpenCV by making the dimensions 28x28 pixels, turning it into grayscale, and using adaptive thresholding. Where do I proceed from here?
An 'image' is a 28x28 array of values from 0-1... so not really an image. Just greyscaling your original image will not make it fit for input. You have to go through the following steps.
Load your image into your programming langauge, with 784 rgb values representing pixels
For each rgb value, take the average of r, g and b. Then divide this value by 255. You will now have the greyscale of an image, a value between 0 and 1.
Replace the rgb values with the greyscale values
You will now have an image which looks like this (see the right array):
So you must do everything through your programming language. If you just greyscale an image with a photoeditor, the pixels will still be r,g,b.
You can use libraries like PIL, skimage that let you load the data into numpy arrays in python and also support many image operations like grayscaling, scaling etc.
After you have processed the image and read the data into numpy array you can then feed this to your network.
I trained a CNN (on tensorflow) for digit recognition using MNIST dataset.
Accuracy on test set was close to 98%.
I wanted to predict the digits using data which I created myself and the results were bad.
What I did to the images written by me?
I segmented out each digit and converted to grayscale and resized the image into 28x28 and fed to the model.
How come that I get such low accuracy on my data set where as such high accuracy on test set?
Are there other modifications that i'm supposed to make to the images?
EDIT:
Here is the link to the images and some examples:
Excluding bugs and obvious errors, my guess would be that your problem is that you are capturing your hand written digits in a way that is too different from your training set.
When capturing your data you should try to mimic as much as possible the process used to create the MNIST dataset:
From the oficial MNIST dataset website:
The original black and white (bilevel) images from NIST were size
normalized to fit in a 20x20 pixel box while preserving their aspect
ratio. The resulting images contain grey levels as a result of the
anti-aliasing technique used by the normalization algorithm. the
images were centered in a 28x28 image by computing the center of mass
of the pixels, and translating the image so as to position this point
at the center of the 28x28 field.
If your data has a different processing in the training and test phases then your model is not able to generalize from the train data to the test data.
So I have two advices for you:
Try to capture and process your digit images so that they look as similar as possible to the MNIST dataset;
Add some of your examples to your training data to allow your model to train on images similar to the ones you are classifying;
For those still have a hard time with the poor quality of CNN based models for MNIST:
https://github.com/christiansoe/mnist_draw_test
Normalization was the key.