Here is the problem we are trying to solve:
Goal is to classify pixels of a colored image into 3 different classes.
We have a set of manually classified data for training purposes
Pixels almost do not correlate to each other (each have individual behaviour) - so most likely classification is on each individual pixel and based on it's individual features.
3 classes approximately can be mapped to colors of RED, YELLOW and BLACK color families.
We need to have the system semi-automatic, i.e. 3 parameters to control the probability of the presence of 3 outcomes (for final well-tuning)
Having this in mind:
Which classification technique will you choose?
What pixel features will you use for classification (RGB, Ycc, HSV, etc) ?
What modification functions will you choose for well-tuning between three outcomes.
My first try was based on
Naive bayes classifier
HSV (also tried RGB and Ycc)
(failed to find a proper functions for well-tuning)
Any suggestion?
Thanks
For each pixel in the image try using the histogram of colors the n x n window around that pixel as its features. For general-purpose color matching under varied lighting conditions, I have had good luck with using two-dimensional histograms of hue and saturation with a relatively small number of bins along each dimension. Depending upon your lighting consistency it might make sense for you to directly use the RGB values.
As for the classifier, the manual-tuning requirement is most easily expressed using class weights: parameters that specify the relative costs of false negatives versus false positives. I have only used this functionality with SVMs, but I'm sure you can find implementations of other classifiers that support a similar concept.
Related
I am trying to train a cnn model for face gender and age detection. My training set contains facial images both coloured and grayscale. How do I normalize this dataset? Or how do I handle a dataset with a mixture of grayscale and coloured images?
Keep in mind the network will just attempt to learn the relationship between your labels (gender/age) and you training data, in the way they are presented to the network.
The optimal choice is depending if you expect the model to work on gray-scale or colored images in the future.
If want to to predict on gray-scale image only
You should train on grayscale image only!
You can use many approaches to convert the colored images to black and white:
simple average of the 3 RGB channels
more sophisticated transforms using cylindrical color spaces as HSV,HSL. There you could use one of the channels as you gray. Normally, tthe V channel corresponds better to human perception than the average of RGB
https://en.wikipedia.org/wiki/HSL_and_HSV
If you need to predict colored image
Obviously, there is not easy way to reconstruct the colors from a grayscale image. Then you must use color images also during training.
if your model accepts MxNx3 image in input, then it will also accept the grayscale ones, given that you replicate the info on the 3 RGB channels.
You should carefully evaluate the number of examples you have, and compare it to the usual training set sizes required by the model you want to use.
If you have enough color images, just do not use the grayscale cases at all.
If you don't have enough examples, make sure you have balanced training and test set for gray/colored cases, otherwise your net will learn to classify gray-scale vs colored separately.
Alternatively, you could consider using masking, and replace with a masking values the missing color channels.
Further alternative you could consider:
- use a pre-trained CNN for feature extraction e.g. VGGs largely available online, and then fine tune the last layers
To me it feels that age and gender estimation would not be affected largely by presence/absence of color, and it might be that reducing the problem to a gray scale images only will help you to convergence since there will be way less parameters to estimate.
You should probably rather consider normalizing you images in terms of pose, orientation, ...
To train a network you have to ensure same size among all the training images, so convert all to grayscale. To normalize you can subtract the mean of training set from each image. Do the same with validation and testing images.
For detailed procedure go through below article:
https://becominghuman.ai/image-data-pre-processing-for-neural-networks-498289068258
My goal is to be able to have a way to process a product photo through the model, and have it return the same photo with the product against a white background. The product photos will be of varying sizes and product types.
I'd like to feed the model photos of products with backgrounds, and those without. In the future I will also expand on the dataset with partially removed backgrounds.
If you are looking for an easy way of doing this, I'd suggest the K-means clustering algorithm. Assuming that you have a simple plain background and an image (of interest) you can obtain the RGB pixel values and use a K-means clustering algorithm with the number of clusters set to 2.
Let me explain this to you with the help of an example. Suppose you have an image of dimension 28*28 (just another arbitrary dimension). The total number of pixels in the image would be 784. Each pixel is represented as a combination of 3 RGB values ranging from 0-255.
A K-Means clustering algorithm will cluster the pixel values into K clusters thus each cluster represents pixel values which are more similar than the pixel values in another cluster. This technique is especially helpful in drawing contours (borders) around images of interest.
In the K-means clustering algorithm, there would be 784 sample points each represented in a 3 dimensional plane for this example. It will cluster these data points into K (2 in this example) clusters.
Here is a very simple implementation of the K-means clustering algorithm.
If you are looking for advanced machine learning implementation, then I'd suggest you look for Deep Convolution Neural Networks for Background Removal in Images. This machine learning technique has been successfully used for the task for background image removal
Read more about it from here, here and here.
I am working on a project to segment air images and classify each segment. The images are very large and have huge homogeneous areas, so I decided to use a Split and Merge Algorithm for the segmentation.
(On the left the original image and on the right the segmented one, where each segment is represented in its RGB mean value Thanks to this answer)
For the classification I want to use a SVM Classifier (I used it a lot in two projects before) with a feature vector.
For the beginning I just want to use five classes: Water, Vegetation, Built up area, Dune and Anomaly
Now I am thinking about what I can put in this feature vector:
The mean RGB Value of the Segment
A texture feature (but can I represent the texture of the segment with just one value?)
The place in the source image (maybe with a value which represents left, right or middle?)
The size of the segment (Water segments should be much larger than built areas)
The mean RGB values of the fourth neighborhood of the segment
So has anyone done something like this and can give me some advises what useful stuff I can put in the feature vector? And can someone give me an advise how I can represent the texture in the segment correctly?
Thank you for your help.
Instead of the Split and Merge algorithm, you could also use superpixels. There are several fast and easy-to-use superpixel algorithms available (some are even implemented in recent OpenCV versions). To name just a view:
Compact Watershed (see here: https://www.tu-chemnitz.de/etit/proaut/forschung/rsrc/cws_pSLIC_ICPR.pdf)
preSLIC and SLIC (see here: https://www.tu-chemnitz.de/etit/proaut/forschung/rsrc/cws_pSLIC_ICPR.pdf and here: http://www.kev-smith.com/papers/SLIC_Superpixels.pdf)
SEEDS (see here: https://arxiv.org/abs/1309.3848)
ERGC (see here: https://sites.google.com/site/pierrebuyssens/code/ergc)
Given the superpixel segmentation, there is a vast set of features you can compute in order to classify them:
In Automatic Photo Pop-Up Table 1, Hoiem et al. consider, among others, the following features: mean RGB color, mean HSV color, color histograms, saturation histograms, Textons, differenty oriented Gaussian derivative filters, mean x and y location, area, ...
In Recovering Occlusion Boundaries from a Single Image, Hoiem et al. consider some additional features to the above list in Table 1.
In SuperParsing: Scalable Nonparametric Image
Parsing with Superpixels
, Tighe et al. additionally consider SIFT histograms, the mask reduced on a 8 x 8 image, boundign box shape, and color thumbnails.
In Class Segmentation and Object Localization with Superpixel Neighborhoods
, Fulkerson et al. also consider features from neighboring superpixels.
Based on the superpixels, you can still apply a simple merging-scheme in order to reduce the number of superpixels. Simple merging by color histograms might already be useful for your tasks. Otherwise you can additionally use edge information in between superpixels for merging.
You don't need to restrict yourself to just 1 feature vector. You could try multiple feature vectors (from the list you already have), and feed them to classifiers based on multiple kernel learning (MKL). MKL has shown to improve the performance over a single feature approach, and one of my favourite MKL techniques is VBpMKL.
If you have time, I would suggest you try one or more the following features, which can capture the features of interest:
Haralick texture features
Histogram oriented gradient features
Gabor filters
SIFT
patch-wise RGB means
My project is to create a software that recognizes certain objects like an apple or a coin etc. I want to use Kinect. My question is: Do I need to have a machine learning algorithm like haar classifier to recognize a object or kinect itself can do that?
Kinect itself cannot recognize objects. It will give you a dense depth map. Then you can use the depth features along with some simple features (in your case, maybe color features or gradient features would do the job). Those features you input to a classifier (SVM or Random Forest for example) to train the system. You use the trained model for testing on new samples.
Regarding Haar features, I think they could do the job but you would need a sufficiently large database of features. It all depends on what you want to detect. In the case of an apple and a coin, just color would suffice.
Refer this paper to get an idea how to perform human pose recognition using Kinect camera. You just have to pay attention to their depth features and their classifiers. Do not apply their approach directly. Your problem is simpler.
Edit: simple gradient orientations histogram
Gradient orientations can give you a coarse idea about the shape of the object (It is not a shape-feature to be specific, better shape features exist, but this one is extremely fast to calculate).
Code snippet:
%calculate gradient
[dx,dy] = gradient(double(img));
A = (atan(dy./(dx+eps))*180)/pi; %eps added to avoid division by zero.
A will contain orientation for each pixel. Segment your original image according to the depth values. For a segment having similar depth values, calculate color histogram. Extract the pixel orientations corresponding to that region, call it A_r. calculate a 9-bin (you can have more bins. Nine bins mean each bin will contain 180/9=20 degrees) histogram. Concatenate the color features and the gradient histogram. Do this for sufficient number of leaves. Then you can give this to a classifier for training.
Edit: This is a reply to a comment below.
Regarding MaxDepth parameter in opencv_traincascade
The documentation says, "Maximal depth of a weak tree. A decent choice is 1, that is case of stumps". When you perform binary classification, it takes a form of:
if yourFeatureValue>=learntThresh
class=1;
else
class=0;
end
The above type of classifier which performs thresholding on a single feature value (a scalar) is called decision stumps. There is only one split between positive and negative class (therefore maxDepth is one). For example, it would work in following scenario. Imagine you have a 1-D feature:
f=[1 2 3 4 -1 -2 -3 -4]
First 4 are class 1, rest are class 0. Decision stumps would get 100% accuracy on this data by setting the threshold to zero. Now, imagine a complicated feature space such as:
f=[1 2 3 4 5 6 7 8 9 10 11 12];
First 4 and last 4 are class 1, rest are class 0. Here, you cannot get 100% classification by decision stumps. You need two thresholds/splits. Therefore, you can construct a tree with depth value 2. You will have 2^(2-1)=2 thresholds. For depth=3, you get 4 thresholds, for depth=4, you get 8 thresholds and so on. Here, I assume a tree with a single node has height 1.
You may feel that the more the number of levels, you can achieve more accuracy, but then there is a problem of overfitting (and computation, memory storage etc.). Therefore, you have to set a good value for depth. I usually set it to 3.
I have a lots of images of paper cards of different shades of colors. Like all blues, or all reds, etc. In the images, they are held up to different objects that are of that color.
I want to write a program to compare the color to the shades on the card and choose the closest shade to the object.
however I realize that for future images my camera is going to be subject to lots of different lighting. I think I should convert into HSV space.
I'm also unsure of what type of distance measure I should use. Given some sort of blobs from the cards, I could average over the HSV and simply see which blob's average is the closest.
But I welcome any and all suggestions, I want to learn more about what I can do with OpenCV.
EDIT: A sample
Here I want to compare the filled in red of the 6th dot to see it is actually the shade of the 3rd paper rectangle.
I think one possibility is to do the following:
Color histograms from Hue and Saturation channels
compute the color histogram of the filled circle.
compute color histogram of the bar of paper.
compute a distance using histogram distance measures.
Possibilities here includes:
Chi square,
Earthmover distance,
Bhattacharya distance,
Histogram intersection etc.
Check this opencv link for details on computing histograms
Check this opencv link for details on the histogram comparisons
Note that when computing the color histograms, convert your images to HSV colorspace as you yourself suggested. Then, there is 2 things to note here.
[EDITED to make this a suggestion rather than a must do because I believe V channel might be necessary to differentiate the shades. Anyhow, try both and go with the one giving better result. Apologies if this sent you off track.] One possibility is to only use the Hue and Saturation channels i.e. you build a 2D
histogram rather than a 3D one consisting of values from the hue and
saturation channels. The reason for doing so is that the variation
in lighting is most felt in the V channel. This, together with the
use of histograms, should hopefully make your comparisons more
robust to lighting changes. There is some discussion on ignoring the
V channel when building color histograms in this post here. You
might find the references therein useful.
Normalize the histograms using the opencv functions. This is to
account for the different sizes of the patches of material (your
small circle vs the huge color bar has different number of pixels).
You might also wish to consider performing some form of preprocessing to "stretch" the color in the image e.g. using histogram equalization or an "S curve" mapping so that the different shades of color get better separated. Then compute the color histograms on this processed image. Keep the information for the mapping and perform it on new test samples before computing their color histograms.
Using ML for classification
Besides simply computing the distance and taking the closest one (i.e. a 1 nearest neighbor classifier), you might want to consider training a classifier to do the classification for you. One reason for doing so is that the training of the classifier will hopefully learn some way to differentiate between the different shades of hues since it has access to them during the training phase and is required to differentiate them. Notice that simply computing a distance, i.e. your suggested method, may not have this property. Hopefully this will give better classification.
The features use in the training can still be the color histograms that I mention above. That is, you compute color histograms as described above for your training samples and pass this to the classifier along with their class (i.e. which shade they are). Then, when you wish to classify a test sample, you likewise compute a color histogram and pass it to the classifier and it will return you the class (shade of color in your case) the color of the test sample belongs to.
Potential problems when training a classifier rather than using a simple distance comparison based approach as you have suggested is partly the added complexity of the program as well as potentially getting bad results when the training data is not good. There is also going to be a lot of parameter tuning involved to get it to work well.
See the opencv machine learning tutorials here for more details. Note that in the examples in the link, the classifier only differentiate between 2 classes whereas you have more than 2 shades of color. This is not a problem as the classifiers in general can work with more than 2 classes.
Hope this helps.