Training sets for AdaBoost algorithm - image-processing

How do you find the negative and positive training data sets of Haar features for the AdaBoost algorithm? So say you have a certain type of blob that you want to locate in an image and there are several of them in your entire array - how do you go about training it? I'd appreciate a nontechnical explanation as much as possible. I'm new to this. Thanks.

First, AdaBoost does not necessarily have anything to do with Haar features. AdaBoost is a learning algorithm that combines weak learners to form a strong learner. Haar features are just a type of data on which an AdaBoost algorithm can learn.
Second, the best way to get them is to prearrange your data. So, if you want to do facial recognition a la Viola and Jones, you'll want to mark the faces in your images in a mask/overlay image. When you're training, you select samples from the image, as well as whether the sample you select is positive or negative. That positivity/negativity comes from your previous marking of the face (or whatever) in the image.
You'll have to make the actual implementation yourself, but you can use existing projects to either guide you, or you can modify their projects.

Related

Image Classification using Single Class Dataset using Transfer Learning

I only have around 1000 images of vehicle. I need to train a model that can identify if the image is vehicle or not-vehicle. I do not have a dataset for not-vehicle, as it could be anything besides vehicle.
I guess the best method for this would be to apply transfer learning. I am trying to train data on a pre-trained VGG19 Model. But still, I am unaware on how to train a model with just vehicle images without any non-vehicle images. I am not being able to classify it.
I am new to ML Overall, Any solution based on practical implementation will be highly appreciated.
You are right about transfer learning approach. Have a look a this article, it is exactly about going from multi-class to binary classification with transfer learning - https://medium.com/#mandygu/seefood-creating-a-binary-classifier-using-transfer-learning-da751db7cf9c
You can try using pretrained model and take the output. You might need to apply dimensionality reduction e.g. PCA, to get a more managable size input. After that you can train novelty detection model to identify whether the output is different than your training set.
Refer to this example: https://github.com/J-Yash/Hotdog-Not-Hotdog
Hope this helps.
This is a binary classification problem: whether the input is a vehicle or not.
If you are new to ML, I would suggest you should start implementing basic binary classifiers like Logistic Regression, Support Vector Machines before jumping to Convolutional Neural Networks (CNNs).
I am providing some links for the binary classification problem implementations using different algorithms. I hope this would help.
Logistic Regression: https://github.com/JB1984/Logistic-Regression-Cat-Classifier
SVM: https://github.com/Witsung/SVM-Fruit-Image-Classifier
CNN: https://github.com/A-Jatin/CNN-implementation-for-binary-image-classification

Machine Learning: Weighting Training Points by Importance

I have a set of labeled training data, and I am training a ML algorithm to predict the label. However, some of my data points are more important than others. Or, analogously, these points have less uncertainty than the others.
Is there a general method to include an importance-representing weight to each training point in the model? Are there instead some specific models which are capable of this while others are not?
I can imagine duplicating these points (and perhaps smearing their features slightly to avoid exact duplicates), or downsampling the less important points. Is there a more elegant way to approach this problem?
Scikit-learn allows you to pass an array of sample weights while fitting the model. Vowpal Wabbit (an online ML library) also has this option.

Obtain negative results from a machine learning algorithm

I have a set of images of a particular object. I want to find if some of these has anomalies with a machine learning algorithm. For example if I have many photos of glasses I want to find if one of these is broken or has something anomalous. Something like this:
GOOD!!
BAD!!
(Obviously I will use the same kind of glasses...)
The problem is that I don't know every negative situation, so, for training, I have only positive images.
In other words I want an algorithm that recognize if an image has something different from the dataset. Do you have any suggestion?
In particular is there a way to use convolutional neural network?
What you are looking for is usually called anomaly, outlier, or novelty detection. You have lots of examples of what your data should look like, and you want to know when something doesn't look like your data.
A good approach for this problem, since you are using images, you can get a feature vectorized version using a pre-trained CNN on image net. Then you can use an anomaly detector on that feature set. The isolation forest should be an easier one to get working.
This is a typical Classification problem. I do not understand why you need CNN for this ......
My suggestion would be to build/train a classification model
comprising of only GOOD images of glass. Here you would possibly
have all kinds of glasses that are intact with a regular shape.
If the model encounters anything other than GOOD images, it will
classify those as BAD images. This so called BAD images may
include cracked/broken glasses having an irregular shape.
Another option that might work is to use an autoencoder.
Autoencoders are unsupervised neural networks with bottleneck architecture that try to reconstruct its own input.
We could train a deep convolutional autoencoder with examples of good glasses so that it gets specialized in reconstructing those type of images. You don't need to train autoencoder with bad glasses.
Therefore I would expect the trained autoencoder to produce low error rate for good glasses and high error rate for bad glasses. Error rate could be measured with MSE based on the difference between the reconstructed and original values (pixels).
From the trained autoencoder you can plot the MSEs for good vs bad glasses to help you define the right threshold. Or you can also try statistic thresholds such as: avg + 2*std, median + 2*MAD, etc.
Autoencoder details:
http://ufldl.stanford.edu/tutorial/unsupervised/Autoencoders/
Deep autoencoder for images:
https://cds.cern.ch/record/2209085/files/Outlier%20detection%20using%20autoencoders.%20Olga%20Lyudchick%20(NMS).pdf

Image similarity detection with TensorFlow

Recently I started to play with tensorflow, while trying to learn the popular algorithms i am in a situation where i need to find similarity between images.
Image A is supplied to the system by me, and userx supplies an image B and the system should retrieve image A to the userx if image B is similar(color and class).
Now i have got few questions:
Do we consider this scenario to be supervised learning? I am asking
because i don't see it as a classification problem(confused!!)
What algorithms i should use to train etc..
Re-training should be done quite often, how should i tackle this
problem so i don't train everytime from scratch( fine-tuning??)
Do we consider this scenario to be supervised learning?
It is supervised learning when you have labels to optimize your model. So for most neural networks, it is supervised.
However, you might also look at the complete task. I guess you don't have any ground truth for image pairs and the "desired" similarity value your model should output?
One way to solve this problem which sounds inherently unsupervised is to take a CNN (convolutional neural network) trained (in a supervised way) on the 1000 classes of image net. To get the similarity of two images, you could then simply take the euclidean distance of the output probability distribution. This will not lead to excellent results, but is probably a good starter.
What algorithms i should use to train etc..
First, you should define what "similar" means for you. Are two images similar when they contain the same object (classes)? Are they similar if the general color of the image is the same?
For example, how similar are the following 3 pairs of images?
Have a look at FaceNet and search for "Content based image retrieval" (CBIR):
Wikipedia
Google Scholar
This can be a supervised learning. You can classify the images into categories, if two images are in the same categories (or close in a category), you can think of them as similar.
You can use the deep conventional neural networks for imagenet such as inception model. The inception model outputs a probability map for 1000 classes (which is a vector whose values sum to 1). You can calculate the distance of vectors of two images to get their similarity.
On the same page of the inception model, you will also find the instructions to retrain a model: https://github.com/tensorflow/models/tree/master/inception#how-to-fine-tune-a-pre-trained-model-on-a-new-task

Which classification algorithm to choose?

I would like to classify text documents into four categories. Also I have lot of samples which are already classified that can be used for training. I would like the algorithm to learn on the fly.. please suggest an optimal algorithm that works for this requirement.
If by "on the fly" you mean online learning (where training and classification can be interleaved), I suggest the k-nearest neighbor algorithm. It's available in Weka and in the package TiMBL.
A perceptron will also be able to do this.
"Optimal" isn't a well-defined term in this context.
there are several algorithms which can be learned on fly. Examples: k-nearest neighbors, naive Bayes, neural networks. You can try how appropriate each of these methods are on a sample corpus.
Since you have unlabeled data you might want to use a model where this helps. The first thing that comes to my mind is nonlinear NCA: Learning a Nonlinear Embedding by Preserving
Class Neighbourhood Structure, (Salakhutdinov, Hinton).
Well....I have to say that document classification is kind of different what you guys are thinking.
Typically, in document classification, after preprocessing, the test data is always extremely huge, for example, O(N^2)...Therefore it might be too computationally expensive.
The another typical classifier that came into my mind is discriminant classifier...which doesn't need the generative model for your dataset. After training, you have to do is to put your single entry to the algorithm, and it is gonna be classified.
Good luck with this. For example, you can check E. Alpadin's book, Introduction to Machine Learning.

Resources