I want to count unique perons in video.
I used Mean square error , Sift algorithm and pre-trained model (Deep_ranking)
https://github.com/USCDataScience/Image-Similarity-Deep-Ranking
to check similarity but I doesn't get good results.
What is the best way to check similarity between persons is for you to decide, because it depends on your task.
But you may get interested in "person re-identification" problem.
https://github.com/bismex/Awesome-person-re-identification
Also you may want to try metric learning - approach, when you get feature vector for each sample (image, for example) using neural network (or other ML algorithm).
Usually you can just take neural network for classification (for example, VGG or ResNet), train it on N classes, where N is a number of persons in your dataset, and then take output from the layer before class predictions. So you get feature vectors for your images and then you can compute euclidean or other distance. Images with small distance probably can be from the same person, images with large distance - from different persons.
I personally tried this solution:
https://elib.dlr.de/116408/1/WACV2018.pdf
https://github.com/nwojke/cosine_metric_learning
Related
I am working on face recognition project using deep learning architecture to classify the images into respective classes. The output of network at softmax layer is the predicted class label and the output of last but one layer at the dense layer is a feature representation of the input image. Here the feature vector is a 1-D matrix of size 1000 for each image. Predicting classes is recognition type problem, but I'm interested in verification problem.
So given two sample images, I need to compare the similarity/dissimilarity score between two given images using their feature representations. If the match score is greater than the threshold then it's a hit else no hit. Please let me know if there are any standard approaches?
Example of similar faces (which should ideally generate matchscore>threshold): https://3c1703fe8d.site.internapcdn.net/newman/gfx/news/hires/2014/yvyughbujh.jpg
Your project has two solutions:
Train your own network (using pretrained one) with output in 1000 classes. This approach is not the simplest one because of the necessity of having enough (say huge) amount of data for each class, approximately 1000 samples per class.
Another approach is to use Distance Metrics Learning. By this "distance" we usually mean Euclidean norm. This approach is much wider and deeper than just extract features and match them to the nearest one. Try to search for it.
Good luck!
I have a set of labeled training data, and I am training a ML algorithm to predict the label. However, some of my data points are more important than others. Or, analogously, these points have less uncertainty than the others.
Is there a general method to include an importance-representing weight to each training point in the model? Are there instead some specific models which are capable of this while others are not?
I can imagine duplicating these points (and perhaps smearing their features slightly to avoid exact duplicates), or downsampling the less important points. Is there a more elegant way to approach this problem?
Scikit-learn allows you to pass an array of sample weights while fitting the model. Vowpal Wabbit (an online ML library) also has this option.
Recently I started to play with tensorflow, while trying to learn the popular algorithms i am in a situation where i need to find similarity between images.
Image A is supplied to the system by me, and userx supplies an image B and the system should retrieve image A to the userx if image B is similar(color and class).
Now i have got few questions:
Do we consider this scenario to be supervised learning? I am asking
because i don't see it as a classification problem(confused!!)
What algorithms i should use to train etc..
Re-training should be done quite often, how should i tackle this
problem so i don't train everytime from scratch( fine-tuning??)
Do we consider this scenario to be supervised learning?
It is supervised learning when you have labels to optimize your model. So for most neural networks, it is supervised.
However, you might also look at the complete task. I guess you don't have any ground truth for image pairs and the "desired" similarity value your model should output?
One way to solve this problem which sounds inherently unsupervised is to take a CNN (convolutional neural network) trained (in a supervised way) on the 1000 classes of image net. To get the similarity of two images, you could then simply take the euclidean distance of the output probability distribution. This will not lead to excellent results, but is probably a good starter.
What algorithms i should use to train etc..
First, you should define what "similar" means for you. Are two images similar when they contain the same object (classes)? Are they similar if the general color of the image is the same?
For example, how similar are the following 3 pairs of images?
Have a look at FaceNet and search for "Content based image retrieval" (CBIR):
Wikipedia
Google Scholar
This can be a supervised learning. You can classify the images into categories, if two images are in the same categories (or close in a category), you can think of them as similar.
You can use the deep conventional neural networks for imagenet such as inception model. The inception model outputs a probability map for 1000 classes (which is a vector whose values sum to 1). You can calculate the distance of vectors of two images to get their similarity.
On the same page of the inception model, you will also find the instructions to retrain a model: https://github.com/tensorflow/models/tree/master/inception#how-to-fine-tune-a-pre-trained-model-on-a-new-task
I have a collection of documents related to a particular domain and have trained the centroid classifier based on that collection. What I want to do is, I will be feeding the classifier with documents from different domains and want to determine how much they are relevant to the trained domain. I can use the cosine similarity for this to get a numerical value but my question is what is the best way to determine the threshold value?
For this, I can download several documents from different domains and inspect their similarity scores to determine the threshold value. But is this the way to go, does it sound statistically good? What are the other approaches for this?
Actually there is another issue with centroids in sparse vectors. The problem is that they usually are significantly less sparse than the original data. For examples, this increases computation costs. And it can yield vectors that are themselves actually atypical because they have a different sparsity pattern. This effect is similar to using arithmetic means of discrete data: say the mean number of doors in a car is 3.4; yet obviously no car exists that actually has 3.4 doors. So in particular, there will be no car with an euclidean distance of less than 0.4 to the centroid! - so how "central" is the centroid then really?
Sometimes it helps to use medoids instead of centroids, because they actually are proper objects of your data set.
Make sure you control such effects on your data!
A simple method to try would be to employ various machine-learning algorithms - and in particular, tree-based ones - on the distances from your centroids.
As mentioned in another answer(#Anony-Mousse), this won't necessarily provide you with good or usable answers, but it just might. Using a ML framework for this procedure, E.g. WEKA, will also help you with estimating your accuracy in a more rigorous manner.
Here are the steps to take, using WEKA:
Generate a train set by finding a decent amount of documents representing each of your classes (to get valid estimations, I'd recommend at least a few dozens per class)
Calculate the distance from each document to each of your centroids.
Generate a feature vector for each such document, composed of the distances from this document to the centroids. You can either use a single feature - the distance to the nearest centroid; or use all distances, if you'd like to try a more elaborate thresholding scheme. For example, if you chose the simpler method of using a single feature, the vector representing a document with a distance of 0.2 to the nearest centroid, belonging to class A would be: "0.2,A"
Save this set in ARFF or CSV format, load into WEKA, and try classifying, e.g. using a J48 tree.
The results would provide you with an overall accuracy estimation, with a detailed confusion matrix, and - of course - with a specific model, e.g. a tree, you can use for classifying additional documents.
These results can be used to iteratively improve the models and thresholds by collecting additional train documents for problematic classes, either by recreating the centroids or by retraining the thresholds classifier.
I have a large set of plant images labeled with the botanical name. What would be the best algorithm to use to train on this dataset in order to classify an unlabel photo? The photos are processed so that 100% of the pixels contain the plant (e.g. either closeups of the leaves or bark), so there are no other objects/empty-space/background that the algorithm would have to filter out.
I've already tried generating SIFT features for all the photos and feeding these (feature,label) pairs to a LibLinear SVM, but the accuracy was a miserable 6%.
I also tried feeding this same data to a few Weka classifiers. The accuracy was a little better (25% with Logistic, 18% with IBk), but Weka's not designed for scalability (it loads everything into memory). Since the SIFT feature dataset is a several million rows, I could only test Weka with a random 3% slice, so it's probably not representative.
EDIT: Some sample images:
Normally, you would not train on the SIFT features directly. Cluster them (using k-means) and then train on the histogram of cluster membership identifiers (i.e., a k-dimensional vector, which counts, at position i, how many features were assigned to the i-th cluster).
This way, you obtain a single output per image (and a single, k-dimensional, feature vector).
Here's the quasi-code (using mahotas and milk in Pythonn):
from mahotas.surf import surf
from milk.unsupervised.kmeans import kmeans,assign_centroids
import milk
# First load your data:
images = ...
labels = ...
local_features = [surfs(im, 6, 4, 2) for im in imgs]
allfeatures = np.concatenate(local_features)
_, centroids = kmeans(allfeatures, k=100)
histograms = []
for ls in local_features:
hist = assign_centroids(ls, centroids, histogram=True)
histograms.append(hist)
cmatrix, _ = milk.nfoldcrossvalidation(histograms, labels)
print "Accuracy:", (100*cmatrix.trace())/cmatrix.sum()
This is a fairly hard problem.
You can give BoW model a try.
Basically, you extract SIFT features on all the images, then use K-means to cluster the features into visual words. After that, use the BoW vector to train you classifiers.
See the Wikipedia article above and the references papers in that for more details.
You probably need better alignment, and probably not more features. There is no way you can get acceptable performance unless you have correspondences. You need to know what points in one leaf correspond to points on another leaf. This is one of the "holy grail" problems in computer vision.
People have used shape context for this problem. You should probably look at this link. This paper describes the basic system behind leafsnap.
You can implement the BoW model according to this Bag-of-Features Descriptor on SIFT Features with OpenCV. It is a very good tutorial to implement the BoW model in OpenCV.