Is it possible to use histogram intersection /chi sauare kernels in LIBLINEAR?
My problem is I have a feature vector of size 5000 all are histogram features. I dont know how to train/ test with SVM.
How can I train this using SVM?
libSVM supports for 4 types of kernels.
0 -- linear: u'*v
1 -- polynomial: (gamma*u'*v + coef0)^degree
2 -- radial basis function: exp(-gamma*|u-v|^2)
3 -- sigmoid: tanh(gamma*u'*v + coef0)
LibSVM supports for linear kernel in that case what is the difference between libSVM and linearSVM?
No, you can't use custom kernels in liblinear.
To do what you want to do, you'll need to use LibSVM and the "precomputed kernel" option, where you supply the gram matrix (this is described in the LibSVM README).
In the case of linear kernels, LibSVM and LibLinear produce similar results. The author says this:
Their predictions are similar but hyperplanes are different. Libsvm
solves L1-loss SVM, but liblinear solves L2-regularized logistic
regression and L2-loss SVM.
A bit late, but might help others: machine-learning package scikit-learn (http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.chi2_kernel.html#sklearn.metrics.pairwise.chi2_kernel) offers at least the chi2-Kernel.
You can use linear SVM solver only if you explicite map your features into non linear feature space, i recommended to read:
"Max-Margin additive classifiers for detection" - http://www.cs.berkeley.edu/~smaji/papers/mcd-free-lunch-iccv-09.pdf
"Random features for large-scale kernel machines" - http://berkeley.intel-research.net/arahimi/papers/rahimi-recht-random-features.pdf
"Efficient Additive Kernels via Explicit Feature Maps" - http://www.vlfeat.org/~vedaldi/assets/pubs/vedaldi11efficient.pdf
I just use chi2 kernel in Libsvm these day .I paste the code here ,hope it can be useful.
function [chi2_ans]=chi2_kernel(x,y)
f=#(x,y) 1-sum(((x'-y').*(x'-y'))./(x'+y'+eps)*2);
[m, ~]=size(x);
chi2_ans=zeros(size(x,1),size(y,1));
for i=1:size(x,1)
veci=x(i,:);
for j=1:size(y,1)
vecj=y(j,:);
chi2_ans(i,j)=f(veci,vecj);
end
end
end
and use it .
function [ acc ] = singleChi2Kernel( trainData,testData,trainLabel,testLabel )
numTrain = size(trainData,1);
numTest = size(testData,1);
%# compute kernel matrices between every pairs of (train,train) and
%# (test,train) instances and include sample serial number as first column
K = [ (1:numTrain)' , chi2_kernel(trainData,trainData) ];
KK = [ (1:numTest)' , chi2_kernel(testData,trainData) ];
%# train and test
model = svmtrain(trainLabel, K, '-t 4 ');
[predClass, acc, decVals] = svmpredict(testLabel, KK, model);
%# confusion matrix
%C = confusionmat(testClass,predClass)
end
the code from link
Related
I have recently started experimenting with OneClassSVM ( using Sklearn ) for unsupervised learning and I followed
this example .
I apologize for the silly questions But I’m a bit confused about two things :
Should I train my svm on both regular example case as well as the outliers , or the training is on regular examples only ?
Which of labels predicted by the OSVM and represent outliers is it 1 or -1
Once again i apologize for those questions but for some reason i cannot find this documented anyware
As this example you reference is about novelty-detection, the docs say:
novelty detection:
The training data is not polluted by outliers, and we are interested in detecting anomalies in new observations.
Meaning: you should train on regular examples only.
The approach is based on:
Schölkopf, Bernhard, et al. "Estimating the support of a high-dimensional distribution." Neural computation 13.7 (2001): 1443-1471.
Extract:
Suppose you are given some data set drawn from an underlying probability distribution P and you want to estimate a “simple” subset S of input space such that the probability that a test point drawn from P lies outside of S equals some a priori specied value between 0 and 1.
We propose a method to approach this problem by trying to estimate a function f that is positive on S and negative on the complement.
The above docs also say:
Inliers are labeled 1, while outliers are labeled -1.
This can also be seen in your example code, extracted:
# Generate some regular novel observations
X = 0.3 * np.random.randn(20, 2)
X_test = np.r_[X + 2, X - 2]
...
# all regular = inliers (defined above)
y_pred_test = clf.predict(X_test)
...
# -1 = outlier <-> error as assumed to be inlier
n_error_test = y_pred_test[y_pred_test == -1].size
I'm very new to scikit learn and machine learning in general.
I am currently designing a SVM to predict if a specific amino acid sequence will be cut by a protease. So far the the SVM method seems to be working quite well:
I'd like to visualize the distance between the two categories (cut and uncut), so I'm trying to use the linear discrimination analysis, which is similar to the principal component analysis, using the following code:
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
lda = LinearDiscriminantAnalysis(n_components=2)
targs = np.array([1 if _ else 0 for _ in XOR_list])
DATA = np.array(data_list)
X_r2 = lda.fit(DATA, targs).transform(DATA)
plt.figure()
for c, i, target_name in zip("rg", [1, 0],["Cleaved","Not Cleaved"]):
plt.scatter(X_r2[targs == i], X_r2[targs == i], c=c, label=target_name)
plt.legend()
plt.title('LDA of cleavage_site dataset')
However, the LDA is only giving a 1D result
In: print X_r2[:5]
Out: [[ 6.74369996]
[ 4.14254941]
[ 5.19537896]
[ 7.00884032]
[ 3.54707676]]
However, the pca analysis will give 2 dimensions with the data I am inputting:
pca = PCA(n_components=2)
X_r = pca.fit(DATA).transform(DATA)
print X_r[:5]
Out: [[ 0.05474151 0.38401203]
[ 0.39244191 0.74113729]
[-0.56785236 -0.30109694]
[-0.55633116 -0.30267444]
[ 0.41311866 -0.25501662]]
edit: here is a link to two google-docs with the input data. I am not using the sequence information, just the numerical information that follows. The files are split up between positive and negative control data.
Input data:
file1
file2
LDA is not a dimensionality reduction technique. LDA is a classifier, the fact that people visualize decision function is just a side effect, and - unfortunately for your use case - decision function for binary problem (2 classes) is 1 dimensional. There is nothing wrong with your code, this is how every single decision function of a linear binary classifier looks like.
In general for 2 classes you get at most 1-dim projection and for K>2 classes you can get up to K-dim projection. With other decomposition techniques (like 1 vs 1) you can go up to K(K-1)/2 but again, only for more than 2 classes.
I have been reading more modern posts about sentiment classification (analysis) such as this.
Taking the IMDB dataset as an example I find that I get a similar accuracy percentage using Doc2Vec (88%), however a far better result using a simple tfidf vectoriser with tri-grams for feature extraction (91%). I think this is similar to Table 2 in Mikolov's 2015 paper.
I thought that by using a bigger data-set this would change. So I re-ran my experiment using a breakdown of 1mill training and 1 mill test from here. Unfortunately, in that case my tfidf vectoriser feature extraction method increased to 93% but doc2vec fell to 85%.
I was wondering if this is to be expected and that others find tfidf to be superior to doc2vec even for a large corpus?
My data-cleaning is simple:
def clean_review(review):
temp = BeautifulSoup(review, "lxml").get_text()
punctuation = """.,?!:;(){}[]"""
for char in punctuation
temp = temp.replace(char, ' ' + char + ' ')
words = " ".join(temp.lower().split()) + "\n"
return words
And I have tried using 400 and 1200 features for the Doc2Vec model:
model = Doc2Vec(min_count=2, window=10, size=model_feat_size, sample=1e-4, negative=5, workers=cores)
Whereas my tfidf vectoriser has 40,000 max features:
vectorizer = TfidfVectorizer(max_features = 40000, ngram_range = (1, 3), sublinear_tf = True)
For classification I experimented with a few linear methods, however found simple logistic regression to do OK...
The example code Mikolov once posted (https://groups.google.com/d/msg/word2vec-toolkit/Q49FIrNOQRo/J6KG8mUj45sJ) used options -cbow 0 -size 100 -window 10 -negative 5 -hs 0 -sample 1e-4 -threads 40 -binary 0 -iter 20 -min-count 1 -sentence-vectors 1 – which in gensim would be similar to dm=0, dbow_words=1, size=100, window=10, hs=0, negative=5, sample=1e-4, iter=20, min_count=1, workers=cores.
My hunch is that optimal values might involve a smaller window and higher min_count, and maybe a size somewhere between 100 and 400, but it's been a while since I've run those experiments.
It can also sometimes help a little to re-infer vectors on the final model, using a larger-than-the-default passes parameter, rather than re-using the bulk-trained vectors. Still, these may just converge on similar performance to Tfidf – they're all dependent on the same word-features, and not very much data.
Going to a semi-supervised approach, where some of the document-tags represent sentiments where known, sometimes also helps.
I have a text classification task. By now i only tagged a corpus and extracted some features in a bigram format (i.e bigram = [('word', 'word'),...,('word', 'word')]. I would like to classify some text, as i understand SVM algorithm only can receive vectors in orther to classify, so i use some vectorizer in scikit as follows:
bigram = [ [('load', 'superior')
('point', 'medium'), ('color', 'white'),
('the load', 'tower')]]
fh = FeatureHasher(input_type='string')
X = fh.transform(((' '.join(x) for x in sample)
for sample in bigram))
print X
the output is a sparse matrix:
(0, 226456) -1.0
(0, 607603) -1.0
(0, 668514) 1.0
(0, 715910) -1.0
How can i use the previous sparse matrix X to classify with SVC?, assuming that i have 2 classes and a train and test sets.
As others have pointed out, your matrix is just a list of feature vectors for the documents in your corpus. Use these vectors as features for classification. You just need classification labels y and then you can use SVC().fit(X, y).
But... the way that you have asked this makes me think that maybe you don't have any classification labels. In this case, I think you want to be doing clustering rather than classification. You could use one of the clustering algorithms to do this. I suggest sklearn.cluster.MiniBatchKMeans to start. You can then output the top 5-10 words for each cluster and form labels from those.
I am currently working on a project where I have to extract the facial expression of a user (only one user at a time from a webcam) like sad or happy.
My method for classifying facial expressions is:
Use opencv to detect the face in the image
Use ASM and stasm to get the facial feature point
and now i'm trying to do facial expression classification
is SVM a good option ? and if it is how can i start with SVM :
how i'm going to train svm for every emotions using this landmarks ?
Yes, SVMs have been numerously shown to perform well in this task. There have been dozens (if not hundreads) of papers describing such procedures.
For example:
Simple paper
Longer paper
Poster about it
More complex example
Some basic sources of the SVMs themselves can be obtained on http://www.support-vector-machines.org/ (like books titles, software links etc.)
And if you are just interested in using them rather then understanding you can get one of basic libraries:
libsvm http://www.csie.ntu.edu.tw/~cjlin/libsvm/
svmlight http://svmlight.joachims.org/
if you are already using opencv,i suggest you use the built in svm implementation, training/saving/loading in python is as follow. c++ has corresponding api to do the same in about the same amount of code. it also has 'train_auto' to find best parameters
import numpy as np
import cv2
samples = np.array(np.random.random((4,5)), dtype = np.float32)
labels = np.array(np.random.randint(0,2,4), dtype = np.float32)
svm = cv2.SVM()
svmparams = dict( kernel_type = cv2.SVM_LINEAR,
svm_type = cv2.SVM_C_SVC,
C = 1 )
svm.train(samples, labels, params = svmparams)
testresult = np.float32( [svm.predict(s) for s in samples])
print samples
print labels
print testresult
svm.save('model.xml')
loaded=svm.load('model.xml')
and output
#print samples
[[ 0.24686454 0.07454421 0.90043277 0.37529686 0.34437731]
[ 0.41088378 0.79261768 0.46119651 0.50203663 0.64999193]
[ 0.11879266 0.6869216 0.4808321 0.6477254 0.16334397]
[ 0.02145131 0.51843268 0.74307418 0.90667248 0.07163303]]
#print labels
[ 0. 1. 1. 0.]
#print testresult
[ 0. 1. 1. 0.]
so you provide the n flattened shape models as samples and n labels and you are good to go. you probably dont even need the asm part, just apply some filters which are sensitive to orientation like sobel or gabor and concatenate the matrices and flatten them then feed them directly to svm. you probably can get maybe 70-90% accuracy.
as someone said cnn are an alternative to svms.here's some links that implement lenet5. so far,i find svms much simpler to get started.
https://github.com/lisa-lab/DeepLearningTutorials/
http://www.codeproject.com/Articles/16650/Neural-Network-for-Recognition-of-Handwritten-Digi
-edit-
landmarks are just n (x,y) vectors right? so why dont you try put them into a array of size 2n and simply feed them directly to the code above?
for example,3 training samples of 4 land marks (0,0),(10,10),(50,50),(70,70)
samples = [[0,0,10,10,50,50,70,70],
[0,0,10,10,50,50,70,70],
[0,0,10,10,50,50,70,70]]
labels=[0.,1.,2.]
0=happy
1=angry
2=disgust
You could check this code to get idea how this could be done using SVM.
You can find algorithm explained here