I'm very new to scikit learn and machine learning in general.
I am currently designing a SVM to predict if a specific amino acid sequence will be cut by a protease. So far the the SVM method seems to be working quite well:
I'd like to visualize the distance between the two categories (cut and uncut), so I'm trying to use the linear discrimination analysis, which is similar to the principal component analysis, using the following code:
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
lda = LinearDiscriminantAnalysis(n_components=2)
targs = np.array([1 if _ else 0 for _ in XOR_list])
DATA = np.array(data_list)
X_r2 = lda.fit(DATA, targs).transform(DATA)
plt.figure()
for c, i, target_name in zip("rg", [1, 0],["Cleaved","Not Cleaved"]):
plt.scatter(X_r2[targs == i], X_r2[targs == i], c=c, label=target_name)
plt.legend()
plt.title('LDA of cleavage_site dataset')
However, the LDA is only giving a 1D result
In: print X_r2[:5]
Out: [[ 6.74369996]
[ 4.14254941]
[ 5.19537896]
[ 7.00884032]
[ 3.54707676]]
However, the pca analysis will give 2 dimensions with the data I am inputting:
pca = PCA(n_components=2)
X_r = pca.fit(DATA).transform(DATA)
print X_r[:5]
Out: [[ 0.05474151 0.38401203]
[ 0.39244191 0.74113729]
[-0.56785236 -0.30109694]
[-0.55633116 -0.30267444]
[ 0.41311866 -0.25501662]]
edit: here is a link to two google-docs with the input data. I am not using the sequence information, just the numerical information that follows. The files are split up between positive and negative control data.
Input data:
file1
file2
LDA is not a dimensionality reduction technique. LDA is a classifier, the fact that people visualize decision function is just a side effect, and - unfortunately for your use case - decision function for binary problem (2 classes) is 1 dimensional. There is nothing wrong with your code, this is how every single decision function of a linear binary classifier looks like.
In general for 2 classes you get at most 1-dim projection and for K>2 classes you can get up to K-dim projection. With other decomposition techniques (like 1 vs 1) you can go up to K(K-1)/2 but again, only for more than 2 classes.
Related
I have recently started experimenting with OneClassSVM ( using Sklearn ) for unsupervised learning and I followed
this example .
I apologize for the silly questions But I’m a bit confused about two things :
Should I train my svm on both regular example case as well as the outliers , or the training is on regular examples only ?
Which of labels predicted by the OSVM and represent outliers is it 1 or -1
Once again i apologize for those questions but for some reason i cannot find this documented anyware
As this example you reference is about novelty-detection, the docs say:
novelty detection:
The training data is not polluted by outliers, and we are interested in detecting anomalies in new observations.
Meaning: you should train on regular examples only.
The approach is based on:
Schölkopf, Bernhard, et al. "Estimating the support of a high-dimensional distribution." Neural computation 13.7 (2001): 1443-1471.
Extract:
Suppose you are given some data set drawn from an underlying probability distribution P and you want to estimate a “simple” subset S of input space such that the probability that a test point drawn from P lies outside of S equals some a priori specied value between 0 and 1.
We propose a method to approach this problem by trying to estimate a function f that is positive on S and negative on the complement.
The above docs also say:
Inliers are labeled 1, while outliers are labeled -1.
This can also be seen in your example code, extracted:
# Generate some regular novel observations
X = 0.3 * np.random.randn(20, 2)
X_test = np.r_[X + 2, X - 2]
...
# all regular = inliers (defined above)
y_pred_test = clf.predict(X_test)
...
# -1 = outlier <-> error as assumed to be inlier
n_error_test = y_pred_test[y_pred_test == -1].size
I'm working on implementing an interface between a TensorFlow basic LSTM that's already been trained and a javascript version that can be run in the browser. The problem is that in all of the literature that I've read LSTMs are modeled as mini-networks (using only connections, nodes and gates) and TensorFlow seems to have a lot more going on.
The two questions that I have are:
Can the TensorFlow model be easily translated into a more conventional neural network structure?
Is there a practical way to map the trainable variables that TensorFlow gives you to this structure?
I can get the 'trainable variables' out of TensorFlow, the issue is that they appear to only have one value for bias per LSTM node, where most of the models I've seen would include several biases for the memory cell, the inputs and the output.
Internally, the LSTMCell class stores the LSTM weights as a one big matrix instead of 8 smaller ones for efficiency purposes. It is quite easy to divide it horizontally and vertically to get to the more conventional representation. However, it might be easier and more efficient if your library does the similar optimization.
Here is the relevant piece of code of the BasicLSTMCell:
concat = linear([inputs, h], 4 * self._num_units, True)
# i = input_gate, j = new_input, f = forget_gate, o = output_gate
i, j, f, o = array_ops.split(1, 4, concat)
The linear function does the matrix multiplication to transform the concatenated input and the previous h state into 4 matrices of [batch_size, self._num_units] shape. The linear transformation uses a single matrix and bias variables that you're referring to in the question. The result is then split into different gates used by the LSTM transformation.
If you'd like to explicitly get the transformations for each gate, you can split that matrix and bias into 4 blocks. It is also quite easy to implement it from scratch using 4 or 8 linear transformations.
I have a text classification task. By now i only tagged a corpus and extracted some features in a bigram format (i.e bigram = [('word', 'word'),...,('word', 'word')]. I would like to classify some text, as i understand SVM algorithm only can receive vectors in orther to classify, so i use some vectorizer in scikit as follows:
bigram = [ [('load', 'superior')
('point', 'medium'), ('color', 'white'),
('the load', 'tower')]]
fh = FeatureHasher(input_type='string')
X = fh.transform(((' '.join(x) for x in sample)
for sample in bigram))
print X
the output is a sparse matrix:
(0, 226456) -1.0
(0, 607603) -1.0
(0, 668514) 1.0
(0, 715910) -1.0
How can i use the previous sparse matrix X to classify with SVC?, assuming that i have 2 classes and a train and test sets.
As others have pointed out, your matrix is just a list of feature vectors for the documents in your corpus. Use these vectors as features for classification. You just need classification labels y and then you can use SVC().fit(X, y).
But... the way that you have asked this makes me think that maybe you don't have any classification labels. In this case, I think you want to be doing clustering rather than classification. You could use one of the clustering algorithms to do this. I suggest sklearn.cluster.MiniBatchKMeans to start. You can then output the top 5-10 words for each cluster and form labels from those.
I am currently working on a project where I have to extract the facial expression of a user (only one user at a time from a webcam) like sad or happy.
My method for classifying facial expressions is:
Use opencv to detect the face in the image
Use ASM and stasm to get the facial feature point
and now i'm trying to do facial expression classification
is SVM a good option ? and if it is how can i start with SVM :
how i'm going to train svm for every emotions using this landmarks ?
Yes, SVMs have been numerously shown to perform well in this task. There have been dozens (if not hundreads) of papers describing such procedures.
For example:
Simple paper
Longer paper
Poster about it
More complex example
Some basic sources of the SVMs themselves can be obtained on http://www.support-vector-machines.org/ (like books titles, software links etc.)
And if you are just interested in using them rather then understanding you can get one of basic libraries:
libsvm http://www.csie.ntu.edu.tw/~cjlin/libsvm/
svmlight http://svmlight.joachims.org/
if you are already using opencv,i suggest you use the built in svm implementation, training/saving/loading in python is as follow. c++ has corresponding api to do the same in about the same amount of code. it also has 'train_auto' to find best parameters
import numpy as np
import cv2
samples = np.array(np.random.random((4,5)), dtype = np.float32)
labels = np.array(np.random.randint(0,2,4), dtype = np.float32)
svm = cv2.SVM()
svmparams = dict( kernel_type = cv2.SVM_LINEAR,
svm_type = cv2.SVM_C_SVC,
C = 1 )
svm.train(samples, labels, params = svmparams)
testresult = np.float32( [svm.predict(s) for s in samples])
print samples
print labels
print testresult
svm.save('model.xml')
loaded=svm.load('model.xml')
and output
#print samples
[[ 0.24686454 0.07454421 0.90043277 0.37529686 0.34437731]
[ 0.41088378 0.79261768 0.46119651 0.50203663 0.64999193]
[ 0.11879266 0.6869216 0.4808321 0.6477254 0.16334397]
[ 0.02145131 0.51843268 0.74307418 0.90667248 0.07163303]]
#print labels
[ 0. 1. 1. 0.]
#print testresult
[ 0. 1. 1. 0.]
so you provide the n flattened shape models as samples and n labels and you are good to go. you probably dont even need the asm part, just apply some filters which are sensitive to orientation like sobel or gabor and concatenate the matrices and flatten them then feed them directly to svm. you probably can get maybe 70-90% accuracy.
as someone said cnn are an alternative to svms.here's some links that implement lenet5. so far,i find svms much simpler to get started.
https://github.com/lisa-lab/DeepLearningTutorials/
http://www.codeproject.com/Articles/16650/Neural-Network-for-Recognition-of-Handwritten-Digi
-edit-
landmarks are just n (x,y) vectors right? so why dont you try put them into a array of size 2n and simply feed them directly to the code above?
for example,3 training samples of 4 land marks (0,0),(10,10),(50,50),(70,70)
samples = [[0,0,10,10,50,50,70,70],
[0,0,10,10,50,50,70,70],
[0,0,10,10,50,50,70,70]]
labels=[0.,1.,2.]
0=happy
1=angry
2=disgust
You could check this code to get idea how this could be done using SVM.
You can find algorithm explained here
Is it possible to use histogram intersection /chi sauare kernels in LIBLINEAR?
My problem is I have a feature vector of size 5000 all are histogram features. I dont know how to train/ test with SVM.
How can I train this using SVM?
libSVM supports for 4 types of kernels.
0 -- linear: u'*v
1 -- polynomial: (gamma*u'*v + coef0)^degree
2 -- radial basis function: exp(-gamma*|u-v|^2)
3 -- sigmoid: tanh(gamma*u'*v + coef0)
LibSVM supports for linear kernel in that case what is the difference between libSVM and linearSVM?
No, you can't use custom kernels in liblinear.
To do what you want to do, you'll need to use LibSVM and the "precomputed kernel" option, where you supply the gram matrix (this is described in the LibSVM README).
In the case of linear kernels, LibSVM and LibLinear produce similar results. The author says this:
Their predictions are similar but hyperplanes are different. Libsvm
solves L1-loss SVM, but liblinear solves L2-regularized logistic
regression and L2-loss SVM.
A bit late, but might help others: machine-learning package scikit-learn (http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.chi2_kernel.html#sklearn.metrics.pairwise.chi2_kernel) offers at least the chi2-Kernel.
You can use linear SVM solver only if you explicite map your features into non linear feature space, i recommended to read:
"Max-Margin additive classifiers for detection" - http://www.cs.berkeley.edu/~smaji/papers/mcd-free-lunch-iccv-09.pdf
"Random features for large-scale kernel machines" - http://berkeley.intel-research.net/arahimi/papers/rahimi-recht-random-features.pdf
"Efficient Additive Kernels via Explicit Feature Maps" - http://www.vlfeat.org/~vedaldi/assets/pubs/vedaldi11efficient.pdf
I just use chi2 kernel in Libsvm these day .I paste the code here ,hope it can be useful.
function [chi2_ans]=chi2_kernel(x,y)
f=#(x,y) 1-sum(((x'-y').*(x'-y'))./(x'+y'+eps)*2);
[m, ~]=size(x);
chi2_ans=zeros(size(x,1),size(y,1));
for i=1:size(x,1)
veci=x(i,:);
for j=1:size(y,1)
vecj=y(j,:);
chi2_ans(i,j)=f(veci,vecj);
end
end
end
and use it .
function [ acc ] = singleChi2Kernel( trainData,testData,trainLabel,testLabel )
numTrain = size(trainData,1);
numTest = size(testData,1);
%# compute kernel matrices between every pairs of (train,train) and
%# (test,train) instances and include sample serial number as first column
K = [ (1:numTrain)' , chi2_kernel(trainData,trainData) ];
KK = [ (1:numTest)' , chi2_kernel(testData,trainData) ];
%# train and test
model = svmtrain(trainLabel, K, '-t 4 ');
[predClass, acc, decVals] = svmpredict(testLabel, KK, model);
%# confusion matrix
%C = confusionmat(testClass,predClass)
end
the code from link