I am working on a seizure prediction research, and I am using Weka to train my data vectors on.
If I have 10 seizures, each seizure is represented by 5 vectors, which makes a total of 50 vectors corresponding to 10 seizures. However, Weka is treating these vectors as totally independent even though each 5 vectors corresponds to only one seizure.
So how can I let Weka take this into account when performing the Learning?
Related
Can we use KNN and linear SVM classifier for training the model with data which contains 4 features and have 6 classification clusters? Because what i think that linear SVM and KNN are used for linearly separating the data which have two features and have binary classification cluster.
This is possible, you just need to use OneVsAll wrapper, like this one https://scikit-learn.org/stable/modules/generated/sklearn.multiclass.OneVsRestClassifier.html
Essentially you will train 6 classifiers, one per cluster, which seeks to locate one class from all the rest.
I'm creating a character recognition software in Python using scikit-learn. I have a large dataset of images labelled [A-Za-z]. I'm using linear SVM. Training the model using all the samples with 52 different labels is very, very slow.
If I divide my training dataset in 13 sections such that each section has images of only 4 characters, and no image can be a part of more than 1 section, and then train 13 different models.
How can I combine those models together to create a more accurate model? OR if I perform classification of test set on all 13 models and compare individual sample's result on basis of confidence score (selecting the one with with highest score), will it affect the accuracy of the overall model?
It seems what you need to to is some kind of Order Reduction of data.
After the order reduction classify data into in 13 large group and then do a final classification tool.
I would look into Linear Discriminant Analysis for the first step I mentioned.
EDITED:
I have a classification dataset of 350000 rows and 500 features. The features are a Tfidf vector.
While my Y(predictor) has values from 1-16 to classify the sentences into 16 types.
The training and testing are randomly split
When I send my data through a classification algorithm, I'm getting a huge difference between the accuracy :
SVM and Naive Bayes are giving 20%+ (which is too less)
RandomForest gives around 55% accuracy which seems more accurate but is still less
Is there a reason why I'm getting such a huge difference across different algorithms and is there a way to further increase the accuracy?
I'm trying predict a person's personality through his tweets
I built a classifier with 13 features ( no binary ones ) and normalized individually for each sample using scikit tool ( Normalizer().transform).
When I make predictions it predicts all training sets as positives and all test sets as negatives ( irrespective of fact whether it is positive or negative )
What anomalies I should focus on in my classifier, feature or data ???
Notes: 1) I normalize test and training sets (individually for each sample) separately.
2) I tried cross validation but the performance is same
3) I used both SVM linear and RBF Kernels
4) I tried without normalizing too. But same poor results
5) I have same number of positive and negative datasets ( 400 each) and 34 samples of positive and 1000+ samples of negative test sets.
If you're training on balanced data the fact that "it predicts all training sets as positive" is probably enough to conclude that something has gone wrong.
Try building something very simple (e.g. a linear SVM with one or two features) and look at the model as well as a visualization of your training data; follow the scikit-learn example: http://scikit-learn.org/stable/auto_examples/svm/plot_iris.html
There's also a possibility that your input data has many large outliers impacting the transform process...
Try doing feature selection on the training data (Seperately from your test/validation data).
Feature selection on your whole dataset can easily lead to overfitting.
I am trying to implement Sentiment analysis using perceptron to get a better accuracy in python. I am lost in the maths that sorounds it and need easy explanation on how to port it to be used for sentiment analysis. There is already a paper published on the same : http://aclweb.org/anthology/P/P11/P11-1015.pdf
Would anyone here be able to explain in detail and clarity ? I have a training datatset and test dataset of 5000 reviews each and am getting an accuracy of 78 percent with bag of words. I have been told perceptron will give me an accuracy of 88% and am curious to implement it.
Perceptron is just a simple binary classifier, that works on fixed size vectors from R^n as input data. So in order to use it you have to encode each of your documents in such a real-valued vector. It could be for example a bag-of-words representation (where each dimension corresponds to one wor, and the value to number of occurences), or any "more complex" representation (one of which is described in the attached paper).
So in order to "port" perceptron to sentiment analysis, you have to figure out some function f, that feeded with document returns real-valued vector, and then train you perceptron on pairs
(f(x),0) for negative reviews
(f(x),1) for positive reviews