Combining different SVM models - machine-learning

I'm creating a character recognition software in Python using scikit-learn. I have a large dataset of images labelled [A-Za-z]. I'm using linear SVM. Training the model using all the samples with 52 different labels is very, very slow.
If I divide my training dataset in 13 sections such that each section has images of only 4 characters, and no image can be a part of more than 1 section, and then train 13 different models.
How can I combine those models together to create a more accurate model? OR if I perform classification of test set on all 13 models and compare individual sample's result on basis of confidence score (selecting the one with with highest score), will it affect the accuracy of the overall model?

It seems what you need to to is some kind of Order Reduction of data.
After the order reduction classify data into in 13 large group and then do a final classification tool.
I would look into Linear Discriminant Analysis for the first step I mentioned.

Related

Creating word embedings from bert and feeding them to random forest for classification

I have used bert base pretrained model with 512 dimensions to generate contextual features. Feeding those vectors to random forest classifier is providing 83 percent accuracy but in various researches i have seen that bert minimal gives 90 percent.
I have some other features too like word2vec, lexicon, TFIDF and punctuation features.
Even when i merged all the features i got 83 percent accuracy. The research paper which i am using as base paper mentioned an accuracy score of 92 percent but they have used an ensemble based approach in which they classified through bert and trained random forest on weights.
But i was willing to do some innovation thus didn't followed that approach.
My dataset is biased to positive reviews so according to me the accuracy is less as model is also biased for positive labels but still I am looking for an expert advise
Code implementation of bert
https://github.com/Awais-mohammad/Sentiment-Analysis/blob/main/Bert_Features.ipynb
Random forest on all features independently
https://github.com/Awais-mohammad/Sentiment-Analysis/blob/main/RandomForestClassifier.ipynb
Random forest on all features jointly
https://github.com/Awais-mohammad/Sentiment-Analysis/blob/main/Merging_Feature.ipynb
Regarding the "no improvements despite adding more features" - some researchers believe that the BERT word embeddings already contain all the available information presented in text, so then it doesn't matter how fancy a classification head you add to it, doesn't matter if it is a linear model that uses the embeddings, or a complicated ML algorithm with a number of other features, they will not provide significant improvements in many tasks. They argue, that since BERT is a context-aware, bidirectional language model - that is trained extensively on MLM and NSP tasks, it already grasps most of the things that additional features for punctuation, word2vec and tfidf could convey. The lexicon could probably help a little in the sentiment task, if it is relevant, but the one or two extra variables, that you likely use to represent it, probably get drowned in all the other features.
Other than that, the accuracy of BERT-based models depends on the dataset used, sometimes the data is simply too diverse to obtain a perfect score, e.g. if there are some instances of observations that are very similar, but with different class labels etc. You can see in the BERT papers, that the accuracy widely depends on the task, e.g. in some tasks it is indeed 90+%, but for some tasks, e.g. Masked Language Modeling, where the model needs to choose a particular word from a vocab of over 30K words, the accuracy of 20% could be impressive in some cases. So in order to obtain a reliable comparison with bert papers, you'd need to pick a dataset that they've used and then compare.
Regarding the dataset balance, for deep learning models in general, the rule of thumb is that the training set should be more or less balanced w.r.t. the fraction of data covered by each class label. So if you have 2 labels, should be ~50-50, if 5 labels, then each should be at around 20% of training dataset, etc.
That is because most NN's work in batches, where they update the model weights based on the feedback from each batch. So if you have too many values of one class, the batch updates will be dominated by that one class, effectively worsening the quality of your training.
So, if you want to improve the accuracy of your model, balancing the dataset could be an easy fix. And if you have e.g. 5 ordered classes with differing sizes, you may consider merging some of them (e.g. reviews from 1-2 as bad, 3 as neutral, 4-5 as good) and then rebalancing, if still necessary.
(Unless it's a situation where e.g. 1 class has 80% of data, and 4 classes share the remaining 20%. In such a case you should probably consider some more advanced options, such as partitioning the algo to two parts, one predicting whether or not an instance is in class 1 (so a binary classifier), the other to distinguish between the 4 underrepresented classes. )

Linear SVM is used for linearly separating the data which have two features

Can we use KNN and linear SVM classifier for training the model with data which contains 4 features and have 6 classification clusters? Because what i think that linear SVM and KNN are used for linearly separating the data which have two features and have binary classification cluster.
This is possible, you just need to use OneVsAll wrapper, like this one https://scikit-learn.org/stable/modules/generated/sklearn.multiclass.OneVsRestClassifier.html
Essentially you will train 6 classifiers, one per cluster, which seeks to locate one class from all the rest.

Using LSTM for binary classification

I have time series data of size 100000*5. 100000 samples and five variables.I have labeled each 100000 samples as either 0 or 1. i.e. binary classification.
I want to train it using LSTM , because of the time series nature of data.I have seen examples of LSTM for time series prediction, Is it suitable to use it in my case.
Not sure about your needs.
LSTM is best suited for sequence models, like time series you said, and your description don't look a time series.
Any way, you may use LSTM for time series, not for prediction, but for classification like this article.
In my experience, for binary classification having only 5 features you could find better methods, will consume more memory thant other methods, and could get worst results.
First of all, you can see it from a different perspective, i.e. instead of having 10,000 labeled samples of 5 variables, you should treat it as 10,000 unlabeled samples of 6 variables, where the 6th variable is the label.
Therefore, you can train your LSTM as a multivariate predictor for your 6th variable, that is the sample label and compare with the ground truth during testing to evaluate its performance.

weka multiple vectors for same instance

I am working on a seizure prediction research, and I am using Weka to train my data vectors on.
If I have 10 seizures, each seizure is represented by 5 vectors, which makes a total of 50 vectors corresponding to 10 seizures. However, Weka is treating these vectors as totally independent even though each 5 vectors corresponds to only one seizure.
So how can I let Weka take this into account when performing the Learning?

What's the difference between ANN, SVM and KNN classifiers?

I am doing remote sensing image classification. I am using the object-oriented method: first I segmented the image to different regions, then I extract the features from regions such as color, shape and texture. The number of all features in a region may be 30 and commonly there are 2000 regions in all, and I will choose 5 classes with 15 samples for every class.
In summary:
Sample data 1530
Test data 197530
How do I choose the proper classifier? If there are 3 classifiers (ANN, SVM, and KNN), which should I choose for better classification?
KNN is the most basic machine learning algorithm to paramtise and implement, but as alluded to by #etov, would likely be outperformed by SVM due to the small training data sizes. ANNs have been observed to be limited by insufficient training data also. However, KNN makes the least number of assumptions regarding your data, other than that accurate training data should form relatively discrete clusters. ANN and SVM are notoriously difficult to paramtise, especially if you wish to repeat the process using multiple datasets and rely upon certain assumptions, such as that your data is linearly separable (SVM).
I would also recommend the Random Forests algorithm as this is easy to implement and is relatively insensitive to training data size, but I would advise against using very small training data sizes.
The scikit-learn module contains these algorithms and is able to cope with large training data sizes, so you could increase the number of training data samples. the best way to know for sure would be to investigate them yourself, as suggested by #etov
If your "sample data" is the train set, it seems very small. I'd first suggest using more than 15 examples per class.
As said in the comments, it's best to match the algorithm to the problem, so you can simply test to see which algorithm works better. But to start with, I'd suggest SVM: it works better than KNN with small train sets, and generally easier to train then ANN, as there are less choices to make.
Have a look at below mind map
KNN: KNN performs well when sample size < 100K records, for non textual data. If accuracy is not high, immediately move to SVC ( Support Vector Classifier of SVM)
SVM: When sample size > 100K records, go for SVM with SGDClassifier.
ANN: ANN has evolved overtime and they are powerful. You can use both ANN and SVM in combination to classify images
More details are available #semanticscholar.org

Resources