I have three concept related questions related to Naïve Bayes.
Naïve Bayes is robust to irrelevant features. What does this mean? Can anyone give an example how does the irrelevant features cancels out?
Decision trees suffer from fragmentation. What does this mean?
It is optimal if the independence assumption holds. Can anyone give an example of independence assumption not holding?
Regards,
Akshit Bhatia
Related
What are the advantages and disadvantages of LDA vs Naive Bayes in
terms of machine learning classification?
I know some of the differences like Naive Bayes assumes variables to be independent, while LDA assumes Gaussian class-conditional density models, but I don't understand when to use LDA and when to use NB depending on the situation?
Both methods are pretty simple, so it's hard to say which one is going to work much better. It's often faster just to try both and calculate the test accuracy. But here's the list of characteristics that usually indicate if certain method is less likely to give good results. It all boils down to the data.
Naive Bayes
The first disadvantage of the Naive Bayes classifier is the feature independence assumption. In practice, the data is multi-dimensional and different features do correlate. Due to this, the result can be potentially pretty bad, though not always significantly. If you know for sure, that features are dependent (e.g. pixels of an image), don't expect Naive Bayes to show off.
Another problem is data scarcity. For any possible value of a feature, a likelihood is estimated by a frequentist approach. This can result in probabilities being close to 0 or 1, which in turn leads to numerical instabilities and worse results.
A third problem arises for continuous features. The Naive Bayes classifier works only with categorical variables, so one has to transform continuous features to discrete, by which throwing away a lot of information. If there's a continuous variable in the data, it's a strong sign against Naive Bayes.
Linear Discriminant Analysis
The LDA does not work well if the classes are not balanced, i.e. the number of objects in various classes are highly different. The solution is to get more data, which can be pretty easy or almost impossible, depending on a task.
Another disadvantage of LDA is that it's not applicable for non-linear problems, e.g. separation of donut-shape point clouds, but in high dimensional spaces it's hard to spot it right away. Usually you understand this after you see LDA not working, but if the data is known to be very non-linear, this is a strong sign against LDA.
In addition, LDA can be sensitive to overfitting and need careful validation / testing.
I am intended to do a yes/no classifier. The problem is that the data does not come from me, so I have to work with what I have been given. I have around 150 samples, each sample contains 3 features, these features are continuous numeric variables. I know the dataset is quite small. I would like to make you two questions:
A) What would be the best machine learning algorithm for this? SVM? a neural network? All that I have read seems to require a big dataset.
B)I could make the dataset a little bit bigger by adding some samples that do not contain all the features, only one or two. I have read that you can use sparse vectors in this case, is this possible with every machine learning algorithm? (I have seen them in SVM)
Thanks a lot for your help!!!
My recommendation is to use a simple and straightforward algorithm, like decision tree or logistic regression, although, the ones you refer to should work equally well.
The dataset size shouldn't be a problem, given that you have far more samples than variables. But having more data always helps.
Naive Bayes is a good choice for a situation when there are few training examples. When compared to logistic regression, it was shown by Ng and Jordan that Naive Bayes converges towards its optimum performance faster with fewer training examples. (See section 4 of this book chapter.) Informally speaking, Naive Bayes models a joint probability distribution that performs better in this situation.
Do not use a decision tree in this situation. Decision trees have a tendency to overfit, a problem that is exacerbated when you have little training data.
In many reinforcement learning (RL) papers, Markov Decision Process (MDP) is a typical problem setting for RL problem. What is the real benefit of this setting? Some papers use LSTM as their policy network structure which obviously violate the MDP assumption and make more sense.
Basically, Markov Decision Processes provide a theoretical framework that allows to analyze the convergence guarantees of the algorithms as well as other theoretical properties. Although LSTM and other deep learning approaches combined with RL have reached impressive results, they lack from a solid theoretical background that allow understand or ensure when the algorithm is going to learn something useful, or how far the learned policy will be from the optimal one.
I have used the extreme learning machine for classification purpose and found that my classification accuracy is only at 70+% which leads me to use the ensemble method by creating more classification model and testing data will be classified based on the majority of the models' classification. However, this method only increase classification accuracy by a small margin. Can I asked what are the other methods which can be used to improve classification accuracy of the 2 dimension linearly inseparable dataset ?
Your question is very broad ... There's no way to help you properly without knowing the real problem you are treating. But, some methods to enhance a classification accuracy, talking generally, are:
1 - Cross Validation : Separe your train dataset in groups, always separe a group for prediction and change the groups in each execution. Then you will know what data is better to train a more accurate model.
2 - Cross Dataset : The same as cross validation, but using different datasets.
3 - Tuning your model : Its basically change the parameters you're using to train your classification model (IDK which classification algorithm you're using so its hard to help more).
4 - Improve, or use (if you're not using) the normalization process : Discover which techniques (change the geometry, colors etc) will provide a more concise data to you to use on the training.
5 - Understand more the problem you're treating... Try to implement other methods to solve the same problem. Always there's at least more than one way to solve the same problem. You maybe not using the best approach.
Enhancing a model performance can be challenging at times. I’m sure, a lot of you would agree with me if you’ve found yourself stuck in a similar situation. You try all the strategies and algorithms that you’ve learnt. Yet, you fail at improving the accuracy of your model. You feel helpless and stuck. And, this is where 90% of the data scientists give up. Let’s dig deeper now. Now we’ll check out the proven way to improve the accuracy of a model:
Add more data
Treat missing and Outlier values
Feature Engineering
Feature Selection
Multiple algorithms
Algorithm Tuning
Ensemble methods
Cross Validation
if you feel the information is lacking then this link should you learn, hopefully can help : https://www.analyticsvidhya.com/blog/2015/12/improve-machine-learning-results/
sorry if the information I give is less satisfactory
I am new to Machine learning and I have this basic question. As I am weak in Math part of the algorithm I find it difficult to understand this.
When you are given a task to design a classifier(keep it simple -- a 2 class classifier) using unsupervised learning(no training samples), how to decide what type of classifier(linear or non-linear) to use? If we do not know this, then the importance on feature selection(which means indirectly knowing what the data set is) becomes very critical.
Am I thinking in the right direction or is there something big that I dont know. Insight into this topic is greatly appreciated.
classification is by definition a "supervised learning" problem. such models require examples of points within given classes to understand how to separate the classes from one another. if you are simply looking for relationships between unlabeled data points, you're solving an unsupervised problem. look into clustering algorithms. k-means is where a lot of people start.
hope this helps!
This is a huge problem. Yes, the term "clustering" is the best entry point for googling about that, but I understand that you want to train a classifier, where "training" means optimizing an objective function with parameters. The first choice is definitely not discriminative classifiers (such as linear ones), because with them, the standard maximum likelihood (ML) objective does not work without labels. If you absolutely want to use linear classifiers, then you have to tweak the ML objective, or better use another objective (approximating the classifier risk). But an easier choice is to rather look at generative models, such as HMMs, Naive Bayes, Latent Dirichlet Allocation, ... for which the ML objective works without labels.