Googled a lot to find an answer.Then thought this will be the area where some one will be able to answer my doubt.
In classification algorithm we have model and prediction part.
Normally while testing we have accuracy rate.
Likewise is there any accuracy rate/confidence for model in Navie Bayes algorithm.
Evaluation is (usually) not part of the classifier.
It's something you do seperately to evaluate if you did a good job, or not.
If you classify your test data using naive bayes, you can perform exactly the same kind of evaluation as with other classifiers!
Related
I read from a paper that Naive Bayes using IG is the best model for text classification where the dataset is small and has few positives. However, I'm not too sure how to code this specific model using Python. Would this be user TF or Scikit learn and then adjusting a parameter?
According to my understanding, RF selects features randomly and hence is hard to overfit. But, in sklearn Gradient boosting also offers the option of max_features which can help to prevent overfitting. So, why would anyone use Random forest?
Can anyone explain when to use Gradient boosting vs Random forest based on the given data?
Any help is highly appreciated.
According to my personal experience, Random Forest could be a better choice when..
You train a model on small data set.
Your data set has few features to learn.
Your data set has low Y flag count or you try to predict a situation that has low chance to occur or rarely occurs.
In these situations, Gradient Boosting algorithms like XGBoost and Light GBM can overfit (though their parameters are tuned) while simple algorithms like Random Forest or even Logistic Regression may perform better. To illustrate, for XGboost and Ligh GBM, ROC AUC from test set may be higher in comparison with Random Forest but shows too high difference with ROC AUC from train set.
Despite the sharp prediction form Gradient Boosting algorithms, in some cases, Random Forest take advantage of model stability from begging methodology (selecting randomly) and outperform XGBoost and Light GBM. However, Gradient Boosting algorithms perform better in general situations.
Similar question asked on Quora:
https://www.quora.com/How-do-random-forests-and-boosted-decision-trees-compare
I agree with the author at the link that random forests are more robust -- they don't require much problem-specific tuning to get good results. Besides that, a couple other items based on my own experience:
Random forests can perform better on small data sets; gradient boosted trees are data hungry
Random forests are easier to explain and understand. This perhaps seems silly but can lead to better adoption of a model if needed to be used by less technical people
I think that's also true. I have also read on this page How Random Forest Works
There explains the advantages of random forest. like this :
For applications in classification problems, Random Forest algorithm
will avoid the overfitting problem
For both classification and
regression task, the same random forest algorithm can be used
The Random Forest algorithm can be used for identifying the most
important features from the training dataset, in other words,
feature engineering.
I need a learning model that when we test it with a data sample it says which train data cause the answer .
Is there anything that do this?
(I already know KNN will do this)
thanks
look for generative models
"It asks the question: based on generation assumptions, which category is most likely to generate this signal?"
This is not a very well worded question:
Which train data cause the answer? I already know KNN will do this
KNN will tell you what the K nearest neighbors are, but it's not just those K training samples that cause the answer, it's also all the other training samples by being farther away.
The objective of machine learning is to generalize from the whole of the training dataset, so all samples in the training dataset (after outlier filtering, dataset reduction steps) cause the answer.
If your question is 'Which class of machine learning algorithms makes a decision by comparing a new instance to instances seen in the training data, and can list the training examples which most strongly informed the decision?', the answer is: Instance based learning https://en.wikipedia.org/wiki/Instance-based_learning
(e.g. KNN, kernel machines, RBF)
What does "Naive" Bayes mean in Machine learning?
Naive Bayes in machine learning typically refers to a set of supervised learning algorithms that apply the Bayes' theorem. It's essentially a "classifier" that helps you classify things based on a series of independent "naive" assumptions. For example if you wanted to use machine learning to help you identify potential fruit... taking a banana, its curved, yellow, and may be 10 inches long. Each of those properties, 'curved', 'yellow', '10 inches long' are all independent properties that come together to form a 'probability' that a fruit is a banana. With this 'naive' bayes classifier, in the future if there are other kinds of 'fruit' or different images, descriptions of fruit that have similar properties, using machine learning, your naive bayes classifier can classify those future fruits or unknown things as bananas correctly... (or incorrectly) in which you'll probably want to identify more 'naive' features to make your classifier more accurate like for example, there might be a 'blackened tip' or 'have a slight greenish color'.
It is called naïve because the model assumes independence between the features. This is a strong assumption which usually is not correct and that is the reason of the name.
Nevertheless, naïve Bayes is quite efficient and in practice is known for giving good results.
A naive Bayes classifier is an algorithm that uses Bayes' theorem to classify objects. Naive Bayes classifiers assume strong, or naive, independence between attributes of data points. Popular uses of naive Bayes classifiers include spam filters, text analysis and medical diagnosis.
What makes a naive Bayes classifier naive is its assumption that all attributes of a data point under consideration are independent of each other. A classifier sorting fruits into apples and oranges would know that apples are red, round and are a certain size, but would not assume all these things at once. Oranges are round too, after all.
A naive Bayes classifier is not a single algorithm, but a family of machine learning algorithms that make uses of statistical independence. These algorithms are relatively easy to write and run more efficiently than more complex Bayes algorithms.
The most popular application is spam filters. A spam filter looks at email messages for certain key words and puts them in a spam folder if they match.
Despite the name, the more data it gets, the more accurate a naive Bayes classifier becomes, such as from a user flagging email messages in an inbox for spam.
I have been trying to figure out the correlation between the error rate and the number of features in both of these models. I watched some videos, and the creator of the video said that a simple model can be better than a complicated model. So I figured that the more features I had the greater the error rate would be. This did not prove to be true in my work, and when I had less features the error rate went up. I'm not sure if I'm doing this incorrectly, or if the guy in the video made a mistake. Can someone care to explain? I also am curious how features relate to Logistic Regression's error rate as well.
Naive Bayes and Logistic Regression are a "generative-discriminative pair," meaning they have the same model form (a linear classifier), but they estimate parameters in different ways.
For feature x and label y, naive Bayes estimates a joint probability p(x,y) = p(y)*p(x|y) from the training data (that is, builds a model that could "generate" the data), and uses Bayes Rule to predict p(y|x) for new test instances. On the other hand, logistic regression estimates p(y|x) directly from the training data by minimizing an error function (which is more "discrimative").
These differences have implications for error rate:
When there are very few training instances, logistic regression might "overfit," because there isn't enough data to estimate p(y|x) reliably. Naive Bayes might do better because it models the entire joint distribution.
When the feature set is large (and sparse, like word features in text classification) naive Bayes might "double count" features that are correlated with each other, because it assumes that each p(x|y) event is independent, when they are not. Logistic regression can do a better job by naturally "splitting the difference" among these correlated features.
If the features really are (mostly) conditionally independent, both models might actually improve with more and more features, provided there are enough data instances. The problem comes when the training set size is small relative to the number of features. Priors on naive Bayes feature parameters, or regularization methods (like L1/Lasso or L2/Ridge) on logistic regression can help in these cases.