I read a lot of times in literature that there are several Data Mining methods (for example: decision trees, k-nearest neighbour, SVM, Bayes Classification) and the same for Data Mining algorithms (k-nearest neighbour algorithm, Naive Bayes Algorithm).
Is a DM method using different DM algorithms or is it the same?
An example to clarify - is there any difference between the below?
I'm using the Naive Bayes classification method.
I'm using the Naive Bayes classification algorithm.
Or is "Bayes" the method and "Naive Bayes" the algorithm?
Related
In a classification problem, I cannot use a simple logit model if my data label (aka., dependent variable) has more than two categories. That leaves me with multinomial regression and Linear Discriminant Analysis (LDA) and the likes. Why is it that multinomial logit is not as popular as LDA in machine learning? What is the particular advantage that LDA offers?
I am trying to build a number classification model in CoreML and want to use the naive bayes classifier but not able to find how to use it. My algorithm is using naive bayes
At the moment, coremltools support only following types of classifiers:
SVMs (scikitlearn)
Neural networks (Keras, Caffe)
Decision trees and their ensembles (scikitlearn, xgboost)
Linear and logistic regression (scikitlearn)
However, implementing Naïve Bayes in Swift yourself is not that hard, check this implementation, for example.
Naive Bayes Algorithm assumes independence among features. What are some text classification algorithms which are not Naive i.e. do not assume independence among it's features.
The answer will be very straight forward, since nearly every classifier (besides Naive Bayes) is not naive. Features independence is very rare assumption, and is not taken by (among huge list of others):
logistic regression (in NLP community known as maximum entropy model)
linear discriminant analysis (fischer linear discriminant)
kNN
support vector machines
decision trees / random forests
neural nets
...
You are asking about text classification, but there is nothing really special about text, and you can use any existing classifier for such data.
I was running Multi-label classification on text data I noticed TFIDF outperformed LDA by a large margin. TFIDF accuracy was aorund 50% and LDA was around 29%.
Is this expected or should LDA do better than this?
LDA is normally used for unsupervised learning, not for classification. It provides a generative model, not a discriminative model (What is the difference between a Generative and Discriminative Algorithm?), which makes it less optimal for classification. LDA can also be sensitive to data preprocessing and model parameters.
Googled a lot to find an answer.Then thought this will be the area where some one will be able to answer my doubt.
In classification algorithm we have model and prediction part.
Normally while testing we have accuracy rate.
Likewise is there any accuracy rate/confidence for model in Navie Bayes algorithm.
Evaluation is (usually) not part of the classifier.
It's something you do seperately to evaluate if you did a good job, or not.
If you classify your test data using naive bayes, you can perform exactly the same kind of evaluation as with other classifiers!