Discriminative models Machine Learning [closed] - machine-learning

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 2 years ago.
Improve this question
I'm a little bit confused about discriminative models.
I understood that a probabilistic classifier use the maximum likelihood to understand which class an input belongs to, while a linear classifier use a linear combination of the input to classify.
At this point I do not understand if discriminative models are probabilistic classifiers or linear classifiers.

A Discriminative model ‌models the decision boundary between the classes. A Generative Model ‌explicitly models the actual distribution of each class. In final both of them is predicting the conditional probability P(y | x).A Generative Model ‌learns the joint probability distribution p(x,y). It predicts the conditional probability with the help of Bayes Theorem.A Discriminative model ‌learns the conditional probability distribution p(y|x). Both of these models were generally used in supervised learning problems.
A more in-depth discussion can be found in the Cross Validated thread Generative vs. discriminative.

Related

difference between classification and regression in k-nearest neighbor? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 2 years ago.
Improve this question
what is the difference between using K-nearest neighbor in classification and using it in regression?
and when using KNN in recommendation system. Does it concerned as classification or as regression?
In classification tasks, the user seeks to predict a category, which is usually represented as an integer label, but represents a category of "things". For instance, you could try to classify pictures between "cat" and "dog" and use label 0 for "cat" and 1 for "dog".
The KNN algorithm for classification will look at the k nearest neighbours of the input you are trying to make a prediction on. It will then output the most frequent label among those k examples.
In regression tasks, the user wants to output a numerical value (usually continuous). It may be for instance estimate the price of a house, or give an evaluation of how good a movie is.
In this case, the KNN algorithm would collect the values associated with the k closest examples from the one you want to make a prediction on and aggregate them to output a single value. usually, you would choose the average of the k values of the neighbours, but you could choose the median or a weighted average (or actually anything that makes sense to you for the task at hand).
For your specific problem, you could use both but regression makes more sense to me in order to predict some kind of a "matching percentage ""between the user and the thing you want to recommand to him.

Base estimator meaning in the context of Isolation forest [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I am struggling to understand the meaning of "Base estimator" in the context of Isolation Forest.
One of the parameters for Isolation Forest method in scikit-learn is n_estimators; its description in sklearn docs states the following:
The number of base estimators in the ensemble.
I tried interpreting the documentation on Sklearn and stuff on Google and Youtube
to understand this terminology but no luck. Could someone please explain what it means in the context of IF?
tl;dr: it is the special kind of decision tree called Isolation Tree (iTree) in the original paper:
We show in this paper that a tree structure can be constructed effectively to isolate every single instance. [...] This isolation characteristic of tree forms the basis of our method to detect anomalies, and we call this tree Isolation Tree or iTree.
The proposed method, called Isolation Forest or iForest, builds an ensemble of iTrees for a given data set [...]
All ensemble methods (to which Isolation Forest belongs) consist of base estimators (i.e. they are exactly ensembles of base estimators); from the sklearn guide:
The goal of ensemble methods is to combine the predictions of several base estimators built with a given learning algorithm in order to improve generalizability / robustness over a single estimator.
For example, in Random Forest (which arguably was the inspiration for the name Isolation Forest), this base estimator is a simple decision tree:
n_estimators : int, default=100
The number of trees in the forest.
Similarly for algorithms like Gradient Boosting Trees (despite the scikit-learn docs referring to them as "boosting stages", they are decision trees nevertheless), Extra Trees etc.
In all these algorithms, the base estimator is fixed (although its specific parameters can vary as set in the ensemble arguments). There is another category of ensemble methods, where the exact model to be used as base estimator can be also set by a respective argument base_estimator; for example, here is the Bagging Classifier:
base_estimator : object, default=None
The base estimator to fit on random subsets of the dataset. If None, then the base estimator is a decision tree.
and AdaBoost:
base_estimator : object, default=None
The base estimator from which the boosted ensemble is built. [...] If None, then the base estimator is DecisionTreeClassifier(max_depth=1).
Historically speaking, the first ensembles were built using various versions of decision trees, and arguably still today it is decision trees (or variants, like iTrees) that are almost exclusively used for such ensembles; quoting from another answer of mine in Execution time of AdaBoost with SVM base classifier :
Adaboost (and similar ensemble methods) were conceived using decision trees as base classifiers (more specifically, decision stumps, i.e. DTs with a depth of only 1); there is good reason why still today, if you don't specify explicitly the base_classifier argument, it assumes a value of DecisionTreeClassifier(max_depth=1). DTs are suitable for such ensembling because they are essentially unstable classifiers, which is not the case with SVMs, hence the latter are not expected to offer much when used as base classifiers.

Ridge regression vs Lasso Regression [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 1 year ago.
Improve this question
Is Lasso regression or Elastic-net regression always better than the ridge regression?
I've conducted these regressions on a few data sets and I've always got the same result that the mean squared error is the least in lasso regression. Is this a mere coincidence or is this true in any case?
On the topic, James, Witten, Hastie and Tibshirani write in their book "An Introduktion to Statistical Learning":
These two examples illustrate that neither ridge regression nor the
lasso will universally dominate the other. In general, one might expect
the lasso to perform better in a setting where a relatively small
number of predictorshave substantial coefficients, and the remaining
predictors have coefficients that are very small or that equal zero.
Ridge regression will perform better when the response is a function of
many predictors, all with coefficients of roughly equal size. However,
the number of predictors that is related to the response is never
known apriori for real data sets. A technique such as cross-validation
can be used in order to determine which approach is betteron a
particular data set. (chapter 6.2)
It's different for each problem. In lasso regression, algorithm is trying to remove the extra features that doesn't have any use which sounds better because we can train with less data very nicely as well but the processing is a little bit harder, but in ridge regression the algorithm is trying to make those extra features less effective but not removing them completely which is easier to process.

Difference between Regression and classification in Machine Learning? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I am new to Machine Learning. Can anyone tell me the major difference between classification and regression in machine learning?
Regression aims to predict a continuous output value. For example, say that you are trying to predict the revenue of a certain brand as a function of many input parameters. A regression model would literally be a function which can output potentially any revenue number based on certain inputs. It could even output revenue numbers which never appeared anywhere in your training set.
Classification aims to predict which class (a discrete integer or categorical label) the input corresponds to. e.g. let us say that you had divided the sales into Low and High sales, and you were trying to build a model which could predict Low or High sales (binary/two-class classication). The inputs might even be the same as before, but the output would be different. In the case of classification, your model would output either "Low" or "High," and in theory every input would generate only one of these two responses.
(This answer is true for any machine learning method; my personal experience has been with random forests and decision trees).
Regression - the output variable takes continuous values.
Example :Given a picture of a person, we have to predict their age on the basis of the given picture
Classification - the output variable takes class labels.
Example: Given a patient with a tumor, we have to predict whether the tumor is malignant or benign.
I am a beginner in Machine Learning field but as far as i know, regression is for "continuous values" and classification is for "discrete values". With regression, there is a line for your continuous value and you can see that your model is good or bad fit. On the other hand, you can see how discrete values gains some meaning "discretely" with classification. If i am wrong please feel free to make correction.

Understanding the probabilistic interpretation of logistic regression [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 2 years ago.
Improve this question
I am having problem developing intuition about the probabilistic interpretation of logistic regression. Specifically, why is it valid to consider the output of logistic regression function as a probability?
Any type of classification can be seen as a probabilistic generative model by modeling the class-conditional densities p(x|C_k) (i.e. given the class C_k, what's the probability of x belonging to that class), and the class priors p(C_k) (i.e. what's the probability of class C_k), so that we can apply Bayes' theorem to obtain the posterior probabilities p(C_k|x) (i.e. given x, what's the probability that it belongs to class C_k). It is called generative because, as Bishop says in his book, you could use the model to generate synthetic data by drawing values of x from the marginal distribution p(x).
This all just means that every time you want to classify something into a specific class (e.g. size of a tumor being malignant of benign), there will be a probability of that being right or wrong.
Logistic regression uses a sigmoid function (or logistic function) in order to classify the data. Since this type of function ranges from 0 to 1, you can easily use it to think of it as probability distributions. Ultimately, you're looking for p(C_k|x) (in the example, xcould be the size of the tumor, and C_0 the class that represents benign and C_1 malignant), and in the case of logistic regression, this is modeled by:
p(C_k|x) = sigma( w^t x )
where sigmais the sigmoid function, w^t is the transposed set of weights w, and xis your feature vector.
I highly recommend you read Chapter 4 of Bishop's book.
• Probabilistic interpretation of Logistic regression is based on below 3 assumptions :
Features are real-valued Gaussian distributed.
Response variable is a Bernoulli random variable. For example, in a binary class problem, yi = 0 or 1.
For all i and j!=i, xi and xj are conditionally independent given y. (Naive Bayes assumption)
So essentially,
Logistic-Reg = Gaussian Naive Bayes + Bernoulli class labels
• The optimization equation that is shown in below image :
• And the equations for P(y=1 or 0/X) are show in below picture :
• If we do a little math, we can see that both geometric and probabilistic interpretations of logistic regression boils down to same thing.
• This link can be useful to learn more regarding Logistic regression and Naive Bayes.

Resources