I am doing a machine learning project using Weka .What is meant by Probability distribution appeared in the classifier output in Weka?
Assuming x your model output and x is the probability that an instance belongs in a certain class. The question is asking what is the probability density function of x.
Related
I have trained a neural network classifier that directly learned the distribution p(y|x, w), where y is binary (0 or 1), x is an input vector, and w are the parameters of the model.
I need to compute the distribution p(x|y,w). According to bayesian theorem, this can be done by p(x|y,w) = [p(y|x,w)p(x|w)]/p(y|w).
p(x|w) is the probability of the input given the model (summed out the class) and p(y|w) is the probability of each class given the model (integrated out every possible input vector from the data). I already have p(y|x,w), which is my classifier.
Is there a way to compute the missing quantities p(x|w) and p(y|w) from the data?
I'm following a TensorFlow example that takes a bunch of features (real estate related) and "expensive" (ie house price) as the binary target.
I was wondering if the target could take more than just a 0 or 1. Let's say, 0 (not expensive), 1 (expensive), 3 (very expensive).
I don't think this is possible as the logistic regression model has asymptotes nearing 0 and 1.
This might be a stupid question, but I'm totally new to ML.
I think I found the answer myself. From Wikipedia:
First, the conditional distribution y|x is a Bernoulli distribution rather than a Gaussian distribution, because the dependent variable is binary. Second, the predicted values are probabilities and are therefore restricted to (0,1) through the logistic distribution function because logistic regression predicts the probability of particular outcomes.
Logistic Regression is defined for binary classification tasks.(For more details, please logistic_regression. For multi-class classification problems, you can use Softmax Classification algorithm. Following tutorials shows how to write a Softmax Classifier in Tensorflow Library.
Softmax_Regression in Tensorflow
However, your data set is linearly non-separable (most of the time this is the case in real-world datasets) you have to use an algorithm which can handle nonlinear decision boundaries. Algorithm such as Neural Network or SVM with Kernels would be a good choice. Following IPython notebook shows how to create a simple Neural Network in Tensorflow.
Neural Network in Tensorflow
Good Luck!
I am new to machine learning and I am currently working on classification problem. I am able to train the model and predict test data sets. I want to know whether is there some way by which I can get scores along with the prediction. By scores , I mean those are proximity scores along with prediction. For example, in standard age-salary-buy (based on age and salary whether the customer will buy the product or not) classification problem, I want to know what is a score out of 100 that he will buy that product in addition to the prediction of whether he will buy it or not.
Currently, I am using LibSVM Algo. Is there some algo which provides me above data ?
Thanks.
What you are looking for is a support of your decision. In other words, many classifiers base their decision of x class over labels Y on:
cl(x) = arg max_{y \in Y} p(y|x)
where p(y|x) is their internal estimation of "x having label y". And such classifiers include:
neural networks (with sigmoid output)
logistic regression
naive bayes
voting ensembles (such as RF)
...
These methods can be easily converted to your 0-100 scale, as probability is in 0-1 scale.
Some, on the other hand use measure proportional to probability (such as SVM), but unbounded, here you can get this value (often called decision function) but you cannot convert it to 0-100 score (as you do not have "maximum" value). This is a big drawback, so some modification were proposed. In particular for SVM you have Platt's scaling which actually fits a logistic regression on top of SVM so you get your probability estimate. In libSVM you can set -b to get probability estimates
from libsvm website
-b probability_estimates: whether to train a SVC or SVR model for probability estimates, 0 or 1 (default 0)
I have a dataset of state->action pairs, (s,a), where each s defines a probability distribution over the possible choices of a, and each a is sampled from that probability distribution. I'd like to train a classifier for this dataset, where rather than learning to predict the maximum likelihood, it predicts the distribution a was sampled from.
For example, if you're playing an iterative rock-paper-scissors, your state may be just the previous move you made and a ∈ { Rock, Paper, Scissors }, where the previous state reduces the probability of choosing that action again. My dataset would then look like:
PreviousAction,Chosen
Rock,Paper
Paper,Rock
Rock,Scissors
Scissors,Paper
Paper,Paper
...
Is it possible to learn probability distributions over the labels with random forests in scikit-learn?
Yes, it is. Train a RandomForestClassifier using fit (which expects labels, not probability distributions, as its y argument), then predict using predict_proba.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 2 years ago.
Improve this question
I am having problem developing intuition about the probabilistic interpretation of logistic regression. Specifically, why is it valid to consider the output of logistic regression function as a probability?
Any type of classification can be seen as a probabilistic generative model by modeling the class-conditional densities p(x|C_k) (i.e. given the class C_k, what's the probability of x belonging to that class), and the class priors p(C_k) (i.e. what's the probability of class C_k), so that we can apply Bayes' theorem to obtain the posterior probabilities p(C_k|x) (i.e. given x, what's the probability that it belongs to class C_k). It is called generative because, as Bishop says in his book, you could use the model to generate synthetic data by drawing values of x from the marginal distribution p(x).
This all just means that every time you want to classify something into a specific class (e.g. size of a tumor being malignant of benign), there will be a probability of that being right or wrong.
Logistic regression uses a sigmoid function (or logistic function) in order to classify the data. Since this type of function ranges from 0 to 1, you can easily use it to think of it as probability distributions. Ultimately, you're looking for p(C_k|x) (in the example, xcould be the size of the tumor, and C_0 the class that represents benign and C_1 malignant), and in the case of logistic regression, this is modeled by:
p(C_k|x) = sigma( w^t x )
where sigmais the sigmoid function, w^t is the transposed set of weights w, and xis your feature vector.
I highly recommend you read Chapter 4 of Bishop's book.
• Probabilistic interpretation of Logistic regression is based on below 3 assumptions :
Features are real-valued Gaussian distributed.
Response variable is a Bernoulli random variable. For example, in a binary class problem, yi = 0 or 1.
For all i and j!=i, xi and xj are conditionally independent given y. (Naive Bayes assumption)
So essentially,
Logistic-Reg = Gaussian Naive Bayes + Bernoulli class labels
• The optimization equation that is shown in below image :
• And the equations for P(y=1 or 0/X) are show in below picture :
• If we do a little math, we can see that both geometric and probabilistic interpretations of logistic regression boils down to same thing.
• This link can be useful to learn more regarding Logistic regression and Naive Bayes.