How can I output soft label predictions from RandomForestClassifier instead of 0 or 1's - random-forest

I'm doing a kaggle competition with binary classification. I want to output the probabilistic outputs (soft labels- some value between 0 and 1) rather than just 0 and 1. Not sure how to output my predictions like this though? Using randomforestclassifier

Use the function predict_proba(X) of the model.
This function predicts the class probabilities for an X.
The documentation says:
The predicted class probabilities of an input sample are computed as
the mean predicted class probabilities of the trees in the forest. The
class probability of a single tree is the fraction of samples of the
same class in a leaf.
So, imagine that your RandomForest is formed for 100 different trees. If 91 of these trees predict that the sample is class 0, and 9 trees predicts it's class 1, the output of predict_proba(X) will be:
[0.91, 0.09]
Whereas if you use predict(X), you will obtain directly the class prediction (in this case, class 0 since 0.91 is bigger than 0.09):
[0] # Which refers to class 0

Related

Converting Neural Network output to classes

I am working on document classification problem from Kaggle.
It has 5 classes - 'business', 'tech', 'politics', 'sport', 'entertainment'
I have trained my Deep Learning model and got the results for the test set as well. But the result I am getting is the list of probabilities of different classes.
Output for one row
How to get the actual classes(labels) from the output I got?
My Neural Network architecture looks like this-
Network Architecture
You should choose the entry with the highest value as the predicted class. For example, in your provided example: [0.045, 0.030, 0.015, 0.889, 0.019], the predicted class is the forth class (i.e., idx=3) which has the highest probability value.
The argmax function of NumPy is probably what you should be using. Considering that pred are the output probablities from your network in the shape of: (batch_size, num_labels), then np.argmax(pred, axis=1) will give you the indices (i.e., labels) associated with the predicted classes.

SVM binary classifier predicts one class for all of test data

I have a classification problem with 10 features and I have to predict 1 or 0. When I train the SVC model, with the train test split, all the predicted values for the test portion of the data comes out to be 0. The data has the following 0-1 count:
0: 1875
1: 1463
The code to train the model is given below:
from sklearn.svm import SVC
model = SVC()
model.fit(X_train, y_train)
pred= model.predict(X_test)
from sklearn.metrics import accuracy_score
accuracy_score(y_test, pred)`
Why does it predict 0 for all the cases?
The model predicts the more frequent class, even though the dataset is nor much imbalanced. It is very likely that the class cannot be predicted from the features as they are right now.
You may try normalizing the features.
Another thing you might want to try is to have a look at how correlated the features are with each other. Having highly correlated features might also prevent the model from converging.
Also, you might have chosen the wrong features.
For a classification problem, it is always good to run a dummy classifiar as a starting point. This will give you an idea how good your model can be.
You can use this as a code:
from sklearn.dummy import DummyClassifier
dummy_classifier = DummyClassifier(strategy="most_frequent")
dummy_classifier.fit(X_train,y_train)
pred_dum= dummy_classifier.predict(X_test)
accuracy_score(y_test, pred_dum)
this will give you an accuracy, if you predict always the most frequent class. If this is for example: 100% , this would mean that you only have one class in your dataset. 80% means, that 80% of your data belongs to one class.
In a first step you can adjust your SVC:
model = SVC(C=1.0, kernel=’rbf’, random_state=42)
C : float, optional (default=1.0)Penalty parameter C of the error
term.
kernel : Specifies the kernel type to be used in the algorithm. It
must be one of ‘linear’, ‘poly’, ‘rbf’
This can give you a starting point.
On top you should run also a prediction for your training data, to see the comparison if you are over- or underfitting.
trainpred= model.predict(X_train)
accuracy_score(y_test, trainpred)

sklearn - Predict each class's probability

So far I have resourced another post and sklearn documentation
So in general I want to produce the following example:
X = np.matrix([[1,2],[2,3],[3,4],[4,5]])
y = np.array(['A', 'B', 'B', 'C', 'D'])
Xt = np.matrix([[11,22],[22,33],[33,44],[44,55]])
model = model.fit(X, y)
pred = model.predict(Xt)
However for output, I would like to see 3 columns per observation as output from pred:
A | B | C
.5 | .2 | .3
.25 | .25 | .5
...
and a different probability for each class showing up in my prediction.
I believe that the best approach would be Multilabel classification from the second link I provided above. Additionally, I think it might be a good idea to hop into one of the multi-label or multi-output models listed below:
Support multilabel:
sklearn.tree.DecisionTreeClassifier
sklearn.tree.ExtraTreeClassifier
sklearn.ensemble.ExtraTreesClassifier
sklearn.neighbors.KNeighborsClassifier
sklearn.neural_network.MLPClassifier
sklearn.neighbors.RadiusNeighborsClassifier
sklearn.ensemble.RandomForestClassifier
sklearn.linear_model.RidgeClassifierCV
Support multiclass-multioutput:
sklearn.tree.DecisionTreeClassifier
sklearn.tree.ExtraTreeClassifier
sklearn.ensemble.ExtraTreesClassifier
sklearn.neighbors.KNeighborsClassifier
sklearn.neighbors.RadiusNeighborsClassifier
sklearn.ensemble.RandomForestClassifier
However, I am looking for someone who is has more confidence and experience at doing this the right way. All feedback is appreciated.
-bmc
From what I understand you want to obtain probabilities for each of the potential classes for multi-class classifier.
In Scikit-Learn it can be done by generic function predict_proba. It is implemented for most of the classifiers in scikit-learn. You basically call:
clf.predict_proba(X)
Where clf is the trained classifier.
As output you will get a decimal array of probabilities for each class for each input value.
One word of caution - not all classifiers naturally evaluate class probabilities. For instance, SVM doesn't do that. You still can obtain the class probabilities though, but to do that upon constructing such classifiers you need to instruct it to perform probability estimation. For SVM it would look like:
SVC(Probability=True)
After you fit it you will be able to use predict_proba as before.
I need to warn you that if classifier doesn't naturally evaluate probabilities that means that the probabilities will be evaluated using rather expansive computational methods which may significantly increase training time. So I advice you to use classifiers which naturally evaluate class probabilities (neural networks with softmax output, logistic regression, gradient boosting etc)
Try to use calibrated model:
# define model
model = SVC()
# define and fit calibration model
calibrated = CalibratedClassifierCV(model, method='sigmoid', cv=5)
calibrated.fit(trainX, trainy)
# predict probabilities
print(calibrated.predict_proba(testX)[:, 1])

How to predict a continuous dependent variable that expresses target class probabilities?

My samples can either belong to class 0 or class 1 but for some of my samples I only have a probability available for them to belong to class 1. So far I've discretized my target variable by applying a threshold i.e. all y >= t I assigned to class 1 and I've discarded all samples that have non-zero probability to belong to class 1. Then I fitted a linear SVM to the data using scitkit-learn.
Of cause this way I through away quite a bit of the training data. One idea I had was to omit the discretization and use regression instead but usually it's not a good idea to approach classification by regression as for example it doesn't guarantee predicted values to be in the interval [0,1].
By the way the nature of my features x is similar as for some of them I also only have probabilities for the respective feature to be present. For the error it didn't make a big difference if I discretized my features in the same way I discretized the dependent variable.
You might be able to approximate this using sample weighting - assign a sample to the class which has the highest probability, but weight that sample by the probability of it actually belonging. Many of the scikit-learn estimators allow for this.
Example:
X = [1, 2, 3, 4] -> class 0 with probability .7 would become X = [1, 2, 3, 4] y = [0] with sample weight of .7 . You might also normalize so the sample weights are between 0 and 1 (since your probabilities and sample weights will only be from .5 to 1. in this scheme). You could also incorporate non-linear penalties to "strengthen" the influence of high probability samples.

How to compute probabilities instead of actual classifications on ML problems

Let's assume that we have a few data points that can be used as the training set. Each row is consisted of 4 say columns (features) that take boolean values. The 5th column expresses the class and it also takes boolean values. Here is an example (they are almost random):
1,1,1,0,1
0,1,1,0,1
1,1,0,0,1
0,0,0,0,0
1,0,0,1,0
0,0,0,0,0
Now, what I want to do is to build a model such that for any given input (new line) the system does not return the class itself (like in the case of a regular classification problem) but instead the probability this particular input belongs to class 0 or class 1. How can I do that? What's more, how can I generate a confidence interval or error rate associated with that computation?
Not all classification algorithms return probabilities, because not all of them have an underlying probabilistic model. For example, a classification tree is just a set of rules that you follow to assign each new input to a particular class.
An example of a classification algorithm that does have an underlying probabilistic model is logistic regression. In this algorithm, the probability that a particular input x is in the class is
prob = 1 / (1 + exp( -theta * x ))
where theta is a vector of coefficients with the same number of dimensions as x. Generally to move from probabilities to classifications, you simply threshold, e.g.
if prob < 0.5
return 0;
else
return 1;
end
Other classification algorithms may have probabilistic interpretations, for example random forests are essentially a voting algorithm with multiple classification trees. If 80% of the trees vote for class 1 and 20% vote for class 2, then you could output an 80% probability of being in class 1. But this is a side effect of how the model works, rather than an explicit underlying probability model.

Resources