I have a classification task. The training data has 50 different labels. The customer wants to differentiate the low probability predictions, meaning that, I have to classify some test data as Unclassified / Other depending on the probability (certainty?) of the model.
When I test my code, the prediction result is a numpy array (I'm using different models, this is one is pre-trained BertTransformer). The prediction array doesn't contain probabilities such as in Keras predict_proba() method. These are numbers generated by prediction method of pretrained BertTransformer model.
[[-1.7862008 -0.7037363 0.09885322 1.5318055 2.1137428 -0.2216074
0.18905772 -0.32575375 1.0748093 -0.06001111 0.01083148 0.47495762
0.27160102 0.13852511 -0.68440574 0.6773654 -2.2712054 -0.2864312
-0.8428862 -2.1132915 -1.0157436 -1.0340284 -0.35126117 -1.0333195
9.149789 -0.21288703 0.11455813 -0.32903734 0.10503325 -0.3004114
-1.3854568 -0.01692022 -0.4388664 -0.42163098 -0.09182278 -0.28269592
-0.33082992 -1.147654 -0.6703184 0.33038092 -0.50087476 1.1643585
0.96983343 1.3400391 1.0692116 -0.7623776 -0.6083422 -0.91371405
0.10002492]]
I'm using numpy.argmax() to identify the correct label. The prediction works just fine. However, since these are not probabilities, I cannot compare the best result with a threshold value.
My question is, how can I define a threshold (say, 0.6), and then compare the probability of the argmax() element of the BertTransformer prediction array so that I can classify the prediction as "Other" if the probability is less than the threshold value?
Edit 1:
We are using 2 different models. One is Keras, and the other is BertTransformer. We have no problem in Keras since it gives the probabilities so I'm skipping Keras model.
The Bert model is pretrained. Here is how it is generated:
def model(self, data):
number_of_categories = len(data['encoded_categories'].unique())
model = BertForSequenceClassification.from_pretrained(
"dbmdz/bert-base-turkish-128k-uncased",
num_labels=number_of_categories,
output_attentions=False,
output_hidden_states=False,
)
# model.cuda()
return model
The output given above is the result of model.predict() method. We compare both models, Bert is slightly ahead, therefore we know that the prediction works just fine. However, we are not sure what those numbers signify or represent.
Here is the Bert documentation.
BertForSequenceClassification returns logits, i.e., the classification scores before normalization. You can normalize the scores by calling F.softmax(output, dim=-1) where torch.nn.functional was imported as F.
With thousands of labels, the normalization can be costly and you do not need it when you are only interested in argmax. This is probably why the models return the raw scores only.
Related
When training an XGBoost classification model, I am using the eli5 function "explain_prediction()" to look at the feature contributions to invidividual predictions.
However, the eli5 package seems to be treating my model as a regressor rather than a classifier.
Below is a snippet of code, showing my model, my prediction, and then the output from the "explain_prediction" method.
As you can see, the output gives a score that is 3.016 rather than a probability between 0 and 1. In this case I would have expected 0.953.
Any help appreciated.
the eli5 package seems to be treating my model as a regressor rather than a classifier.
The boosting score is converted to the probability score by applying the inverse logit function to it.
The probability scale is non-linear, which would make the numeric interpretation of feature contributions more difficult.
.. the output gives a score is 3.016 .. I would have expected 0.953
1 / (1 + exp(-3.016)) = 0.9532917416863492
I am working on document classification problem from Kaggle.
It has 5 classes - 'business', 'tech', 'politics', 'sport', 'entertainment'
I have trained my Deep Learning model and got the results for the test set as well. But the result I am getting is the list of probabilities of different classes.
Output for one row
How to get the actual classes(labels) from the output I got?
My Neural Network architecture looks like this-
Network Architecture
You should choose the entry with the highest value as the predicted class. For example, in your provided example: [0.045, 0.030, 0.015, 0.889, 0.019], the predicted class is the forth class (i.e., idx=3) which has the highest probability value.
The argmax function of NumPy is probably what you should be using. Considering that pred are the output probablities from your network in the shape of: (batch_size, num_labels), then np.argmax(pred, axis=1) will give you the indices (i.e., labels) associated with the predicted classes.
I have a classification problem with 10 features and I have to predict 1 or 0. When I train the SVC model, with the train test split, all the predicted values for the test portion of the data comes out to be 0. The data has the following 0-1 count:
0: 1875
1: 1463
The code to train the model is given below:
from sklearn.svm import SVC
model = SVC()
model.fit(X_train, y_train)
pred= model.predict(X_test)
from sklearn.metrics import accuracy_score
accuracy_score(y_test, pred)`
Why does it predict 0 for all the cases?
The model predicts the more frequent class, even though the dataset is nor much imbalanced. It is very likely that the class cannot be predicted from the features as they are right now.
You may try normalizing the features.
Another thing you might want to try is to have a look at how correlated the features are with each other. Having highly correlated features might also prevent the model from converging.
Also, you might have chosen the wrong features.
For a classification problem, it is always good to run a dummy classifiar as a starting point. This will give you an idea how good your model can be.
You can use this as a code:
from sklearn.dummy import DummyClassifier
dummy_classifier = DummyClassifier(strategy="most_frequent")
dummy_classifier.fit(X_train,y_train)
pred_dum= dummy_classifier.predict(X_test)
accuracy_score(y_test, pred_dum)
this will give you an accuracy, if you predict always the most frequent class. If this is for example: 100% , this would mean that you only have one class in your dataset. 80% means, that 80% of your data belongs to one class.
In a first step you can adjust your SVC:
model = SVC(C=1.0, kernel=’rbf’, random_state=42)
C : float, optional (default=1.0)Penalty parameter C of the error
term.
kernel : Specifies the kernel type to be used in the algorithm. It
must be one of ‘linear’, ‘poly’, ‘rbf’
This can give you a starting point.
On top you should run also a prediction for your training data, to see the comparison if you are over- or underfitting.
trainpred= model.predict(X_train)
accuracy_score(y_test, trainpred)
I am new to SVM. I am using jlibsvm for a multi-class classification problem. Basically, I am doing a sentence classification problem. There are 3 Classes. What I understood is I am doing One-against-all classification. I have a comparatively small train set. A total of 75 sentences, In which 25 sentences belongs to each class.
I am making 3 SVMs (so 3 different models), where, while training, in SVM_A, sentences belong to CLASS A will have a true label, i.e., 1 and other sentences will have a -1 label. Correspondingly done for SVM_B, and SVM_C.
While testing, to get the true label of a sentence, I am giving the sentence to 3 models and I am taking the prediction probability returned by these 3 models. Which one returns the highest will be the class the sentence belong to.
This is how I am doing. But I am getting the same prediction probability for every sentence in the test set for all models.
A predicted:0.012820514
B predicted:0.012820514
C predicted:0.012820514
These values repeat for all sentences in the training set.
The following is how I set parameters for training:
C_SVC svm = new C_SVC();
MutableBinaryClassificationProblemImpl problem;
ImmutableSvmParameterGrid.Builder builder = ImmutableSvmParameterGrid.builder();
// create training parameters ------------
HashSet<Float> cSet;
HashSet<LinearKernel> kernelSet;
cSet = new HashSet<Float>();
cSet.add(1.0f);
kernelSet = new HashSet<LinearKernel>();
kernelSet.add(new LinearKernel());
// configure finetuning parameters
builder.eps = 0.001f; // epsilon
builder.Cset = cSet; // C values used
builder.kernelSet = kernelSet; //Kernel used
builder.probability=true; // To get the prediction probability
ImmutableSvmParameter params = builder.build();
What am I doing wrong?
Is there any other better way to do multi-class classification other than this?
You are getting the same output, because you generate the same model three times.
The reason for this is, that jlibsvm is able to perform multiclass classification out of the box based on the provided data (LIBSVM itself supports this too). If it detects, that more than two class labes are provided in the given data, it automatically performs multiclass classification. So there is no need for a manually 1vsN approach. Just supply the data with class-labels for each category.
However, jlibsvm is still in beta and relies on a rather old version of LIBSVM (2.88). A lot has changed. For a more intiuitive Java binding (in comparison to the default LIBSVM version), you can take a look at zlibsvm, which is available via Maven Central and based on the latest LIBSVM version.
Let's assume that we have a few data points that can be used as the training set. Each row is consisted of 4 say columns (features) that take boolean values. The 5th column expresses the class and it also takes boolean values. Here is an example (they are almost random):
1,1,1,0,1
0,1,1,0,1
1,1,0,0,1
0,0,0,0,0
1,0,0,1,0
0,0,0,0,0
Now, what I want to do is to build a model such that for any given input (new line) the system does not return the class itself (like in the case of a regular classification problem) but instead the probability this particular input belongs to class 0 or class 1. How can I do that? What's more, how can I generate a confidence interval or error rate associated with that computation?
Not all classification algorithms return probabilities, because not all of them have an underlying probabilistic model. For example, a classification tree is just a set of rules that you follow to assign each new input to a particular class.
An example of a classification algorithm that does have an underlying probabilistic model is logistic regression. In this algorithm, the probability that a particular input x is in the class is
prob = 1 / (1 + exp( -theta * x ))
where theta is a vector of coefficients with the same number of dimensions as x. Generally to move from probabilities to classifications, you simply threshold, e.g.
if prob < 0.5
return 0;
else
return 1;
end
Other classification algorithms may have probabilistic interpretations, for example random forests are essentially a voting algorithm with multiple classification trees. If 80% of the trees vote for class 1 and 20% vote for class 2, then you could output an 80% probability of being in class 1. But this is a side effect of how the model works, rather than an explicit underlying probability model.