Hello everyone, I'm new in this area, I wondered if anyone could help me understand the results of logistic regression.
I would need to understand if the independent variables can be used to make a good classification.
=== Run information ===
Scheme: weka.classifiers.functions.Logistic -R 1.0E-8 -M -1 -num-decimal-places 4
Relation: Train
Instances: 14185
Attributes: 5
ATTR_1
ATTR_2
ATTR_3
ATTR_4
DEPENDENT_VAR
Test mode: evaluate on training data
=== Classifier model (full training set) ===
Logistic Regression with ridge parameter of 1.0E-8
Coefficients...
Class
Variable 0
====================
ATTR_1 0.0022
ATTR_2 0.0022
ATTR_3 0.0034
ATTR_4 -0.0021
Intercept 0.9156
Odds Ratios...
Class
Variable 0
====================
ATTR_1 1.0022
ATTR_2 1.0022
ATTR_3 1.0034
ATTR_4 0.9979
Time taken to build model: 0.13 seconds
=== Evaluation on training set ===
Time taken to test model on training data: 0.07 seconds
=== Summary ===
Correctly Classified Instances 51240 72.2453 %
Incorrectly Classified Instances 19685 27.7547 %
Kappa statistic -0.0001
Mean absolute error 0.3992
Root mean squared error 0.4467
Relative absolute error 99.5581 %
Root relative squared error 99.7727 %
Total Number of Instances 70925
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
1,000 1,000 0,723 1,000 0,839 -0,005 0,545 0,759 0
0,000 0,000 0,000 0,000 0,000 -0,005 0,545 0,305 1
Weighted Avg. 0,722 0,723 0,522 0,722 0,606 -0,005 0,545 0,633
=== Confusion Matrix ===
a b <-- classified as
51240 5 | a = 0
19680 0 | b = 1
In particular, I am interested in understanding the values of the coefficients and the odds-ratios.
Thanks.
Off the top of my head:
Odds ratios and coefficient values are proportional to another, and can be calculated from each other.
For attribute1 , exp(0.0022) = 1.002
For doing more calculations and fitting/predicting, coefficients are "better". However the coefficients are values that must be plugged into exp(x) functions and are somewhat difficult to "visualize in your head".
For human understanding, odds ratios are sometimes more convenient - easier to interpret/visualize, but you can't do certain calculations directly with them.
Weka does not know what you are more interested in, so it gives you both for convenience.
By the way, weka does regularized logistic regression
(Logistic Regression with ridge parameter of 1.0E-8), so coefficients might differ slightly from those that a different software package might give you.
Related
I have an imbalanced dataset which has 43323 rows and 9 of them belong to 'failure' class, other rows belong to 'normal' class. I trained a classifier with 100% recall and 94.89% AUC for test data (0.75/0.25 split with stratify = y). However, the classifier has 0.18% precision & 0.37% F1 score. I assumed I can find better F1 score by changing the threshold but I failed (I checked the threshold between 0 to 1 with step = 0.01). Also, it seems weired to me that usually when dealing with imbalanced dataset, it is hard to get a high recall. The goal is to get a better F1 score. What can I do for the next step? Thanks!
(To be clear, I used SMOTE to upsample the failure samples in training dataset)
Getting 100% recall is trivial in fact: just classify everything as 1.
Is the precision/recall curve any good? Perhaps a more thorough scan could yield a better result:
probabilities = model.predict_proba(X_test)
precision, recall, thresholds = sklearn.metrics.precision_recall_curve(y_test, probabilities)
f1_scores = 2 * recall * precision / (recall + precision)
best_f1 = np.max(f1_scores)
best_thresh = thresholds[np.argmax(f1_scores)]
I implemented an ANN (1 hidden layer of 64 units, learning rate = 0.001, epsilon = 0.001, iters = 500) with pythons OpenCV module. Train error ~ 3% and test error ~ 12%
In order to improve the accruacy/ generalisation of my NN I decided to proceed by- implementing model selection (of #hidden units and learning rate) to get an accurate value of hyperparameters and plotting learning curves to determine if more data is needed (currently have 2.5k).
Having read some sources regarding NN training and model selection, I'm very confused on the following matter -
1) In order to perform model selection, I know the following needs to be done-
create set possibleHiddenUnits {4, 8, 16, 32, 64}
randomly select Tr & Va sets from the total set of Tr + Va with some split e.g. 80/20
foreach ele in possibleHiddenUnits
(*) compute weights for the NN using backpropagation and an iterative optimisation algorithm like Gradient Descent (where we provide the termination criteria in the form of number of iterations / epsilon)
compute Validation set error using these trained weights
select the number of hidden units which min Va set error
Alternatively, I believe we can also use k-fold cross validation.
a. how do you decide what the number of iterations/ epsilon for GD should be?
b. does 1 iteration out of x iterations of GD (where the entire training set is used to compute the gradients of cost wrt weights through backprop) constitute an 'epoch'?
2) Sources (whats is the difference between train, validation and test set, in neural networks? and How to use k-fold cross validation in a neural network) mention that the training for a NN is done in the following way as it prevents over-fitting
for each epoch
for each training data instance
propagate error through the network
adjust the weights
calculate the accuracy over training data
for each validation data instance
calculate the accuracy over the validation data
if the threshold validation accuracy is met
exit training
else
continue training
a. I believe this method should be executed once the model selection has been done. But then how do we avoid overfitting of the model in step (*) of the model selection process above?
b. Am I right in assuming that one epoch constitues one iteration of training where weights are calculated using the entire Tr set through GD + backprop and GD involves x (>1) iters over the entire Tr set to calculate the weights ?
Also, out off 1b and 2b which is correct?
This is more of a comment but since I cant make comments yet I write it here. Have you tried other methods like l2 regularization or dropout? I dont know a lot about model selection but dropout has a very similiar effect like taking lots of models and averaging them. Normaly dropout should do the trick and you wont have problems with overfitting anymore.
I'm new to machine learning and I'm trying to learn the process and have started by playing around with Weka. When I load the data in Weka and start the classification, the software shows values such as below:
Correctly Classified Instances 416 39.6568 %
Incorrectly Classified Instances 633 60.3432 %
Kappa statistic 0.091
Mean absolute error 0.4371
Root mean squared error 0.4663
Relative absolute error 98.4524 %
Root relative squared error 98.9763 %
Coverage of cases (0.95 level) 100 %
Mean rel. region size (0.95 level) 100 %
Total Number of Instances 1049
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
0.310 0.231 0.377 0.310 0.340 0.084 0.554 0.448 16-18
0.271 0.167 0.460 0.271 0.341 0.123 0.501 0.359 19+
0.599 0.511 0.382 0.599 0.467 0.084 0.570 0.395 All Age
Weighted Avg. 0.397 0.306 0.407 0.397 0.384 0.098 0.541 0.399
By taking a look at these values, I can assume that I have bad data since the number of Correctly Classified Instances is 37.65 and there is a high error rate. But the TP Rate and Precision are around an acceptable level.
This makes me confused, I want to know how I can judge the model based on these numbers? Does it mean my data is badly preprocessed?
You have to do a confusion matrix to get the accuracy and precision. Below is the link. Hope it helps.
http://www2.cs.uregina.ca/~dbd/cs831/notes/confusion_matrix/confusion_matrix.html
I was trying to data model a Classification Machine Learning algorithm on a data set which has 32 attributes,the last column being Target class.I refined the attributes number in to 6 from 32 ,which I felt would be more useful for my Classification model.
I tried to perform J48 and some incremental classification algorithm.
I expected output structure which consists of confusion matrix,correctlt and incorrectly classified instances,kappa value.
But my result did not give any information on Correctly and Incorrectly classified instances.Also,it did not predict confusion matrix and Kappa value.All I received is like this:
=== Summary ===
Correlation coefficient 0.9482
Mean absolute error 0.2106
Root mean squared error 0.5673
Relative absolute error 13.4077 %
Root relative squared error 31.9157 %
Total Number of Instances 1461
Can anyone tell me why I did not get Confusion matrix,kappa and Correct,Incorrect instances information.
Unfortunately you didnt write your code, or what version of weka do you apply.
BTW, to calculate confusion mtx, kappa etc. you can use methods of Evaluation class, http://weka.sourceforge.net/doc.dev/weka/classifiers/Evaluation.html
for example, after you train your model:
classifier.buildClassifier(train); \\train is an instances
Evaluation eval = new Evaluation(train);
//evaulate your model at 10 fold cross validation manner
eval.crossValidateModel(classifier, train, 10, new Random(1));
System.out.println(classifier);
//print different stats with
System.out.println(eval.toSummaryString());
System.out.println(eval.toMatrixString());
System.out.println(eval.toClassDetailsString());
First I read this: How to interpret weka classification?
but it didn't helped me.
Then, to set up the background, I am trying to learn in kaggle competitions and models are evaluated with ROC area.
Actually I built two models and data about them are represented in this way:
Correctly Classified Instances 10309 98.1249 %
Incorrectly Classified Instances 197 1.8751 %
Kappa statistic 0.7807
K&B Relative Info Score 278520.5065 %
K&B Information Score 827.3574 bits 0.0788 bits/instance
Class complexity | order 0 3117.1189 bits 0.2967 bits/instance
Class complexity | scheme 948.6802 bits 0.0903 bits/instance
Complexity improvement (Sf) 2168.4387 bits 0.2064 bits/instance
Mean absolute error 0.0465
Root mean squared error 0.1283
Relative absolute error 46.7589 % >72<69
Root relative squared error 57.5625 % >72<69
Total Number of Instances 10506
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure ROC Area Class
0.998 0.327 0.982 0.998 0.99 0.992 0
0.673 0.002 0.956 0.673 0.79 0.992 1
Weighted Avg. 0.981 0.31 0.981 0.981 0.98 0.992
Apart of K&B Relative Info Score; Relative absolute error and Root relative squared error which are respectively inferior, superior and superior in the best model as assessed by ROC curves,
all data are the same.
I built a third model with similar behavior (TP rate and so on), but again K&B Relative Info Score; Relative absolute error and Root relative squared error varied. But that did not allowed to predict if this third model was superior to both first (variations where the same compared to the best model, so theorically it should have been superior, but it wasn't).
What should I do to predict if a model will perform well given such details about it?
Thanks by advance.