How to calculate evaluation matrix(accuracy, precision, recall, f1 score) for multiOutput Classifier - machine-learning

I want to evaluate SVM which predicts more than one column. So I use multiOutput. Here is my code:
C = 1.0 # SVM regularization parameter
linear_svc=MultiOutputClassifier(SVC(kernel='linear', C=C))
I use cross validation:
scoring = {'accuracy' : make_scorer(accuracy_score),
'precision' : make_scorer(precision_score)
}
results = model_selection.cross_val_score(estimator=linear_svc,
X=X_train_,
y=y_train_,
cv=kfold,
scoring=scoring)
But i can't. please help

Related

ValueError: Classification metrics can't handle a mix of multiclass and continuous-multioutput targets

I am trying to calculate the f1 score for a multi-class classification problem using the Cifar10 dataset. I am importing f1 metirics from the sklearn library. However I keep getting the following error message:
ValueError: Classification metrics can't handle a mix of multiclass and continuous-multioutput targets
Below is my function for testing the model on my validation set. Would someone be able to explain how to calculate f1 when performing multi-class classification. I am getting quite confused.
#torch.no_grad()
def valid_function(model, optimizer, val_loader):
model.eval()
val_loss = 0.0
val_accu = 0.0
f_one = []
for i, (x_val, y_val) in enumerate(val_loader):
x_val, y_val = x_val.to(device), y_val.to(device)
val_pred = model(x_val)
loss = criterion(val_pred, y_val)
val_loss += loss.item()
val_accu += accuracy(val_pred, y_val)
f_one.append(f1_score(y_val.cpu(), val_pred.cpu()))
val_loss /= len(val_loader)
val_accu /= len(val_loader)
print('Val Loss: %.3f | Val Accuracy: %.3f'%(val_loss,val_accu))
return val_loss, val_accu
The problem is here:
val_pred = model(x_val)
You need to convert how you load the model. For example in your case:
val_pred = np.argmax(model.predict(x_val), axis=-1)

Implementing Group Lasso on PyTorch weight matrices

I am trying to implement Group Lasso on weight matrices of a neural network in PyTorch.
I have written the code to implement Group Lasso but am unsure if this is correct, confirmation or correction of my code will be very helpful.
def gl_norm(model, gl_lambda, num_blk):
gl_reg = torch.tensor(0., dtype=torch.float32).cuda()
for key in model:
for param in model[key].parameters():
dim = param.size()
if dim.__len__() > 1 and not model[key].skip_regularization:
div1 = list(torch.chunk(param,int(num_blk),1))
all_blks = []
for div2 in div1:
temp = list(torch.chunk(div2,int(num_blk),0))
for blk in temp:
all_blks.append(blk)
for l2_param in all_blks:
gl_reg += torch.norm(l2_param, 2)
return gl_reg * float(gl_lambda)
I expect the torch.chunk function to break up the weight matrix into small blocks which then go through L2 norm for the block and L1 norm between all the blocks.

h20 predict function probability scoring on test data

I have created h20 random forest model for fraud prediction.now while scoring using predict function for test data. I got below dataframe from predict function output.
Now for 2nd records it predicted 1 but probability of p1 is far less than p0. What's the correct probability scores (p0/1) and classification we can use for my fraud prediction model?
If these are not correct probabilities then calibrated probabilities calculated using parameters(calibrate_model = True) as mentioned below will give correct probability?
nfolds=5
rf1 = h2o.estimators.H2ORandomForestEstimator(
model_id = "rf_df1",
ntrees = 200,
max_depth = 4,
sample_rate = .30,
# stopping_metric="misclassification",
# stopping_rounds = 2,
mtries = 6,
min_rows = 12,
nfolds=3,
distribution = "multinomial",
fold_assignment="Modulo",
keep_cross_validation_predictions=True,
calibrate_model = True,
calibration_frame = calib,
weights_column = "weight",
balance_classes = True
# stopping_tolerance = .005)
)
predict p0 p1
1 0 0.9986012 0.000896514
2 1 0.9985695 0.000448676
3 0 0.9981387 0.000477767
The prediction labels are based on a threshold and the threshold used is generally based on the threshold that maximizes the F1 score. See the following post to learn more on how to interpret the probability results.
Details on how the calibration frame and model work can be found here and here.

I want to calculate cross entropy at all outputs of LSTM

I am writing a program of classification problem using LSTM.
However, I do not know how to calculate cross entropy with all the output of LSTM.
Here is a part of my program.
cell_fw = tf.nn.rnn_cell.LSTMCell(num_hidden)
cell_bw = tf.nn.rnn_cell.LSTMCell(num_hidden)
outputs, _ = tf.nn.bidirectional_dynamic_rnn(cell_fw,cell_bw,inputs = inputs3, dtype=tf.float32,sequence_length= seq_len)
outputs = tf.concat(outputs,axis=2)
#outputs [batch_size,max_timestep,num_features]
outputs = tf.reshape(outputs, [-1, num_hidden*2])
W = tf.Variable(tf.truncated_normal([num_hidden*2,
num_classes],
stddev=0.1))
b = tf.Variable(tf.constant(0., shape=[num_classes]))
logits = tf.matmul(outputs, W) + b
How can I apply crossentropy error to this?
Should I create a vector that represents the same class as the number of max_timestep for each batch and calculate the error with that?
Have you looked at cross_entropy documentation: https://www.tensorflow.org/api_docs/python/tf/losses/softmax_cross_entropy ?
The dimension of onehot_labels should answer your question.

Tensorflow KNN : How can we assign the K parameter for defining number of neighbors in KNN?

I have started working on a machine learning project using K-Nearest-Neighbors method on python tensorflow library. I have no experience working with tensorflow tools, so I found some code in github and modified it for my data.
My dataset is like this:
2,2,2,2,0,0,3
2,2,2,2,0,1,0
2,2,2,4,2,2,1
...
2,2,2,4,2,0,0
And this is the code which actually works fine:
import tensorflow as tf
import numpy as np
# Whole dataset => 1428 samples
dataset = 'car-eval-data-1.csv'
# samples for train, remaining for test
samples = 1300
reader = np.loadtxt(open(dataset, "rb"), delimiter=",", skiprows=1, dtype=np.int32)
train_x, train_y = reader[:samples,:5], reader[:samples,6]
test_x, test_y = reader[samples:, :5], reader[samples:, 6]
# Placeholder you can assign values in future. its kind of a variable
# v = ("variable type",[None,4]) -- you can have multidimensional values here
training_values = tf.placeholder("float",[None,len(train_x[0])])
test_values = tf.placeholder("float",[len(train_x[0])])
# MANHATTAN distance
distance = tf.abs(tf.reduce_sum(tf.square(tf.subtract(training_values,test_values)),reduction_indices=1))
prediction = tf.arg_min(distance, 0)
init = tf.global_variables_initializer()
accuracy = 0.0
with tf.Session() as sess:
sess.run(init)
# Looping through the test set to compare against the training set
for i in range (len(test_x)):
# Tensor flow method to get the prediction near to the test parameters in the training set.
index_in_trainingset = sess.run(prediction, feed_dict={training_values:train_x,test_values:test_x[i]})
print("Test %d, and the prediction is %s, the real value is %s"%(i,train_y[index_in_trainingset],test_y[i]))
if train_y[index_in_trainingset] == test_y[i]:
# if prediction is right so accuracy increases.
accuracy += 1. / len(test_x)
print('Accuracy -> ', accuracy * 100, ' %')
The only thing I do not understand is that if it's the KNN method so there has to be some K parameter which defines the number of neighbors for predicting the label for each test sample.
How can we assign the K parameter to tune the number of nearest neighbors for the code?
Is there any way to modify this code to make use of K parameter?
You're right that the example above does not have the provision to select K-Nearest neighbours. In the code below, I have added the ability to add such a parameter(knn_size) along with other corrections
import tensorflow as tf
import numpy as np
# Whole dataset => 1428 samples
dataset = 'PATH_TO_DATASET_CSV'
knn_size = 1
# samples for train, remaining for test
samples = 1300
reader = np.loadtxt(open(dataset, "rb"), delimiter=",", skiprows=1, dtype=np.int32)
train_x, train_y = reader[:samples,:6], reader[:samples,6]
test_x, test_y = reader[samples:, :6], reader[samples:, 6]
# Placeholder you can assign values in future. its kind of a variable
# v = ("variable type",[None,4]) -- you can have multidimensional values here
training_values = tf.placeholder("float",[None, len(train_x[0])])
test_values = tf.placeholder("float",[len(train_x[0])])
# MANHATTAN distance
distance = tf.abs(tf.reduce_sum(tf.square(tf.subtract(training_values,test_values)),reduction_indices=1))
# Here, we multiply the distance by -1 to reverse the magnitude of distances, i.e. the largest distance becomes the smallest distance
# tf.nn.top_k returns the top k values and their indices, here k is controlled by the parameter knn_size
k_nearest_neighbour_values, k_nearest_neighbour_indices = tf.nn.top_k(tf.scalar_mul(-1,distance),k=knn_size)
#Based on the indices we obtain from the previous step, we locate the exact class label set of the k closest matches in the training data
best_training_labels = tf.gather(train_y,k_nearest_neighbour_indices)
if knn_size==1:
prediction = tf.squeeze(best_training_labels)
else:
# Now we make our prediction based on the class label that appears most frequently
# tf.unique_with_counts() gives us all unique values that appear in a 1-D tensor along with their indices and counts
values, indices, counts = tf.unique_with_counts(best_training_labels)
# This gives us the index of the class label that has repeated the most
max_count_index = tf.argmax(counts,0)
#Retrieve the required class label
prediction = tf.gather(values,max_count_index)
init = tf.global_variables_initializer()
accuracy = 0.0
with tf.Session() as sess:
sess.run(init)
# Looping through the test set to compare against the training set
for i in range (len(test_x)):
# Tensor flow method to get the prediction near to the test parameters in the training set.
prediction_value = sess.run([prediction], feed_dict={training_values:train_x,test_values:test_x[i]})
print("Test %d, and the prediction is %s, the real value is %s"%(i,prediction_value[0],test_y[i]))
if prediction_value[0] == test_y[i]:
# if prediction is right so accuracy increases.
accuracy += 1. / len(test_x)
print('Accuracy -> ', accuracy * 100, ' %')

Resources