the stacking model, I want to see the recall and precision results - stack

In the stacking model, I want to see the recall and accuracy results, I have tried many methods and I have not found results. I have found recall and precision in another model but I stuck with the stacking model., little help would go a long way.
estimator = [
('rf', RandomForestClassifier(n_estimators=10, random_state=42)),
('dec_tree', dec_tree),
('knn', knn),
('xgb' , xgb),
('ext' , ext),
('grad' , grad),
('hist', hist)]
#bulid stack model
stack_model= StackingClassifier(
estimators=estimator, final_estimator=LogisticRegression())
#train stack model
stack_model.fit(x_train, y_train)
#make preduction
y_train_pred = stack_model.predict(x_train)
y_test_pred = stack_model.predict(x_test)
#traning set performance
stack_model_train_accuracy = accuracy_score(y_train,y_train_pred)
stack_model_train_f1 = f1_score(y_train,y_train_pred, average ='weighted')
#Testing set performance
stack_model_test_accuracy = accuracy_score(y_test,y_test_pred)
stack_model_test_f1= f1_score(y_test,y_test_pred, average ='weighted')
#print
print ('Model Performance For Traning Set')
print ('- Accuracy: %s' % stack_model_train_accuracy)
print ('- f1: %s' % stack_model_train_f1)
print ('______________________________________')
print ('Model Performance For Testing Set')
print ('- Accuracy: %s' % stack_model_test_accuracy)
print ('- f1: %s' % stack_model_test_f1)
until here it is working >> but I need the recall and precision. if I check them out, in the same way, I checked the accuracy and f_score > > it will be wrong! and if I used classification_report it will be an error too.

Related

Why is my explained variance a negative value for regression models?

This is the code I'm using to compare performance metrics of different regression models on my timeseries data (basically I'm trying to predict certain values based off the month & day of the year)
import sklearn.metrics as metrics
def regression_results(y_true, y_pred):
predictions=y_pred
test_labels=y_true
errors = abs(predictions - test_labels)
# Print out the mean absolute error (mae)
print('Mean Absolute Error:', round(np.mean(errors), 2))
# Calculate mean absolute percentage error (MAPE)
mape = 100 * (errors / test_labels)
# Calculate and display accuracy
accuracy = 100 - np.mean(mape)
print('Accuracy:', round(accuracy, 2), '%.')
# Regression metrics
explained_variance=metrics.explained_variance_score(y_true, y_pred)
mean_absolute_error=metrics.mean_absolute_error(y_true, y_pred)
mse=metrics.mean_squared_error(y_true, y_pred)
mean_squared_log_error=metrics.mean_squared_log_error(y_true, y_pred)
median_absolute_error=metrics.median_absolute_error(y_true, y_pred)
r2=metrics.r2_score(y_true, y_pred)
print('explained_variance: ', round(explained_variance,4))
print('mean_squared_log_error: ', round(mean_squared_log_error,4))
print('r2: ', round(r2,4))
print('MAE: ', round(mean_absolute_error,4))
print('MSE: ', round(mse,4))
print('RMSE: ', round(np.sqrt(mse),4))
These are the results I'm getting for randomforestregressor model (and all other regression models display similar results, including the negative explained variance value).
Mean Absolute Error: 0.02
Accuracy: 98.41 %.
explained_variance: -0.4901
mean_squared_log_error: 0.0001
r2: -0.5035
MAE: 0.0163
MSE: 0.0004
RMSE: 0.0205
Does this mean my data is bad?

is binary cross entropy an additive function?

I am trying to train a machine learning model where the loss function is binary cross entropy, because of gpu limitations i can only do batch size of 4 and i'm having lot of spikes in the loss graph. So I'm thinking to back-propagate after some predefined batch size(>4). So it's like i'll do 10 iterations of batch size 4 store the losses, after 10th iteration add the losses and back-propagate. will it be similar to batch size of 40.
TL;DR
f(a+b) = f(a)+f(b) is it true for binary cross entropy?
f(a+b) = f(a) + f(b) doesn't seem to be what you're after. This would imply that BCELoss is additive which it clearly isn't. I think what you really care about is if for some index i
# false
f(x, y) == f(x[:i], y[:i]) + f([i:], y[i:])
is true?
The short answer is no, because you're missing some scale factors. What you probably want is the following identity
# true
f(x, y) == (i / b) * f(x[:i], y[:i]) + (1.0 - i / b) * f(x[i:], y[i:])
where b is the total batch size.
This identity is used as motivation for the gradient accumulation method (see below). Also, this identity applies to any objective function which returns an average loss across each batch element, not just BCE.
Caveat/Pitfall: Keep in mind that batch norm will not behave exactly the same when using this approach since it updates its internal statistics based on batch size during the forward pass.
We can actually do a little better memory-wise than just computing the loss as a sum followed by backpropagation. Instead we can compute the gradient of each component in the equivalent sum individually and allow the gradients to accumulate. To better explain I'll give some examples of equivalent operations
Consider the following model
import torch
import torch.nn as nn
import torch.nn.functional as F
class MyModel(nn.Module):
def __init__(self):
super().__init__()
num_outputs = 5
# assume input shape is 10x10
self.conv_layer = nn.Conv2d(3, 10, 3, 1, 1)
self.fc_layer = nn.Linear(10*5*5, num_outputs)
def forward(self, x):
x = self.conv_layer(x)
x = F.max_pool2d(x, 2, 2, 0, 1, False, False)
x = F.relu(x)
x = self.fc_layer(x.flatten(start_dim=1))
x = torch.sigmoid(x) # or omit this and use BCEWithLogitsLoss instead of BCELoss
return x
# to ensure same results for this example
torch.manual_seed(0)
model = MyModel()
# the examples will work as long as the objective averages across batch elements
objective = nn.BCELoss()
# doesn't matter what type of optimizer
optimizer = torch.optim.SGD(model.parameters(), lr=0.001)
and lets say our data and targets for a single batch are
torch.manual_seed(1) # to ensure same results for this example
batch_size = 32
input_data = torch.randn((batch_size, 3, 10, 10))
targets = torch.randint(0, 1, (batch_size, 20)).float()
Full batch
The body of our training loop for an entire batch may look something like this
# entire batch
output = model(input_data)
loss = objective(output, targets)
optimizer.zero_grad()
loss.backward()
optimizer.step()
loss_value = loss.item()
print("Loss value: ", loss_value)
print("Model checksum: ", sum([p.sum().item() for p in model.parameters()]))
Weighted sum of loss on sub-batches
We could have computed this using the sum of multiple loss functions using
# This is simpler if the sub-batch size is a factor of batch_size
sub_batch_size = 4
assert (batch_size % sub_batch_size == 0)
# for this to work properly the batch_size must be divisible by sub_batch_size
num_sub_batches = batch_size // sub_batch_size
loss = 0
for sub_batch_idx in range(num_sub_batches):
start_idx = sub_batch_size * sub_batch_idx
end_idx = start_idx + sub_batch_size
sub_input = input_data[start_idx:end_idx]
sub_targets = targets[start_idx:end_idx]
sub_output = model(sub_input)
# add loss component for sub_batch
loss = loss + objective(sub_output, sub_targets) / num_sub_batches
optimizer.zero_grad()
loss.backward()
optimizer.step()
loss_value = loss.item()
print("Loss value: ", loss_value)
print("Model checksum: ", sum([p.sum().item() for p in model.parameters()]))
Gradient accumulation
The problem with the previous approach is that in order to apply back-propagation, pytorch needs to store intermediate results of layers in memory for every sub-batch. This ends up requiring a relatively large amount of memory and you may still run into memory consumption issues.
To alleviate this problem, instead of computing a single loss and performing back-propagation once, we could perform gradient accumulation. This gives equivalent results of the previous version. The difference here is that we instead perform a backward pass on each component of
the loss, only stepping the optimizer once all of them have been backpropagated. This way the computation graph is cleared after each sub-batch which will help with memory usage. Note that this works because .backward() actually accumulates (adds) the newly computed gradients to the existing .grad member of each model parameter. This is why optimizer.zero_grad() must be called only once, before the loop, and not during or after.
# This is simpler if the sub-batch size is a factor of batch_size
sub_batch_size = 4
assert (batch_size % sub_batch_size == 0)
# for this to work properly the batch_size must be divisible by sub_batch_size
num_sub_batches = batch_size // sub_batch_size
# Important! zero the gradients before the loop
optimizer.zero_grad()
loss_value = 0.0
for sub_batch_idx in range(num_sub_batches):
start_idx = sub_batch_size * sub_batch_idx
end_idx = start_idx + sub_batch_size
sub_input = input_data[start_idx:end_idx]
sub_targets = targets[start_idx:end_idx]
sub_output = model(sub_input)
# compute loss component for sub_batch
sub_loss = objective(sub_output, sub_targets) / num_sub_batches
# accumulate gradients
sub_loss.backward()
loss_value += sub_loss.item()
optimizer.step()
print("Loss value: ", loss_value)
print("Model checksum: ", sum([p.sum().item() for p in model.parameters()]))
I think 10 iterations of batch size 4 is same as one iteration of batch size 40, only here the time taken will be more. Across different training examples losses are added before backprop. But that doesn't make the function linear. BCELoss has a log component, and hence it is not a linear function. However what you said is correct. It will be similar to batch size 40.

Boosting (ensemble learning) graphs of error vs number of trees show an incorrect trend

I am running boosting over a standard dataset (Abalone), using both SAMME and SAMMME.R algorithms of boosting and the graphs that I obtained are not what I was expecting. It is a multi-class classification problem using supervised learning.
Here is the graph
I was expecting a graph where the error reduces as the number of trees increase.But here the error seems to increase or stay constant.
Here is the code for this plot
bdt_real = AdaBoostClassifier(
DecisionTreeClassifier(max_depth=2),
n_estimators=600,
learning_rate=1)
bdt_discrete = AdaBoostClassifier(
DecisionTreeClassifier(max_depth=2),
n_estimators=600,
learning_rate=1.5,
algorithm="SAMME")
bdt_real.fit(X_train, y_train)
bdt_discrete.fit(X_train, y_train)
real_test_errors = []
discrete_test_errors = []
for real_test_predict, discrete_train_predict in zip(
bdt_real.staged_predict(X_test), bdt_discrete.staged_predict(X_test)):
real_test_errors.append(
1. - accuracy_score(real_test_predict, y_test))
discrete_test_errors.append(
1. - accuracy_score(discrete_train_predict, y_test))
n_trees_discrete = len(bdt_discrete)
n_trees_real = len(bdt_real)
# Boosting might terminate early, but the following arrays are always
# n_estimators long. We crop them to the actual number of trees here:
discrete_estimator_errors = bdt_discrete.estimator_errors_[:n_trees_discrete]
real_estimator_errors = bdt_real.estimator_errors_[:n_trees_real]
discrete_estimator_weights = bdt_discrete.estimator_weights_[:n_trees_discrete]
# print discrete_test_errors
# print real_test_errors
plt.figure(figsize=(15, 5))
plt.subplot(131)
plt.plot(range(1, n_trees_discrete + 1),
discrete_test_errors, c='black', label='SAMME')
plt.plot(range(1, n_trees_real + 1),
real_test_errors, c='black',
linestyle='dashed', label='SAMME.R')
plt.legend()
# plt.ylim(0.0, 1.0)
plt.ylabel('Test Error')
plt.xlabel('Number of Trees')
Does anyone have some valid pointers as to what could be the reason for this trend of increasing error with number of trees? I did think this could mean I have overfit the model. But it looks like there is no error reduction at all. For overfitting I expect an error reduction at first and then error increase.

Conv nets accuracy not changing while loss decreases

I'm training several CNNs to do image classification in TensorFlow. The training losses decrease normally. However the test accuracy never changed throughout the whole training procedure, plus the accuracy is very low (0.014) where the accuracy for randomly guessing would be 0.003 (There are around 300 classes). One thing I've noticed is that only those models that I applied batch norm to showed such a weird behavior. What can possibly be wrong to cause this issue? The training set has 80000 samples, in case you might figure this was caused by overfitting. Below is part of the code for evaluation:
Accuracy function:
correct_prediction = tf.equal(tf.argmax(Model(test_image), 1), tf.argmax(test_image_label, 0))
accuracy = tf.cast(correct_prediction, tf.float32)
the test_image is a batch with only one sample in it while the test_image_label is a scalar.
Session:
with tf.Session() as sess:
sess.run(tf.local_variables_initializer())
sess.run(tf.global_variables_initializer())
saver = tf.train.Saver()
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord, start=True)
print('variables initialized')
step = 0
for epoch in range(epochs):
sess.run(enqueue_train)
print('epoch: %d' %epoch)
if epoch % 5 == 0:
save_path = saver.save(sess, savedir + "/Model")
for batch in range(num_batch):
if step % 400 == 0:
summary_str = cost_summary.eval(feed_dict={phase: True})
file_writer.add_summary(summary_str, step)
else:
sess.run(train_step, feed_dict={phase: True})
step += 1
sess.run(train_close)
sess.run(enqueue_test)
accuracy_vector = []
for num in range(len(testnames)):
accuracy_vector.append(sess.run(accuracy, feed_dict={phase: False}))
mean_accuracy = sess.run(tf.divide(tf.add_n(accuracy_vector), len(testnames)))
print("test accuracy %g"%mean_accuracy)
sess.run(test_close)
save_path = saver.save(sess, savedir + "/Model_final")
coord.request_stop()
coord.join(threads)
file_writer.close()
The phase above is to indicate if it is training or testing for the batch norm layer.
Note that I tried to calculate the accuracy with the training set, which led to the minimal loss. However it gives the same poor accuracy. Please help me, I really appreciate it!

Dropout Tensorflow: Weight scaling at test time

Do I need to scale weights at test time in tensorflow i.e weights*keep_prob at testing or tensorflow does it itself? if so then how?
At training my keep_prob is 0.5. and at test time its 1.
Although network is regularized but accuracy is not good as before regularization.
P.S i'm classifying CIFAR10
n_nodes_h1=1000
n_nodes_h2=1000
n_nodes_h3=400
n_nodes_h4=100
classes=10
x=tf.placeholder('float',[None,3073])
y=tf.placeholder('float')
keep_prob=tf.placeholder('tf.float32')
batch_size=100
def neural_net(data):
hidden_layer1= {'weight':tf.Variable(tf.random_normal([3073,n_nodes_h1])),
'biases':tf.Variable(tf.random_normal([n_nodes_h1]))}
hidden_layer2={'weight':tf.Variable(tf.random_normal([n_nodes_h1,n_nodes_h2])),
'biases':tf.Variable(tf.random_normal([n_nodes_h2]))}
out_layer={'weight':tf.Variable(tf.random_normal([n_nodes_h2,classes])),
'biases':tf.Variable(tf.random_normal([classes]))}
l1= tf.add(tf.matmul(data,hidden_layer1['weight']), hidden_layer1['biases'])
l1=tf.nn.relu(l1)
#************DROPOUT*******************
l1=tf.nn.dropout(l1,keep_prob)
l2= tf.add(tf.matmul(l1,hidden_layer2['weight']), hidden_layer2['biases'])
l2=tf.nn.relu(l2)
out= tf.matmul(l2,out_layer['weight'])+ out_layer['biases']
return out
This was network
iterations=20
Train_loss=[]
Test_loss=[]
def train_nn(x):
prediction=neural_net(x)
cost=tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=prediction,labels=y))
optimizer=tf.train.AdamOptimizer().minimize(cost)
epochs=iterations
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for epoch in range (epochs):
e_loss=0
i=0
for _ in range(int(X_train.shape[0]/batch_size)):
e_x=X_train[i:i+batch_size]
e_y=y_hot_train[i:i+batch_size]
i+=batch_size
_,c=sess.run([optimizer,cost],feed_dict={x:e_x,y:e_y, keep_prob:0.5})
e_loss+=c
print "Epoch: ",epoch," Train loss= ",e_loss
Train_loss.append(e_loss)
correct=tf.equal(tf.argmax(prediction,1),tf.argmax(y,1))
accuracy=tf.reduce_mean(tf.cast(correct,'float'))
print "Accuracy on test: " ,accuracy.eval({x:X_test,y:y_hot_test , keep_prob:1.})
print "Accuracy on train:" ,accuracy.eval({x:X_train[0:2600],y:y_hot_train[0:2600], keep_prob=1.})
train_nn(x)
Do I need something like
hidden_layer1['weight']*=keep_prob
#testing time
Tensorflow does it itself:
With probability keep_prob, outputs the input element scaled up by 1 /
keep_prob, otherwise outputs 0. The scaling is so that the expected
sum is unchanged.
(from this page)

Resources