What's the difference between optimizer.compute_gradient() and tf.gradients() in tensorflow? - machine-learning

The following code I've written, fails at self.optimizer.compute_gradients(self.output,all_variables)
import tensorflow as tf
import tensorlayer as tl
from tensorflow.python.framework import ops
import numpy as np
class Network1():
def __init__(self):
ops.reset_default_graph()
tl.layers.clear_layers_name()
self.sess = tf.Session()
self.optimizer = tf.train.AdamOptimizer(learning_rate=0.1)
self.input_x = tf.placeholder(tf.float32, shape=[None, 784],name="input")
input_layer = tl.layers.InputLayer(self.input_x)
relu1 = tl.layers.DenseLayer(input_layer, n_units=800, act = tf.nn.relu, name="relu1")
relu2 = tl.layers.DenseLayer(relu1, n_units=500, act = tf.nn.relu, name="relu2")
self.output = relu2.all_layers[-1]
all_variables = relu2.all_layers
self.gradient = self.optimizer.compute_gradients(self.output,all_variables)
init_op = tf.initialize_all_variables()
self.sess.run(init_op)
with warning,
TypeError: Argument is not a tf.Variable: Tensor("relu1/Relu:0",
shape=(?, 800), dtype=float32)
However when I change that line to tf.gradients(self.output,all_variables), the code works fine, at least no warning is reported. Where did I miss, since I think these two methods are actually executing the same thing, that is return a list of (gradient, variable) pairs.

optimizer.compute_gradients wraps tf.gradients(), as you can see here. It does additional asserts (which explains your error).

I would like to add to the above answer by referring to a simple point. optimizer.compute_gradients return a list of tuples as (grads, vars) pairs. Variables are always there, but the gradients might be None. That makes sense since computing the gradients of specific loss with respect to some of the variables in var_list can be None. It says there is no dependency.
On the other hand, tf.gradients only return the list of sum(dy/dx) for each variable. It MUST be accompanied by the variable list for applying the gradient update.
Henceforth, the following two approaches can be utilized interchangeably:
### Approach 1 ###
variable_list = desired_list_of_variables
gradients = optimizer.compute_gradients(loss,var_list=variable_list)
optimizer.apply_gradients(gradients)
# ### Approach 2 ###
variable_list = desired_list_of_variables
gradients = tf.gradients(loss, var_list=variable_list)
optimizer.apply_gradients(zip(gradients, variable_list))

Related

Error with bagImpute and predict from caret package

I have the following when error when trying to use the preProcess function from the caret package. The predict function works for knn and median imputation, but gives an error for bagging. How should I edit my call to the predict function.
Reproducible example:
data = iris
set.seed(1)
data = as.data.frame(lapply(data, function(cc) cc[ sample(c(TRUE, NA), prob = c(0.8, 0.2), size = length(cc), replace = TRUE) ]))
preprocess_values = preProcess(data, method = c("bagImpute"), verbose = TRUE)
data_new = predict(preprocess_values, data)
This gives the following error:
> data_new = predict(preprocess_values, data)
Error in UseMethod("predict") :
no applicable method for 'predict' applied to an object of class "NULL"
The preprocessing/imputation functions in caret work only for numerical variables.
As stated in the help of preProcess
x a matrix or data frame. Non-numeric predictors are allowed but will be ignored.
You most likely found a bug in the part that should ignore the non numerical variables which throws an uninformative error instead of ignoring them.
If you remove the factor variable the imputation works
library(caret)
df <- iris
set.seed(1)
df <- as.data.frame(lapply(data, function(cc) cc[ sample(c(TRUE, NA), prob = c(0.8, 0.2), size = length(cc), replace = TRUE) ]))
df <- df[,-5] #remove factor variable
preprocess_values <- preProcess(df, method = c("bagImpute"), verbose = TRUE)
data_new <- predict(preprocess_values, df)
The last line of code works but results in a bunch of warnings:
In cprob[tindx] + pred :
longer object length is not a multiple of shorter object length
These warnings are not from caret but from the internal call to ipred::bagging which is called internally by caret::preProcess. The cause for these errors are instances in the data where there are 3 NA values in a row, when they are removed
df <- df[rowSums(sapply(df, is.na)) != 3,]
preprocess_values <- preProcess(df, method = c("bagImpute"), verbose = TRUE)
data_new <- predict(preprocess_values, df)
the warnings disappear.
You should check out recipes, and specifically step_bagimpute, to overcome the above mentioned limitations.

XGBoost Hyperparameter Tuning using Hyperopt

I am trying to tune my XGBClassifier model. But I am failing to do so. Please find the code below and please help me clean and edit the code.
import csv
from hyperopt import STATUS_OK
from timeit import default_timer as timer
MAX_EVALS = 200
N_FOLDS = 10
def objective(params, n_folds = N_FOLDS):
"""Objective function for Gradient Boosting Machine Hyperparameter Optimization"""
# Keep track of evals
global ITERATION
ITERATION += 1
# Retrieve the subsample if present otherwise set to 1.0
subsample = params['boosting_type'].get('subsample', 1.0)
# Extract the boosting type
params['boosting_type'] = params['boosting_type']['boosting_type']
params['subsample'] = subsample
# Make sure parameters that need to be integers are integers
for parameter_name in ['num_leaves', 'subsample_for_bin',
'min_child_samples']:
params[parameter_name] = int(params[parameter_name])
start = timer()
# Perform n_folds cross validation
cv_results = lgb.cv(params, train_set, num_boost_round = 10000,
nfold = n_folds, early_stopping_rounds = 100,
metrics = 'auc', seed = 50)
run_time = timer() - start
# Extract the best score
best_score = np.max(cv_results['auc-mean'])
# Loss must be minimized
loss = 1 - best_score
# Boosting rounds that returned the highest cv score
n_estimators = int(np.argmax(cv_results['auc-mean']) + 1)
# Write to the csv file ('a' means append)
of_connection = open(out_file, 'a')
writer = csv.writer(of_connection)
writer.writerow([loss, params, ITERATION, n_estimators,
run_time])
# Dictionary with information for evaluation
return {'loss': loss, 'params': params, 'iteration': ITERATION,
'estimators': n_estimators, 'train_time': run_time,
'status': STATUS_OK}
I believe I am doing something wrong in the objective function, as I am trying to edit the objective function of LightGBM.
Please help me.
I created the hgboost library which provides XGBoost Hyperparameter Tuning using Hyperopt.
pip install hgboost
Examples can be found here

Tensorflow: Variable rnn/basic_lstm_cell/kernel already exists, disallowed

Here is a part of my tensorflow RNN network code written in jupyter. The whole code runs perfect for the first time, however, running it furthermore produces an error. The code:
x = tf.placeholder(tf.float32)
y = tf.placeholder(tf.float32)
def recurrent_nn_model(x):
x = tf.transpose(x, [1,0,2])
x = tf.reshape(x, [-1, chunk_size])
x = tf.split(x, n_chunks, 0)
lstm_layer = {'hidden_state': tf.zeros([n_batches, lstm_size]),
'current_state': tf.zeros([n_batches, lstm_size])}
layer = {'weights': tf.Variable(tf.random_normal([lstm_size, n_classes])),
'biases': tf.Variable(tf.random_normal([n_classes]))}
lstm = rnn_cell.BasicLSTMCell(lstm_size)
rnn_outputs, rnn_states = rnn.static_rnn(lstm, x, dtype=tf.float32)
output = tf.add(tf.matmul(rnn_outputs[-1], layer['weights']),
layer['biases'])
return output
The error is:
Variable rnn/basic_lstm_cell/kernel already exists, disallowed. Did
you mean to set reuse=True in VarScope? Originally defined at:
If recurrent_nn_model is the whole network, just add this line to reset previously defined graph:
tf.reset_default_graph()
If you're intentionally calling recurrent_nn_model several times and combine these RNNs into one graph, you should use different variable scopes for each one:
with tf.variable_scope('lstm1'):
recurrent_nn_model(x1)
with tf.variable_scope('lstm2'):
recurrent_nn_model(x2)

How to get class labels from TensorFlow prediction

I have a classification model in TF and can get a list of probabilities for the next class (preds). Now I want to select the highest element (argmax) and display its class label.
This may seems silly, but how can I get the class label that matches a position in the predictions tensor?
feed_dict={g['x']: current_char}
preds, state = sess.run([g['preds'],g['final_state']], feed_dict)
prediction = tf.argmax(preds, 1)
preds gives me a vector of predictions for each class. Surely there must be an easy way to just output the most likely class (label)?
Some info about my model:
x = tf.placeholder(tf.int32, [None, num_steps], name='input_placeholder')
y = tf.placeholder(tf.int32, [None, 1], name='labels_placeholder')
batch_size = batch_size = tf.shape(x)[0]
x_one_hot = tf.one_hot(x, num_classes)
rnn_inputs = [tf.squeeze(i, squeeze_dims=[1]) for i in
tf.split(x_one_hot, num_steps, 1)]
tmp = tf.stack(rnn_inputs)
print(tmp.get_shape())
tmp2 = tf.transpose(tmp, perm=[1, 0, 2])
print(tmp2.get_shape())
rnn_inputs = tmp2
with tf.variable_scope('softmax'):
W = tf.get_variable('W', [state_size, num_classes])
b = tf.get_variable('b', [num_classes], initializer=tf.constant_initializer(0.0))
rnn_outputs = rnn_outputs[:, num_steps - 1, :]
rnn_outputs = tf.reshape(rnn_outputs, [-1, state_size])
y_reshaped = tf.reshape(y, [-1])
logits = tf.matmul(rnn_outputs, W) + b
predictions = tf.nn.softmax(logits)
A prediction is an array of n types of classes(labels). It represents the model's "confidence" that the image corresponds to each of its classes(labels). You can check which label has the highest confidence value by using:
prediction = np.argmax(preds, 1)
After getting this highest element index using (argmax function) out of other probabilities, you need to place this index into class labels to find the exact class name associated with this index.
class_names[prediction]
Please refer to this link for more understanding.
You can use tf.reduce_max() for this. I would refer you to this answer.
Let me know if it works - will edit if it doesn't.
Mind that there are sometimes several ways to load a dataset. For instance with fashion MNIST the tutorial could lead you to use load_data() and then to create your own structure to interpret a prediction. However you can also load these data by using tensorflow_datasets.load(...) like here after installing tensorflow-datasets which gives you access to some DatasetInfo. So for instance if your prediction is 9 you can tell it's a boot with:
import tensorflow_datasets as tfds
_, ds_info = tfds.load('fashion_mnist', with_info=True)
print(ds_info.features['label'].names[9])
When you use softmax, the labels you train the model on are either numbers 0..n or one-hot encoded values. So if original labels of your data are let's say string names, you must map them to integers first and keep the mapping as a variable (such as 0 -> "apple", 1 -> "orange", 2 -> "pear" ...).
When using integers (with loss='sparse_categorical_crossentropy'), you get predictions as an array of probabilities, you just find the array index with the max value. You can use this predicted index to reverse-map to your label:
predictedIndex = np.argmax(predictions) // 2
predictedLabel = indexToLabelMap[predictedIndex] // "pear"
If you use one-hot encoded labels (with loss='categorical_crossentropy'), the predicted index corresponds with the "hot" index of your label.
Just for reference, I needed this info when I was working with MNIST dataset used in Google's Machine learning crash course. There is also a good classification tutorial in the Tensorflow docs.

kNN Consistently Overusing One Label

I am using a kNN to do some classification of labeled images. After my classification is done, I am outputting a confusion matrix. I noticed that one label, bottle was being applied incorrectly more often.
I removed the label and tested again, but then noticed that another label, shoe was being applied incorrectly, but was fine last time.
There should be no normalization, so I'm unsure what is causing this behavior. Testing showed it continued no matter how many labels I removed.
Not totally sure how much code to post, so I'll put some things that should be relevant and pastebin the rest.
def confusionMatrix(classifier, train_DS_X, train_DS_y, test_DS_X, test_DS_y):
# Will output a confusion matrix graph for the predicion
y_pred = classifier.fit(train_DS_X, train_DS_y).predict(test_DS_X)
labels = set(set(train_DS_y) | set(test_DS_y))
def plot_confusion_matrix(cm, title='Confusion matrix', cmap=plt.cm.Blues):
plt.imshow(cm, interpolation='nearest', cmap=cmap)
plt.title(title)
plt.colorbar()
tick_marks = np.arange(len(labels))
plt.xticks(tick_marks, labels, rotation=45)
plt.yticks(tick_marks, labels)
plt.tight_layout()
plt.ylabel('True label')
plt.xlabel('Predicted label')
# Compute confusion matrix
cm = confusion_matrix(test_DS_y , y_pred)
np.set_printoptions(precision=2)
print('Confusion matrix, without normalization')
#print(cm)
plt.figure()
plot_confusion_matrix(cm)
# Normalize the confusion matrix by row (i.e by the number of samples
# in each class)
cm_normalized = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
print('Normalized confusion matrix')
#print(cm_normalized)
plt.figure()
plot_confusion_matrix(cm_normalized, title='Normalized confusion matrix')
plt.show()
Relevant Code from Main Function:
# Select training and test data
PCA = decomposition.PCA(n_components=.95)
zscorer = ZScoreMapper(param_est=('targets', ['rest']), auto_train=False)
DS = getVoxels (1, .5)
train_DS = DS[0]
test_DS = DS[1]
# Apply PCA and ZScoring
train_DS = processVoxels(train_DS, True, zscorer, PCA)
test_DS = processVoxels(test_DS, False, zscorer, PCA)
print 3*"\n"
# Select the desired features
# If selecting samples or PCA, that must be the only feature
featuresOfInterest = ['pca']
trainDSFeat = selectFeatures(train_DS, featuresOfInterest)
testDSFeat = selectFeatures(test_DS, featuresOfInterest)
train_DS_X = trainDSFeat[0]
train_DS_y = trainDSFeat[1]
test_DS_X = testDSFeat[0]
test_DS_y = testDSFeat[1]
# Optimization of neighbors
# Naively searches for local max starting at numNeighbors
lastScore = 0
lastNeightbors = 1
score = .0000001
numNeighbors = 5
while score > lastScore:
lastScore = score
lastNeighbors = numNeighbors
numNeighbors += 1
#Classification
neigh = neighbors.KNeighborsClassifier(n_neighbors=numNeighbors, weights='distance')
neigh.fit(train_DS_X, train_DS_y)
#Testing
score = neigh.score(test_DS_X,test_DS_y )
# Confusion Matrix Output
neigh = neighbors.KNeighborsClassifier(n_neighbors=lastNeighbors, weights='distance')
confusionMatrix(neigh, train_DS_X, train_DS_y, test_DS_X, test_DS_y)
Pastebin: http://pastebin.com/U7yTs3vs
The issue was in part the result of my axis being mislabeled, when I thought I was removing the faulty label I was in actuality just removing a random label, meaning the faulty data was still being analyzed. Fixing the axis and removing the faulty label which was actually rest yielded:
The code I changed is:
cm = confusion_matrix(test_DS_y , y_pred, labels)
Basically I manually set the ordering based on my list of ordered labels.

Resources