I want to save the Variable and bias tensors as checkpoints during training steps . I have used fully_connected() from tf.contrib.layers to implement several fully connected layers . To do so I need to extract the Variable and Bias tensors of those fully_connected layers . How to do so ?
Just to mention that:
There is no need to extract weights and biases just to save them. For tf.layers or tf.contrib.layers, if trainable is set to True, the weights and biases are added to GraphKeys.TRAINABLE_VARIABLES, which is a subset of GraphKeys.GLOBAL_VARIABLES. Thus, if you use saver = tf.train.Saver(var_list=tf.global_variables()) and saver.save(sess, save_path, global_step) at some point, the weights and biases will be saved.
In cases you really have to extract variables, one way would be to use tf.get_variable or tf.get_default_graph().get_tensor_by_name with the correct variable name, as mentioned by the other answer.
You might have noticed TensorFlow classes such as tf.layer.Dense and tf.layers.Conv2D. Once built, they have weights / variables methods that return the weight and bias tensors.
The tf.trainable_variables() will give you a list of all the variables in the network that are trainable. This could be made better by using variable_scope and name_scope, like stated here: How to get weights from tensorflow fully_connected
In [1]: import tensorflow as tf
In [2]: a1 = tf.get_variable(name='a1', shape=(1,2), dtype=tf.float32)
In [3]: fc = tf.contrib.layers.fully_connected(a1, 4)
In [4]: sess = tf.Session()
2017-12-17 21:09:18.127498: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-12-17 21:09:18.127554: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-12-17 21:09:18.127578: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-12-17 21:09:18.127598: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-12-17 21:09:18.127618: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
In [5]: tf.trainable_variables()
Out[5]:
[<tf.Variable 'a1:0' shape=(1, 2) dtype=float32_ref>,
<tf.Variable 'fully_connected/weights:0' shape=(2, 4) dtype=float32_ref>,
<tf.Variable 'fully_connected/biases:0' shape=(4,) dtype=float32_ref>]
In [6]: for var in tf.trainable_variables():
...: if 'weights' in var.name or 'biases' in var.name:
...: print(var)
...:
<tf.Variable 'fully_connected/weights:0' shape=(2, 4) dtype=float32_ref>
<tf.Variable 'fully_connected/biases:0' shape=(4,) dtype=float32_ref>
In [7]:
Related
I am running a code to train a resnet model on a kaggle notebook. I have chosen the accelerator as GPU so I haven't made any mistakes there. I am training the model using the following code:
model.cuda()
for epoch in range(10):
model.train(True)
trainloss=0
for x,y in trainloader:
x,y=x.cuda(),y.cuda()
yhat=model(x)
optimizer.zero_grad()
loss=criterion(yhat,y)
loss.backward()
optimizer.step()
trainloss+=loss.item()
print('Epoch {} Loss: {}'.format(epoch,(trainloss/len(trainloader.dataset))))
model.eval()
testcorrect=0
with torch.no_grad():
for test_x,test_y in testloader:
test_x,test_y=test_x.cuda(),test_y.cuda()
yhat=model(test_x)
_,z=yhat.max(1)
testcorrect+=(test_y==z).sum().item()
print('Model Accuracy: ',(testcorrect/len(testloader.dataset)))
Network Code:
model=torchvision.models.resnet18(pretrained=True)
num_ftrs=model.fc.in_features
model.fc=nn.Sequential(nn.Linear(num_ftrs,1000),
nn.ReLU(),
nn.Linear(1000,2)
)
If you see I have used the .cuda() function on both my model as well as the tensors(inside the training part as well as validation part). However the GPU usage shown for the kaggle notebook is 0% while my CPU usage is up to 99%. Am I missing any code which is required to train the model using the GPU?
It might be that your model doesn't give GPU enough work. Try to make your network more GPU-hungry, e.g. introduce some linear layer with a bunch of neurons, etc. to double check that in that case you see increased GPU usage. Also I noticed that the measurement is delayed by a bit, so maybe you give GPU some work which it can do in a fraction of a second and the GPU usage bar doesn't have a chance to go higher from 0%.
Maybe you could share the actual network you're using?
I can see the GPU usage going to 100% in Kaggle notebook with a toy example like this (notice 2500 x 2500 linear layer here):
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
trainloader = [(torch.Tensor(np.random.randn(1000, 5)), torch.Tensor([1.0] * 1000))] * 1000
model = nn.Sequential(nn.Linear(5, 2500), nn.Linear(2500, 1500), nn.Linear(1500, 1))
model.cuda()
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.)
criterion = lambda x,y : ((x-y)**2).mean()
for epoch in range(10):
for x,y in trainloader:
x,y=x.cuda(),y.cuda()
yhat=model(x)
optimizer.zero_grad()
loss=criterion(yhat,y)
loss.backward()
optimizer.step()
print(epoch)
I am experimenting with a binary classifier implementation in TensorFlow. If I have two plain outputs (i.e. no activation) in the final layer and use tf.losses.sparse_softmax_cross_entropy, my network trains as expected. However, if I change the output layer to produce a single output with a tf.sigmoid activation and use tf.losses.log_loss as the loss function, my network does not train (i.e. loss/accuracy does not improve).
Here is what my output layer/loss function looks like in the first (i.e. working) case:
out = tf.layers.dense(prev, 2)
loss = tf.losses.sparse_softmax_cross_entropy(labels=y, logits=out)
In the second case, I have the following:
out = tf.layers.dense(prev, 1, activation=tf.sigmoid)
loss = tf.losses.log_loss(labels=y, predictions=out)
Tensor y is a vector of 0/1 values; it is not one-hot encoded. The network learns as expected in the first case, but not in the second case. Apart from these two lines, everything else is kept the same.
I do not understand why the second set-up does not work. Interestingly, if I express the same network in Keras and use the second set-up, it works. Am I using the wrong TensorFlow functions to express my intent in the second case? I'd like to produce a single sigmoid output and use binary cross-entropy loss to train a simple binary classifier.
I'm using Python 3.6 and TensorFlow 1.4.
Here is a small, runnable Python script to demonstrate the issue. Note that you need to have downloaded the StatOil/C-CORE dataset from Kaggle to be able to run the script as is.
Thanks!
Using a sigmoid activation on two outputs doesn't give you a probability distribution:
import tensorflow as tf
import tensorflow.contrib.eager as tfe
tfe.enable_eager_execution()
start = tf.constant([[4., 5.]])
out_dense = tf.layers.dense(start, units=2)
print("Logits (un-transformed)", out_dense)
out_sigmoid = tf.layers.dense(start, units=2, activation=tf.sigmoid)
print("Elementwise sigmoid", out_sigmoid)
out_softmax = tf.nn.softmax(tf.layers.dense(start, units=2))
print("Softmax (probability distribution)", out_softmax)
Prints:
Logits (un-transformed) tf.Tensor([[-3.64021587 6.90115976]], shape=(1, 2), dtype=float32)
Elementwise sigmoid tf.Tensor([[ 0.94315267 0.99705648]], shape=(1, 2), dtype=float32)
Softmax (probability distribution) tf.Tensor([[ 0.05623185 0.9437682 ]], shape=(1, 2), dtype=float32)
Instead of tf.nn.softmax, you could also use tf.sigmoid on a single logit, then set the other output to one minus that.
I am training a neural network for multilabel classification, with a large number of classes (1000). Which means more than one output can be active for every input. On an average, I have two classes active per output frame. On training with a cross entropy loss the neural network resorts to outputting only zeros, because it gets the least loss with this output since 99.8% of my labels are zeros. Any suggestions on how I can push the network to give more weight to the positive classes?
Tensorflow has a loss function weighted_cross_entropy_with_logits, which can be used to give more weight to the 1's. So it should be applicable to a sparse multi-label classification setting like yours.
From the documentation:
This is like sigmoid_cross_entropy_with_logits() except that pos_weight, allows one to trade off recall and precision by up- or down-weighting the cost of a positive error relative to a negative error.
The argument pos_weight is used as a multiplier for the positive targets
If you use the tensorflow backend in Keras, you can use the loss function like this (Keras 2.1.1):
import tensorflow as tf
import keras.backend.tensorflow_backend as tfb
POS_WEIGHT = 10 # multiplier for positive targets, needs to be tuned
def weighted_binary_crossentropy(target, output):
"""
Weighted binary crossentropy between an output tensor
and a target tensor. POS_WEIGHT is used as a multiplier
for the positive targets.
Combination of the following functions:
* keras.losses.binary_crossentropy
* keras.backend.tensorflow_backend.binary_crossentropy
* tf.nn.weighted_cross_entropy_with_logits
"""
# transform back to logits
_epsilon = tfb._to_tensor(tfb.epsilon(), output.dtype.base_dtype)
output = tf.clip_by_value(output, _epsilon, 1 - _epsilon)
output = tf.log(output / (1 - output))
# compute weighted loss
loss = tf.nn.weighted_cross_entropy_with_logits(targets=target,
logits=output,
pos_weight=POS_WEIGHT)
return tf.reduce_mean(loss, axis=-1)
Then in your model:
model.compile(loss=weighted_binary_crossentropy, ...)
I have not found many resources yet which report well working values for the pos_weight in relation to the number of classes, average active classes, etc.
Many thanks to tobigue for the great solution.
The tensorflow and keras apis have changed since that answer. So the updated version of weighted_binary_crossentropy is below for Tensorflow 2.7.0.
import tensorflow as tf
POS_WEIGHT = 10
def weighted_binary_crossentropy(target, output):
"""
Weighted binary crossentropy between an output tensor
and a target tensor. POS_WEIGHT is used as a multiplier
for the positive targets.
Combination of the following functions:
* keras.losses.binary_crossentropy
* keras.backend.tensorflow_backend.binary_crossentropy
* tf.nn.weighted_cross_entropy_with_logits
"""
# transform back to logits
_epsilon = tf.convert_to_tensor(tf.keras.backend.epsilon(), output.dtype.base_dtype)
output = tf.clip_by_value(output, _epsilon, 1 - _epsilon)
output = tf.math.log(output / (1 - output))
loss = tf.nn.weighted_cross_entropy_with_logits(labels=target, logits=output, pos_weight=POS_WEIGHT)
return tf.reduce_mean(loss, axis=-1)
Is it possible to train a model in Xgboost that have multiple continuous outputs (multi regression)?
What would be the objective to train such a model?
Thanks in advance for any suggestions
My suggestion is to use sklearn.multioutput.MultiOutputRegressor as a wrapper of xgb.XGBRegressor. MultiOutputRegressor trains one regressor per target and only requires that the regressor implements fit and predict, which xgboost happens to support.
# get some noised linear data
X = np.random.random((1000, 10))
a = np.random.random((10, 3))
y = np.dot(X, a) + np.random.normal(0, 1e-3, (1000, 3))
# fitting
multioutputregressor = MultiOutputRegressor(xgb.XGBRegressor(objective='reg:linear')).fit(X, y)
# predicting
print np.mean((multioutputregressor.predict(X) - y)**2, axis=0) # 0.004, 0.003, 0.005
This is probably the easiest way to regress multi-dimension targets using xgboost as you would not need to change any other part of your code (if you were using the sklearn API originally).
However this method does not leverage any possible relation between targets. But you can try to design a customized objective function to achieve that.
Multiple output regression is now available in the nightly build of XGBoost, and will be included in XGBoost 1.6.0.
See https://github.com/dmlc/xgboost/blob/master/demo/guide-python/multioutput_regression.py for an example.
It generates warnings: reg:linear is now deprecated in favor of reg:squarederror, so I update an answer based on #ComeOnGetMe's
import numpy as np
import pandas as pd
import xgboost as xgb
from sklearn.multioutput import MultiOutputRegressor
# get some noised linear data
X = np.random.random((1000, 10))
a = np.random.random((10, 3))
y = np.dot(X, a) + np.random.normal(0, 1e-3, (1000, 3))
# fitting
multioutputregressor = MultiOutputRegressor(xgb.XGBRegressor(objective='reg:squarederror')).fit(X, y)
# predicting
print(np.mean((multioutputregressor.predict(X) - y)**2, axis=0))
Out:
[2.00592697e-05 1.50084441e-05 2.01412247e-05]
I would place a comment but I lack the reputation. In addition to #Jesse Anderson, to install the most recent version, select the top link from here:
https://s3-us-west-2.amazonaws.com/xgboost-nightly-builds/list.html?prefix=master/
Make sure to select the one for your operating system.
Use pip install to install the wheel. I.e. for macOS:
pip install https://s3-us-west-2.amazonaws.com/xgboost-nightly-builds/master/xgboost-1.6.0.dev0%2B4d81c741e91c7660648f02d77b61ede33cef8c8d-py3-none-macosx_10_15_x86_64.macosx_11_0_x86_64.macosx_12_0_x86_64.whl
You can use Linear regression, random forest regressors and some other related algorithms in Scikit-learn to produce multi-output regression. Not sure about XGboost. The boosting regressor in Scikit does not allow multiple outputs. For people who asked, when it may be necessary one example would be to forecast multi-steps of time-series a head.
Based on the above discussion, I have extended the univariate XGBoostLSS to a multivariate framework called Multi-Target XGBoostLSS Regression that models multiple targets and their dependencies in a probabilistic regression setting. Code follows soon.
The training data (including both training and validation set) has about 80 million samples and each sample has 200 dense floating points. There are 6 labeled classe and they are imbalanced.
In the common-used ML libs (e.g., libsvm, scikit-learn, Spark MLlib, random forest, XGBoost or else), which should I use? Regarding the hardware configuration, the machine has 24 CPU cores and 250 Gb memory.
I would recommend using scikit-learn's SGDClassifier as it is online so you can load your training data in chunks (mini-batches) into memory and train the classifier gradually so you won't need to load all the data into memory.
It is highly parallel and easy to use.
You can set the warm_start argument to True and call fit multiple times with each chunk of X, y loaded into memory or the better option you can use the partial_fit method.
clf = SGDClassifier(loss='hinge', alpha=1e-4, penalty='l2', l1_ratio=0.9, learning_rate='optimal', n_iter=10, shuffle=False, n_jobs=10, fit_intercept=True)
# len(classes) = n_classes
all_classes = np.array(set_of_all_classes)
while True:
#load a minibatch from disk into memory
X, y = load_next_chunk()
clf.partial_fit(X, y, all_classes)
X_test, y_test = load_test_data()
y_pred = clf.predict(X_test)