I have a PyTorch model that has test accuracy of about 95% - 97%. I save it using torch.save(my_model.state_dict(), PATH), but whenever I try to reload it using my_model.load_state_dict(torch.load(PATH)) and test it on the same data using test_fn(my_model) my test accuracy goes down to about 0.06%.I'm trying to follow the suggest serialization semantics (https://pytorch.org/docs/stable/notes/serialization.html)
This happens whether or not I use my_model.eval() (although by defualt I'm not using this for either training or testing). Is there an extra step I need to take?
In code this looks like:
my_model = GraphConv(w2i, p2i, l2i, r2i, s2i, words, pos, lems, 512, 512, 3) ## Initialise model & params
my_model.cuda()
loss_function = nn.NLLLoss()
optimizer = optim.Adam(my_model.parameters(), lr=0.001)
for epoch in range(15):
... ### Apply training steps
print(test_fn(my_model)) ### Will be over 95%
torch.save(my_model.state_dict(), PATH)
...
my_model2 = GraphConv(w2i, p2i, l2i, r2i, s2i, words, pos, lems, 512, 512, 3) ## Initialise new model
my_model2.load_state_dict(torch.load('PATH'))
print(test_fn(my_model2)) ### Is about 0.06%
Related
Using tensorflows tutorial on DCGAN as an example:
https://www.tensorflow.org/tutorials/generative/dcgan?hl=en
To log the loss, the following example was used:
https://www.tensorflow.org/tensorboard/get_started?hl=en
Using the above as a reference, I added a few lines to view the loss in tensorboard, however couldn't do the same for the generator/discriminator weights and bias.
Code used to view generator/discriminator loss :
g_loss = tf.keras.metrics.Mean('g_loss', dtype=tf.float32)
d_loss = tf.keras.metrics.Mean('d_loss', dtype=tf.float32)
Preparing writer / log directory :
current_time = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
train_log_dir = 'logs/' + current_time + '/train'
train_summary_writer = tf.summary.create_file_writer(train_log_dir)
then for each epoch I pass in gen_loss and disc_loss into g_loss and d_loss respectively, then do the following :
with train_summary_writer.as_default():
tf.summary.scalar('g_loss', g_loss.result(), step=epoch)
tf.summary.scalar('d_loss', d_loss.result(), step=epoch)
The above allows you to view g_loss and d_loss under the scalars tab in tensorboard.
So how can I do the same for the weights and bias?
I can see that it makes use of tf.GradientTape() to carry out the backpropagation.
When this is used, I presume you do not need to use model.fit() with callbacks,
and instead make use of generator.trainable_variables with tf.summary.histogram(), but I'm unsure how to put it all together.
and you also need to "merge" scalars and histograms at some point if you want to view both?
I'm trying to use the ResNet-50 model from the ONNX model zoo and load and train it in CNTK for an image classification task. The first thing that confuses me is, that the batch axis (not sure what's the official name for it, dynamic axis?) is set to 1 in this model:
Why is that? Couldn't it simply be [3x224x224]? In this model for example, the input looks like this:
To load the model and use my own Dense layer, I use the following code:
def create_model(num_classes, input_features, freeze=False):
base_model = load_model("restnet-50.onnx", format=ModelFormat.ONNX)
feature_node = find_by_name(base_model, "gpu_0/data_0")
last_node = find_by_uid(base_model, "Reshape2959")
substitutions = {
feature_node : placeholder(name='new_input')
}
cloned_layers = last_node.clone(CloneMethod.clone, substitutions)
cloned_out = cloned_layers(input_features)
z = Dense(num_classes, activation=softmax, name="prediction") (cloned_out)
return z
For training I use (shortened):
# datasets = list of classes
feature = input_variable(shape=(1, 3, 224, 224))
label = input_variable(shape=(1,3))
model = create_model(len(datasets), feature)
loss = cross_entropy_with_softmax(model, label)
# some definitions for learner, epochs, ProgressPrinters missing
for epoch in range(epochs):
loss.train((X_current,y_current), parameter_learners=[learner], callbacks=[progress_printer])
X_current is a single image and y_current the corresponding class label both encoded as numpy arrays with the followings shapes
X_current.shape
(1, 3, 224, 224)
y_current.shape
(1, 3)
When I try to train the model, I get
"ValueError: ToBatchAxis7504 ToBatchAxisNode operation can only operate on tensor without minibatch data (no layout)"
What's wrong here?
I have a stateful RNN model with several GRU layers that was created in Keras.
I have to run this model now from Java, so I dumped the model as protobuf, and I'm loading it from Java TensorFlow.
This model must be stateful because features will be fed one timestep at-a-time.
As far as I understand, in order to achieve statefulness in a TensorFlow model, I must somehow feed in the last state every time I execute the session runner, and also that the run would return the state after the execution.
Is there a way to output the state in the Keras model?
Is there a simpler way altogether to get a stateful Keras model to work as such using TensorFlow?
Many thanks
An alternative solution is to use the model.state_updates property of the keras model, and add it to the session.run call.
Here is a full example that illustrates this solutions with two lstms:
import tensorflow as tf
class SimpleLstmModel(tf.keras.Model):
""" Simple lstm model with two lstm """
def __init__(self, units=10, stateful=True):
super(SimpleLstmModel, self).__init__()
self.lstm_0 = tf.keras.layers.LSTM(units=units, stateful=stateful, return_sequences=True)
self.lstm_1 = tf.keras.layers.LSTM(units=units, stateful=stateful, return_sequences=True)
def call(self, inputs):
"""
:param inputs: [batch_size, seq_len, 1]
:return: output tensor
"""
x = self.lstm_0(inputs)
x = self.lstm_1(x)
return x
def main():
model = SimpleLstmModel(units=1, stateful=True)
x = tf.placeholder(shape=[1, 1, 1], dtype=tf.float32)
output = model(x)
sess = tf.Session()
sess.run(tf.initialize_all_variables())
res_at_step_1, _ = sess.run([output, model.state_updates], feed_dict={x: [[[0.1]]]})
print(res_at_step_1)
res_at_step_2, _ = sess.run([output, model.state_updates], feed_dict={x: [[[0.1]]]})
print(res_at_step_2)
if __name__ == "__main__":
main()
Which produces the following output:
[[[0.00168626]]]
[[[0.00434444]]]
and shows that the lstm state is preserved between batches.
If we set stateful to False, the output becomes:
[[[0.00033928]]]
[[[0.00033928]]]
Showing that the state is not reused.
ok, so I managed to solve this problem!
What worked for me was creating tf.identity tensors for not only the outputs, as is standard, but also for the state tensors.
In the Keras models, the state tensors can be found by doing:
model.updates
Which gives something like this:
[(<tf.Variable 'gru_1_1/Variable:0' shape=(1, 70) dtype=float32_ref>,
<tf.Tensor 'gru_1_1/while/Exit_2:0' shape=(1, 70) dtype=float32>),
(<tf.Variable 'gru_2_1/Variable:0' shape=(1, 70) dtype=float32_ref>,
<tf.Tensor 'gru_2_1/while/Exit_2:0' shape=(1, 70) dtype=float32>),
(<tf.Variable 'gru_3_1/Variable:0' shape=(1, 4) dtype=float32_ref>,
<tf.Tensor 'gru_3_1/while/Exit_2:0' shape=(1, 4) dtype=float32>)]
The 'Variable' is used for inputting the states, and the 'Exit' for outputs of the new states.
So I created tf.identity out of the 'Exit' tensors. I gave them meaningful names, e.g.:
tf.identity(state_variables[j], name='state'+str(j))
Where state_variables contained only the 'Exit' tensors
Then used the input variables (e.g. gru_1_1/Variable:0) to feed the model state from TensorFlow, and the identity variables I created out of the 'Exit' tensors were used to extract the new states after feeding the model at each timestep
I use TensorFlow to build super-resolution convolutional neural network for enhancing image resolution. The network accepts a low-resolution image as input and produces a high-resolution image as output.
For training, I use tf.estimator.Estimator
def get_estimator(run_config=None, params=None):
"""Return the model as a Tensorflow Estimator object.
Args:
run_config (RunConfig): Configuration for Estimator run.
params (HParams): hyperparameters.
"""
return tf.estimator.Estimator(
model_fn=model_fn, # First-class function
params=params, # HParams
config=run_config # RunConfig
)
wrapped by tf.contrib.learn.Experiment
def experiment_fn(run_config, params):
"""Create an experiment to train and evaluate the model.
Args:
run_config (RunConfig): Configuration for Estimator run.
params (HParam): Hyperparameters
Returns:
(Experiment) Experiment for training the mnist model.
"""
# You can change a subset of the run_config properties as
run_config = run_config.replace(save_checkpoints_steps=params.min_eval_frequency)
estimator = get_estimator(run_config, params)
# # Setup data loaders
train_input_fn = get_input_fn(params.filenames, params.epoch, True, params.batch_size)
eval_input_fn = get_input_fn(params.filenames, 1, False, params.batch_size)
# Define the experiment
experiment = tf.contrib.learn.Experiment(
estimator=estimator, # Estimator
train_input_fn=train_input_fn, # First-class function
eval_input_fn=eval_input_fn, # First-class function
train_steps=params.train_steps, # Minibatch steps
min_eval_frequency=params.min_eval_frequency, # Eval frequency
eval_steps=params.eval_steps # Minibatch steps
)
return experiment
And I run it via tf.contrib.learn.learn_runner as follow:
def run_experiment(config, session):
assert os.path.exists(config.tfrecord_dir)
assert os.path.exists(os.path.join(config.tfrecord_dir, config.dataset, config.subset))
save_config(config.summaries_dir, config)
filenames = get_tfrecord_files(config)
batch_number = min(len(filenames), config.train_size) // config.batch_size
logging.info('Total number of batches %d' % batch_number)
params = tf.contrib.training.HParams(
learning_rate=config.learning_rate,
device=config.device,
epoch=config.epoch,
batch_size=config.batch_size,
min_eval_frequency=100,
train_steps=None, # Use train feeder until its empty
eval_steps=1, # Use 1 step of evaluation feeder
filenames=filenames
)
run_config = tf.contrib.learn.RunConfig(model_dir=config.checkpoint_dir)
learn_runner.run(
experiment_fn=experiment_fn, # First-class function
run_config=run_config, # RunConfig
schedule="train_and_evaluate", # What to run
hparams=params # HParams
)
The class Experiment provides method train_and_evaluate that evaluate during training.
My question is: How can I get an evaluation result(an output image) during training cnn? I want to see a temporal training result.
My project on github
I think you're looking for adding an image summary to your model using tf.summary.image.
It makes it easy to visualize images during training in Tensorboard:
def model_fn(...):
...
# max_outputs control the number of images in the batch you want to display
tf.summary.image("train_images", images, max_outputs=3)
# ...
return tf.estimator.EstimatorSpec(...)
During evaluation, I don't think there is an easy way to display an image inside tf.estimator. The issue is that during evaluation, only integer or float values can be displayed.
In more details, at eval time you return eval_metric_ops containing for instance your accuracy. TensorFlow will display every integer or float value from this dict in TensorBoard, but will give you a warning if you try to display anything else (ex: images). (Source code: function _write_dict_to_summary)
WARNING:tensorflow:Skipping summary for eval_images, must be a float, np.float32, np.int64, np.int32 or int.
A workaround could be to get back the value of the images outside of tf.estimator and display them manually in TensorBoard.
Edit: there is another related question on stackoverflow, and two GitHub issue here and here to track progress on this.
From what I understand, they will try to make it easy to return an image summary in eval_metric_ops that will automatically appear in TensorBoard.
I wanted to update the parameters of a model manually with pytorch. I made a super simple standard sequential model (full code here) but whenever I try to train my model it does not train unless I create the actual variables explicitly (code for model variables explicitly). So with the sequential model the code looks as follow:
mdl_sgd = torch.nn.Sequential( torch.nn.Linear(D_sgd,1,bias=False) )
...
for i in range(nb_iter):
# Forward pass: compute predicted Y using operations on Variables
batch_xs, batch_ys = get_batch2(X,Y,M,dtype) # [M, D], [M, 1]
## FORWARD PASS
y_pred = mdl_sgd.forward(X)
## LOSS
loss = (1/N)*(y_pred - batch_ys).pow(2).sum()
## Manually zero the gradients after updating weights
mdl_sgd.zero_grad()
## BACKARD PASS
loss.backward() # Use autograd to compute the backward pass. Now w will have gradients
## SGD update
for W in mdl_sgd.parameters():
#print(W.grad.data)
W.data = W.data - eta*W.grad.data
when I train it it seems that nothing happens. I've tried many things to make this work like wrapping it in a class and putting explicit require_grads=True or change the locations where I make the zero out the gradients etc but nothing seems to work. What I really want/need is to be able to explicitly be able to do the update rule myself (not with optimum). Not sure if thats the reason it doesn't work but the following does work for some reason:
X = poly_kernel_matrix(x_true,Degree_mdl) # maps to the feature space of the model
X = Variable(torch.FloatTensor(X).type(dtype), requires_grad=False)
Y = Variable(torch.FloatTensor(Y).type(dtype), requires_grad=False)
w_init=torch.randn(D_sgd,1).type(dtype)
W = Variable( w_init, requires_grad=True)
...
for i in range(nb_iter):
# Forward pass: compute predicted Y using operations on Variables
batch_xs, batch_ys = get_batch2(X,Y,M,dtype) # [M, D], [M, 1]
## FORWARD PASS
#y_pred = mdl_sgd.forward(X)
y_pred = batch_xs.mm(W)
## LOSS
loss = (1/N)*(y_pred - batch_ys).pow(2).sum()
## BACKARD PASS
loss.backward() # Use autograd to compute the backward pass. Now w will have gradients
## SGD update
W.data = W.data - eta*W.grad.data
## Manually zero the gradients after updating weights
#mdl_sgd.zero_grad()
W.grad.data.zero_()
the reason I know this is because the plot of the regression lines look sensible:
while when I use the torch.nn.Sequential I get:
I am sure its a really newbie question but I am not sure why I can't update the parameters. Does someone know why? I want to be able to update the parameters manually (however I want) and in this case I decided to use SGD to see if I could even update the parameters.
Note I also tried subclassing modules and registering params but it didn't work either. This is the class I built:
class regression_NN(torch.nn.Module):
def __init__(self,w_init):
"""
"""
super(type(self), self).__init__()
# mdl
#self.W = Variable(w_init, requires_grad=True)
#self.W = torch.nn.Parameter( Variable(w_init, requires_grad=True) )
#self.W = torch.nn.Parameter( w_init )
self.W = torch.nn.Parameter( w_init,requires_grad=True )
#self.mod_list = torch.nn.ModuleList([self.W])
def forward(self, x):
"""
"""
y_pred = x.mm(self.W)
return y_pred
All code is:
https://github.com/brando90/simple_regression
I'm relatively new at pytorch so I might have many bad practice...you can correct them if u want but Im mostly concerned that my paremters are not updating even when I try to explicitly register them in a class that inherits from torch.nn.Module.
I also linked to the question from the pytorch official forum: https://discuss.pytorch.org/t/how-does-one-make-sure-that-the-parameters-are-update-manually-in-pytorch-using-modules/6076