I ran the demo tensorflow MNIST model(in models/image/mnist) by
python -m tensorflow.models.image.mnist.convolutional
Does it mean that after the model completes training, the parameters/weights are automatically stored on secondary storage? Or do we have to edit the code to include "saver" functions for parameters to be stored?
No they are not automatically saved. Everything is in memory. You have to explicitly add a saver function to store your model to a secondary storage.
First you create a saver operation
saver = tf.train.Saver(tf.all_variables())
Then you want to save your model as it progresses in the train process, usually after N steps. This intermediate steps are commonly named "checkpoints".
# Save the model checkpoint periodically.
if step % 1000 == 0:
checkpoint_path = os.path.join('.train_dir', 'model.ckpt')
saver.save(sess, checkpoint_path)
Then you can restore the model from the checkpoint:
saver.restore(sess, model_checkpoint_path)
Take a look at tensorflow.models.image.cifar10 for a concrete example
Related
I have a model architecture. I have saved the entire model using torch.save() for some n number of iterations. I want to run another iteration of my code by using the pre-trained weights of the model I saved previously.
Edit: I want the weight initialization for the new iteration be done from the weights of the pretrained model
Edit 2: Just to add, I don't plan to resume training. I intend to save the model and use it for a separate training with same parameters. Think of it like using a saved model with weights etc. for a larger run and more samples (i.e. a complete new training job)
Right now, I do something like:
# default_lr = 5
# default_weight_decay = 0.001
# model_io = the pretrained model
model = torch.load(model_io)
optim = torch.optim.Adam(model.parameters(),lr=default_lr, weight_decay=default_weight_decay)
loss_new = BCELoss()
epochs = default_epoch
.
.
training_loop():
....
outputs = model(input)
....
.
#similarly for test loop
Am I missing something? I have to run for a very long epoch for a huge number of sample so can not afford to wait to see the results then figure out things.
Thank you!
From the code that you have posted, I see that you are only loading the previous model parameters in order to restart your training from where you left it off. This is not sufficient to restart your training correctly. Along with your model parameters (weights), you also need to save and load your optimizer state, especially when your choice of optimizer is Adam which has velocity parameters for all your weights that help in decaying the learning rate.
In order to smoothly restart training, I would do the following:
# For saving your model
state = {
'model': model.state_dict(),
'optimizer': optimizer.state_dict()
}
model_save_path = "Enter/your/model/path/here/model_name.pth"
torch.save(state, model_save_path)
# ------------------------------------------
# For loading your model
state = torch.load(model_save_path)
model = MyNetwork()
model.load_state_dict(state['model'])
optim = torch.optim.Adam(model.parameters(),lr=default_lr, weight_decay=default_weight_decay)
optim.load_state_dict(state['optimizer'])
Besides these, you may also want to save your learning rate if you are using a learning rate decay strategy, your best validation accuracy so far which you may want for checkpointing purposes, and any other changeable parameter which might affect your training. But in most of the cases, saving and loading just the model weights and optimizer state should be sufficient.
EDIT: You may also want to look at this following answer which explains in detail how you should save your model in different scenarios.
The motivation behind this question is I had saved a Keras model using Matterport's MaskRCNN and in the tf.keras.callbacks.ModelCheckpoint() had very explicitly set the save_weights_only argument to False, so that the entire model would be saved (not just the weights).
Turns out there's a bug in the ModelCheckpoint() callback where it sometimes does not save the full model.
This is obviously a problem when you go to load the model after closing your TF session, as the Graph, architecture, and optimizer state are gone, making it hard (if not impossible) to reload that saved model.
Therefore, I am asking whether it is possible to somehow extract the TF session retroactively, from just the .h5 weights file, after the session has closed (resulting from, for example, your Notebook kernel crashing).
Not much code to go on, but there it is:
Given a .h5 file that was saved after each epoch of training a model in Keras, is it possible to extract the Graph session from that .h5 file, and if so, how?
I have several models saved in .h5 format but never called tf.get_session() during the saving of the model weights in h5 format.
with tf.session() as sess:
how to load this model using Tensorflow
TF 2.0 makes this a cinch, but how to solve this on Tensorflow version 1.14?
The end goal of this is to take a model saved with Keras as a .h5 file and do inference with it on Tensorflow Serving, which needs, to my knowledge, a protobuf file in .pb format.
https://medium.com/#pipidog/how-to-convert-your-keras-models-to-tensorflow-e471400b886a
I've tried keras_to_tensorflow:
https://github.com/amir-abdi/keras_to_tensorflow
The code to convert ModelCheckPoint saved in .h5 format to .pb format is shown below:
import tensorflow as tf
# The export path contains the name and the version of the model
tf.keras.backend.set_learning_phase(0) # Ignore dropout at inference
model = tf.keras.models.load_model('./model.h5')
export_path = './PlanetModel/1'
# Fetch the Keras session and save the model
# The signature definition is defined by the input and output tensors
# And stored with the default serving key
with tf.keras.backend.get_session() as sess:
tf.saved_model.simple_save(
sess,
export_path,
inputs={'input_image': model.input},
outputs={t.name:t for t in model.outputs})
For more information, please refer this article.
For other ways to do it, please refer this Stack Overflow Answer.
I've created a model using google clouds vision api. I spent countless hours labeling data, and trained a model. At the end of almost 20 hours of "training" the model, it's still hit and miss.
How can I iterate on this model? I don't want to lose the "learning" it's done so far.. It works about 3/5 times.
My best guess is that I should loop over the objects again, find where it's wrong, and label accordingly. But I'm not sure of the best method for that. Should I be labeling all images where it "misses" as TEST data images? Are there best practices or resources I can read on this topic?
I'm by no means an expert, but here's what I'd suggest in order of most to least important:
1) Add more data if possible. More data is always a good thing, and helps develop robustness with your network's predictions.
2) Add dropout layers to prevent over-fitting
3) Have a tinker with kernel and bias initialisers
4) [The most relevant answer to your question] Save the training weights of your model and reload them into a new model prior to training.
5) Change up the type of model architecture you're using. Then, have a tinker with epoch numbers, validation splits, loss evaluation formulas, etc.
Hope this helps!
EDIT: More information about number 4
So you can save and load your model weights during or after the model has trained. See here for some more in-depth information about saving.
Broadly, let's cover the basics. I'm assuming you're going through keras but the same applies for tf:
Saving the model after training
Simply call:
model_json = model.to_json()
with open("{Your_Model}.json", "w") as json_file:
json_file.write(model_json)
# serialize weights to HDF5
model.save_weights("{Your_Model}.h5")
print("Saved model to disk")
Loading the model
You can load the model structure from json like so:
from keras.models import model_from_json
json_file = open('{Your_Model.json}', 'r')
loaded_model_json = json_file.read()
json_file.close()
model = model_from_json(loaded_model_json)
And load the weights if you want to:
model.load_weights('{Your_Weights}.h5', by_name=True)
Then compile the model and you're ready to retrain/predict. by_name for me was essential to re-load the weights back into the same model architecture; leaving this out may cause an error.
Checkpointing the model during training
cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath={checkpoint_path},
save_weights_only=True,
verbose=1)
# Train the model with the new callback
model.fit(train_images,
train_labels,
epochs=10,
validation_data=(test_images,test_labels),
callbacks=[cp_callback]) # Pass callback to training
I have same my inception model in Pycharm using TensorFlow library. Every time I run the project, it starts training the Data set. I want to skip the training every time I run model because once the model has been save ,there is no need to train the data again and again. How I get to know my model has been save successfully? How can I apply the save model in same file?
You can save/restore/load your model using TensorFlow:
Save:
builder = tf.saved_model.builder.SavedModelBuilder(export_dir) with tf.Session(graph=tf.Graph()) as sess: ... builder.add_meta_graph_and_variables(sess,
[tag_constants.TRAINING],
signature_def_map=foo_signatures,
assets_collection=foo_assets,
strip_default_attrs=True)
...
builder.save()
Load:
with tf.Session(graph=tf.Graph()) as sess:
tf.saved_model.loader.load(sess, [tag_constants.TRAINING], export_dir)
...
For further reference: TensorFlow Guide on Saving a Model
Actually, once you have saved your model, some files will be saved to your directory with the extension .YAML, .h5 or .meta(for graph), you can check the accuracy of model by restoring from saved file, just for sanity check.
There is nice tutorial on this:
https://www.tensorflow.org/guide/saved_model
http://cv-tricks.com/tensorflow-tutorial/save-restore-tensorflow-models-quick-complete-tutorial/
If you are use keras-api to build model, then this link will be useful for saving and restoring https://keras.io/getting-started/faq/#how-can-i-save-a-keras-model
In Tensorflow, how can I save the weights and all other variables of the program after it has finished training? I would like to be able to use the model I trained later on. Thanks in advance.
You can define a saver object like this:
saver = tf.train.Saver(max_to_keep=5, keep_checkpoint_every_n_hours=1)
In this case, the saver is configured to keep the five most recent checkpoints and also to keep a checkpoint every hour during training.
The saver can then be called periodically in your main training loop with a call such as the following.
sess=tf.Session()
...
# Save the model every 100 iterations
if step % 100 == 0:
saver.save(sess, "./model", global_step=step)
In this example the saver is saving a checkpoint into the ./model subdirectory every 100 training steps. The optional parameter global_step appends this value to the checkpoint filenames.
The model weights and other values may be restored at a later time for additional training or inference by the following:
saver.restore(sess, path.model_checkpoint_path)
There are a variety of other useful variants and options. A good place to start learning about them is the TF how-to on variable creation, storage and retrieval here