I need to save and restore the graph to keep training from the last checkpoint, but somehow is not working.
I use saver = tf.train.Saver() to save the model. And:
with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as sess:
# Initializing saver
sess.run(tf.global_variables_initializer())
save_path = saver.save(sess,model_path+"/%s.ckpt"%model_name)
if flag == "initial_train":
training_loop(num_epochs)
flag = None
else:
new_saver = tf.train.import_meta_graph(model_path+"/%s.ckpt.meta"%model_name)
new_saver.restore(sess, save_path)
print("Model loaded")
training_loop(num_epochs)
I really don't know why it's not importing the weights
You are, on subsequent runs
Initializing all variables (so they will have their initial random/constant values) with sess.run(tf.global_variables_initializer()
Saving the initialized values to some file (saver.save(sess,model_path+"/%s.ckpt"%model_name))
Loading those randomly initialized values from that file
So you are just loading what you initialized and saved on line 3&4.
Also, I don't know how you pass information, but training_loop does not get a reference to the saver and you are not saving the model after the training loop, so it seems you are not actually saving your models anywhere.
Related
I was going through this post in the pytorch forum, and I also wanted to do this. The original post removes and adds layers but I think my situation is not that different. I also want to add layers or more filters or word embeddings. My main motivation is that the AI agent does not know the whole vocabulary/dictionary in advance because its large. I prefer strongly (for the moment) to not do character by character RNNs.
So what will happen for me is when the agent starts a forward pass it might find new words it has never seen and will need to add them to the embedding table (or perhaps add new filters before it starts the forward pass).
So what I want to make sure is:
embeddings are added correctly (at the right time, when a new computation graph is made) so that they are updatable by the optimizer
no issues with stored info of past parameters e.g. if its using some sort of momentum
How does one do this? Any sample code that works?
Just to add an answer to the title of your question: "How does one dynamically add new parameters to optimizers in Pytorch?"
You can append params at any time to the optimizer:
import torch
import torch.optim as optim
model = torch.nn.Linear(2, 2)
# Initialize optimizer
optimizer = optim.Adam(model.parameters(), lr=0.001, momentum=0.9)
extra_params = torch.randn(2, 2)
optimizer.param_groups.append({'params': extra_params })
#then you can print your `extra_params`
print("extra params", extra_params)
print("optimizer params", optimizer.param_groups)
That is a tricky question, as I would argue that the answer is "depends", in particular on how you want to deal with the optimizer.
Let's start with your specific problem - an embedding. In particular, you are asking on how to add embeddings to allow for a larger vocabulary dynamically. My first advice is, that if you have a good sense of an upper boundary of your vocabulary size, make the embedding large enough to cope with it from the beginning, as this is more efficient, and as you will need the memory eventually anyway. But this is not what you asked. So - to dynamically change your embedding, you'll need to overwrite your old one with a new one, and inform your optimizer of the change. You can simply do that whenever you run into an exception with your old embedding, in a try ... except block. This should roughly follow this idea:
# from within whichever module owns the embedding
# remember the already trained weights
old_embedding_weights = self.embedding.weight.data
# create a new embedding of the new size
self.embedding = nn.Embedding(new_vocab_size, embedding_dim)
# initialize the values for the new embedding. this does random, but you might want to use something like GloVe
new_weights = torch.randn(new_vocab_size, embedding_dim)
# as your old values may have been updated, you want to retrieve these updates values
new_weights[:old_vocab_size] = old_embedding_weights
self.embedding.weights.data.copy_(new_weights)
However, you should not do this for every single new word you receive, as this copying takes time (and a whole lot of memory, as the embedding exists twice for a short time - if you're nearly out memory, just make your embedding large enough from the start). So instead increase the size dynamically by a couple of hundred slots at a time.
Additionally, this first step already raises some questions:
How does my respective nn.Module know about the new embedding parameter?
The __setattr__ method of nn.Module takes care of that (see here)
Second, why don't I simply change my parameter? That's already pointing towards some of the problems of changing the optimizer: pytorch internally keeps references by object ID. This means that if you change your object, all these references will point towards a potentially incompatible object, as its properties have changed. So we should simply create a new parameter instead.
What about other nn.Parameters or nn.Modules that are not embeddings? These you treat the same. You basically just instantiate them, and attach them to their parent module. The __setattr__ method will take care of the rest. So you can do so completely dyncamically ...
Except, of course, the optimizer. The optimizer is the only other thing that "knows" about your parameters except for your main model-module. So you need to let the optimizer know of any change.
And this is tricky, if you want to be sophisticated about it, and very easy if you don't care about keeping the optimizer state. However, even if you want to be sophisticated about it, there is a very good reason why you probably should not do this anyways. More about that below.
Anyways, if you don't care, a simple
# simply overwrite your old optimizer
optimizer = optim.SGD(model.parameters(), lr=0.001)
will do. If you care, however, you want to transfer your old state, you can do so the same way that you can store, and later load parameters and optimizer states from disk: using the .state_dict() and .load_state_dict() methods. This, however, does only work with a twist:
# extract the state dict from your old optimizer
old_state_dict = optimizer.state_dict()
# create a new optimizer
optimizer = optim.SGD(model.parameters())
new_state_dict = optimizer.state_dict()
# the old state dict will have references to the old parameters, in state_dict['param_groups'][xyz]['params'] and in state_dict['state']
# you now need to find the parameter mismatches between the old and new statedicts
# if your optimizer has multiple param groups, you need to loop over them, too (I use xyz as a placeholder here. mostly, you'll only have 1 anyways, so just replace xyz with 0
new_pars = [p for p in new_state_dict['param_groups'][xyz]['params'] if not p in old_state_dict['param_groups'][xyz]['params']]
old_pars = [p for p in old_state_dict['param_groups'][xyz]['params'] if not p in new_state_dict['param_groups'][xyz]['params']]
# then you remove all the outdated ones from the state dict
for pid in old_pars:
old_state_dict['state'].pop(pid)
# and add a new state for each new parameter to the state:
for pid in new_pars:
old_state_dict['param_groups'][xyz]['params'].append(pid)
old_state_dict['state'][pid] = { ... } # your new state def here, depending on your optimizer
However, here's the reason why you should probably never update your optimizer like this, but should instead re-initialize from scratch, and just accept the loss of state information: When you change your computation graph, you change forward and backward computation for all parameters along your computation path (if you do not have a branching architecture, this path will be your entire graph). This more specifically means, that the input to your functions (=layer/nn.Module) will be different if you change some function (=layer/nn.Module) applied earlier, and the gradients will change if you change some function (=layer/nn.Module) applied later. That in turn invalidates the entire state of your optimizer. So if you keep your optimizer's state around, it will be a state computed for a different computation graph, and will probably end up in catastrophic behavior on part of your optimizer, if you try to apply it to a new computation graph. (I've been there ...)
So - to sum it up: I'd really recommend to try to keep it simple, and to only change a parameter as conservatively as possible, and not to touch the optimizer.
If you want to customize initial params:
from itertools import chain
l1 = nn.Linear(3,3)
l2 = nn.Linear(2,3)
optimizer = optim.SGD(chain(l1.parameters(), l2.parameters()), lr=0.01, momentum=0.9)
The key is that the first param of constructor receives iterator.
We are trying to use text classification example from the TensorFlow Examples (tensorflow/examples/learn/text_classification.py) . It works well with db_pedia data.
Now we are trying to save/restore the model using Saver but we are not getting where to use Saver APIs as code in text_classification.py doesn’t use Session at all and Saver API need session to save/restore.
This example uses tf.estimator.Estimator, which has a special method
export_savedmodel for saving.
In addition, you can specify model_dir in constructor:
Directory to save model parameters, graph and etc. This can also be
used to load checkpoints from the directory into a estimator to
continue training a previously saved model. If None, the model_dir in
config will be used if set. If both are set, they must be same. If
both are None, a temporary directory will be used.
I am having trouble understanding the graph argument in the tf.Session(). I tried looking up at the TensorFlow website :link but couldn't understand much.
I am trying to find out the different between tf.Session() and tf.Session(graph=some_graph_inserted_here).
Question Context
Code A (Not Working):
def predict():
with tf.name_scope("predict"):
with tf.Session() as sess:
saver = tf.train.import_meta_graph("saved_models/testing.meta")
saver.restore(sess, "saved_models/testing")
loaded_graph = tf.get_default_graph()
output_ = loaded_graph.get_tensor_by_name('loss/network/output_layer/BiasAdd:0')
_x = loaded_graph.get_tensor_by_name('x:0')
print sess.run(output_, feed_dict={_x: np.array([12003]).reshape([-1, 1])})
This code gives the following error: ValueError: cannot add op with name hidden_layer1/kernel/Adam as that name is already used when trying to load the graph at saver = tf.train.import_meta_graph("saved_models/testing.meta")
Code B (Working):
def predict():
with tf.name_scope("predict"):
loaded_graph = tf.Graph()
with tf.Session(graph=loaded_graph) as sess:
saver = tf.train.import_meta_graph("saved_models/testing.meta")
saver.restore(sess, "saved_models/testing")
output_ = loaded_graph.get_tensor_by_name('loss/network/output_layer/BiasAdd:0')
_x = loaded_graph.get_tensor_by_name('x:0')
print sess.run(output_, feed_dict={_x: np.array([12003]).reshape([-1, 1])})
The codes does not work if I replace loaded_graph = tf.Graph() with loaded_graph = tf.get_default_graph(). Why?
Full Code if it helps:
(https://gist.github.com/duemaster/f8cf05c0923ebabae476b83e895619ab)
The TensorFlow Graph is an object which contains your various tf.Tensor and tf.Operation.
When you create these tensors (e.g. using tf.Variable or tf.constant) or operations (e.g. tf.matmul), they will be added to the default graph (look at the graph member of these object to get the graph they belong to). If you haven't specified anything, it will be the graph you get when calling the tf.get_default_graph method.
But you could also work with multiple graphes using a context manager:
g = tf.Graph()
with g.as_default():
[your code]
Suppose you created several graphes in your code, you then need to put the graph you and to run as an argument of the tf.Session method to specify TensorFlow which one to run.
In Code A, you
work with the default graph,
try to import the meta graph into it (which fails because it already contains some of the nodes) and,
would restore the model into it,
while in Code B, you
create a fresh new graph,
import the meta graph into it (which succeeds because it's an empty graph) and
restore it.
Useful link:
tf.Graph API
Edit:
This piece of code makes the Code A work (I reset the default graph to a fresh one, and I removed the predict name_scope).
def predict():
tf.reset_default_graph()
with tf.Session() as sess:
saver = tf.train.import_meta_graph("saved_models/testing.meta")
saver.restore(sess, "saved_models/testing")
loaded_graph = tf.get_default_graph()
output_ = loaded_graph.get_tensor_by_name('loss/network/output_layer/BiasAdd:0')
_x = loaded_graph.get_tensor_by_name('x:0')
print(sess.run(output_, feed_dict={_x: np.array([12003]).reshape([-1, 1])}))
In Tensorflow, you are constructing graphs. By default, Tensorflow creates a default (sorry for tautology) graph (which you could access using tf.get_default_graph()). By default, any new Session object uses this default graph.
In your case, you already have a graph (which is a default one), and you also saved exactly this graph into meta file. Then, you are trying to recover this graph using tf.train.import_meta_graph(). However, since your session uses a default graph, and you are trying to recover an identical one, you are encountering an error since this operation is trying to duplicate the nodes, which is forbidden.
When you explicitly create a new graph object by calling tf.Graph() and create a Session object using this graph (but not the default one) everything is fine since the nodes are created in another graph.
The function tf.train.import_meta_graph("saved_models/testing.meta") add all the nodes from the meta file to the current graph. In the first code, the current graph is the default_graph which already has the ops defined, so the error. In the second case, you are loading the nodes to a new graph and so it works fine!.
When you create a Session you're placing a graph into a specified device.
If no graph is specified, the Session constructor tries to build a graph using the default one (that you can get using tf.get_default_graph).
Your code A doesn't work because in the current session already exists a graph and that graph already contains the same exact node you're trying to import.
Your code B works because you're placing into the Session a new empyt graph (created with tf.Graph()): when you import the graph definition there's no collision among the existing nodes in the current session (that are 0, because the graph is empty) and the ones you're importing
I'm trying to build a service that has 2 components. In component 1, I train a machine learning model using sklearn by creating a Pipeline. This model gets serialized using joblib.dump (really numpy_pickle.dump). Component 2 runs in the cloud, loads the model trained by (1), and uses it to label text that it gets as input.
I'm running into an issue where, during training (component 1) I need to first binarize my data since it is text data, which means that the model is trained on binarized input and then makes predictions using the mapping created by the binarizer. I need to get this mapping back when (2) makes predictions based on the model so that I can output the actual text labels.
I tried adding the binarizer to the pipeline like this, thinking that the model would then have the mapping itself:
p = Pipeline([
('binarizer', MultiLabelBinarizer()),
('vect', CountVectorizer(min_df=min_df, ngram_range=ngram_range)),
('tfidf', TfidfTransformer()),
('clf', OneVsRestClassifier(clf))
])
But I get the following error:
model = p.fit(training_features, training_tags)
*** TypeError: fit_transform() takes 2 positional arguments but 3 were given
My goal is to make sure the binarizer and model are tied together so that the consumer knows how to decode the model's output.
What are some existing paradigms for doing this? Should I be serializing the binarizer together with the model in some other object that I create? Is there some other way of passing the binarizer to Pipeline so that I don't have to do that, and would I be able to get the mappings back from the model if I did that?
Your intuition that you should add the MultiLabelBinarizer to the pipeline was the right way to solve this problem. It would have worked, except that MultiLabelBinarizer.fit_transform does not take the fit_transform(self, X, y=None) method signature which is now standard for sklearn estimators. Instead, it has a unique fit_transform(self, y) signature which I had never noticed before. As a result of this difference, when you call fit on the pipeline, it tries to pass training_tags as a third positional argument to a function with two positional arguments, which doesn't work.
The solution to this problem is tricky. The cleanest way I can think of to work around it is to create your own MultiLabelBinarizer that overrides fit_transform and ignores its third argument. Try something like the following.
class MyMLB(MultiLabelBinarizer):
def fit_transform(self, X, y=None):
return super(MultiLabelBinarizer, self).fit_transform(X)
Try adding this to your pipeline in place of the MultiLabelBinarizer and see what happens. If you're able to fit() the pipeline, the last problem that you'll have is that your new MyMLB class has to be importable on any system that will de-pickle your now trained, pickled pipeline object. The easiest way to do this is to put MyMLB into its own module and place a copy on the remote machine that will be de-pickling and executing the model. That should fix it.
I misunderstood how the MultiLabelBinarizer worked. It is a transformer of outputs, not of inputs. Not only does this explain the alternative fit_transform() method signature for that class, but it also makes it fundamentally incompatible with the idea of inclusion in a single classification pipeline which is limited to transforming inputs and making predictions of outputs. However, all is not lost!
Based on your question, you're already comfortable with serializing your model to disk as [some form of] a .pkl file. You should be able to also serialize a trained MultiLabelBinarizer, and then unpack it and use it to unpack the outputs from your pipeline. I know you're using joblib, but I'll write this up this sample code as if you're using pickle. I believe the idea will still apply.
X = <training_data>
y = <training_labels>
# Perform multi-label classification on class labels.
mlb = MultiLabelBinarizer()
multilabel_y = mlb.fit_transform(y)
p = Pipeline([
('vect', CountVectorizer(min_df=min_df, ngram_range=ngram_range)),
('tfidf', TfidfTransformer()),
('clf', OneVsRestClassifier(clf))
])
# Use multilabel classes to fit the pipeline.
p.fit(X, multilabel_y)
# Serialize both the pipeline and binarizer to disk.
with open('my_sklearn_objects.pkl', 'wb') as f:
pickle.dump((mlb, p), f)
Then, after shipping the .pkl files to the remote box...
# Hydrate the serialized objects.
with open('my_sklearn_objects.pkl', 'rb') as f:
mlb, p = pickle.load(f)
X = <input data> # Get your input data from somewhere.
# Predict the classes using the pipeline
mlb_predictions = p.predict(X)
# Turn those classes into labels using the binarizer.
classes = mlb.inverse_transform(mlb_predictions)
# Do something with predicted classes.
<...>
Is this the paradigm for doing this? As far as I know, yes. Not only that, but if you desire to keep them together (which is a good idea, I think) you can serialize them as a tuple as I did in the example above so they stay in a single file. No need to serialize a custom object or anything like that.
Model serialization via pickle et al. is the sklearn approved way to save estimators between runs and move them between computers. I've used this process successfully many times before, including in productions systems with success.
I am using a custom image set to train a neural network using Tensorflow API. After successful training process I get these checkpoint files containing values of different training var. I now want to get an inference model from these checkpoint files, I found this script which does that, which I can then use to generate deepdream images as explained in this tutorial. The problem is when I load my model using:
import tensorflow as tf
model_fn = 'export'
graph = tf.Graph()
sess = tf.InteractiveSession(graph=graph)
with tf.gfile.FastGFile(model_fn, 'rb') as f:
graph_def = tf.GraphDef()
graph_def.ParseFromString(f.read())
t_input = tf.placeholder(np.float32, name='input')
imagenet_mean = 117.0
t_preprocessed = tf.expand_dims(t_input-imagenet_mean, 0)
tf.import_graph_def(graph_def, {'input':t_preprocessed})
I get this error:
graph_def.ParseFromString(f.read())
self.MergeFromString(serialized)
raise message_mod.DecodeError('Unexpected end-group tag.')
google.protobuf.message.DecodeError: Unexpected end-group tag.
The script expect a protocol buffer file, I am not sure the script I am using to generate inference models is giving me proto buffer files or not.
Can someone please suggest what am I doing wrong, or is there a better way to achieve this. I simply want to convert checkpoint files generated by tensor to proto buffer.
Thanks
The link to the script you ran is broken, but in any case the recommended thing is not to try to generate an inference model from a checkpoint, but rather to embed code at the end of your training program that will emit a "SavedModel" export (which is not the same thing as a checkpoint).
Please see [1], and in particular the heading "Building a Saved Model". Note that a Saved Model constitutes multiple files, one of which is indeed a protocol buffer (which directly answers your question I hope); the others are variable files and (optional) asset files.
[1] https://www.tensorflow.org/programmers_guide/saved_model