PyTorch optimizer not reading parameters from my Model class dict - machine-learning

I'm utilizing the pyro package (pyro = combo of Python and PyTorch, https://pyro.ai/examples/normalizing_flows_i.html) for ML on Google Colab to try to do normalizing flows. When I try to set up the optimizer (using Adam), it tells me: type object 'NFModel' has no attribute 'params'.
Effectively I'm trying to use pyro's features to make a neural net with a few layers. The only failure point I have now for my model is the optimizer.
Class Definition Image:
class_def
Fails at: optimizer = torch.optim.Adam([{'params': NFModel.params.hiddenlayers.parameters()}], lr=LR)
As an aside, the reason for using pyro is normalizing flows require bijective transformations, i.e. if we consider the NN to be F, then PDF(x)=PDF(F(x))(F')^(-1). Pyro has this setup and recreating it would be otherwise way too cumbersome.
I looked up what caused previous failures for others and that was them not calling nn.Model when they built out their class, but I've added that
I've tried different combinations of things in the Adam line.
I've tried toying with the params piece in the class definition.

Related

How to export/save/load the actual AutoKeras "super" model, not the underlying tensorflow model

Is there a way to export/save/load a previously trained autokeras model? I understand I can use the following code to save/load the underlying tensorflow best model:
model = reg.export_model()
model.save(MODEL_FILEPATH, save_format="tf")
best_model = load_model(MODEL_FILEPATH, custom_objects=ak.CUSTOM_OBJECTS)
However, in practice that wouldn't work, since my data has been fitted by autokeras, which takes care of data preparation and scaling. I don't think I have access to what autokeras is doing to the input data (X) before actually fitting, so I can't actually use the exported tensorflow best model to predict labels for new samples with un-prepared and unscaled features.
Am I missing something major here?
Also I noticed that there are some binaries in the autokeras temporary dir. That dir seems to be generated automatically. Is there a way to use that dir to load the previously-fit autokeras "super" model?
Just using import pickle will do the job - https://github.com/keras-team/autokeras/issues/1081#issuecomment-645508111 :

Converting a pytorch model to nn.Module for exporting to onnx for lens studio

I am trying to convert pix2pix to a pb or onnx that can run in Lens Studio. Lens studio has strict requirements for the models. I am trying to export this pytorch model to onnx using this guide provided by lens studio. The issue is the pytorch model found here uses its own base class, when in the example it uses Module.nn, and therefore doesnt have methods/variables that the torch.onnx.export function needs to run. So far Ive run into its missing a variable called training and a method called train
Would it be worth it to try to modify the base model, or should I try to build it from scratch using nn.Module? Is there a way to make the pix2pix model inherit from both the abstract base class and nn.module? Am I not understanding the situation? The reason why I want to do it using the lens studio tutorial is because I have gotten it to export onnx different ways but Lens Studio wont accept those for various reasons.
Also this is my first time asking a SO question (after 6 years of coding), let me know if I make any mistakes and I can correct them. Thank you.
This is the important code from the tutorial creating a pytorch model for Lens Studio:
import torch
import torch.nn as nn
class Model(nn.Module):
def __init__(self):
super().__init__()
self.layer = nn.Conv2d(in_channels=3, out_channels=1,
kernel_size=3, stride=2, padding=1)
def forward(self, x):
out = self.layer(x)
out = nn.functional.interpolate(out, scale_factor=2,
mode='bilinear', align_corners=True)
out = torch.nn.functional.softmax(out, dim=1)
return out
I'm not going to include all the code from the pytorch model bc its large, but the beginning of the baseModel.py is
import os
import torch
from collections import OrderedDict
from abc import ABC, abstractmethod
from . import networks
class BaseModel(ABC):
"""This class is an abstract base class (ABC) for models.
To create a subclass, you need to implement the following five functions:
-- <__init__>: initialize the class; first call BaseModel.__init__(self, opt).
-- <set_input>: unpack data from dataset and apply preprocessing.
-- <forward>: produce intermediate results.
-- <optimize_parameters>: calculate losses, gradients, and update network weights.
-- <modify_commandline_options>: (optionally) add model-specific options and set default options.
"""
def __init__(self, opt):
"""Initialize the BaseModel class.
Parameters:
opt (Option class)-- stores all the experiment flags; needs to be a subclass of BaseOptions
When creating your custom class, you need to implement your own initialization.
In this function, you should first call <BaseModel.__init__(self, opt)>
Then, you need to define four lists:
-- self.loss_names (str list): specify the training losses that you want to plot and save.
-- self.model_names (str list): define networks used in our training.
-- self.visual_names (str list): specify the images that you want to display and save.
-- self.optimizers (optimizer list): define and initialize optimizers. You can define one optimizer for each network. If two networks are updated at the same time, you can use itertools.chain to group them. See cycle_gan_model.py for an example.
"""
self.opt = opt
self.gpu_ids = opt.gpu_ids
self.isTrain = opt.isTrain
self.device = torch.device('cuda:{}'.format(self.gpu_ids[0])) if self.gpu_ids else torch.device('cpu') # get device name: CPU or GPU
self.save_dir = os.path.join(opt.checkpoints_dir, opt.name) # save all the checkpoints to save_dir
if opt.preprocess != 'scale_width': # with [scale_width], input images might have different sizes, which hurts the performance of cudnn.benchmark.
torch.backends.cudnn.benchmark = True
self.loss_names = []
self.model_names = []
self.visual_names = []
self.optimizers = []
self.image_paths = []
self.metric = 0 # used for learning rate policy 'plateau'
and for pix2pix_model.py
import torch
from .base_model import BaseModel
from . import networks
class Pix2PixModel(BaseModel):
""" This class implements the pix2pix model, for learning a mapping from input images to output images given paired data.
The model training requires '--dataset_mode aligned' dataset.
By default, it uses a '--netG unet256' U-Net generator,
a '--netD basic' discriminator (PatchGAN),
and a '--gan_mode' vanilla GAN loss (the cross-entropy objective used in the orignal GAN paper).
pix2pix paper: https://arxiv.org/pdf/1611.07004.pdf
"""
#staticmethod
def modify_commandline_options(parser, is_train=True):
"""Add new dataset-specific options, and rewrite default values for existing options.
Parameters:
parser -- original option parser
is_train (bool) -- whether training phase or test phase. You can use this flag to add training-specific or test-specific options.
Returns:
the modified parser.
For pix2pix, we do not use image buffer
The training objective is: GAN Loss + lambda_L1 * ||G(A)-B||_1
By default, we use vanilla GAN loss, UNet with batchnorm, and aligned datasets.
"""
# changing the default values to match the pix2pix paper (https://phillipi.github.io/pix2pix/)
parser.set_defaults(norm='batch', netG='unet_256', dataset_mode='aligned')
if is_train:
parser.set_defaults(pool_size=0, gan_mode='vanilla')
parser.add_argument('--lambda_L1', type=float, default=100.0, help='weight for L1 loss')
return parser
def __init__(self, opt):
"""Initialize the pix2pix class.
Parameters:
opt (Option class)-- stores all the experiment flags; needs to be a subclass of BaseOptions
"""
(Also sidenote if you see this and it looks like no easy way out let me know, I know what its like seeing someone getting started in something who goes in to deep too early on)
You can definitely have your model inherit from both the base class and torch.nn.Module (python allows multiple inheritance). However you should take care about the conflicts if both inherited class have functions with identical names (I can see at least one : their base provide the eval function and so to nn.module).
However since you do not need the CycleGan, and a lot of the code is compatibility with their training environment, you'd probably better just re-implement the pix2pix. Just steal the code, have it inherit from nn.Module, copy-paste useful/mandatory functions from the base class, and have everything translated into clean pytorch code. You already have the forward function (which is the only requirement for a pytorch module).
All the subnetworks they use (like the resnet blocks) seem to inherit from nn.Module already so there is nothing to change here (double check that though)

Access Clients Loss while having keras tff NN models

I'm trying to obtain the losses of all clients in tensorflow model without luck. The answer to post how to print local outputs in tensorflow federated?
suggests to create our NN model from scratch. However, I already have my keras NN model. So is there a way to still access the local client losses without having to build NN from scratch?
I tried to use tff.federated_collect(), but not sure how is that possible.
This is partly my attempt:
trainer_Itr_Process = tff.learning.build_federated_averaging_process(model_fn_Federated,server_optimizer_fn=(lambda : tf.keras.optimizers.SGD(learning_rate=learn_rate)),client_weight_fn=None)
FLstate = trainer_Itr_Process.initialize()
#tff.learning.Model
def federated_output_computation():
return{
'num_examples': tff.federated_sum(metrics.num_examples),
'loss': tff.federated_mean(metrics.loss, metrics.num_examples),
'accuracy': tff.federated_mean(metrics.accuracy, metrics.num_examples),
'per_client/num_examples': tff.federated_collect(metrics.num_examples),
'per_client/loss': tff.federated_collect(metrics.loss),
'per_client/accuracy': tff.federated_collect(metrics.accuracy),
}
This is the error I received:
#tff.learning.Model
TypeError: object() takes no parameters
tff.learning.Model is not a decorator for functions, it is the class interface used by the tff.learning module.
Probably the best way to change the implementation of tff.learning.Model.federated_output_computation (what is recommended in how to print local outputs in tensorflow federated?) is to create your own subclass of tff.learning.Model, that implements a different federated_output_computation property. This would be close to re-implementing tff.learning.from_keras_model(), except providing the custom metric aggregation; so looking at the implementation (here) can be useful, but ingesting Keras models is non-trivial at the moment.

Why do they clone the entire model before training in torch?

I have been going through lots of Torch code recently. I have been noticing, usually after the model is constructed, it is cloned, like in following code :
siamese_1=siamese_1:cuda()
parameters,gradParameters = siamese_1:getParameters()
siamese_2=siamese_1:clone('weight','bias','gradWeight','gradBias')
siamese_net:add(siamese_1)
siamese_net:add(siamese_2)
siamese_1 being a constructed model.
It is difficult to understand why is this being done ?
This code is for performing fine-tuning over networks. Is from the this repository (line 122 to 126).
When you clone a model and specify some additional arguments (like 'weight, etc) the new model will share these parameters with the original one. Thus in your case the models siamese_1 and siamese_2 share their weights, bias and the corresponding gradients.
In the code you are looking at the authors want to create a network with two parallel networks sharing their weights, this is the reason why they use the clone function.

How to implement a sequence classification LSTM network in CNTK?

I'm working on implementation of LSTM Neural Network for sequence classification. I want to design a network with the following parameters:
Input : a sequence of n one-hot-vectors.
Network topology : two-layer LSTM network.
Output: a probability that a sequence given belong to a class (binary-classification). I want to take into account only last output from second LSTM layer.
I need to implement that in CNTK but I struggle because its documentation is not written really well. Can someone help me with that?
There is a sequence classification example that follows exactly what you're looking for.
The only difference is that it uses just a single LSTM layer. You can easily change this network to use multiple layers by changing:
LSTM_function = LSTMP_component_with_self_stabilization(
embedding_function.output, LSTM_dim, cell_dim)[0]
to:
num_layers = 2 # for example
encoder_output = embedding_function.output
for i in range(0, num_layers):
encoder_output = LSTMP_component_with_self_stabilization(encoder_output.output, LSTM_dim, cell_dim)
However, you'd be better served by using the new layers library. Then you can simply do this:
encoder_output = Stabilizer()(input_sequence)
for i in range(0, num_layers):
encoder_output = Recurrence(LSTM(hidden_dim)) (encoder_output.output)
Then, to get your final output that you'd put into a dense output layer, you can first do:
final_output = sequence.last(encoder_output)
and then
z = Dense(vocab_dim) (final_output)
here you can find a straightforward approach, just add the additional layer like:
Sequential([
Recurrence(LSTM(hidden_dim), go_backwards=False),
Recurrence(LSTM(hidden_dim), go_backwards=False),
Dense(label_dim, activation=sigmoid)
])
train it, test it and apply it...
CNTK published a hands-on tutorial for language understanding that has an end to end recipe:
This hands-on lab shows how to implement a recurrent network to process text, for the Air Travel Information Services (ATIS) task of slot tagging (tag individual words to their respective classes, where the classes are provided as labels in the training data set). We will start with a straight-forward embedding of the words followed by a recurrent LSTM. This will then be extended to include neighboring words and run bidirectionally. Lastly, we will turn this system into an intent classifier.
I'm not familiar with CNTK. But since the question has been left unanswered for so long, I can perhaps suggest some advice to help you with the implementation?
I'm not sure how experienced you are with these architectures; but before moving to CNTK (which seemingly has a less active community), I'd suggest looking at other popular repositories (like Theano, tensor-flow, etc.)
For instance, a similar task in theano is given here: kyunghyuncho tutorials. Just look for "def lstm_layer" for the definitions.
A torch example can be found in Karpathy's very popular tutorials
Hope this helps a bit..

Resources