I know that this example is supposed to illustrate how to add trainable parameters in a Python layer using the add_blob() method.
However, I am still unable to understand how this can be used to set the dimensions of the blob based on user defined parameters.
There is a better example here on how to write a Python layer here.
But here, the layer does not contain any trainable parameters.
Please explain how to write a custom Python layer with trainable parameters.
When you add a parameters blob using add_blob(), you can reshape the added blob, either in setup() method (right when you add it), or in the layer's reshape() method.
Related
Can I define my own activation function and use it in the TensorFlow Train API, i.e. the high level API with pre-defined estimators like DNNClassifier?
For example, I want to use this code but replace the activation function tf.nn.tanh with something my own:
tf.estimator.DNNClassifier(
feature_columns=feature_columns,
hidden_units=[5,10,5
n_classes=3,
optimizer=tf.train.ProximalAdagradOptimizer(learning_rate=0.01,
l1_regularization_strength=0.0001),
activation_fn=tf.nn.tanh)
If your custom function can be expressed in terms of built-in tensorflow ops, then it's fairly straightforward. For example:
DNNClassifier(feature_columns=feature_columns,
...,
activation_fn=lambda x: 2*tf.nn.tanh(x)+3*tf.nn.relu(x)+1)
In general, activation_fn can be a callable that accepts a tensor of arbitrary shape (because it'll be applied after each layer). Tensorflow will be able to backpropagate through this expression without any problem.
However, if you want a completely new custom op, not expressible via existing ones, you'll have to register it and compute its gradient manually. See this question for the details.
For eg.- the input with dimensions [10,1,224,224] is required to be reduced to [1,1,224,224] where [samples,channels,rows,columns] is the convention for the dimensions.
Then your problem is badly formuled, consider using [10,1,224,224] as input_shape and make batches of such tensors. Then use Averagepooling3D, see doc here.
You won't be able to make operations on batches with the usual layers, except maybe if you build your own custom layer : see here.
Keras enables adding a layer which calculates a user defined lambda function.
What I don't get is how Keras knows to calculate the gradient of this user defined function for the backpropagation.
That one of the benefit of using Theano/Tensorflow and libraries build on top of them. They can give you automatic gradient calculation of the mathematical functions and operations.
Keras gets them by calling:
# keras/theano_backend.py
def gradients(loss, variables):
return T.grad(loss, variables)
# keras/tensorflow_backend.py
def gradients(loss, variables):
'''Returns the gradients of `variables` (list of tensor variables)
with regard to `loss`.
'''
return tf.gradients(loss, variables, colocate_gradients_with_ops=True)
which are in turn called by the optimizers(keras/optimizers.py) grads = self.get_gradients(loss, params) to get the gradients which are used to write the update rule for all the params. params here are the trainable weights of the layers. But layers created by the Lambda functional layers don't have any trainable weights. But they affect the loss function though the forward prob and hence indirectly affect the calculation of the gradients of trainable weights of other layers.
The only time you need to write new gradient calculation is when you are defining a new basic mathematical operation/function. Also, when you write a custom loss function the auto grad almost always takes care of the gradient calculation. But optionally you can optimize training (not always) if you implement analytical gradient of your custom functions. For example softwax function can be expressed in exp, sum and div and auto grad can take care of it, but its analytical/symbolic grad is usually implemented in Theano/Tensorflow.
For implementing new Ops you can see the below links for that:
http://deeplearning.net/software/theano/extending/extending_theano.html
https://www.tensorflow.org/versions/r0.12/how_tos/adding_an_op/index.html
I'm trying to reshape the size of a convolution layer of a caffemodel (This is a follow-up question to this question). Although there is a tutorial on how to do net surgery, it only shows how to copy weight parameters from one caffemodel to another of the same size.
Instead I need to add a new channel (all 0) to my convolution filter such that it changes its size from currently (64x3x3x3) to (64x4x3x3).
Say the convolution layer is called 'conv1'. This is what I tried so far:
# Load the original network and extract the fully connected layers' parameters.
net = caffe.Net('../models/train.prototxt',
'../models/train.caffemodel',
caffe.TRAIN)
Now I can perform this:
net.blobs['conv1'].reshape(64,4,3,3);
net.save('myNewTrainModel.caffemodel');
But the saved model seems not to have changed. I've read that the actual weights of the convolution are stored rather in net.params['conv1'][0].data than in net.blobs but I can't figure out how to reshape the net.params object. Does anyone have an idea?
As you well noted, net.blobs does not store the learned parameters/weights, but rather stores the result of applying the filters/activations on the net's input. The learned weights are stored in net.params. (see this for more details).
AFAIK, you cannot directly reshape net.params and add a channel.
What you can do, is have two nets deploy_trained_net_with_3ch.prototxt and deploy_empty_net_with_4ch.prototxt. The two files can be almost identical apart from the input shape definition and the first layer's name.
Then you can load both nets to python and copy the relevant part:
net3ch = caffe.Net('deploy_trained_net_with_3ch.prototxt', 'train.caffemodel', caffe.TEST)
net4ch = caffe.Net('deploy_empty_net_with_4ch.prototxt', 'train.caffemodel', caffe.TEST)
since all layer names are identical (apart from conv1) net4ch.params will have the weights of train.caffemodel. As for the first layer, you can now manually copy the relevant part:
net4ch.params['conv1_4ch'][0].data[:,:3,:,:] = net3ch.params['conv1'][0].data[...]
and finally:
net4ch.save('myNewTrainModel.caffemodel')
i am reading a tutorial and there is an equation as shown in the image. I know that sign in the image called cross addition, but my question is is there any method in opencv that performs cross addition?
This 'plus in a circle' in this context most likely refers to Direct addition of Matrices
In particular, the unary notation ⊕I1..n refers to the construction of a diagonalised matrix of the matrices I.
For example, suppose we have:
There is no single method in OpenCV that performs this but you can easily use existing matrix operations to do it by:
Create output matrix of correct size and init with zeros
Iterate over matrices to be direct added and set appropriate esubrange of output matrix