Why does Trax automatically create a Serial layer over a sublayer? - machine-learning

I implemented a Serial Layer in Trax (deep learning library by Google). Why does an additional Serial layer is created in spite of already declaring one?
Below is the code.
model = tl.Serial(
tl.Dense(n_units=512),
tl.Relu()
)
print(model)
The output is:
Serial[
Dense_512
Serial[
Relu
]
]
While it should have been
Serial[
Dense_512
Relu
]

Related

getting values between layers of a neural network using functional API

I am using below simple neural network for classification
model2 = keras.Sequential([
keras.layers.Flatten(input_shape=(X.shape[1], X.shape[2])),
keras.layers.Dense(64, activation ='relu'),
keras.layers.Dense(32, activation ='relu'),
keras.layers.Dense(16, activation ='relu'),
keras.layers.Dense(5, activation ='softmax')
])
I would like to check how the values for different classes change after going through each layer using the Euclidean distance to see which layer is the most useful and which is the least.
Does it make sense? If so, how can I do it using functional API? I encountered the first problems at the very beginning - they are related to the shape of the data - originally it is (1991, 13, 1292).

Can the number of units in NN input layer be different than the number of features in the data?

Based on the tensorflow keras API tutorial;
model = keras.Sequential([
keras.layers.Dense(10, activation='softmax', input_shape=(32,)),
keras.layers.Dense(10, activation='softmax')
])
I couldn't understand that why the number of units in the input layer is 10 while the input shape is 32. Also, there are many examples like this one in the tensorflow tutorials.
This is a rather common confusion by new practitioners, and not without a reason: the answer, as it has already been hinted at in the comments, is that in the Keras Sequential API there is an implicit input layer, determined by the input_shape argument of the first explicit layer.
This is directly visible in the Keras Functional API (check the example in the docs), where Input is an explicit layer itself, and in which your model would be written as:
inputs = Input(shape=(32,)) # input layer
x = Dense(10, activation='softmax')(inputs) # hidden layer
outputs = Dense(10, activation='softmax')(x) # output layer
model = Model(inputs, outputs)
i.e. your model is actually an example of a "good old" neural net with three layers (input, hidden, and output), despite that it looks like a two-layer net in the Keras Sequential API.
(BTW, and irrelevant to the question, it does not make much sense to have softmax as activation for your hidden layer.)

Multi task learning in Keras

I am trying to implement shared layers in Keras. I do see that Keras has keras.layers.concatenate, but I am unsure from documentation about its use. Can I use it to create multiple shared layers? What would be the best way to implement a simple shared neural network as shown below using Keras?
Edit 1:
Note that all the shape of input, output and shared layers for all 3 NNs are the same. There are multiple shared layers (and non-shared layers) in the three NNs. The coloured layers are unique to each NN, and have same shape.
Basically, the figure represents 3 identical NNs with multiple shared hidden layers, followed by multiple non-shared hidden layers.
I am unsure how to share multiple layers as in the Twitter example, there was just one shared layer (example in API doc).
Edit 2:
Based on geompalik's helpful comments, this is what I initially came up with:
sharedLSTM1 = LSTM((data.shape[1]), return_sequences=True)
sharedLSTM2 = LSTM(data.shape[1])
def createModel(dropoutRate=0.0, numNeurons=40, optimizer='adam'):
inputLayer = Input(shape=(timesteps, data.shape[1]))
sharedLSTM1Instance = sharedLSTM1(inputLayer)
sharedLSTM2Instance = sharedLSTM2(sharedLSTM1Instance)
dropoutLayer = Dropout(dropoutRate)(sharedLSTM2Instance)
denseLayer1 = Dense(numNeurons)(dropoutLayer)
denseLayer2 = Dense(numNeurons)(denseLayer1)
outputLayer = Dense(1, activation='sigmoid')(denseLayer2)
return (inputLayer, outputLayer)
inputLayer1, outputLayer1 = createModel()
inputLayer2, outputLayer2 = createModel()
model = Model(inputs=[inputLayer1, inputLayer2], outputs=[outputLayer1, outputLayer2])
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
In the above code, I expect that the LSTM Layers in the two models are shared, whereas the dropout and 2 dense layers are not shared. Is that correct?
If so, I do not need keras.layers.concatenate in this example, right?
I get the following image if I try to visualise the network using plot_model (which is what I was expecting):
Implementing the shown architecture is quite straight-forward with the functional API of Keras. Check this page for more information on that.
In your case you have the input layer and the first hidden layer shared, and then one layer for each of the three subjects. Designing your model is now a matter of how your data look: for instance, if for a given input you have different outputs for each subject, you can should define a model like:
model = Model(inputs=[you_main_input], outputs=[subject1_output, subject2_output, subject3_output])
If that is not the case, and you have training data corresponding to each of the subjects, you can define three NNs, and have the first two layers shared across them. Check under "Shared layers" in the above-cited documentation.

How do I obtain the layer names for use in the iOS sample app? (Tensorflow)

I'm very new to Tensorflow, and I'm trying to train something using the inception v3 network for use in an iPhone app. I managed to export my graph as a protocolbuffer file, manually remove the dropout nodes (correctly, I hope), and have placed that .pb file in my iOS project, but now I am receiving the following error:
Running model failed:Not found: FeedInputs: unable to find feed output input
which seems to indicate that my input_layer_name and output_layer_name variables in the iOS app are misconfigured.
I see in various places that it should be Mul and softmax respectively, for inception v3, but these values don't work for me.
My question is: what is a layer (with regards to this context), and how do I find out what mine are?
This is the exact definition of the model that I trained, but I don't see "Mul" or "softmax" present.
This is what I've been able to learn about layers, but it seems to be a different concept, since "Mul" isn't present in that list.
I'm worried that this might be a duplicate of this question but "layers" aren't explained (are they tensors?) and graph.get_operations() seems to be deprecated, or maybe I'm using it wrong.
As MohamedEzz wrote there are no layers in Tensorflow graphs. There are only operations that can be placed under the same name scope.
Usually operations of a single layer placed under the same scope and applications that aware of name scope concept can display them grouped.
One of such applications is Tensorboard. I believe that using Tensorboard is the easiest way to find node names.
Consider the following example:
import tensorflow as tf
import tensorflow.contrib.slim.nets as nets
input_placeholder = tf.placeholder(tf.float32, shape=(None, 224, 224, 3))
network = nets.inception.inception_v3(input_placeholder)
writer = tf.summary.FileWriter('.', tf.get_default_graph())
writer.close()
It creates placeholder for input data then creates Inception v3 network and saves event data (with graph) in current directory.
Launching Tensorflow in the same directory makes it possible to view graph structure.
tensorboard --logdir .
Tensorboard prints UI url to the console
Starting TensorBoard 41 on port 6006
(You can navigate to http://192.168.128.73:6006)
Below is an image of this graph.
Locate node you are interested in and select it to find its name (in the upper left information pane).
Input:
Output:
Please note that usually you need not node names but tensor names. In most cases it is enough to add :0 to node name to get tensor name.
For example to run Inception v3 network created above using names from the graph use the following code (continuation of the above code):
import numpy as np
data = np.random.randn(1, 224, 224, 3) # just random data
session = tf.InteractiveSession()
session.run(tf.global_variables_initializer())
result = session.run('InceptionV3/Predictions/Softmax:0', feed_dict={'Placeholder:0': data})
# result.shape = (1, 1000)
In the core of tensorflow, there are ops (operations) and tensors (n-dimensional arrays). Each op takes tensors and gives back tensors. Layers are just convenience wrappers around a number of ops that represent a neural network layer.
For example a convolution layer is composed of mainly 3 ops :
conv2d op : this is what slides a kernel over the input tensor and does element-wise multiplication between the kernel and the underlying input window.
bias_add op : adds the biases to the tensor coming out of the conv2d op
activation op : applies an activation function element-wise to the output tensor of the bias_add op
To run a tensorflow model, you provide feeds (inputs) and fetches (desired outputs). These are tensors, or tensor names.
From this line of code Inception_model, it seems that what you need is a tensor named 'predictions' which has the n_class output probabilities.
What you observed (softmax) is the type of the op that produced the predictions tensor
As for the input tensor name, the inception_model.py code does not show the input tensor name, since it's an argument to the function. So it depends on what name you have given to that input tensor.
When you create your layers or variable add the parameter called name
with tf.name_scope("output"):
W2 = tf.Variable(tf.truncated_normal([num_filters, num_classes], stddev=0.1), name="W2")
b2 = tf.Variable(tf.constant(0.1, shape=[num_classes]), name="b2")
scores = tf.nn.xw_plus_b(h_pool_flat, W2, b2, name="scores")
pred_y = tf.nn.softmax(scores,name="pred_y")
In this case I can access final predicted values by using "output/pred_y". If you dont have name_scope, you can just use "pred_y" to get to the values
conv = tf.nn.conv1d(word_embeddedings,
W1,
stride=stride_size,
padding="VALID",
name="conv") #will have dimensions [batch_size,out_width,num_filters] out_width is a function of max_words,filter_size and stride_size
# Apply nonlinearity
h = tf.nn.relu(tf.nn.bias_add(conv, b1), name="relu")
I called the layer "conv" and used it in the next layer. Paste your snippet like I have done here

Net surgery: How to reshape a convolution layer of a caffemodel file in caffe?

I'm trying to reshape the size of a convolution layer of a caffemodel (This is a follow-up question to this question). Although there is a tutorial on how to do net surgery, it only shows how to copy weight parameters from one caffemodel to another of the same size.
Instead I need to add a new channel (all 0) to my convolution filter such that it changes its size from currently (64x3x3x3) to (64x4x3x3).
Say the convolution layer is called 'conv1'. This is what I tried so far:
# Load the original network and extract the fully connected layers' parameters.
net = caffe.Net('../models/train.prototxt',
'../models/train.caffemodel',
caffe.TRAIN)
Now I can perform this:
net.blobs['conv1'].reshape(64,4,3,3);
net.save('myNewTrainModel.caffemodel');
But the saved model seems not to have changed. I've read that the actual weights of the convolution are stored rather in net.params['conv1'][0].data than in net.blobs but I can't figure out how to reshape the net.params object. Does anyone have an idea?
As you well noted, net.blobs does not store the learned parameters/weights, but rather stores the result of applying the filters/activations on the net's input. The learned weights are stored in net.params. (see this for more details).
AFAIK, you cannot directly reshape net.params and add a channel.
What you can do, is have two nets deploy_trained_net_with_3ch.prototxt and deploy_empty_net_with_4ch.prototxt. The two files can be almost identical apart from the input shape definition and the first layer's name.
Then you can load both nets to python and copy the relevant part:
net3ch = caffe.Net('deploy_trained_net_with_3ch.prototxt', 'train.caffemodel', caffe.TEST)
net4ch = caffe.Net('deploy_empty_net_with_4ch.prototxt', 'train.caffemodel', caffe.TEST)
since all layer names are identical (apart from conv1) net4ch.params will have the weights of train.caffemodel. As for the first layer, you can now manually copy the relevant part:
net4ch.params['conv1_4ch'][0].data[:,:3,:,:] = net3ch.params['conv1'][0].data[...]
and finally:
net4ch.save('myNewTrainModel.caffemodel')

Resources