I am trying to implement shared layers in Keras. I do see that Keras has keras.layers.concatenate, but I am unsure from documentation about its use. Can I use it to create multiple shared layers? What would be the best way to implement a simple shared neural network as shown below using Keras?
Edit 1:
Note that all the shape of input, output and shared layers for all 3 NNs are the same. There are multiple shared layers (and non-shared layers) in the three NNs. The coloured layers are unique to each NN, and have same shape.
Basically, the figure represents 3 identical NNs with multiple shared hidden layers, followed by multiple non-shared hidden layers.
I am unsure how to share multiple layers as in the Twitter example, there was just one shared layer (example in API doc).
Edit 2:
Based on geompalik's helpful comments, this is what I initially came up with:
sharedLSTM1 = LSTM((data.shape[1]), return_sequences=True)
sharedLSTM2 = LSTM(data.shape[1])
def createModel(dropoutRate=0.0, numNeurons=40, optimizer='adam'):
inputLayer = Input(shape=(timesteps, data.shape[1]))
sharedLSTM1Instance = sharedLSTM1(inputLayer)
sharedLSTM2Instance = sharedLSTM2(sharedLSTM1Instance)
dropoutLayer = Dropout(dropoutRate)(sharedLSTM2Instance)
denseLayer1 = Dense(numNeurons)(dropoutLayer)
denseLayer2 = Dense(numNeurons)(denseLayer1)
outputLayer = Dense(1, activation='sigmoid')(denseLayer2)
return (inputLayer, outputLayer)
inputLayer1, outputLayer1 = createModel()
inputLayer2, outputLayer2 = createModel()
model = Model(inputs=[inputLayer1, inputLayer2], outputs=[outputLayer1, outputLayer2])
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
In the above code, I expect that the LSTM Layers in the two models are shared, whereas the dropout and 2 dense layers are not shared. Is that correct?
If so, I do not need keras.layers.concatenate in this example, right?
I get the following image if I try to visualise the network using plot_model (which is what I was expecting):
Implementing the shown architecture is quite straight-forward with the functional API of Keras. Check this page for more information on that.
In your case you have the input layer and the first hidden layer shared, and then one layer for each of the three subjects. Designing your model is now a matter of how your data look: for instance, if for a given input you have different outputs for each subject, you can should define a model like:
model = Model(inputs=[you_main_input], outputs=[subject1_output, subject2_output, subject3_output])
If that is not the case, and you have training data corresponding to each of the subjects, you can define three NNs, and have the first two layers shared across them. Check under "Shared layers" in the above-cited documentation.
Related
I am trying to train an image classifier on an unbalanced training set. In order to cope with the class imbalance, I want either to weight the classes or the individual samples. Weighting the classes does not seem to work. And somehow for my setup I was not able to find a way to specify the samples weights. Below you can read how I load and encode the training data and the two approaches that I tried.
Training data loading and encoding
My training data is stored in a directory structure where each image is place in the subfolder corresponding to its class (I have 32 classes in total). Since the training data is too big too all load at once into memory I make use of image_dataset_from_directory and by that describe the data in a TF Dataset:
train_ds = keras.preprocessing.image_dataset_from_directory (training_data_dir,
batch_size=batch_size,
image_size=img_size,
label_mode='categorical')
I use label_mode 'categorical', so that the labels are described as a one-hot encoded vector.
I then prefetch the data:
train_ds = train_ds.prefetch(buffer_size=buffer_size)
Approach 1: specifying class weights
In this approach I try to specify the class weights of the classes via the class_weight argument of fit:
model.fit(
train_ds, epochs=epochs, callbacks=callbacks, validation_data=val_ds,
class_weight=class_weights
)
For each class we compute weight which are inversely proportional to the number of training samples for that class. This is done as follows (this is done before the train_ds.prefetch() call described above):
class_num_training_samples = {}
for f in train_ds.file_paths:
class_name = f.split('/')[-2]
if class_name in class_num_training_samples:
class_num_training_samples[class_name] += 1
else:
class_num_training_samples[class_name] = 1
max_class_samples = max(class_num_training_samples.values())
class_weights = {}
for i in range(0, len(train_ds.class_names)):
class_weights[i] = max_class_samples/class_num_training_samples[train_ds.class_names[i]]
What I am not sure about is whether this solution works, because the keras documentation does not specify the keys for the class_weights dictionary in case the labels are one-hot encoded.
I tried training the network this way but found out that the weights did not have a real influence on the resulting network: when I looked at the distribution of predicted classes for each individual class then I could recognize the distribution of the overall training set, where for each class the prediction of the dominant classes is most likely.
Running the same training without any class weight specified led to similar results.
So I suspect that the weights don't seem to have an influence in my case.
Is this because specifying class weights does not work for one-hot encoded labels, or is this because I am probably doing something else wrong (in the code I did not show here)?
Approach 2: specifying sample weight
As an attempt to come up with a different (in my opinion less elegant) solution I wanted to specify the individual sample weights via the sample_weight argument of the fit method. However from the documentation I find:
[...] This argument is not supported when x is a dataset, generator, or keras.utils.Sequence instance, instead provide the sample_weights as the third element of x.
Which is indeed the case in my setup where train_ds is a dataset. Now I really having trouble finding documentation from which I can derive how I can modify train_ds, such that it has a third element with the weight. I thought using the map method of a dataset can be useful, but the solution I came up with is apparently not valid:
train_ds = train_ds.map(lambda img, label: (img, label, class_weights[np.argmax(label)]))
Does anyone have a solution that may work in combination with a dataset loaded by image_dataset_from_directory?
I'm looking to do some image classification on PDF documents that I convert to images. I'm using tensorflow inception v3 pre trained model and trying to retrain the last layer with my own categories following the tensorflow tuto. I have ~1000 training images per category and only 4 categories. With 200k iterations I can reach up to 90% of successful classifications, which is not bad but still need some work:
The issue here is this pre-trained model takes only 300*300p images for input. Obviously it messes up a lot with the characters involved in the features I try to recognize in the documents.
Would it be possible to alter the model input layer so I can give him images with better resolution ?
Would I get better results with a home made and way simpler model ?
If so, where should I start to build a model for such image classification ?
If you want to use a different image resolution than the pre-trained model uses , you should use only the convolution blocks and have a set of fully connected blocks with respect to the new size. Using a higher level library like Keras will make it a lot easier. Below is an example on how to do that in Keras.
import keras
from keras.layers import Flatten,Dense,GlobalAveragePooling2D
from keras.models import Model
from keras.applications.inception_v3 import InceptionV3
base_model = InceptionV3(include_top=False,input_shape=(600,600,3),weights='imagenet')
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(1024,activation='relu')(x)
#Add as many dense layers / Fully Connected layers required
pred = Dense(10,activation='softmax')(x)
model = Model(base_model.input,pred)
for l in model.layers[:-3]:
l.trainable=False
The input_top = False will give you only the convolution blocks. You can use the input_shape=(600,600,3) to set the required shape you want. And you can add a couple of dense blocks/Fully connected blocks/layers to the model.The last layer should contain the required number of categories .10 represent the number of classes.By this approach you use all the weights associated with the convolution layers of the pre trained model and train only the last dense layers.
I'm trying to reshape the size of a convolution layer of a caffemodel (This is a follow-up question to this question). Although there is a tutorial on how to do net surgery, it only shows how to copy weight parameters from one caffemodel to another of the same size.
Instead I need to add a new channel (all 0) to my convolution filter such that it changes its size from currently (64x3x3x3) to (64x4x3x3).
Say the convolution layer is called 'conv1'. This is what I tried so far:
# Load the original network and extract the fully connected layers' parameters.
net = caffe.Net('../models/train.prototxt',
'../models/train.caffemodel',
caffe.TRAIN)
Now I can perform this:
net.blobs['conv1'].reshape(64,4,3,3);
net.save('myNewTrainModel.caffemodel');
But the saved model seems not to have changed. I've read that the actual weights of the convolution are stored rather in net.params['conv1'][0].data than in net.blobs but I can't figure out how to reshape the net.params object. Does anyone have an idea?
As you well noted, net.blobs does not store the learned parameters/weights, but rather stores the result of applying the filters/activations on the net's input. The learned weights are stored in net.params. (see this for more details).
AFAIK, you cannot directly reshape net.params and add a channel.
What you can do, is have two nets deploy_trained_net_with_3ch.prototxt and deploy_empty_net_with_4ch.prototxt. The two files can be almost identical apart from the input shape definition and the first layer's name.
Then you can load both nets to python and copy the relevant part:
net3ch = caffe.Net('deploy_trained_net_with_3ch.prototxt', 'train.caffemodel', caffe.TEST)
net4ch = caffe.Net('deploy_empty_net_with_4ch.prototxt', 'train.caffemodel', caffe.TEST)
since all layer names are identical (apart from conv1) net4ch.params will have the weights of train.caffemodel. As for the first layer, you can now manually copy the relevant part:
net4ch.params['conv1_4ch'][0].data[:,:3,:,:] = net3ch.params['conv1'][0].data[...]
and finally:
net4ch.save('myNewTrainModel.caffemodel')
I have the following problem:
I want to use a LSTM network for text classification. In order to speed up training and make code more clear I want to use an Embedding layer along keras.Tokenizer in order to train my model.
Once I trained my model - I want to compute a saliency map of output w.r.t. the input. To do that I decided to replace an Embedding layer with a TimeDistributedDense.
Do you have any idea what is the best way to do that. For a simple model I'm able to simply rebuild model with a known weights - but I want to make it as generic as possible - e.g. to replace the model structure the future and make my framework as model agnostic as possible.
You can create a new model with the existing layers after training without losing the trained information. Creating models does not overwrite values, it just rearranges the flow of tensors. For example, if your model looks like this:
# Create model for training, includes Embedding layer
in = Input(...)
em = Embedding(...)
l1 = LSTM(...)
l2 = TimeDistributed(Dense(...))
t_em = em(in)
t_l1 = l1(t_em)
t_l2 = l2(t_l1)
model_train = Model(inputs=[in], outputs=[t_l2])
model_train.compile(...)
model_train.fit(...)
# Create model for salience map, ignores Embedding layer
t_l1 = l1(in)
t_l2 = l2(in)
model_saliency = Model(inputs=[in], outputs=[t_l2])
As a side note, TimeDistributedDense is deprecated, use wrappers.TimeDistributed.
I'm working on implementation of LSTM Neural Network for sequence classification. I want to design a network with the following parameters:
Input : a sequence of n one-hot-vectors.
Network topology : two-layer LSTM network.
Output: a probability that a sequence given belong to a class (binary-classification). I want to take into account only last output from second LSTM layer.
I need to implement that in CNTK but I struggle because its documentation is not written really well. Can someone help me with that?
There is a sequence classification example that follows exactly what you're looking for.
The only difference is that it uses just a single LSTM layer. You can easily change this network to use multiple layers by changing:
LSTM_function = LSTMP_component_with_self_stabilization(
embedding_function.output, LSTM_dim, cell_dim)[0]
to:
num_layers = 2 # for example
encoder_output = embedding_function.output
for i in range(0, num_layers):
encoder_output = LSTMP_component_with_self_stabilization(encoder_output.output, LSTM_dim, cell_dim)
However, you'd be better served by using the new layers library. Then you can simply do this:
encoder_output = Stabilizer()(input_sequence)
for i in range(0, num_layers):
encoder_output = Recurrence(LSTM(hidden_dim)) (encoder_output.output)
Then, to get your final output that you'd put into a dense output layer, you can first do:
final_output = sequence.last(encoder_output)
and then
z = Dense(vocab_dim) (final_output)
here you can find a straightforward approach, just add the additional layer like:
Sequential([
Recurrence(LSTM(hidden_dim), go_backwards=False),
Recurrence(LSTM(hidden_dim), go_backwards=False),
Dense(label_dim, activation=sigmoid)
])
train it, test it and apply it...
CNTK published a hands-on tutorial for language understanding that has an end to end recipe:
This hands-on lab shows how to implement a recurrent network to process text, for the Air Travel Information Services (ATIS) task of slot tagging (tag individual words to their respective classes, where the classes are provided as labels in the training data set). We will start with a straight-forward embedding of the words followed by a recurrent LSTM. This will then be extended to include neighboring words and run bidirectionally. Lastly, we will turn this system into an intent classifier.
I'm not familiar with CNTK. But since the question has been left unanswered for so long, I can perhaps suggest some advice to help you with the implementation?
I'm not sure how experienced you are with these architectures; but before moving to CNTK (which seemingly has a less active community), I'd suggest looking at other popular repositories (like Theano, tensor-flow, etc.)
For instance, a similar task in theano is given here: kyunghyuncho tutorials. Just look for "def lstm_layer" for the definitions.
A torch example can be found in Karpathy's very popular tutorials
Hope this helps a bit..