how to do custom keras layer matrix multiplication - machine-learning

Layers:
Input shape (None,75)
Hidden layer 1 - shape is (75,3)
Hidden layer 2 - shape is (3,1)
For the last layer, the output must be calculated as ( (H21*w1)*(H22*w2)*(H23*w3)), where H21,H22,H23 will be the outcome of Hidden layer 2, and w1,w2,w3 will be constant weight which are not trainable. So how to write a lambda function for the above outcome
def product(X):
return X[0]*X[1]
keras_model = Sequential()
keras_model.add(Dense(75,
input_dim=75,activation='tanh',name="layer1" ))
keras_model.add(Dense(3 ,activation='tanh',name="layer2" ))
keras_model.add(Dense(1,name="layer3"))
cross1=keras_model.add(Lambda(lambda x:product,output_shape=(1,1)))([layer2,layer3])
print(cross1)
NameError: name 'layer2' is not defined

Use the functional API model
inputs = Input((75,)) #shape (batch, 75)
output1 = Dense(75, activation='tanh',name="layer1" )(inputs) #shape (batch, 75)
output2 = Dense(3 ,activation='tanh',name="layer2" )(output1) #shape (batch, 3)
output3 = Dense(1,name="layer3")(output2) #shape (batch, 1)
cross1 = Lambda(lambda x: x[0] * x[1])([output2, output3]) #shape (batch, 3)
model = Model(inputs, cross1)
Please notice that the shapes are totally different from what you expect.

I will suggest you to do it via a customized layer instead of the Lambda layer. Why? A customized will give you more freedom to do stuffs, and it is also more transparent in terms of viewing your desired weights. More precisely, if you do it through Lambda layer, the constant weight will not be saved as a part of the model, but it will if you use a customized layer.
Here is an example
from keras import backend as K
from keras.layers import *
from keras.models import *
import numpy as np
class MyLayer(Layer) :
# see https://keras.io/layers/writing-your-own-keras-layers/
def __init__(self,
w_vec=None,
allow_training=False,
**kwargs) :
self._w_vec = w_vec
assert allow_training or (w_vec is not None), \
"ERROR: non-trainable w_vec must be initialized"
self.allow_training = allow_training
super().__init__(**kwargs)
return
def build(self, input_shape) :
batch_size, num_feats = input_shape
self.w_vec = self.add_weight(shape=(1, num_feats),
name='w_vec',
initializer='uniform', # <- use your own preferred initializer
trainable=self.allow_training,)
if self._w_vec is not None :
# predefined w_vec
assert self._w_vec.shape[1] == num_feats, \
"ERROR: initial w_vec shape mismatches the input shape"
# set it to the weight
self.set_weights([self._w_vec]) # <- set weights to the supplied one
super().build(input_shape)
return
def call(self, x) :
# Given:
# x = [H21, H22, H23]
# w_vec = [w1, w2, w3]
# Step 1: output elem_prod
# elem_prod = [H21*w1, H22*w2, H23*w3]
elem_prod = x * self.w_vec
# Step 2: output ret
# ret = (H21*w1) * (H22*w2) * (H23*w3)
ret = K.prod(elem_prod, axis=-1, keepdims=True)
return ret
def compute_output_shape(self, input_shape) :
return (input_shape[0], 1)
def make_test_cases(w_vec=None, allow_training=False):
x = Input(shape=(75,))
y = Dense(75, activation='tanh', name='fc1')(x)
y = Dense(3, activation='tanh', name='fc2')(y)
y = MyLayer(w_vec, allow_training, name='core')(y)
y = Dense(1, name='fc3')(y)
net = Model(inputs=x, outputs=y, name='{}-{}'.format( 'randomInit' if w_vec is None else 'assignInit',
'trainable' if allow_training else 'nontrainable'))
print(net.name)
print(net.layers[-2].get_weights()[0])
print(net.summary())
return net
And you may run the following test cases to see the differences (pay attention to the first and the last lines in the print out, which gives you the initial values and the number of constant parameters, respectively)
a. Constant weights, non-trainable
m1 = make_test_cases(w_vec=np.arange(3).reshape([1,3]), allow_training=False)
will give you
assignInit-nontrainable [[0. 1. 2.]]
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_4 (InputLayer) (None, 75) 0
_________________________________________________________________
fc1 (Dense) (None, 75) 5700
_________________________________________________________________
fc2 (Dense) (None, 3) 228
_________________________________________________________________
core (MyLayer) (None, 1) 3
_________________________________________________________________
fc3 (Dense) (None, 1) 2
=================================================================
Total params: 5,933
Trainable params: 5,930
Non-trainable params: 3
_________________________________________________________________
b. Constant weights, trainable
m2 = make_test_cases(w_vec=np.arange(3).reshape([1,3]), allow_training=True)
will give you
assignInit-trainable [[0. 1. 2.]]
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_5 (InputLayer) (None, 75) 0
_________________________________________________________________
fc1 (Dense) (None, 75) 5700
_________________________________________________________________
fc2 (Dense) (None, 3) 228
_________________________________________________________________
core (MyLayer) (None, 1) 3
_________________________________________________________________
fc3 (Dense) (None, 1) 2
=================================================================
Total params: 5,933
Trainable params: 5,933
Non-trainable params: 0
_________________________________________________________________
c. Random weights, trainable
m3 = make_test_cases(w_vec=None, allow_training=True)
will give you
randomInit-trainable [[ 0.02650297 -0.02010062 -0.03771694]]
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_6 (InputLayer) (None, 75) 0
_________________________________________________________________
fc1 (Dense) (None, 75) 5700
_________________________________________________________________
fc2 (Dense) (None, 3) 228
_________________________________________________________________
core (MyLayer) (None, 1) 3
_________________________________________________________________
fc3 (Dense) (None, 1) 2
=================================================================
Total params: 5,933
Trainable params: 5,933
Non-trainable params: 0
_________________________________________________________________
Final remark
I will say it is unclear which case may work better in advance for your problem, but trying all three sounds like a good plan.

Related

Dense layer does not give expected Output shape

I am trying to copy a model architecture. In the original model architecture, after applying the last Dense layer Output Shape is (None, 3) with 300 params. As shown
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_Dense1 (Dense) (None, 100) 128100
dense_Dense2 (Dense) (None, 3) 300
But when I apply the Dense output shape I am getting is (None, 3) with 303 params. as shown below
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_35 (Dense) (None, 100) 128100
dense_36 (Dense) (None, 3) 303
This is the code I wrote for this part:
x = GlobalAveragePooling2D()(x)
x = Dense(100, activation="relu")(x)
prediction = Dense(3, activation='softmax')(x)
Is it possible that the architecture you're trying to copy doesn't use bias? Try not using bias:
Dense(3, activation='softmax', use_bias=False)
Model: "sequential_5"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
global_average_pooling2d_3 ( (None, 8) 0
_________________________________________________________________
dense_9 (Dense) (None, 100) 900
_________________________________________________________________
dense_10 (Dense) (None, 3) 300
=================================================================
Total params: 1,200
Trainable params: 1,200
Non-trainable params: 0
_________________________________________________________________

Keras autoencoder model for detect anomaly in text

I am trying to create an autoencoder that is capable of finding anomalies in text sequences:
X_train_pada_seq.shape
(28840, 999)
I want to use a layer Embedding. Here is my model:
encoder_inputs = Input(shape=(max_len_str, ))
encoder_emb = Embedding(input_dim=len(word_index)+1, output_dim=20, input_length=laenge_pads)(encoder_inputs)
encoder_LSTM_1 = Bidirectional(LSTM(400, activation='relu', return_sequences=True))(encoder_emb)
encoder_drop = Dropout(0.2)(encoder_LSTM_1)
encoder_LSTM_2 = Bidirectional(GRU(200, activation='relu', return_sequences=False, name = 'bottleneck'))(encoder_drop)
decoder_repeated = RepeatVector(200)(encoder_LSTM_2)
decoder_LSTM = Bidirectional(LSTM(400, activation='relu', return_sequences=True))(decoder_repeated)
decoder_drop = Dropout(0.2)(decoder_LSTM)
decoder_output = TimeDistributed(Dense(999, activation='softmax'))(decoder_drop)
autoencoder = Model(encoder_inputs, decoder_output)
autoencoder.compile(optimizer=keras.optimizers.Adam(learning_rate=0.0001), loss='categorical_crossentropy', metrics=['accuracy'])
autoencoder.summary()
Model: "model_7"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_10 (InputLayer) [(None, 999)] 0
_________________________________________________________________
embedding_19 (Embedding) (None, 999, 20) 159660
_________________________________________________________________
bidirectional (Bidirectional (None, 999, 800) 1347200
_________________________________________________________________
dropout (Dropout) (None, 999, 800) 0
_________________________________________________________________
bidirectional_1 (Bidirection (None, 400) 1202400
_________________________________________________________________
repeat_vector (RepeatVector) (None, 200, 400) 0
_________________________________________________________________
bidirectional_2 (Bidirection (None, 200, 800) 2563200
_________________________________________________________________
dropout_1 (Dropout) (None, 200, 800) 0
_________________________________________________________________
time_distributed_6 (TimeDist (None, 200, 999) 800199
=================================================================
Total params: 6,072,659
Trainable params: 6,072,659
Non-trainable params: 0
But when training the model:
history = autoencoder.fit(X_train_pada_seq, X_train_pada_seq, epochs=10, batch_size=64,
validation_data=(X_test_pada_seq, X_test_pada_seq))
I get an error:
ValueError: Shapes (None, 999) and (None, 200, 999) are incompatible
How to remake the model to fix the error?
I've seen your code snippet and it seems that your model output need to match your target shape which is (None, 999), but your output shape is (None, 200, 999).
You need to make your output model shape match the target shape.
Try using tf.reduce_mean with axis=1 (averages all the sequence):
decoder_drop = Dropout(0.2)(decoder_LSTM)
decoder_time = TimeDistributed(Dense(999, activation='softmax'))(decoder_drop)
decoder_output = tf.math.reduce_mean(decoder_time, axis=1)
This should let you fit the model.
your last layer (output) should be of this shape
batchsize x 999 x 200) #999 words, 200 is dim of each word
Currently the output of your model is
batchsize x 200 x 999
which is incorrect.
use sparse categorical cross entropy as loss function.
then it will work.

How to add more layers to existing model (eg. teachable machine application model)?

I'm trying to use the google model from teachable machine application https://teachablemachine.withgoogle.com/ by adding few more layers before output layers.
When I retrain the model, always return this error:
ValueError: Input 0 of layer dense_25 is incompatible with the layer: expected axis -1 of input shape to have value 5 but received input with shape [20, 512]
Here's my approach:
When retrain the model it return error:
If I retrain the model without adding new layers, it's working fine.
Can anybody advise what was the issue?
UPDATED ANSWER
if you want to add layers in between two layers for a pre-trained model, it is not as straightforward as adding layers using add method. if done so will result in un-expected behavior
analysis of error:
if you compile the model like below(like you specified):
model.layers[-1].add(Dense(512, activation ="relu"))
model.add(Dense(128, activation="relu"))
model.add(Dense(32))
model.add(Dense(5))
output of model summary :
Model: "sequential_12"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
sequential_9 (Sequential) (None, 1280) 410208
_________________________________________________________________
sequential_11 (Sequential) (None, 512) 131672
_________________________________________________________________
dense_12 (Dense) (None, 128) 768
_________________________________________________________________
dense_13 (Dense) (None, 32) 4128
_________________________________________________________________
dense_14 (Dense) (None, 5) 165
=================================================================
Total params: 546,941
Trainable params: 532,861
Non-trainable params: 14,080
_________________________________________________________________
everything looks good here, but on closer look :
for l in model.layers:
print("layer : ", l.name, ", expects input of shape : ",l.input_shape)
output :
layer : sequential_9 , expects input of shape : (None, 224, 224, 3)
layer : sequential_11 , expects input of shape : (None, 1280)
layer : dense_12 , expects input of shape : (None, 5) <-- **PROBLEM**
layer : dense_13 , expects input of shape : (None, 128)
layer : dense_14 , expects input of shape : (None, 32)
PROBLEM here is that dense_12 expects an input of shape(None, 5) but it should expect input shape of (None, 512) since we have added Dense(512) to sequential_11, possible reason would be adding layers like above specified might not update few attributes such as output shape of sequential_11, so during forward pass there is as miss-match between output of sequential_11 and input of layer dense_12(in your case dense_25)
possible work around would be :
for your question "adding layers in between sequential_9 and sequential_11", you can add as many layers as you want in between sequential_9 and sequential_11, but always make sure that output shape of last added layer should match input shape expected by sequential_11. in this case it is 1280.
code :
sequential_1 = model.layers[0] # re-using pre-trained model
sequential_2 = model.layers[1]
from tensorflow.keras.layers import Input
from tensorflow.keras.layers import Dense
from tensorflow.keras.models import Model
inp_sequential_1 = Input(sequential_1.layers[0].input_shape[1:])
out_sequential_1 = sequential_1(inp_sequential_1)
#adding layers in between sequential_9 and sequential_11
out_intermediate = Dense(512, activation="relu")(out_sequential_1)
out_intermediate = Dense(128, activation ="relu")(out_intermediate)
out_intermediate = Dense(32, activation ="relu")(out_intermediate)
# always make sure to include a layer with output shape matching input shape of sequential 11, in this case 1280
out_intermediate = Dense(1280, activation ="relu")(out_intermediate)
output = sequential_2(out_intermediate) # output of intermediate layers are given to sequential_11
final_model = Model(inputs=inp_sequential_1, outputs=output)
output of model summary:
Model: "functional_3"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_5 (InputLayer) [(None, 224, 224, 3)] 0
_________________________________________________________________
sequential_9 (Sequential) (None, 1280) 410208
_________________________________________________________________
dense_15 (Dense) (None, 512) 655872
_________________________________________________________________
dense_16 (Dense) (None, 128) 65664
_________________________________________________________________
dense_17 (Dense) (None, 32) 4128
_________________________________________________________________
dense_18 (Dense) (None, 1280) 42240
_________________________________________________________________
sequential_11 (Sequential) (None, 5) 128600
=================================================================
Total params: 1,306,712
Trainable params: 1,292,632
Non-trainable params: 14,080

What is the effect of using TimeDistributed layer wrapper?

Consider the following two models:
from tensorflow.python.keras.layers import Input, GRU, Dense, TimeDistributed
from tensorflow.python.keras.models import Model
inputs = Input(batch_shape=(None, None, 100))
gru_out = GRU(32, return_sequences=True)(inputs)
dense = Dense(200, activation='softmax')
decoder_pred = TimeDistributed(dense)(gru_out)
model = Model(inputs=inputs, outputs=decoder_pred)
model.summary()
with the output:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, None, 100) 0
_________________________________________________________________
gru (GRU) (None, None, 32) 12768
_________________________________________________________________
time_distributed (TimeDistri (None, None, 200) 6600
=================================================================
Total params: 19,368
Trainable params: 19,368
Non-trainable params: 0
_________________________________________________________________
And the second model:
from tensorflow.python.keras.layers import Input, GRU, Dense
from tensorflow.python.keras.models import Model
inputs = Input(batch_shape=(None, None, 100))
gru_out = GRU(32, return_sequences=True)(inputs)
decoder_pred = Dense(200, activation='softmax')(gru_out)
model = Model(inputs=inputs, outputs=decoder_pred)
model.summary()
with the output:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_2 (InputLayer) (None, None, 100) 0
_________________________________________________________________
gru_1 (GRU) (None, None, 32) 12768
_________________________________________________________________
dense_1 (Dense) (None, None, 200) 6600
=================================================================
Total params: 19,368
Trainable params: 19,368
Non-trainable params: 0
_________________________________________________________________
My question is, is the TimeDistributed layer wrapper doing anything to the first model? Are these two different in any aspect (considering that their total number of params are identical)?

Keras target dimensions mismatch

Attempting a single-label classification problem with num_classes = 73
Here's my simplified Keras model:
num_classes = 73
batch_size = 4
train_data_list = [training_file_names list here..]
validation_data_list = [ validation_file_names list here..]
training_generator = DataGenerator(train_data_list, batch_size, num_classes)
validation_generator = DataGenerator(validation_data_list, batch_size, num_classes)
model = Sequential()
model.add(Conv1D(32, 3, strides=1, input_shape=(15,120), activation="relu"))
model.add(Conv1D(16, 3, strides=1, activation="relu"))
model.add(Flatten())
model.add(Dense(n_classes, activation='softmax'))
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss="categorical_crossentropy",optimizer=sgd,metrics=['accuracy'])
model.fit_generator(generator=training_generator, epochs=100,
validation_data=validation_generator)
Here's my DataGenerator's __get_item__ method:
def __get_item__(self):
X = np.zeros((self.batch_size,15,120))
y = np.zeros((self.batch_size, 1 ,self.n_classes))
for i in range(self.batch_size):
X_row = some_method_that_gives_X_of_15x20_dim()
target = some_method_that_gives_target()
one_hot = keras.utils.to_categorical(target, num_classes=self.n_classes)
X[i] = X_row
y[i] = one_hot
return X, y
Since my X values are correctly returned with dimension (batch_size, 15, 120), I am not showing it here. My issue is with the y value returned.
y returned from this generator method has a shape of (batch_size, 1, 73) as one hot encoded label for the 73 classes, which I think is the correct shape to return.
However Keras gives the following error for the last layer:
ValueError: Error when checking target: expected dense_1 to have 2
dimensions, but got array with shape (4, 1, 73)
Since the batch size is 4, I think the target batch should also be 3 dimensional (4,1,73). Why is then Keras expecting the last layer to be 2 dimensions ?
you model' s summary shows that in the output layer there should be only 2 dimensions, (None, 73)
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv1d_7 (Conv1D) (None, 13, 32) 11552
_________________________________________________________________
conv1d_8 (Conv1D) (None, 11, 16) 1552
_________________________________________________________________
flatten_5 (Flatten) (None, 176) 0
_________________________________________________________________
dense_4 (Dense) (None, 73) 12921
=================================================================
Total params: 26,025
Trainable params: 26,025
Non-trainable params: 0
_________________________________________________________________
Since dimension of your target is (batch_size, 1, 73), you can just change to (batch_size, 73) in order for your model to run

Resources