Dense layer does not give expected Output shape - machine-learning

I am trying to copy a model architecture. In the original model architecture, after applying the last Dense layer Output Shape is (None, 3) with 300 params. As shown
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_Dense1 (Dense) (None, 100) 128100
dense_Dense2 (Dense) (None, 3) 300
But when I apply the Dense output shape I am getting is (None, 3) with 303 params. as shown below
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_35 (Dense) (None, 100) 128100
dense_36 (Dense) (None, 3) 303
This is the code I wrote for this part:
x = GlobalAveragePooling2D()(x)
x = Dense(100, activation="relu")(x)
prediction = Dense(3, activation='softmax')(x)

Is it possible that the architecture you're trying to copy doesn't use bias? Try not using bias:
Dense(3, activation='softmax', use_bias=False)
Model: "sequential_5"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
global_average_pooling2d_3 ( (None, 8) 0
_________________________________________________________________
dense_9 (Dense) (None, 100) 900
_________________________________________________________________
dense_10 (Dense) (None, 3) 300
=================================================================
Total params: 1,200
Trainable params: 1,200
Non-trainable params: 0
_________________________________________________________________

Related

Keras autoencoder model for detect anomaly in text

I am trying to create an autoencoder that is capable of finding anomalies in text sequences:
X_train_pada_seq.shape
(28840, 999)
I want to use a layer Embedding. Here is my model:
encoder_inputs = Input(shape=(max_len_str, ))
encoder_emb = Embedding(input_dim=len(word_index)+1, output_dim=20, input_length=laenge_pads)(encoder_inputs)
encoder_LSTM_1 = Bidirectional(LSTM(400, activation='relu', return_sequences=True))(encoder_emb)
encoder_drop = Dropout(0.2)(encoder_LSTM_1)
encoder_LSTM_2 = Bidirectional(GRU(200, activation='relu', return_sequences=False, name = 'bottleneck'))(encoder_drop)
decoder_repeated = RepeatVector(200)(encoder_LSTM_2)
decoder_LSTM = Bidirectional(LSTM(400, activation='relu', return_sequences=True))(decoder_repeated)
decoder_drop = Dropout(0.2)(decoder_LSTM)
decoder_output = TimeDistributed(Dense(999, activation='softmax'))(decoder_drop)
autoencoder = Model(encoder_inputs, decoder_output)
autoencoder.compile(optimizer=keras.optimizers.Adam(learning_rate=0.0001), loss='categorical_crossentropy', metrics=['accuracy'])
autoencoder.summary()
Model: "model_7"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_10 (InputLayer) [(None, 999)] 0
_________________________________________________________________
embedding_19 (Embedding) (None, 999, 20) 159660
_________________________________________________________________
bidirectional (Bidirectional (None, 999, 800) 1347200
_________________________________________________________________
dropout (Dropout) (None, 999, 800) 0
_________________________________________________________________
bidirectional_1 (Bidirection (None, 400) 1202400
_________________________________________________________________
repeat_vector (RepeatVector) (None, 200, 400) 0
_________________________________________________________________
bidirectional_2 (Bidirection (None, 200, 800) 2563200
_________________________________________________________________
dropout_1 (Dropout) (None, 200, 800) 0
_________________________________________________________________
time_distributed_6 (TimeDist (None, 200, 999) 800199
=================================================================
Total params: 6,072,659
Trainable params: 6,072,659
Non-trainable params: 0
But when training the model:
history = autoencoder.fit(X_train_pada_seq, X_train_pada_seq, epochs=10, batch_size=64,
validation_data=(X_test_pada_seq, X_test_pada_seq))
I get an error:
ValueError: Shapes (None, 999) and (None, 200, 999) are incompatible
How to remake the model to fix the error?
I've seen your code snippet and it seems that your model output need to match your target shape which is (None, 999), but your output shape is (None, 200, 999).
You need to make your output model shape match the target shape.
Try using tf.reduce_mean with axis=1 (averages all the sequence):
decoder_drop = Dropout(0.2)(decoder_LSTM)
decoder_time = TimeDistributed(Dense(999, activation='softmax'))(decoder_drop)
decoder_output = tf.math.reduce_mean(decoder_time, axis=1)
This should let you fit the model.
your last layer (output) should be of this shape
batchsize x 999 x 200) #999 words, 200 is dim of each word
Currently the output of your model is
batchsize x 200 x 999
which is incorrect.
use sparse categorical cross entropy as loss function.
then it will work.

How to add more layers to existing model (eg. teachable machine application model)?

I'm trying to use the google model from teachable machine application https://teachablemachine.withgoogle.com/ by adding few more layers before output layers.
When I retrain the model, always return this error:
ValueError: Input 0 of layer dense_25 is incompatible with the layer: expected axis -1 of input shape to have value 5 but received input with shape [20, 512]
Here's my approach:
When retrain the model it return error:
If I retrain the model without adding new layers, it's working fine.
Can anybody advise what was the issue?
UPDATED ANSWER
if you want to add layers in between two layers for a pre-trained model, it is not as straightforward as adding layers using add method. if done so will result in un-expected behavior
analysis of error:
if you compile the model like below(like you specified):
model.layers[-1].add(Dense(512, activation ="relu"))
model.add(Dense(128, activation="relu"))
model.add(Dense(32))
model.add(Dense(5))
output of model summary :
Model: "sequential_12"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
sequential_9 (Sequential) (None, 1280) 410208
_________________________________________________________________
sequential_11 (Sequential) (None, 512) 131672
_________________________________________________________________
dense_12 (Dense) (None, 128) 768
_________________________________________________________________
dense_13 (Dense) (None, 32) 4128
_________________________________________________________________
dense_14 (Dense) (None, 5) 165
=================================================================
Total params: 546,941
Trainable params: 532,861
Non-trainable params: 14,080
_________________________________________________________________
everything looks good here, but on closer look :
for l in model.layers:
print("layer : ", l.name, ", expects input of shape : ",l.input_shape)
output :
layer : sequential_9 , expects input of shape : (None, 224, 224, 3)
layer : sequential_11 , expects input of shape : (None, 1280)
layer : dense_12 , expects input of shape : (None, 5) <-- **PROBLEM**
layer : dense_13 , expects input of shape : (None, 128)
layer : dense_14 , expects input of shape : (None, 32)
PROBLEM here is that dense_12 expects an input of shape(None, 5) but it should expect input shape of (None, 512) since we have added Dense(512) to sequential_11, possible reason would be adding layers like above specified might not update few attributes such as output shape of sequential_11, so during forward pass there is as miss-match between output of sequential_11 and input of layer dense_12(in your case dense_25)
possible work around would be :
for your question "adding layers in between sequential_9 and sequential_11", you can add as many layers as you want in between sequential_9 and sequential_11, but always make sure that output shape of last added layer should match input shape expected by sequential_11. in this case it is 1280.
code :
sequential_1 = model.layers[0] # re-using pre-trained model
sequential_2 = model.layers[1]
from tensorflow.keras.layers import Input
from tensorflow.keras.layers import Dense
from tensorflow.keras.models import Model
inp_sequential_1 = Input(sequential_1.layers[0].input_shape[1:])
out_sequential_1 = sequential_1(inp_sequential_1)
#adding layers in between sequential_9 and sequential_11
out_intermediate = Dense(512, activation="relu")(out_sequential_1)
out_intermediate = Dense(128, activation ="relu")(out_intermediate)
out_intermediate = Dense(32, activation ="relu")(out_intermediate)
# always make sure to include a layer with output shape matching input shape of sequential 11, in this case 1280
out_intermediate = Dense(1280, activation ="relu")(out_intermediate)
output = sequential_2(out_intermediate) # output of intermediate layers are given to sequential_11
final_model = Model(inputs=inp_sequential_1, outputs=output)
output of model summary:
Model: "functional_3"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_5 (InputLayer) [(None, 224, 224, 3)] 0
_________________________________________________________________
sequential_9 (Sequential) (None, 1280) 410208
_________________________________________________________________
dense_15 (Dense) (None, 512) 655872
_________________________________________________________________
dense_16 (Dense) (None, 128) 65664
_________________________________________________________________
dense_17 (Dense) (None, 32) 4128
_________________________________________________________________
dense_18 (Dense) (None, 1280) 42240
_________________________________________________________________
sequential_11 (Sequential) (None, 5) 128600
=================================================================
Total params: 1,306,712
Trainable params: 1,292,632
Non-trainable params: 14,080

how to do custom keras layer matrix multiplication

Layers:
Input shape (None,75)
Hidden layer 1 - shape is (75,3)
Hidden layer 2 - shape is (3,1)
For the last layer, the output must be calculated as ( (H21*w1)*(H22*w2)*(H23*w3)), where H21,H22,H23 will be the outcome of Hidden layer 2, and w1,w2,w3 will be constant weight which are not trainable. So how to write a lambda function for the above outcome
def product(X):
return X[0]*X[1]
keras_model = Sequential()
keras_model.add(Dense(75,
input_dim=75,activation='tanh',name="layer1" ))
keras_model.add(Dense(3 ,activation='tanh',name="layer2" ))
keras_model.add(Dense(1,name="layer3"))
cross1=keras_model.add(Lambda(lambda x:product,output_shape=(1,1)))([layer2,layer3])
print(cross1)
NameError: name 'layer2' is not defined
Use the functional API model
inputs = Input((75,)) #shape (batch, 75)
output1 = Dense(75, activation='tanh',name="layer1" )(inputs) #shape (batch, 75)
output2 = Dense(3 ,activation='tanh',name="layer2" )(output1) #shape (batch, 3)
output3 = Dense(1,name="layer3")(output2) #shape (batch, 1)
cross1 = Lambda(lambda x: x[0] * x[1])([output2, output3]) #shape (batch, 3)
model = Model(inputs, cross1)
Please notice that the shapes are totally different from what you expect.
I will suggest you to do it via a customized layer instead of the Lambda layer. Why? A customized will give you more freedom to do stuffs, and it is also more transparent in terms of viewing your desired weights. More precisely, if you do it through Lambda layer, the constant weight will not be saved as a part of the model, but it will if you use a customized layer.
Here is an example
from keras import backend as K
from keras.layers import *
from keras.models import *
import numpy as np
class MyLayer(Layer) :
# see https://keras.io/layers/writing-your-own-keras-layers/
def __init__(self,
w_vec=None,
allow_training=False,
**kwargs) :
self._w_vec = w_vec
assert allow_training or (w_vec is not None), \
"ERROR: non-trainable w_vec must be initialized"
self.allow_training = allow_training
super().__init__(**kwargs)
return
def build(self, input_shape) :
batch_size, num_feats = input_shape
self.w_vec = self.add_weight(shape=(1, num_feats),
name='w_vec',
initializer='uniform', # <- use your own preferred initializer
trainable=self.allow_training,)
if self._w_vec is not None :
# predefined w_vec
assert self._w_vec.shape[1] == num_feats, \
"ERROR: initial w_vec shape mismatches the input shape"
# set it to the weight
self.set_weights([self._w_vec]) # <- set weights to the supplied one
super().build(input_shape)
return
def call(self, x) :
# Given:
# x = [H21, H22, H23]
# w_vec = [w1, w2, w3]
# Step 1: output elem_prod
# elem_prod = [H21*w1, H22*w2, H23*w3]
elem_prod = x * self.w_vec
# Step 2: output ret
# ret = (H21*w1) * (H22*w2) * (H23*w3)
ret = K.prod(elem_prod, axis=-1, keepdims=True)
return ret
def compute_output_shape(self, input_shape) :
return (input_shape[0], 1)
def make_test_cases(w_vec=None, allow_training=False):
x = Input(shape=(75,))
y = Dense(75, activation='tanh', name='fc1')(x)
y = Dense(3, activation='tanh', name='fc2')(y)
y = MyLayer(w_vec, allow_training, name='core')(y)
y = Dense(1, name='fc3')(y)
net = Model(inputs=x, outputs=y, name='{}-{}'.format( 'randomInit' if w_vec is None else 'assignInit',
'trainable' if allow_training else 'nontrainable'))
print(net.name)
print(net.layers[-2].get_weights()[0])
print(net.summary())
return net
And you may run the following test cases to see the differences (pay attention to the first and the last lines in the print out, which gives you the initial values and the number of constant parameters, respectively)
a. Constant weights, non-trainable
m1 = make_test_cases(w_vec=np.arange(3).reshape([1,3]), allow_training=False)
will give you
assignInit-nontrainable [[0. 1. 2.]]
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_4 (InputLayer) (None, 75) 0
_________________________________________________________________
fc1 (Dense) (None, 75) 5700
_________________________________________________________________
fc2 (Dense) (None, 3) 228
_________________________________________________________________
core (MyLayer) (None, 1) 3
_________________________________________________________________
fc3 (Dense) (None, 1) 2
=================================================================
Total params: 5,933
Trainable params: 5,930
Non-trainable params: 3
_________________________________________________________________
b. Constant weights, trainable
m2 = make_test_cases(w_vec=np.arange(3).reshape([1,3]), allow_training=True)
will give you
assignInit-trainable [[0. 1. 2.]]
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_5 (InputLayer) (None, 75) 0
_________________________________________________________________
fc1 (Dense) (None, 75) 5700
_________________________________________________________________
fc2 (Dense) (None, 3) 228
_________________________________________________________________
core (MyLayer) (None, 1) 3
_________________________________________________________________
fc3 (Dense) (None, 1) 2
=================================================================
Total params: 5,933
Trainable params: 5,933
Non-trainable params: 0
_________________________________________________________________
c. Random weights, trainable
m3 = make_test_cases(w_vec=None, allow_training=True)
will give you
randomInit-trainable [[ 0.02650297 -0.02010062 -0.03771694]]
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_6 (InputLayer) (None, 75) 0
_________________________________________________________________
fc1 (Dense) (None, 75) 5700
_________________________________________________________________
fc2 (Dense) (None, 3) 228
_________________________________________________________________
core (MyLayer) (None, 1) 3
_________________________________________________________________
fc3 (Dense) (None, 1) 2
=================================================================
Total params: 5,933
Trainable params: 5,933
Non-trainable params: 0
_________________________________________________________________
Final remark
I will say it is unclear which case may work better in advance for your problem, but trying all three sounds like a good plan.

What is the effect of using TimeDistributed layer wrapper?

Consider the following two models:
from tensorflow.python.keras.layers import Input, GRU, Dense, TimeDistributed
from tensorflow.python.keras.models import Model
inputs = Input(batch_shape=(None, None, 100))
gru_out = GRU(32, return_sequences=True)(inputs)
dense = Dense(200, activation='softmax')
decoder_pred = TimeDistributed(dense)(gru_out)
model = Model(inputs=inputs, outputs=decoder_pred)
model.summary()
with the output:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, None, 100) 0
_________________________________________________________________
gru (GRU) (None, None, 32) 12768
_________________________________________________________________
time_distributed (TimeDistri (None, None, 200) 6600
=================================================================
Total params: 19,368
Trainable params: 19,368
Non-trainable params: 0
_________________________________________________________________
And the second model:
from tensorflow.python.keras.layers import Input, GRU, Dense
from tensorflow.python.keras.models import Model
inputs = Input(batch_shape=(None, None, 100))
gru_out = GRU(32, return_sequences=True)(inputs)
decoder_pred = Dense(200, activation='softmax')(gru_out)
model = Model(inputs=inputs, outputs=decoder_pred)
model.summary()
with the output:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_2 (InputLayer) (None, None, 100) 0
_________________________________________________________________
gru_1 (GRU) (None, None, 32) 12768
_________________________________________________________________
dense_1 (Dense) (None, None, 200) 6600
=================================================================
Total params: 19,368
Trainable params: 19,368
Non-trainable params: 0
_________________________________________________________________
My question is, is the TimeDistributed layer wrapper doing anything to the first model? Are these two different in any aspect (considering that their total number of params are identical)?

Fine tuning model delete previous added layers

I use Keras 2.2.4. I train a model that I want to fine-tune every 30 epochs with new data content (image classification).
Everyday I add more image to classes to feed the model. Every 30 epochs the model is re-trained.
I use 2 conditions, first one if no previous model already trained and second condition when a model is already trained then I want to fine-tune it with new content/classes.
model_base = keras.applications.vgg19.VGG19(include_top=False, input_shape=(*IMG_SIZE, 3), weights='imagenet')
output = GlobalAveragePooling2D()(model_base.output)
# If we resume a pretrained model load it
if os.path.isfile(os.path.join(MODEL_PATH, 'weights.h5')):
print('Using existing weights...')
base_lr = 0.0001
model = load_model(os.path.join(MODEL_PATH, 'weights.h5'))
output = Dense(len(all_character_names), activation='softmax', name='d2')(output)
model = Model(model_base.input, output)
for layer in model_base.layers[:-2]:
layer.trainable = False
else:
base_lr = 0.001
output = BatchNormalization()(output)
output = Dropout(0.5)(output)
output = Dense(2048, activation='relu', name='d1')(output)
output = BatchNormalization()(output)
output = Dropout(0.5)(output)
output = Dense(len(all_character_names), activation='softmax', name='d2')(output)
model = Model(model_base.input, output)
for layer in model_base.layers[:-5]:
layer.trainable = False
opt = optimizers.Adam(lr=base_lr, decay=base_lr / epochs)
model.compile(optimizer=opt,
loss='categorical_crossentropy',
metrics=['accuracy'])
Model summary first time:
...
_________________________________________________________________
block5_conv4 (Conv2D) (None, 14, 14, 512) 2359808
_________________________________________________________________
block5_pool (MaxPooling2D) (None, 7, 7, 512) 0
_________________________________________________________________
global_average_pooling2d_1 ( (None, 512) 0
_________________________________________________________________
batch_normalization_1 (Batch (None, 512) 2048
_________________________________________________________________
dropout_1 (Dropout) (None, 512) 0
_________________________________________________________________
d1 (Dense) (None, 2048) 1050624
_________________________________________________________________
batch_normalization_2 (Batch (None, 2048) 8192
_________________________________________________________________
dropout_2 (Dropout) (None, 2048) 0
_________________________________________________________________
d2 (Dense) (None, 19) 38931
=================================================================
Total params: 21,124,179
Trainable params: 10,533,907
Non-trainable params: 10,590,272
Model summary second time:
...
_________________________________________________________________
block5_conv4 (Conv2D) (None, 14, 14, 512) 2359808
_________________________________________________________________
block5_pool (MaxPooling2D) (None, 7, 7, 512) 0
_________________________________________________________________
global_average_pooling2d_1 ( (None, 512) 0
_________________________________________________________________
d2 (Dense) (None, 19) 9747
=================================================================
Total params: 20,034,131
Trainable params: 2,369,555
Non-trainable params: 17,664,576
Problem: When a model exist and is loaded for fine-tune it seems to have loose all additionals layers added the first time (Dense 2048, Dropout, etc)
Do I need to add these layers again ? It seems to have no sense as it would loose the training information made at the first pass.
Note: I may need to not set the base_lr as saving a model should save also the learning rate at the state where it stopped before, but I will check this later.
Please note that once you load the model:
model = load_model(os.path.join(MODEL_PATH, 'weights.h5'))
You don't use it. You just overwrite it again
model = Model(model_base.input, output)
Where output is also defined as an operation on the base_model.
It seems to me that you just want to delete the lines after load_model.

Resources