How to get correct shape of predict model? - machine-learning

The Model is used to predict future's height, weight with lstm.
My Dataset Example
Train X :
<table border="1">
<th>Height(cm)</th>
<th>Weight(kg)</th>
<tr>
<td>180</td>
<td>88</td>
</tr>
<tr>
<td>181</td>
<td>77</td>
</tr>
<tr>
<td>182</td>
<td>80</td>
</tr>
<tr>
<td>183</td>
<td>79</td>
</tr>
</table>
Train Y :
<table border="1">
<th>Height(cm)</th>
<th>Weight(kg)</th>
<tr>
<td>182</td>
<td>86</td>
</tr>
</table>
This is just example
I changed the dataset to 3d with these code
xtrain= np.reshape(xtrain,(xtrain.shape[0],xtrain.shape[1],2))
The result : xtrain.shape = (82, 4, 2)
ytrain.shape = (82, 1, 2)
Then, this is model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm (LSTM) (None, 4, 50) 10600
_________________________________________________________________
lstm_1 (LSTM) (None, 4, 50) 20200
_________________________________________________________________
dropout (Dropout) (None, 4, 50) 0
_________________________________________________________________
dense (Dense) (None, 4, 2) 102
=================================================================
Total params: 30,902
Trainable params: 30,902
Non-trainable params: 0
_________________________________________________________________
None
I want the model output's to be (None, 1, 2)
How It should be changed??
model=Sequential()
model.add(LSTM(units=50,return_sequences=True,kernel_initializer='glorot_uniform',input_shape=(xtrain.shape[1],2)))
model.add(LSTM(units=50,kernel_initializer='glorot_uniform',return_sequences=True))
model.add(Dropout(0.2))
model.add(Dense(units=2))
model.compile(optimizer='adam',loss='mean_squared_error')
model.fit(xtrain,ytrain,batch_size=4,epochs=1)

Did you experimente with flatten layer? from the documentation:
Note: If inputs are shaped (batch,) without a feature axis, then flattening adds an extra channel dimension and output shape is (batch, 1).

Related

Dense layer does not give expected Output shape

I am trying to copy a model architecture. In the original model architecture, after applying the last Dense layer Output Shape is (None, 3) with 300 params. As shown
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_Dense1 (Dense) (None, 100) 128100
dense_Dense2 (Dense) (None, 3) 300
But when I apply the Dense output shape I am getting is (None, 3) with 303 params. as shown below
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_35 (Dense) (None, 100) 128100
dense_36 (Dense) (None, 3) 303
This is the code I wrote for this part:
x = GlobalAveragePooling2D()(x)
x = Dense(100, activation="relu")(x)
prediction = Dense(3, activation='softmax')(x)
Is it possible that the architecture you're trying to copy doesn't use bias? Try not using bias:
Dense(3, activation='softmax', use_bias=False)
Model: "sequential_5"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
global_average_pooling2d_3 ( (None, 8) 0
_________________________________________________________________
dense_9 (Dense) (None, 100) 900
_________________________________________________________________
dense_10 (Dense) (None, 3) 300
=================================================================
Total params: 1,200
Trainable params: 1,200
Non-trainable params: 0
_________________________________________________________________

Keras autoencoder model for detect anomaly in text

I am trying to create an autoencoder that is capable of finding anomalies in text sequences:
X_train_pada_seq.shape
(28840, 999)
I want to use a layer Embedding. Here is my model:
encoder_inputs = Input(shape=(max_len_str, ))
encoder_emb = Embedding(input_dim=len(word_index)+1, output_dim=20, input_length=laenge_pads)(encoder_inputs)
encoder_LSTM_1 = Bidirectional(LSTM(400, activation='relu', return_sequences=True))(encoder_emb)
encoder_drop = Dropout(0.2)(encoder_LSTM_1)
encoder_LSTM_2 = Bidirectional(GRU(200, activation='relu', return_sequences=False, name = 'bottleneck'))(encoder_drop)
decoder_repeated = RepeatVector(200)(encoder_LSTM_2)
decoder_LSTM = Bidirectional(LSTM(400, activation='relu', return_sequences=True))(decoder_repeated)
decoder_drop = Dropout(0.2)(decoder_LSTM)
decoder_output = TimeDistributed(Dense(999, activation='softmax'))(decoder_drop)
autoencoder = Model(encoder_inputs, decoder_output)
autoencoder.compile(optimizer=keras.optimizers.Adam(learning_rate=0.0001), loss='categorical_crossentropy', metrics=['accuracy'])
autoencoder.summary()
Model: "model_7"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_10 (InputLayer) [(None, 999)] 0
_________________________________________________________________
embedding_19 (Embedding) (None, 999, 20) 159660
_________________________________________________________________
bidirectional (Bidirectional (None, 999, 800) 1347200
_________________________________________________________________
dropout (Dropout) (None, 999, 800) 0
_________________________________________________________________
bidirectional_1 (Bidirection (None, 400) 1202400
_________________________________________________________________
repeat_vector (RepeatVector) (None, 200, 400) 0
_________________________________________________________________
bidirectional_2 (Bidirection (None, 200, 800) 2563200
_________________________________________________________________
dropout_1 (Dropout) (None, 200, 800) 0
_________________________________________________________________
time_distributed_6 (TimeDist (None, 200, 999) 800199
=================================================================
Total params: 6,072,659
Trainable params: 6,072,659
Non-trainable params: 0
But when training the model:
history = autoencoder.fit(X_train_pada_seq, X_train_pada_seq, epochs=10, batch_size=64,
validation_data=(X_test_pada_seq, X_test_pada_seq))
I get an error:
ValueError: Shapes (None, 999) and (None, 200, 999) are incompatible
How to remake the model to fix the error?
I've seen your code snippet and it seems that your model output need to match your target shape which is (None, 999), but your output shape is (None, 200, 999).
You need to make your output model shape match the target shape.
Try using tf.reduce_mean with axis=1 (averages all the sequence):
decoder_drop = Dropout(0.2)(decoder_LSTM)
decoder_time = TimeDistributed(Dense(999, activation='softmax'))(decoder_drop)
decoder_output = tf.math.reduce_mean(decoder_time, axis=1)
This should let you fit the model.
your last layer (output) should be of this shape
batchsize x 999 x 200) #999 words, 200 is dim of each word
Currently the output of your model is
batchsize x 200 x 999
which is incorrect.
use sparse categorical cross entropy as loss function.
then it will work.

How to add more layers to existing model (eg. teachable machine application model)?

I'm trying to use the google model from teachable machine application https://teachablemachine.withgoogle.com/ by adding few more layers before output layers.
When I retrain the model, always return this error:
ValueError: Input 0 of layer dense_25 is incompatible with the layer: expected axis -1 of input shape to have value 5 but received input with shape [20, 512]
Here's my approach:
When retrain the model it return error:
If I retrain the model without adding new layers, it's working fine.
Can anybody advise what was the issue?
UPDATED ANSWER
if you want to add layers in between two layers for a pre-trained model, it is not as straightforward as adding layers using add method. if done so will result in un-expected behavior
analysis of error:
if you compile the model like below(like you specified):
model.layers[-1].add(Dense(512, activation ="relu"))
model.add(Dense(128, activation="relu"))
model.add(Dense(32))
model.add(Dense(5))
output of model summary :
Model: "sequential_12"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
sequential_9 (Sequential) (None, 1280) 410208
_________________________________________________________________
sequential_11 (Sequential) (None, 512) 131672
_________________________________________________________________
dense_12 (Dense) (None, 128) 768
_________________________________________________________________
dense_13 (Dense) (None, 32) 4128
_________________________________________________________________
dense_14 (Dense) (None, 5) 165
=================================================================
Total params: 546,941
Trainable params: 532,861
Non-trainable params: 14,080
_________________________________________________________________
everything looks good here, but on closer look :
for l in model.layers:
print("layer : ", l.name, ", expects input of shape : ",l.input_shape)
output :
layer : sequential_9 , expects input of shape : (None, 224, 224, 3)
layer : sequential_11 , expects input of shape : (None, 1280)
layer : dense_12 , expects input of shape : (None, 5) <-- **PROBLEM**
layer : dense_13 , expects input of shape : (None, 128)
layer : dense_14 , expects input of shape : (None, 32)
PROBLEM here is that dense_12 expects an input of shape(None, 5) but it should expect input shape of (None, 512) since we have added Dense(512) to sequential_11, possible reason would be adding layers like above specified might not update few attributes such as output shape of sequential_11, so during forward pass there is as miss-match between output of sequential_11 and input of layer dense_12(in your case dense_25)
possible work around would be :
for your question "adding layers in between sequential_9 and sequential_11", you can add as many layers as you want in between sequential_9 and sequential_11, but always make sure that output shape of last added layer should match input shape expected by sequential_11. in this case it is 1280.
code :
sequential_1 = model.layers[0] # re-using pre-trained model
sequential_2 = model.layers[1]
from tensorflow.keras.layers import Input
from tensorflow.keras.layers import Dense
from tensorflow.keras.models import Model
inp_sequential_1 = Input(sequential_1.layers[0].input_shape[1:])
out_sequential_1 = sequential_1(inp_sequential_1)
#adding layers in between sequential_9 and sequential_11
out_intermediate = Dense(512, activation="relu")(out_sequential_1)
out_intermediate = Dense(128, activation ="relu")(out_intermediate)
out_intermediate = Dense(32, activation ="relu")(out_intermediate)
# always make sure to include a layer with output shape matching input shape of sequential 11, in this case 1280
out_intermediate = Dense(1280, activation ="relu")(out_intermediate)
output = sequential_2(out_intermediate) # output of intermediate layers are given to sequential_11
final_model = Model(inputs=inp_sequential_1, outputs=output)
output of model summary:
Model: "functional_3"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_5 (InputLayer) [(None, 224, 224, 3)] 0
_________________________________________________________________
sequential_9 (Sequential) (None, 1280) 410208
_________________________________________________________________
dense_15 (Dense) (None, 512) 655872
_________________________________________________________________
dense_16 (Dense) (None, 128) 65664
_________________________________________________________________
dense_17 (Dense) (None, 32) 4128
_________________________________________________________________
dense_18 (Dense) (None, 1280) 42240
_________________________________________________________________
sequential_11 (Sequential) (None, 5) 128600
=================================================================
Total params: 1,306,712
Trainable params: 1,292,632
Non-trainable params: 14,080

Why is my CNN overfitting and how can I fix it?

I am finetuning a 3D-CNN called C3D which was originally trained to classify sports from video clips.
I am freezing the convolution (feature extraction) layers and training the fully connected layers using gifs from GIPHY to classify the gifs for sentiment analysis (positive or negative).
Weights are pre loaded for all layers except the final fully connected layer.
I am using 5000 images (2500 positive, 2500 negative) for training with a 70/30 training/testing split using Keras. I am using the Adam optimizer with a learning rate of 0.0001.
The training accuracy increases and the training loss decreases during training but very early on the validation accuracy and loss does not improve as the model starts to overfit.
I believe I have enough training data and am using a dropout of 0.5 on both of the fully connected layers so how can I combat this overfitting?
The model architechture, training code and visualisations of training performance from Keras can be found below.
train_c3d.py
from training.c3d_model import create_c3d_sentiment_model
from ImageSentiment import load_gif_data
import numpy as np
import pathlib
from keras.callbacks import ModelCheckpoint
from keras.optimizers import Adam
def image_generator(files, batch_size):
"""
Generate batches of images for training instead of loading all images into memory
:param files:
:param batch_size:
:return:
"""
while True:
# Select files (paths/indices) for the batch
batch_paths = np.random.choice(a=files,
size=batch_size)
batch_input = []
batch_output = []
# Read in each input, perform preprocessing and get labels
for input_path in batch_paths:
input = load_gif_data(input_path)
if "pos" in input_path: # if file name contains pos
output = np.array([1, 0]) # label
elif "neg" in input_path: # if file name contains neg
output = np.array([0, 1]) # label
batch_input += [input]
batch_output += [output]
# Return a tuple of (input,output) to feed the network
batch_x = np.array(batch_input)
batch_y = np.array(batch_output)
yield (batch_x, batch_y)
model = create_c3d_sentiment_model()
print(model.summary())
model.load_weights('models/C3D_Sport1M_weights_keras_2.2.4.h5', by_name=True)
for layer in model.layers[:14]: # freeze top layers as feature extractor
layer.trainable = False
for layer in model.layers[14:]: # fine tune final layers
layer.trainable = True
train_files = [str(filepath.absolute()) for filepath in pathlib.Path('data/sample_train').glob('**/*')]
val_files = [str(filepath.absolute()) for filepath in pathlib.Path('data/sample_validation').glob('**/*')]
batch_size = 8
train_generator = image_generator(train_files, batch_size)
validation_generator = image_generator(val_files, batch_size)
model.compile(optimizer=Adam(lr=0.0001),
loss='binary_crossentropy',
metrics=['accuracy'])
mc = ModelCheckpoint('best_model.h5', monitor='val_loss', mode='min', verbose=1)
history = model.fit_generator(train_generator, validation_data=validation_generator,
steps_per_epoch=int(np.ceil(len(train_files) / batch_size)),
validation_steps=int(np.ceil(len(val_files) / batch_size)), epochs=5, shuffle=True,
callbacks=[mc])
load_gif_data()
def load_gif_data(file_path):
"""
Load and process gif for input into Keras model
:param file_path:
:return: Mean normalised image in BGR format as numpy array
for more info see -> http://cs231n.github.io/neural-networks-2/
"""
im = Img(fp=file_path)
try:
im.load(limit=16, # Keras image model only requires 16 frames
first=True)
except:
print("Error loading image: " + file_path)
return
im.resize(size=(112, 112))
im.convert('RGB')
im.close()
np_frames = []
frame_index = 0
for i in range(16): # if image is less than 16 frames, repeat the frames until there are 16
frame = im.frames[frame_index]
rgb = np.array(frame)
bgr = rgb[..., ::-1]
mean = np.mean(bgr, axis=0)
np_frames.append(bgr - mean) # C3D model was originally trained on BGR, mean normalised images
# it is important that unseen images are in the same format
if frame_index == (len(im.frames) - 1):
frame_index = 0
else:
frame_index = frame_index + 1
return np.array(np_frames)
model architecture
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv1 (Conv3D) (None, 16, 112, 112, 64) 5248
_________________________________________________________________
pool1 (MaxPooling3D) (None, 16, 56, 56, 64) 0
_________________________________________________________________
conv2 (Conv3D) (None, 16, 56, 56, 128) 221312
_________________________________________________________________
pool2 (MaxPooling3D) (None, 8, 28, 28, 128) 0
_________________________________________________________________
conv3a (Conv3D) (None, 8, 28, 28, 256) 884992
_________________________________________________________________
conv3b (Conv3D) (None, 8, 28, 28, 256) 1769728
_________________________________________________________________
pool3 (MaxPooling3D) (None, 4, 14, 14, 256) 0
_________________________________________________________________
conv4a (Conv3D) (None, 4, 14, 14, 512) 3539456
_________________________________________________________________
conv4b (Conv3D) (None, 4, 14, 14, 512) 7078400
_________________________________________________________________
pool4 (MaxPooling3D) (None, 2, 7, 7, 512) 0
_________________________________________________________________
conv5a (Conv3D) (None, 2, 7, 7, 512) 7078400
_________________________________________________________________
conv5b (Conv3D) (None, 2, 7, 7, 512) 7078400
_________________________________________________________________
zeropad5 (ZeroPadding3D) (None, 2, 8, 8, 512) 0
_________________________________________________________________
pool5 (MaxPooling3D) (None, 1, 4, 4, 512) 0
_________________________________________________________________
flatten_1 (Flatten) (None, 8192) 0
_________________________________________________________________
fc6 (Dense) (None, 4096) 33558528
_________________________________________________________________
dropout_1 (Dropout) (None, 4096) 0
_________________________________________________________________
fc7 (Dense) (None, 4096) 16781312
_________________________________________________________________
dropout_2 (Dropout) (None, 4096) 0
_________________________________________________________________
nfc8 (Dense) (None, 2) 8194
=================================================================
Total params: 78,003,970
Trainable params: 78,003,970
Non-trainable params: 0
_________________________________________________________________
None
training visualisations
I think that the error is in the loss function and in the last Dense layer. As provided in the model summary, the last Dense layer is,
nfc8 (Dense) (None, 2)
The output shape is ( None , 2 ) meaning that the layer has 2 units. As you said earlier, you need to classify GIFs as positive or negative.
Classifying GIFs could be a binary classification problem or a multiclass classification problem ( with two classes ).
Binary classification has only 1 unit in the last Dense layer with a sigmoid activation function. But, here the model has 2 units in the last Dense layer.
Hence, the model is a multiclass classifier, but you have given a loss function of binary_crossentropy which is meant for binary classifiers ( with a single unit in the last layer ).
So, replacing the loss with categorical_crossentropy should work. Or edit the last Dense layer and change the number of units and activation function.
Hope this helps.

Fine tuning model delete previous added layers

I use Keras 2.2.4. I train a model that I want to fine-tune every 30 epochs with new data content (image classification).
Everyday I add more image to classes to feed the model. Every 30 epochs the model is re-trained.
I use 2 conditions, first one if no previous model already trained and second condition when a model is already trained then I want to fine-tune it with new content/classes.
model_base = keras.applications.vgg19.VGG19(include_top=False, input_shape=(*IMG_SIZE, 3), weights='imagenet')
output = GlobalAveragePooling2D()(model_base.output)
# If we resume a pretrained model load it
if os.path.isfile(os.path.join(MODEL_PATH, 'weights.h5')):
print('Using existing weights...')
base_lr = 0.0001
model = load_model(os.path.join(MODEL_PATH, 'weights.h5'))
output = Dense(len(all_character_names), activation='softmax', name='d2')(output)
model = Model(model_base.input, output)
for layer in model_base.layers[:-2]:
layer.trainable = False
else:
base_lr = 0.001
output = BatchNormalization()(output)
output = Dropout(0.5)(output)
output = Dense(2048, activation='relu', name='d1')(output)
output = BatchNormalization()(output)
output = Dropout(0.5)(output)
output = Dense(len(all_character_names), activation='softmax', name='d2')(output)
model = Model(model_base.input, output)
for layer in model_base.layers[:-5]:
layer.trainable = False
opt = optimizers.Adam(lr=base_lr, decay=base_lr / epochs)
model.compile(optimizer=opt,
loss='categorical_crossentropy',
metrics=['accuracy'])
Model summary first time:
...
_________________________________________________________________
block5_conv4 (Conv2D) (None, 14, 14, 512) 2359808
_________________________________________________________________
block5_pool (MaxPooling2D) (None, 7, 7, 512) 0
_________________________________________________________________
global_average_pooling2d_1 ( (None, 512) 0
_________________________________________________________________
batch_normalization_1 (Batch (None, 512) 2048
_________________________________________________________________
dropout_1 (Dropout) (None, 512) 0
_________________________________________________________________
d1 (Dense) (None, 2048) 1050624
_________________________________________________________________
batch_normalization_2 (Batch (None, 2048) 8192
_________________________________________________________________
dropout_2 (Dropout) (None, 2048) 0
_________________________________________________________________
d2 (Dense) (None, 19) 38931
=================================================================
Total params: 21,124,179
Trainable params: 10,533,907
Non-trainable params: 10,590,272
Model summary second time:
...
_________________________________________________________________
block5_conv4 (Conv2D) (None, 14, 14, 512) 2359808
_________________________________________________________________
block5_pool (MaxPooling2D) (None, 7, 7, 512) 0
_________________________________________________________________
global_average_pooling2d_1 ( (None, 512) 0
_________________________________________________________________
d2 (Dense) (None, 19) 9747
=================================================================
Total params: 20,034,131
Trainable params: 2,369,555
Non-trainable params: 17,664,576
Problem: When a model exist and is loaded for fine-tune it seems to have loose all additionals layers added the first time (Dense 2048, Dropout, etc)
Do I need to add these layers again ? It seems to have no sense as it would loose the training information made at the first pass.
Note: I may need to not set the base_lr as saving a model should save also the learning rate at the state where it stopped before, but I will check this later.
Please note that once you load the model:
model = load_model(os.path.join(MODEL_PATH, 'weights.h5'))
You don't use it. You just overwrite it again
model = Model(model_base.input, output)
Where output is also defined as an operation on the base_model.
It seems to me that you just want to delete the lines after load_model.

Resources