I have a model architecture based on a resnet50 that needs to be retrained regularly. It worked for years. It is running on tensorflow version 1.9 and keras 2.3.1. Now I bought a new computer with a RTX 3070 - which means I have to use tensorflow 2.4 or higher in order to make use of the GPU. I installed tensorflow 2.5 together with the relevant Cuda 11.2 and cudnn 8.1, manually copied some files - and the model is indeed running on GPU. However, when I freeze layers of the base model, I get completely different results as compared to when I run it on my old computer. For example: For two warm-up epochs with all layers of the resnet50 frozen, I get more than 50 percent accuracy on my old computer - but only 7.5 on the new one.
I am aware of the problems with BatchNormalization layers and followed the tutorial (or instruction) here:
https://www.tensorflow.org/tutorials/images/transfer_learning
how to solve the issue (as you can see in the code below). I also tried to downgrade tensorflow to 2.4, reinstalled Anaconda and set everything up from scratch, etc. - but nothing works.
To compare the two architectures and in order make sure that no other reason could be responsible for the discrepancy, I copied the entire data on an external hard drive - and only adjusted the imports from keras to tensorflow.keras (together with some other small alterations necessary to use tensorflow.keras, i.e. using fit instead of fit_generator, etc.). Could someone look at the code for the tensorflow.keras model (second code block from top) - and tell me where I go wrong.
Here is the code for the model in keras (which works perfectly):
# =============================================================================
# Build model
# =============================================================================
from keras.applications.resnet50 import ResNet50
from keras.models import Model
from keras.layers import Dense, Flatten, Dropout, Input, \
AveragePooling2D
from keras import initializers
from keras import optimizers
in_shape = (224,224,3) # Shape of input images
n_classes = 26 # Number of classes
dor = 0.3 # Dropout rate
learning_rate = 5e-5
optim = optimizers.Adam(lr=learning_rate)
base_model = ResNet50(include_top=False, weights='imagenet', \
input_shape=in_shape)
inp = Input(shape=in_shape)
x = base_model(inp)
x = AveragePooling2D((7, 7), name='avg_pool')(x)
x = Flatten()(x)
x = Dropout(dor)(x)
x = Dense(2048, \
kernel_initializer=initializers.he_normal(), \
bias_initializer=initializers.ones(), \
activation='relu')(x)
x = Dense(n_classes, kernel_initializer=initializers.he_normal(), \
bias_initializer=initializers.ones(), activation='softmax')(x)
model = Model(inp, x)
model.compile(loss = 'categorical_crossentropy', optimizer=optim, \
metrics=['accuracy'])
model.summary()
# =============================================================================
# Train model
# =============================================================================
# Warm up phase
for layer in model.layers[1].layers:
layer.trainable = False
model.compile(loss = 'categorical_crossentropy', optimizer=optim, metrics=['accuracy'])
model.summary()
history = model.fit_generator(train_generator,
validation_data=val_generator,
epochs=warm_up_epochs,
steps_per_epoch=train_spe,
validation_steps=val_spe,
verbose=1)
The output is:
And here is the code for the tensorflow.keras model (which does NOT work):
# =============================================================================
# Build model
# =============================================================================
from tensorflow.keras.applications.resnet50 import ResNet50
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, Flatten, Dropout, Input, \
AveragePooling2D
from tensorflow.keras import initializers
from tensorflow.keras import optimizers
in_shape = (224,224,3) # Shape of input images
n_classes = 26 # Number of classes
dor = 0.3 # Dropout rate
learning_rate = 5e-5
optim = optimizers.Adam(learning_rate=learning_rate)
# resnet50, pretrained on Imagenet
base_model = ResNet50(include_top=False, weights='imagenet', \
input_shape=in_shape)
inp = Input(shape=in_shape)
x = base_model(inp, training=False)
x = AveragePooling2D((7, 7), name='avg_pool')(x)
x = Flatten()(x)
x = Dropout(dor)(x)
x = Dense(2048, \
kernel_initializer=initializers.he_normal(), \
bias_initializer=initializers.ones(), \
activation='relu')(x)
x = Dense(n_classes, kernel_initializer=initializers.he_normal(), \
bias_initializer=initializers.ones(), activation='softmax')(x)
model = Model(inp, x)
model.compile(loss = 'categorical_crossentropy', optimizer=optim, \
metrics=['accuracy'])
model.summary()
# =============================================================================
# Train model
# =============================================================================
# Warm up phase
for layer in model.layers[1].layers:
layer.trainable = False
model.compile(loss = 'categorical_crossentropy', optimizer=optim, \
metrics=('accuracy'))
model.summary()
history = model.fit(train_generator,
validation_data=val_generator,
epochs=warm_up_epochs,
steps_per_epoch=train_spe,
validation_steps=val_spe,
verbose=1)
The output is:
the striking thing is: When I do NOT freeze layer, the performance of the tensorflow.keras model is comparable to the one of the keras model. As I said, I do not know where I gor wrong here. Any help is appreciated,
Thank you very much for your answers!
Okay, I have actually (after a few days of despair and getting started in PyTorch due to the frustration) just now found a hack to solve this behavior. The problem is that the official tensorflow post about finetuning models:
https://www.tensorflow.org/tutorials/images/transfer_learning
is actually wrong - at least for tensorflow version 2.5 (but I would assume that it is probably from 2.4 upwards).
In the link above it is explained in quite some detail why you, when you build the model, have to set the parameter "training" to False, i.e. in the above code in the line:
x = base_model(inp, training=False)
If you do NOT do that - and then only set the non batch_normalization layers to False when freezing layers, it all works. I.e. what I did was:
# Warm up phase
for layer in model.layers[1].layers:
if '_bn' not in layer.name:
layer.trainable = False
after checking the names of the layers in resnet50 - where each BatchNormalization layer name ends with '_bn'.
If you exclude the BatchNormalization layers from freezing (like above) AND leave the training parameter set to False, it still does not work.
This is of course only a quick and dirty hack - and maybe there is a better solution. If you know of one, please let me know. But for the moment at least this works, also with freezing only some layers, etc.
Related
So I made a CNN that classifies two types of birds, and it worked fine. After that, I tried adding one more type, but I got weird results. I already posted this on ai stack exchange, but they said its better to ask it in here, so I am providing a link to that post.
https://ai.stackexchange.com/q/11444/23452
Here is the model code:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Activation, Flatten, Conv2D, MaxPooling2D
from tensorflow.keras.callbacks import TensorBoard
import pickle
import time as time
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction = 0.333)
sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))
pickle_in = open("C:/Users/Recep/Desktop/programlar/python/X.pickle","rb")
X = pickle.load(pickle_in)
pickle_in = open("C:/Users/Recep/Desktop/programlar/python/Y.pickle","rb")
Y = pickle.load(pickle_in)
X = X/255.0
node_size = 64
model_name = "agi_vs_golden-{}".format(time.time())
tensorboard = TensorBoard(log_dir='C:/Users/Recep/Desktop/programlar/python/logs/{}'.format(model_name))
file_writer = tf.summary.FileWriter('C:/Users/Recep/Desktop/programlar/python/logs/{}'.format(model_name, sess.graph))
model = Sequential()
model.add(Conv2D(node_size,(3,3),input_shape = X.shape[1:]))
#idk what that shape does except that and validation i have no problem
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(node_size,(3,3)))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Flatten())
model.add(Dense(node_size))
model.add(Activation("relu"))
model.add(Dense(1))
model.add(Activation("sigmoid"))
model.compile(loss="binary_crossentropy",optimizer="adam",metrics=["accuracy"])
model.fit(X,Y,batch_size=25,epochs=8,validation_split=0.1,callbacks=[tensorboard])
# idk what the validation is and how its used but dont think it caused the problem
model.save("agi_vs_gouldian.model")
By the way, as I said in the comments of my original post, I think maybe there is a lack of training the network, or I don't have the enough data. So I tried increasing the number of epochs. It kinda get the problem, but the part that I'm curious about is what happened when I had the lower epochs?
Can anyone help me?
I am giving the tensor board graphs below.
BTW, is my data array rgb?
And how can I get rid of this local max of %70?
And since I'm a beginner to this, I don't know what validation really works, but I saw that the validation graphs stays the same in the first training that I had issues with.
You try to classify three birds with sigmoid. Sigmoid is good for binary classification. Try a softmax activation layer and see how it goes. I suggest replacing
model.add(Dense(1))
model.add(Activation("sigmoid"))
with
model.add(Dense(3, activation='softmax'))
Where 3 is the number of birds' type you want to classify.
Have a look here, a very good tutorial of using softmax as the activation layer for a multi-class classification
https://machinelearningmastery.com/multi-class-classification-tutorial-keras-deep-learning-library/
import numpy as np
from keras import backend as K
from keras.datasets import mnist
from keras.models import Model
from keras.layers import Dense, Input
import matplotlib.pyplot as plt
# download the mnist to the path
# X shape (60,000 28x28), y shape (10,000, )
(x_train, _), (x_test, y_test) = mnist.load_data()
# data pre-processing
x_train = x_train.astype('float32') / 255. - 0.5 # minmax_normalized
x_test = x_test.astype('float32') / 255. - 0.5 # minmax_normalized
x_train = x_train.reshape((x_train.shape[0], -1))
x_test = x_test.reshape((x_test.shape[0], -1))
# in order to plot in a 2D figure
encoding_dim = 2
# this is our input placeholder
input_img = Input(shape=(784,))
# encoder layers
encoder = Dense(2, activation='relu')(input_img)
# decoder layers
decoder = Dense(784, activation='relu')(encoder)`
I want to know how can I get the weights (such as the kernel of Dense_2) of a Dense layer before Model in keras?
If i run:autoencoder = Model(input=input_img,output=decoder), then do autoencoder.get_layer('dense_2').kernel, I can get the kernel. However, I want to set the kernel as one of the output. So, I must get the kernel before Model.
I want to get the kernel because it will be set as one part of the loss function, such as loss2=tf.square(kernel' * kernel, axis=-1). So I must get the kernel before running Model.
How can I do that?
Thanks!
I think you mean you need to have one of your middle layers as one of the outputs.
In your case, you can change your model creation in this way:
autoencoder = Model(input=input_img,output=[encoder,decoder])
you can define even different losses for each of these two outputs!
I am trying to build a model to predict house prices.
I have some features X (no. of bathrooms , etc.) and target Y (ranging around $300,000 to $800,000)
I have used sklearn's Standard Scaler to standardize Y before fitting it to the model.
Here is my Keras model:
def build_model():
model = Sequential()
model.add(Dense(36, input_dim=36, activation='relu'))
model.add(Dense(18, input_dim=36, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='mse', optimizer='sgd', metrics=['mae','mse'])
return model
I am having trouble trying to interpret the results -- what does a MSE of 0.617454319755 mean?
Do I have to inverse transform this number, and square root the results, getting an error rate of 741.55 in dollars?
math.sqrt(sc.inverse_transform([mse]))
I apologise for sounding silly as I am starting out!
I apologise for sounding silly as I am starting out!
Do not; this is a subtle issue of great importance, which is usually (and regrettably) omitted in tutorials and introductory expositions.
Unfortunately, it is not as simple as taking the square root of the inverse-transformed MSE, but it is not that complicated either; essentially what you have to do is:
Transform back your predictions to the initial scale of the original data
Get the MSE between these invert-transformed predictions and the original data
Take the square root of the result
in order to get a performance indicator of your model that will be meaningful in the business context of your problem (e.g. US dollars here).
Let's see a quick example with toy data, omitting the model itself (which is irrelevant here, and in fact can be any regression model - not only a Keras one):
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error
import numpy as np
# toy data
X = np.array([[1,2], [3,4], [5,6], [7,8], [9,10]])
Y = np.array([3, 4, 5, 6, 7])
# feature scaling
sc_X = StandardScaler()
X_train = sc_X.fit_transform(X)
# outcome scaling:
sc_Y = StandardScaler()
Y_train = sc_Y.fit_transform(Y.reshape(-1, 1))
Y_train
# array([[-1.41421356],
# [-0.70710678],
# [ 0. ],
# [ 0.70710678],
# [ 1.41421356]])
Now, let's say that we fit our Keras model (not shown here) using the scaled sets X_train and Y_train, and get predictions on the training set:
prediction = model.predict(X_train) # scaled inputs here
print(prediction)
# [-1.4687586 -0.6596055 0.14954728 0.95870024 1.001172 ]
The MSE reported by Keras is actually the scaled MSE, i.e.:
MSE_scaled = mean_squared_error(Y_train, prediction)
MSE_scaled
# 0.052299712818541934
while the 3 steps I have described above are simply:
MSE = mean_squared_error(Y, sc_Y.inverse_transform(prediction)) # first 2 steps, combined
MSE
# 0.10459946572909758
np.sqrt(MSE) # 3rd step
# 0.323418406602187
So, in our case, if our initial Y were US dollars, the actual error in the same units (dollars) would be 0.32 (dollars).
Notice how the naive approach of inverse-transforming the scaled MSE would give a very different (and incorrect) result:
np.sqrt(sc_Y.inverse_transform([MSE_scaled]))
# array([2.25254588])
MSE is mean square error, here is the formula.
Basically it is a mean of square of different of expected output and prediction. Making square root of this will not give you the difference between error and output. This is useful for training.
Currently you have build a model.
If you want to train the model use these function.
mode.fit(x=input_x_array, y=input_y_array, batch_size=None, epochs=1, verbose=1, callbacks=None, validation_split=0.0, validation_data=None, shuffle=True, class_weight=None, sample_weight=None, initial_epoch=0, steps_per_epoch=None, validation_steps=None)
If you want to do prediction of the output you should use following code.
prediction = model.predict(np.array(input_x_array))
print(prediction)
You can find more details here.
https://keras.io/models/about-keras-models/
https://keras.io/models/sequential/
I can unpack my RNN model onto my website, but I am having trouble getting it to predict a numpy array of predictions using a list as input (contains only one string called text but needs to be a list for preprocessing from what I've gathered) and I am coming across the problem:
ValueError: Error when checking : expected embedding_1_input to have shape (None, 72)
but got array with shape (1, 690)
Here is how I am currently preprocessing and predicting with the model:
tokenizer = Tokenizer(num_words = 5000, split=' ')
tokenizer.fit_on_texts([text])
X = tokenizer.texts_to_sequences([text])
X = pad_sequences(X)
prediction = loadedModel.predict(X)
print(prediction)
And this is how I trained my model:
HIDDEN_LAYER_SIZE = 195 # Details the amount of nodes in a hidden layer.
TOP_WORDS = 5000 # Most-used words in the dataset.
MAX_REVIEW_LENGTH = 500 # Char length of each text being sent in (necessary).
EMBEDDING_VECTOR_LENGTH = 128 # The specific Embedded later will have 128-length vectors to
# represent each word.
BATCH_SIZE = 32 # Takes 64 sentences at a time and continually retrains RNN.
NUMBER_OF_EPOCHS = 10 # Fits RNN to more accurately guess the data's political bias.
DROPOUT = 0.2 # Helps slow down overfitting of data (slower convergence rate)
# Define the model
model = Sequential()
model.add(Embedding(TOP_WORDS, EMBEDDING_VECTOR_LENGTH, \
input_length=X.shape[1]))
model.add(SpatialDropout1D(DROPOUT))
model.add(LSTM(HIDDEN_LAYER_SIZE))
model.add(Dropout(DROPOUT))
model.add(Dense(2, activation='softmax'))
# Compile the model
model.compile(loss='categorical_crossentropy', optimizer='adam', \
metrics=['accuracy'])
#printModelSummary(model)
# Fit the model
model.fit(X_train, Y_train, validation_data=(X_test, Y_test), \
epochs=NUMBER_OF_EPOCHS, batch_size=BATCH_SIZE)
How can I fix my preprocessing code in the codebox starting with "tokenizer" to stop getting the ValueError?
Thank you, and I can definitely provide more code or expand upon the purpose of the project.
So there are two problems here:
Set max_len in pad_sequences: it seems that all of your training sequences were padded to have length 72 so - you need to change the following line:
X = pad_sequences(X, max_len=72)
Use training Tokenizer: this is a subtle problem - you are creating and fitting a totally new Tokenizer so it could be different than one which you used for training. This could cause problems - because different words could have different indexes - and this will make your model to work awful. Try to pickle your training Tokenizer and load it during deployment in order to transform sentences into data points fed to your model properly.
I have a problem when I use keras model (deep learning library) in Jupyter notebook.
This problem occurred after I installed scipy, matplotlib libraries in conda.
Before executing code cell, my computer GPU memory usuage is 241 MB / 8113 MB
And I executed simple keras model code like below
from keras.models import Sequential
from keras.layers import Convolution1D, MaxPooling1D, Activation, Dense, Flatten
# Parameter
n_filters1 = 64 # number of convolutional filters
n_filters2 = 32
n_conv = 4 # convolution filter size
n_pool = 3 # pooling window size
model = Sequential()
model.add(Convolution1D(n_filters1, n_conv, border_mode='valid', activation='relu', input_shape=(40, 1)))
model.add(MaxPooling1D(n_pool, stride=2))
model.add(Convolution1D(n_filters2, n_conv, border_mode='valid', activation='relu'))
model.add(MaxPooling1D(n_pool, stride=2))
model.add(Flatten())
model.add(Dense(100, activation='relu'))
model.add(Dense(2, activation='softmax'))
After executing above code, GPU memory usage change from 241MB -> 7754MB
So, when I run training cell code, Jupyter died
This problem happened again and again... even though I restarted my Jupyter notebook...
Is there any good idea to solve this problem?