Is Validation Data involved in Training using Keras Fit? - machine-learning

I ran the following model.fit() function,
history = model.fit(x=train1,
y=labels,
validation_data=(df.iloc[:,:-1], df.iloc[:,-1]),
batch_size=32,
verbose=1,epochs=100)
Where train1.shape=(2889, 84) and df.iloc[:,:-1].shape= (759371, 119)
When I run this, I get the following ValueError after just Epoch 1/100:
ValueError: Input 0 of layer "model" is incompatible with the layer: expected shape=(None, 84), found shape=(None, 119)
Why should this error matter so early on and halt the program--since we're still in the training state where the model is being presumably trained only on the training set not the test/validation set?

Your model is evaluated using the validation data after every epoch. So, after the first epoch, the validation occurs, but your validation data does not have the proper shape. This is why you are getting this error.
So, try applying the same preprocessing functions to your validation data as you did to your training data so that they have the same shapes and your problem would be fixed.

Related

ValueError: Input 0 of layer "lstm_6" is incompatible with the layer

I am trying to create a hybrid model which is consists of EfficientNetB7 and LSTM.
# pretrained model act as a feature extractor
Effnet=tensorflow.keras.applications.EfficientNetB7( input_shape=(IMG_SIZE,IMG_SIZE,3), include_top=False,weights="imagenet",pooling="avg")
Effnet.trainable = False
x = Flatten()(Effnet.output)
x=(BatchNormalization())(x)
#add two LSTM Layers
x=LSTM(8,input_shape=(IMG_SIZE,IMG_SIZE,3),return_sequences=False)(x)
x=LSTM(8)(x)
x=(BatchNormalization())(x)
#add two fully connected dense layers 1024 as my model
x=Dense(1024)(x)
x=(BatchNormalization())(x)
x=Activation('relu')(x)
x=Dense(1024)(x)
x=(BatchNormalization())(x)
x=Activation('relu')(x)
x = Dense(NUM_CLASSE)(x)
x=(BatchNormalization())(x)
prediction =Activation('softmax')(x)
model = Model(inputs=Effnet.input, outputs=prediction)
model.summary()
But it gives me the following error
and the EfficientNetB7 is average pooling is, I think it is causing the problem, how do I remove it?
ValueError: Input 0 of layer "lstm_6" is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: (None, 2560)
How can i fix it, please? Thank you, Regards!
The problem is on this line:
x=LSTM(8,input_shape=(IMG_SIZE,IMG_SIZE,3),return_sequences=False)(x)
You defined that the LSTM layers expect input of dimension 3. However, that only hold for the very beginning of your network, which flows into EfficientNetB7. When you have the last output from EfficientNet, you flatten it and get a 1D tensor.
The error message is actually pretty straightforward.
expected ndim=3, found ndim=2. Full shape received: (None, 2560)
2560 comes from flattening the features, and the first dimension is the one for batch size.
You must correct the input to your LSTM layer. If you do not specify anything, keras might just figure it out itself.

Input 0 is incompatible with layer lstm_12: expected ndim=3, found ndim=2

I am new to ML and trying to make an RNN LSTM model.
I want to optimize the hyper-parameter using GridSearchCV. What I want to optimize is the number of layers and nodes for each number of layer selection.
Here is the code to generate the model:
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import GridSearchCV
def create_model(layers,activation):
model = Sequential()
for i,node in enumerate(layers):
if i == 0:
model.add(LSTM(units=node, input_shape=(x_train.shape[1],1)))
model.add(Activation(activation))
model.add(Dropout(0.2))
else:
model.add(LSTM(units=node, input_shape=(x_train.shape[1],1)))
model.add(Activation(activation))
model.add(Dropout(0.2))
model.add(Dense(units=1))
model.compile(optimizer='adam',loss='mean_squared_error',metrics=['accuracy'])
return model
and here is the variables
layers=[[40,40],[30,30],[30,30,30],[30,30,30,30],[30,30,30,30,30]]
activations =['sigmoid','relu']
batch_size = [32,50]
epochs = [50]
then I wrap it up using gridsearchcv
param_grid = dict(layers=layers,activation=activations,batch_size=batch_size,epochs=epochs)
grid = GridSearchCV(estimator=model,param_grid=param_grid)
When I do it
grid_result = grid.fit(x_train,y_train,verbose=3)
I got this error
ValueError: Input 0 is incompatible with layer lstm_14: expected ndim=3, found ndim=2
I dont know what happens. My x_train shape is (13871, 60, 1) and y_train shape is (13871,). Thank you beforehand and your help will be very much appreciated!
Thanks!
Phil
The error message actually explains this well. LSTM requires a time series input of shape (batch_size, timesteps, features). You seem to have this correct for your first input lstm layer. However, the output of LSTM is not a sequence. Consequent LSTM layers will not receive appropriate input.
You can make the LSTM output also as a sequence by setting the parameter
return_sequences=True
Note that you may have to set return sequence to false in the final layer before dense or perform flatten operation.
Does that help?
PS: your if... else, condition are exactly the same. Is that something you plan to change later?

Difference between doing cross-validation and validation_data/validation_split in Keras

First, I split the dataset into train and test, for example:
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.4, random_state=999)
I then use GridSearchCV with cross-validation to find the best performing model:
validator = GridSearchCV(estimator=clf, param_grid=param_grid, scoring="accuracy", cv=cv)
And by doing this, I have:
A model is trained using k-1 of the folds as training data; the resulting
model is validated on the remaining part of the data (scikit-learn.org)
But then, when reading about Keras fit fuction, the document introduces 2 more terms:
validation_split: Float between 0 and 1. Fraction of the training data
to be used as validation data. The model will set apart this fraction
of the training data, will not train on it, and will evaluate the loss
and any model metrics on this data at the end of each epoch. The
validation data is selected from the last samples in the x and y data
provided, before shuffling.
validation_data: tuple (x_val, y_val) or tuple (x_val, y_val,
val_sample_weights) on which to evaluate the loss and any model
metrics at the end of each epoch. The model will not be trained on
this data. validation_data will override validation_split.
From what I understand, validation_split (to be overridden by validation_data) will be used as an unchanged validation dataset, meanwhile hold-out set in cross-validation changes during each cross-validation step.
First question: is it necessary to use validation_split or validation_data since I already do cross validation?
Second question: if it is not necessary, then should I set validation_split and validation_data to 0 and None, respectively?
grid_result = validator.fit(train_images, train_labels, validation_data=None, validation_split=0)
Question 3: If I do so, what will happen during the training, would Keras just simply ignore the validation step?
Question 4: Does the validation_split belong to k-1 folds or the hold-out fold, or will it be considered as "test set" (like in the case of cross validation) which will never be used to train the model.
Validation is performed to ensure that the model is not overfitting on the dataset and it would generalize to new data. Since in the parameters grid search you are also doing validation then there is no need to perform the validation step by the Keras model itself during training. Therefore to answer your questions:
is it necessary to use validation_split or validation_data since I already do cross validation?
No, as I mentioned above.
if it is not necessary, then should I set validation_split and validation_data to 0 and None, respectively?
No, since by default no validation is done in Keras (i.e. by default we have validation_split=0.0, validation_data=None in fit() method).
If I do so, what will happen during the training, would Keras just simply ignore the validation step?
Yes, Keras won't perform the validation when training the model. However note that, as I mentioned above, the grid search procedure would perform validation to better estimate the performance of the model with a specific set of parameters.

What does initial_epoch in Keras mean?

I'm a little bit confused about initial_epoch value in fit and fit_generator methods. Here is the doc:
initial_epoch: Integer. Epoch at which to start training (useful for resuming a previous training run).
I understand, it is not useful if you start training from scratch. It is useful if you trained your dataset and want to improve accuracy or other values (correct me if I'm wrong). But I'm not sure what it really does.
So after all this, I have 2 questions:
What does initial_epoch do and what is it for?
When can I use initial_epoch?
When I change my dataset?
When I change the learning rate, optimizer or loss function?
Both of them?
Since in some of the optimizers, some of their internal values (e.g. learning rate) are set using the current epoch value, or even you may have (custom) callbacks that depend on the current value of epoch, the initial_epoch argument let you specify the initial value of epoch to start from when training.
As stated in the documentation, this is mostly useful when you have trained your model for some epochs, say 10, and then saved it and now you want to load it and resume the training for another 10 epochs without disrupting the state of epoch-dependent objects (e.g. optimizer). So you would set initial_epoch=10 (i.e. we have trained the model for 10 epochs) and epochs=20 (not 10, since the total number of epochs to reach is 20) and then everything resume as if you were initially trained the model for 20 epochs in one single training session.
However, note that when using built-in optimizers of Keras you don't need to use initial_epoch, since they store and update their state internally (without considering the value of current epoch) and also when saving a model the state of the optimizer will be stored as well.
The answer above is correct however it is important to note that if you have trained for 10 epochs and set initial_epoch=10 and epochs=20 you train for 10 more epochs until you reach a total of 20 epochs. For example I trained for 2 epochs, then set initial_epoch=2 and epochs=4. The result is it trains for 4-2=2 more epochs. The new data in the history object starts at epoch 3. So the returned history object does start from epoch 1 as you might expect. Another words the state of the history object is not preserved from the initial training epochs. If you do not set initial_epoch and you train for 2 epochs, then rerun the fit_generator with epochs=4 it will train for 4 more epochs starting from the state preserved at the end of the second epoch (provided you use the built in optimizers). Again the history object state is NOT preserved from the initial training and only contains the data for the last 4 epochs. I noticed this because I plot the validation loss versus epochs.
Here is an example of how to integrate the initial_epoch in your code
#Training first 4 Epcohs and saving
model.fit(x_train, y_train, validation_data=(x_val, y_val), batch_size=32, epochs=4)
model.save("partial.h5")
#loading the model, training another 4 Epochs and then saving the updated model.
from keras.models import load_model
new_model = load_model('partial.h5')
new_model.fit(x_train, y_train, validation_data=(x_val, y_val), batch_size=32, initial_epoch=4,epochs=8)
new_model.save("updated.h5")
Also don't forget to specify a particular random_state value while splitting the data into train and test, so that it encounters the same set of training data each time you reinitiate the training process, so that there is no data leakage of test data entering the training data.

error Evaluating classifier Train and test dataset are not compatible

I am getting error while running SMO model on test dataset in weka
Problem Evaluating classifier Train and test dataset are not
compatible. Class index differ: 3 != 0
Training dataset format
mean,variance,label
54.3333333333,1205.55555556,five
3.0,0.0,five
31739.0,0.0,five
3205.5,4475340.25,one
Test dataset format
mean,variance
3.0,0.0
257.0,0.0
216.0,14884.0
736.0,0.0
I trained the training dataset and want to get labels for the test dataset. Why I am getting these errors.
The test dataset should have identical structure to the training data. In your case you should add a column to the end called "label". Then, you need to assign some value to the label. This could be simply a question mark "?" to indicate the true label is unknown.

Resources