Problem description
I am going through "Deep Learning in Python" by François Chollet (publisher webpage, notebooks on github). Replicating examples from Chapter 6 I encountered problems with (I believe) GRU layer with recurrent dropout.
The code in which I had first observed those errors is quite long, so I decided to stick to the simplest problem, which could replicate the error: classifying IMDB reviews into "positive" and "negative" categories.
When I use a GRU layer with recurrent dropout training loss (after couple of batches of first epoch) takes "value" of nan, while training accuracy (from the start of second epoch) takes the value of 0.
64/12000 [..............................] - ETA: 3:05 - loss: 0.6930 - accuracy: 0.4844
128/12000 [..............................] - ETA: 2:09 - loss: 0.6926 - accuracy: 0.4766
192/12000 [..............................] - ETA: 1:50 - loss: 0.6910 - accuracy: 0.5573
(...)
3136/12000 [======>.......................] - ETA: 59s - loss: 0.6870 - accuracy: 0.5635
3200/12000 [=======>......................] - ETA: 58s - loss: 0.6862 - accuracy: 0.5650
3264/12000 [=======>......................] - ETA: 58s - loss: 0.6860 - accuracy: 0.5650
3328/12000 [=======>......................] - ETA: 57s - loss: nan - accuracy: 0.5667
3392/12000 [=======>......................] - ETA: 57s - loss: nan - accuracy: 0.5560
3456/12000 [=======>......................] - ETA: 56s - loss: nan - accuracy: 0.5457
(...)
11840/12000 [============================>.] - ETA: 1s - loss: nan - accuracy: 0.1593
11904/12000 [============================>.] - ETA: 0s - loss: nan - accuracy: 0.1584
11968/12000 [============================>.] - ETA: 0s - loss: nan - accuracy: 0.1576
12000/12000 [==============================] - 83s 7ms/step - loss: nan - accuracy: 0.1572 - val_loss: nan - val_accuracy: 0.0000e+00
Epoch 2/20
64/12000 [..............................] - ETA: 1:16 - loss: nan - accuracy: 0.0000e+00
128/12000 [..............................] - ETA: 1:15 - loss: nan - accuracy: 0.0000e+00
192/12000 [..............................] - ETA: 1:16 - loss: nan - accuracy: 0.0000e+00
(...)
11840/12000 [============================>.] - ETA: 1s - loss: nan - accuracy: 0.0000e+00
11904/12000 [============================>.] - ETA: 0s - loss: nan - accuracy: 0.0000e+00
11968/12000 [============================>.] - ETA: 0s - loss: nan - accuracy: 0.0000e+00
12000/12000 [==============================] - 82s 7ms/step - loss: nan - accuracy: 0.0000e+00 - val_loss: nan - val_accuracy: 0.0000e+00
Epoch 3/20
64/12000 [..............................] - ETA: 1:18 - loss: nan - accuracy: 0.0000e+00
128/12000 [..............................] - ETA: 1:18 - loss: nan - accuracy: 0.0000e+00
192/12000 [..............................] - ETA: 1:16 - loss: nan - accuracy: 0.0000e+00
(...)
Localizing the problem
To find out the solution I wrote the code presented below, which goes through several models (GRU/LSTM, {no dropout, only "normal" dropout, only recurrent dropout, "normal" and recurrent dropout, rmsprop/adam}) and presents loss and accuracy of all those models. (It also creates smaller, separate graphs for each model.)
# Based on examples from "Deep Learning with Python" by François Chollet:
## Constants, modules:
VERSION = 2
import os
from keras import models
from keras import layers
import matplotlib.pyplot as plt
import pylab
## Loading data:
from keras.datasets import imdb
(x_train, y_train), (x_test, y_test) = \
imdb.load_data(num_words=10000)
from keras.preprocessing import sequence
x_train = sequence.pad_sequences(x_train, maxlen=500)
x_test = sequence.pad_sequences(x_test, maxlen=500)
## Dictionary with models' hyperparameters:
MODELS = [
# GRU:
{"no": 1,
"layer_type": "GRU",
"optimizer": "rmsprop",
"dropout": None,
"recurrent_dropout": None},
{"no": 2,
"layer_type": "GRU",
"optimizer": "rmsprop",
"dropout": 0.3,
"recurrent_dropout": None},
{"no": 3,
"layer_type": "GRU",
"optimizer": "rmsprop",
"dropout": None,
"recurrent_dropout": 0.3},
{"no": 4,
"layer_type": "GRU",
"optimizer": "rmsprop",
"dropout": 0.3,
"recurrent_dropout": 0.3},
{"no": 5,
"layer_type": "GRU",
"optimizer": "adam",
"dropout": None,
"recurrent_dropout": None},
{"no": 6,
"layer_type": "GRU",
"optimizer": "adam",
"dropout": 0.3,
"recurrent_dropout": None},
{"no": 7,
"layer_type": "GRU",
"optimizer": "adam",
"dropout": None,
"recurrent_dropout": 0.3},
{"no": 8,
"layer_type": "GRU",
"optimizer": "adam",
"dropout": 0.3,
"recurrent_dropout": 0.3},
# LSTM:
{"no": 9,
"layer_type": "LSTM",
"optimizer": "rmsprop",
"dropout": None,
"recurrent_dropout": None},
{"no": 10,
"layer_type": "LSTM",
"optimizer": "rmsprop",
"dropout": 0.3,
"recurrent_dropout": None},
{"no": 11,
"layer_type": "LSTM",
"optimizer": "rmsprop",
"dropout": None,
"recurrent_dropout": 0.3},
{"no": 12,
"layer_type": "LSTM",
"optimizer": "rmsprop",
"dropout": 0.3,
"recurrent_dropout": 0.3},
{"no": 13,
"layer_type": "LSTM",
"optimizer": "adam",
"dropout": None,
"recurrent_dropout": None},
{"no": 14,
"layer_type": "LSTM",
"optimizer": "adam",
"dropout": 0.3,
"recurrent_dropout": None},
{"no": 15,
"layer_type": "LSTM",
"optimizer": "adam",
"dropout": None,
"recurrent_dropout": 0.3},
{"no": 16,
"layer_type": "LSTM",
"optimizer": "adam",
"dropout": 0.3,
"recurrent_dropout": 0.3},
]
## Adding name:
for model_dict in MODELS:
model_dict["name"] = f"{model_dict['layer_type']}"
model_dict["name"] += f"_d{model_dict['dropout']}" if model_dict['dropout'] is not None else f"_dN"
model_dict["name"] += f"_rd{model_dict['recurrent_dropout']}" if model_dict['recurrent_dropout'] is not None else f"_rdN"
model_dict["name"] += f"_{model_dict['optimizer']}"
## Fucntion - defing and training model:
def train_model(model_dict):
"""Defines and trains a model, outputs history."""
## Defining:
model = models.Sequential()
model.add(layers.Embedding(10000, 32))
recurrent_layer_kwargs = dict()
if model_dict["dropout"] is not None:
recurrent_layer_kwargs["dropout"] = model_dict["dropout"]
if model_dict["recurrent_dropout"] is not None:
recurrent_layer_kwargs["recurrent_dropout"] = model_dict["recurrent_dropout"]
if model_dict["layer_type"] == 'GRU':
model.add(layers.GRU(32, **recurrent_layer_kwargs))
elif model_dict["layer_type"] == 'LSTM':
model.add(layers.LSTM(32, **recurrent_layer_kwargs))
else:
raise ValueError("Wrong model_dict['layer_type'] value...")
model.add(layers.Dense(1, activation='sigmoid'))
## Compiling:
model.compile(
optimizer=model_dict["optimizer"],
loss='binary_crossentropy',
metrics=['accuracy'])
## Training:
history = model.fit(x_train, y_train,
epochs=20,
batch_size=64,
validation_split=0.2)
return history
## Multi-model graphs' parameters:
graph_all_nrow = 4
graph_all_ncol = 4
graph_all_figsize = (20, 20)
assert graph_all_nrow * graph_all_nrow >= len(MODELS)
## Figs and axes of multi-model graphs:
graph_all_loss_fig, graph_all_loss_axs = plt.subplots(graph_all_nrow, graph_all_ncol, figsize=graph_all_figsize)
graph_all_acc_fig, graph_all_acc_axs = plt.subplots(graph_all_nrow, graph_all_ncol, figsize=graph_all_figsize)
## Loop trough all models:
for i, model_dict in enumerate(MODELS):
history = train_model(model_dict)
## Metrics extraction:
loss = history.history['loss']
val_loss = history.history['val_loss']
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
epochs = range(1, len(loss) + 1)
## Single-model grph - loss:
graph_loss_fname = fr"{os.path.basename(__file__).replace('.py', '')}"
graph_loss_fname += fr"_v{VERSION}_{model_dict['no']}_{model_dict['name']}_loss_graph.png"
graph_loss_fig, graph_loss_ax = plt.subplots()
graph_loss_ax.plot(epochs, loss, 'bo', label='Training loss')
graph_loss_ax.plot(epochs, val_loss, 'b', label='Validation loss')
graph_loss_ax.legend()
graph_loss_fig.suptitle("Training and validation loss")
graph_loss_fig.savefig(graph_loss_fname)
pylab.close(graph_loss_fig)
## Single-model grph - accuracy:
graph_acc_fname = fr"{os.path.basename(__file__).replace('.py', '')}"
graph_acc_fname += fr"_v{VERSION}_{model_dict['no']}_{model_dict['name']}_acc_graph.png"
graph_acc_fig, graph_acc_ax = plt.subplots()
graph_acc_ax.plot(epochs, acc, 'bo', label='Training accuracy')
graph_acc_ax.plot(epochs, val_acc, 'b', label='Validation accuracy')
graph_acc_ax.legend()
graph_acc_fig.suptitle("Training and validation acc")
graph_acc_fig.savefig(graph_acc_fname)
pylab.close(graph_acc_fig)
## Position of axes on multi-model graph:
i_row = i // graph_all_ncol
i_col = i % graph_all_ncol
## Adding model metrics to multi-model graph - loss:
graph_all_loss_axs[i_row, i_col].plot(epochs, loss, 'bo', label='Training loss')
graph_all_loss_axs[i_row, i_col].plot(epochs, val_loss, 'b', label='Validation loss')
graph_all_loss_axs[i_row, i_col].set_title(fr"{model_dict['no']}. {model_dict['name']}")
## Adding model metrics to multi-model graph - accuracy:
graph_all_acc_axs[i_row, i_col].plot(epochs, acc, 'bo', label='Training acc')
graph_all_acc_axs[i_row, i_col].plot(epochs, val_acc, 'b', label='Validation acc')
graph_all_acc_axs[i_row, i_col].set_title(fr"{model_dict['no']}. {model_dict['name']}")
## Saving multi-model graphs:
# Output files are quite big (8000x8000 PNG), you may want to decrease DPI.
graph_all_loss_fig.savefig(fr"{os.path.basename(__file__).replace('.py', '')}_ALL_loss_graph.png", dpi=400)
graph_all_acc_fig.savefig(fr"{os.path.basename(__file__).replace('.py', '')}_ALL_acc_graph.png", dpi=400)
Please find two main graphs below: Loss - binary crossentropy, Accuracy (I am not allowed te embed images in post due to low reputation).
I have also obtained similarly strange problems in regression model - the MAE was in range of several thousands - in the problem where $y$ range was maybe of several tens. (I decided not to include this model here, because it would make this question even longer.)
Versions of modules and libraries, hardware
Modules:
Keras 2.3.1
Keras-Applications 1.0.8
Keras-Preprocessing 1.1.0
matplotlib 3.1.3
tensorflow-estimator 1.14.0
tensorflow-gpu 2.1.0
tensorflow-gpu-estimator 2.1.0
keras.json file:
{
"floatx": "float32",
"epsilon": 1e-07,
"backend": "tensorflow",
"image_data_format": "channels_last"
}
CUDA - I have CUDA 10.0 and CUDA 10.1 installed on my system.
CUDnn - I have three versions: cudnn-10.0 v7.4.2.24, cudnn-10.0 v7.6.4.38, cudnn-9.0 v7.4.2.24
GPU: Nvidia GTX 1050Ti 4gb
Windows 10 Home
Questions
Do you know what may be the reason of this behavior?
Is it possible that this is caused by multiple CUDA and CUDnn installations? Before observing the problem I have trained several models (both from book and my own ones) and the seemed to behave mor or less as expected, while having 2 CUDA and 2 CUDnn versions (those above without cudnn-10.0 v7.6.4.38) installed.
Is there any official/good source of adequate combinations of keras, tensorflow, CUDA, CUDnn (and other relevant things e.g. maybe Visual Studio)? I cannot really find any authoritative and up-to-date source.
I hope I've described everything clearly enough. If you have any questions, please ask.
I finally found the solution (sort of). It's enough to change keras to tensorflow.keras.
Revised Code
# Based on examples from "Deep Learning with Python" by François Chollet:
## Constants, modules:
VERSION = 2
import os
#U: from keras import models
#U: from keras import layers
from tensorflow.keras import models
from tensorflow.keras import layers
import matplotlib.pyplot as plt
import pylab
## Loading data:
from keras.datasets import imdb
(x_train, y_train), (x_test, y_test) = \
imdb.load_data(num_words=10000)
from keras.preprocessing import sequence
x_train = sequence.pad_sequences(x_train, maxlen=500)
x_test = sequence.pad_sequences(x_test, maxlen=500)
## Dictionary with models' hyperparameters:
MODELS_ALL = [
# GRU:
{"no": 1,
"layer_type": "GRU",
"optimizer": "rmsprop",
"dropout": None,
"recurrent_dropout": None},
{"no": 2,
"layer_type": "GRU",
"optimizer": "rmsprop",
"dropout": 0.3,
"recurrent_dropout": None},
{"no": 3,
"layer_type": "GRU",
"optimizer": "rmsprop",
"dropout": None,
"recurrent_dropout": 0.3},
{"no": 4,
"layer_type": "GRU",
"optimizer": "rmsprop",
"dropout": 0.3,
"recurrent_dropout": 0.3},
{"no": 5,
"layer_type": "GRU",
"optimizer": "adam",
"dropout": None,
"recurrent_dropout": None},
{"no": 6,
"layer_type": "GRU",
"optimizer": "adam",
"dropout": 0.3,
"recurrent_dropout": None},
{"no": 7,
"layer_type": "GRU",
"optimizer": "adam",
"dropout": None,
"recurrent_dropout": 0.3},
{"no": 8,
"layer_type": "GRU",
"optimizer": "adam",
"dropout": 0.3,
"recurrent_dropout": 0.3},
# LSTM:
{"no": 9,
"layer_type": "LSTM",
"optimizer": "rmsprop",
"dropout": None,
"recurrent_dropout": None},
{"no": 10,
"layer_type": "LSTM",
"optimizer": "rmsprop",
"dropout": 0.3,
"recurrent_dropout": None},
{"no": 11,
"layer_type": "LSTM",
"optimizer": "rmsprop",
"dropout": None,
"recurrent_dropout": 0.3},
{"no": 12,
"layer_type": "LSTM",
"optimizer": "rmsprop",
"dropout": 0.3,
"recurrent_dropout": 0.3},
{"no": 13,
"layer_type": "LSTM",
"optimizer": "adam",
"dropout": None,
"recurrent_dropout": None},
{"no": 14,
"layer_type": "LSTM",
"optimizer": "adam",
"dropout": 0.3,
"recurrent_dropout": None},
{"no": 15,
"layer_type": "LSTM",
"optimizer": "adam",
"dropout": None,
"recurrent_dropout": 0.3},
{"no": 16,
"layer_type": "LSTM",
"optimizer": "adam",
"dropout": 0.3,
"recurrent_dropout": 0.3},
]
MODELS_GRU_RECCURENT = [
# GRU:
{"no": 3,
"layer_type": "GRU",
"optimizer": "rmsprop",
"dropout": None,
"recurrent_dropout": 0.3},
{"no": 4,
"layer_type": "GRU",
"optimizer": "rmsprop",
"dropout": 0.3,
"recurrent_dropout": 0.3},
{"no": 7,
"layer_type": "GRU",
"optimizer": "adam",
"dropout": None,
"recurrent_dropout": 0.3},
{"no": 8,
"layer_type": "GRU",
"optimizer": "adam",
"dropout": 0.3,
"recurrent_dropout": 0.3},
]
MODELS = MODELS_ALL # "MODELS = MODELS_ALL" or "MODELS = MODELS_GRU_RECCURENT"
## Adding name:
for model_dict in MODELS:
model_dict["name"] = f"{model_dict['layer_type']}"
model_dict["name"] += f"_d{model_dict['dropout']}" if model_dict['dropout'] is not None else f"_dN"
model_dict["name"] += f"_rd{model_dict['recurrent_dropout']}" if model_dict['recurrent_dropout'] is not None else f"_rdN"
model_dict["name"] += f"_{model_dict['optimizer']}"
## Fucntion - defing and training model:
def train_model(model_dict):
"""Defines and trains a model, outputs history."""
## Defining:
model = models.Sequential()
model.add(layers.Embedding(10000, 32))
recurrent_layer_kwargs = dict()
if model_dict["dropout"] is not None:
recurrent_layer_kwargs["dropout"] = model_dict["dropout"]
if model_dict["recurrent_dropout"] is not None:
recurrent_layer_kwargs["recurrent_dropout"] = model_dict["recurrent_dropout"]
if model_dict["layer_type"] == 'GRU':
model.add(layers.GRU(32, **recurrent_layer_kwargs))
elif model_dict["layer_type"] == 'LSTM':
model.add(layers.LSTM(32, **recurrent_layer_kwargs))
else:
raise ValueError("Wrong model_dict['layer_type'] value...")
model.add(layers.Dense(1, activation='sigmoid'))
## Compiling:
model.compile(
optimizer=model_dict["optimizer"],
loss='binary_crossentropy',
metrics=['accuracy'])
## Training:
history = model.fit(x_train, y_train,
epochs=20,
batch_size=64,
validation_split=0.2)
return history
## Multi-model graphs' parameters:
graph_all_nrow = 4
graph_all_ncol = 4
graph_all_figsize = (20, 20)
assert graph_all_nrow * graph_all_nrow >= len(MODELS)
# fig and axes of multi-model graphs:
graph_all_loss_fig, graph_all_loss_axs = plt.subplots(graph_all_nrow, graph_all_ncol, figsize=graph_all_figsize)
graph_all_acc_fig, graph_all_acc_axs = plt.subplots(graph_all_nrow, graph_all_ncol, figsize=graph_all_figsize)
## Loop trough all models:
for i, model_dict in enumerate(MODELS):
history = train_model(model_dict)
## Metrics extraction:
loss = history.history['loss']
val_loss = history.history['val_loss']
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
epochs = range(1, len(loss) + 1)
## Single-model graph - loss:
graph_loss_fname = fr"{os.path.basename(__file__).replace('.py', '')}"
graph_loss_fname += fr"_v{VERSION}_{model_dict['no']}_{model_dict['name']}_loss_graph.png"
graph_loss_fig, graph_loss_ax = plt.subplots()
graph_loss_ax.plot(epochs, loss, 'bo', label='Training loss')
graph_loss_ax.plot(epochs, val_loss, 'b', label='Validation loss')
graph_loss_ax.legend()
graph_loss_fig.suptitle("Training and validation loss")
graph_loss_fig.savefig(graph_loss_fname)
pylab.close(graph_loss_fig)
## Single-model graph - accuracy:
graph_acc_fname = fr"{os.path.basename(__file__).replace('.py', '')}"
graph_acc_fname += fr"_v{VERSION}_{model_dict['no']}_{model_dict['name']}_acc_graph.png"
graph_acc_fig, graph_acc_ax = plt.subplots()
graph_acc_ax.plot(epochs, acc, 'bo', label='Training accuracy')
graph_acc_ax.plot(epochs, val_acc, 'b', label='Validation accuracy')
graph_acc_ax.legend()
graph_acc_fig.suptitle("Training and validation acc")
graph_acc_fig.savefig(graph_acc_fname)
pylab.close(graph_acc_fig)
## Position of axes on multi-model graph:
i_row = i // graph_all_ncol
i_col = i % graph_all_ncol
## Adding model metrics to multi-model graph - loss:
graph_all_loss_axs[i_row, i_col].plot(epochs, loss, 'bo', label='Training loss')
graph_all_loss_axs[i_row, i_col].plot(epochs, val_loss, 'b', label='Validation loss')
graph_all_loss_axs[i_row, i_col].set_title(fr"{model_dict['no']}. {model_dict['name']}")
## Adding model metrics to multi-model graph - accuracy:
graph_all_acc_axs[i_row, i_col].plot(epochs, acc, 'bo', label='Training acc')
graph_all_acc_axs[i_row, i_col].plot(epochs, val_acc, 'b', label='Validation acc')
graph_all_acc_axs[i_row, i_col].set_title(fr"{model_dict['no']}. {model_dict['name']}")
graph_all_loss_fig.suptitle(f"Loss - binary crossentropy [v{VERSION}]")
graph_all_acc_fig.suptitle(f"Accuracy [v{VERSION}]")
## Saving multi-model graphs:
graph_all_loss_fig.savefig(fr"{os.path.basename(__file__).replace('.py', '')}_ALL_v{VERSION}_loss_graph.png", dpi=400)
graph_all_acc_fig.savefig(fr"{os.path.basename(__file__).replace('.py', '')}_ALL_v{VERSION}_acc_graph.png", dpi=400)
## Saving multi-model graphs (SMALL):
graph_all_loss_fig.savefig(fr"{os.path.basename(__file__).replace('.py', '')}_ALL_v{VERSION}_loss_graph_SMALL.png", dpi=150)
graph_all_acc_fig.savefig(fr"{os.path.basename(__file__).replace('.py', '')}_ALL_v{VERSION}_acc_graph_SMALL.png", dpi=150)
Results
Graphs analogous to those in question: Loss - binary crossentropy, Accuracy
More on keras vs tensorflow.keras
As written in François Chollet's tweets (found here: https://stackoverflow.com/a/54117754) instead of stand-alone keras there will be tensorflow.keras (that is Keras as official API to TensorFlow) from now on. (I'm not completely sure if I'm 100% correct, feel free to correct me.)
I think it's better just to use tensorflow.keras instead of keras in future projects.
Same for me while training using R interface to Keras. The issue seems related to recurrent dropout and the length of the "time" dimension. It happens using GRU only (lstm has no problem).
# remotes::install_github("rstudio/keras#1032")
library(keras)
reticulate::py_config()
#> python: /home/clanera/anaconda3/envs/r-tensorflow/bin/python
#> libpython: /home/clanera/anaconda3/envs/r-tensorflow/lib/libpython3.6m.so
#> pythonhome: /home/clanera/anaconda3/envs/r-tensorflow:/home/clanera/anaconda3/envs/r-tensorflow
#> version: 3.6.10 |Anaconda, Inc.| (default, Jan 7 2020, 21:14:29) [GCC 7.3.0]
#> numpy: /home/clanera/anaconda3/envs/r-tensorflow/lib/python3.6/site-packages/numpy
#> numpy_version: 1.18.1
#> tensorflow: /home/clanera/anaconda3/envs/r-tensorflow/lib/python3.6/site-packages/tensorflow
#>
#> NOTE: Python version was forced by RETICULATE_PYTHON
tensorflow::tf_config()
#> TensorFlow v2.0.0 (~/anaconda3/envs/r-tensorflow/lib/python3.6/site-packages/tensorflow)
#> Python v3.6 (~/anaconda3/envs/r-tensorflow/bin/python)
tensorflow::tf_gpu_configured()
#> TensorFlow built with CUDA: FALSE
#> GPU device name:
#> [1] FALSE
n <- 100
t <- 80 # with 72- seams have no problem
q <- 10
x <- array(sample(n*t*q), c(n, t, q))
y <- sample(0:1, n, replace = TRUE)
input <- layer_input(c(t, q))
output <- input %>%
# ## no problem using LSTM
# layer_lstm(units = 2, recurrent_dropout = 0.5) %>%
layer_gru(units = 2, recurrent_dropout = 0.5) %>%
layer_dense(units = 1, activation = "sigmoid")
model <- keras_model(input, output)
summary(model)
#> Model: "model"
#> ________________________________________________________________________________
#> Layer (type) Output Shape Param #
#> ================================================================================
#> input_1 (InputLayer) [(None, 80, 10)] 0
#> ________________________________________________________________________________
#> gru (GRU) (None, 2) 78
#> ________________________________________________________________________________
#> dense (Dense) (None, 1) 3
#> ================================================================================
#> Total params: 81
#> Trainable params: 81
#> Non-trainable params: 0
#> ________________________________________________________________________________
history <- model %>%
compile(optimizer = "adam", loss = "binary_crossentropy") %>%
fit(x, y, 2, 3)
history
#> Trained on 100 samples (batch_size=2, epochs=3)
#> Final epoch (plot to see history):
#> loss: NaN
Created on 2020-05-10 by the reprex package (v0.3.0)
sessionInfo()
#> R version 4.0.0 (2020-04-24)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 18.04.4 LTS
#>
#> Matrix products: default
#> BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> attached base packages:
#> [1] stats graphics grDevices datasets utils methods base
#>
#> other attached packages:
#> [1] keras_2.2.5.0
#>
#> loaded via a namespace (and not attached):
#> [1] Rcpp_1.0.4.6 whisker_0.4 knitr_1.28
#> [4] magrittr_1.5 lattice_0.20-41 R6_2.4.1
#> [7] rlang_0.4.6 stringr_1.4.0 highr_0.8
#> [10] tools_4.0.0 grid_4.0.0 xfun_0.13
#> [13] htmltools_0.4.0 tfruns_1.4 yaml_2.2.1
#> [16] digest_0.6.25 tensorflow_2.0.0 Matrix_1.2-18
#> [19] base64enc_0.1-3 zeallot_0.1.0 evaluate_0.14
#> [22] rmarkdown_2.1 stringi_1.4.6 compiler_4.0.0
#> [25] generics_0.0.2 reticulate_1.15-9000 jsonlite_1.6.1
#> [28] renv_0.10.0
Related
I wrote some models in pytorch which was not able to learn anything even after many epochs. In order to debug the problem I made a simple model which models identity function of an input. The difficulty is this model also doesn't learn nothing despite training for 50k epochs,
import torch
import torch.nn as nn
torch.manual_seed(1)
class Net(nn.Module):
def __init__(self):
super().__init__()
self.input = nn.Linear(2,4)
self.hidden = nn.Linear(4,4)
self.output = nn.Linear(4,2)
self.relu = nn.ReLU()
self.softmax = nn.Softmax(dim=1)
self.dropout = nn.Dropout(0.5)
def forward(self,x):
x = self.input(x)
x = self.dropout(x)
x = self.relu(x)
x = self.hidden(x)
x = self.dropout(x)
x = self.relu(x)
x = self.output(x)
x = self.softmax(x)
return x
X = torch.tensor([[1,0],[1,0],[0,1],[0,1]],dtype=torch.float)
net = Net()
criterion = nn.CrossEntropyLoss()
opt = torch.optim.Adam(net.parameters(), lr=0.001)
for i in range(100000):
opt.zero_grad()
y = net(X)
loss = criterion(y,torch.argmax(X,dim=1))
loss.backward()
if i%500 ==0:
print("Epoch: ",i)
print(torch.argmax(y,dim=1).detach().numpy().tolist())
print("Loss: ",loss.item())
print()
Output
Epoch: 52500
[0, 0, 1, 0]
Loss: 0.6554909944534302
Epoch: 53000
[0, 0, 0, 0]
Loss: 0.7004914283752441
Epoch: 53500
[0, 0, 0, 0]
Loss: 0.7156486511230469
Epoch: 54000
[0, 0, 0, 0]
Loss: 0.7171240448951721
Epoch: 54500
[0, 0, 0, 0]
Loss: 0.691678524017334
Epoch: 55000
[0, 0, 0, 0]
Loss: 0.7301554679870605
Epoch: 55500
[0, 0, 0, 0]
Loss: 0.728650689125061
What is wrong with my implementation?
There are a few mistakes:
Missing optimizer.step():
optimizer.step() updates the parameters based on backpropagated gradients and other accumulated momentum and all.
Usage of softmax with CrossEntropy Loss:
Pytorch CrossEntropyLoss criterion combines nn.LogSoftmax() and nn.NLLLoss() in one single class. i.e. it applies softmax then takes negative log. So in your case you are taking softmax(softmax(output)). Correct way is use linear output layer while training and use softmax layer or just take argmax for prediction.
High dropout value for small network:
Which results in underfitting.
Here's the corrected code:
import torch
import torch.nn as nn
torch.manual_seed(1)
class Net(nn.Module):
def __init__(self):
super().__init__()
self.input = nn.Linear(2,4)
self.hidden = nn.Linear(4,4)
self.output = nn.Linear(4,2)
self.relu = nn.ReLU()
self.softmax = nn.Softmax(dim=1)
# self.dropout = nn.Dropout(0.0)
def forward(self,x):
x = self.input(x)
# x = self.dropout(x)
x = self.relu(x)
x = self.hidden(x)
# x = self.dropout(x)
x = self.relu(x)
x = self.output(x)
# x = self.softmax(x)
return x
def predict(self, x):
with torch.no_grad():
out = self.forward(x)
return self.softmax(out)
X = torch.tensor([[1,0],[1,0],[0,1],[0,1]],dtype=torch.float)
net = Net()
criterion = nn.CrossEntropyLoss()
opt = torch.optim.Adam(net.parameters(), lr=0.001)
for i in range(100000):
opt.zero_grad()
y = net(X)
loss = criterion(y,torch.argmax(X,dim=1))
loss.backward()
# This was missing before
opt.step()
if i%500 ==0:
print("Epoch: ",i)
pred = net.predict(X)
print(f'prediction: {torch.argmax(pred, dim=1).detach().numpy().tolist()}, actual: {torch.argmax(X,dim=1)}')
print("Loss: ", loss.item())
Output:
Epoch: 0
prediction: [0, 0, 0, 0], actual: tensor([0, 0, 1, 1])
Loss: 0.7042869329452515
Epoch: 500
prediction: [0, 0, 1, 1], actual: tensor([0, 0, 1, 1])
Loss: 0.1166711300611496
Epoch: 1000
prediction: [0, 0, 1, 1], actual: tensor([0, 0, 1, 1])
Loss: 0.05215628445148468
Epoch: 1500
prediction: [0, 0, 1, 1], actual: tensor([0, 0, 1, 1])
Loss: 0.02993333339691162
Epoch: 2000
prediction: [0, 0, 1, 1], actual: tensor([0, 0, 1, 1])
Loss: 0.01916157826781273
Epoch: 2500
prediction: [0, 0, 1, 1], actual: tensor([0, 0, 1, 1])
Loss: 0.01306679006665945
Epoch: 3000
prediction: [0, 0, 1, 1], actual: tensor([0, 0, 1, 1])
Loss: 0.009280549362301826
.
.
.
I'm trying to do a multi-class classification on sequential data to learn what is the source of certain events based on the cumulative reading of the sources.
I'm using a simple LSTM layer with 64 units and a Dense layer with the same number of units as targets. The model does not seems to be learning anything as the accuracy remains about 1% all thought.
def create_model():
model = Sequential()
model.add(LSTM(64, return_sequences=False))
model.add(Dense(8))
model.add(Activation("softmax"))
model.compile(
loss="categorical_crossentropy",
optimizer=Adam(lr=0.00001),
metrics=["accuracy"],
)
return model
I have tried changing learning rate to very small values (0.001, 0.0001, 1e-5) and training for larger epochs but no change in accuracy observed. Am I missing something here? Is my data preprocessing not correct or the model creation is faulty?
Thanks in advance for your help.
Dataset
Accumulated- Source-1 Source-2 Source-3
Reading
217 0 0 0
205 0 0 0
206 0 0 0
231 0 0 0
308 0 0 1
1548 0 0 1
1547 0 0 1
1530 0 0 1
1545 0 0 1
1544 0 0 1
1527 0 0 1
1533 0 0 1
1527 0 0 1
1527 0 0 1
1534 0 0 1
1520 0 0 1
1524 0 0 1
1523 0 0 1
205 0 0 0
209 0 0 0
.
.
.
I created a rolling window dataset having SEQ_LEN=5 to be fed to an LSTM network:
rolling_window labels
[205, 206, 217, 205, 206] [0, 0, 0]
[206, 217, 205, 206, 231] [0, 0, 0]
[217, 205, 206, 231, 308] [0, 0, 1]
[205, 206, 231, 308, 1548] [0, 0, 1]
[206, 231, 308, 1548, 1547] [0, 0, 1]
[231, 308, 1548, 1547, 1530] [0, 0, 1]
[308, 1548, 1547, 1530, 1545] [0, 0, 1]
[1548, 1547, 1530, 1545, 1544] [0, 0, 1]
[1547, 1530, 1545, 1544, 1527] [0, 0, 1]
[1530, 1545, 1544, 1527, 1533] [0, 0, 1]
[1545, 1544, 1527, 1533, 1527] [0, 0, 1]
[1544, 1527, 1533, 1527, 1527] [0, 0, 1]
[1527, 1533, 1527, 1527, 1534] [0, 0, 1]
[1533, 1527, 1527, 1534, 1520] [0, 0, 1]
[1527, 1527, 1534, 1520, 1524] [0, 0, 1]
[1527, 1534, 1520, 1524, 1523] [0, 0, 1]
[1534, 1520, 1524, 1523, 1520] [0, 0, 1]
[1520, 1524, 1523, 1520, 205] [0, 0, 0]
.
.
.
Reshaped dataset
X_train = train_df.rolling_window.values
X_train = X_train.reshape(X_train.shape[0], 1, SEQ_LEN)
Y_train = train_df.labels.values
Y_train = Y_train.reshape(Y_train.shape[0], 3)
Model
def create_model():
model = Sequential()
model.add(LSTM(64, input_shape=(1, SEQ_LEN), return_sequences=True))
model.add(Activation("relu"))
model.add(Flatten())
model.add(Dense(3))
model.add(Activation("softmax"))
model.compile(
loss="categorical_crossentropy", optimizer=Adam(lr=0.01), metrics=["accuracy"]
)
return model
Training
model = create_model()
model.fit(X_train, Y_train, batch_size=512, epochs=5)
Training Output
Epoch 1/5
878396/878396 [==============================] - 37s 42us/step - loss: 0.2586 - accuracy: 0.0173
Epoch 2/5
878396/878396 [==============================] - 36s 41us/step - loss: 0.2538 - accuracy: 0.0175
Epoch 3/5
878396/878396 [==============================] - 36s 41us/step - loss: 0.2538 - accuracy: 0.0176
Epoch 4/5
878396/878396 [==============================] - 37s 42us/step - loss: 0.2537 - accuracy: 0.0177
Epoch 5/5
878396/878396 [==============================] - 38s 43us/step - loss: 0.2995 - accuracy: 0.0174
[EDIT-1]
After trying Max's suggestions, here are the results (loss and accuracy are still not changing)
Suggested model
def create_model():
model = Sequential()
model.add(LSTM(64, return_sequences=False))
model.add(Dense(8))
model.add(Activation("softmax"))
model.compile(
loss="categorical_crossentropy",
optimizer=Adam(lr=0.001),
metrics=["accuracy"],
)
return model
X_train
array([[[205],
[217],
[209],
[215],
[206]],
[[217],
[209],
[215],
[206],
[206]],
[[209],
[215],
[206],
[206],
[211]],
...,
[[175],
[175],
[173],
[176],
[174]],
[[175],
[173],
[176],
[174],
[176]],
[[173],
[176],
[174],
[176],
[173]]])
Y_train (P.S: There are 8 target classes actually. The above example was a simplification of the real problem)
array([[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
...,
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0]])
Training-output
Epoch 1/5
878396/878396 [==============================] - 15s 17us/step - loss: 0.1329 - accuracy: 0.0190
Epoch 2/5
878396/878396 [==============================] - 15s 17us/step - loss: 0.1313 - accuracy: 0.0190
Epoch 3/5
878396/878396 [==============================] - 16s 18us/step - loss: 0.1293 - accuracy: 0.0190
Epoch 4/5
878396/878396 [==============================] - 16s 18us/step - loss: 0.1355 - accuracy: 0.0195
Epoch 5/5
878396/878396 [==============================] - 15s 18us/step - loss: 0.1315 - accuracy: 0.0236
[EDIT-2]
Based on Max and Marcin's suggestions below the accuracy is mostly remaining below 3%. Although 1 out of 10 times it hits 95% accuracy. It all depends on what the accuracy is at the beginning of the first epoch. If it doesn't start the gradient descent in the right place, it doesn't reach good accuracy. Do I need to use a different initializer? Changing the learning rate doesn't bring repeatable results.
Suggestions:
1. Scale/Normalize the X_train (done)
2. Not reshaping Y_train (done)
3. Having lesser units in LSTM layer (reduced from 64 to 16)
4. Have smaller batch_size (reduced from 512 to 64)
Scaled X_train
array([[[ 0.01060734],
[ 0.03920736],
[ 0.02014085],
[ 0.03444091],
[ 0.01299107]],
[[ 0.03920728],
[ 0.02014073],
[ 0.03444082],
[ 0.01299095],
[ 0.01299107]],
[[ 0.02014065],
[ 0.0344407 ],
[ 0.01299086],
[ 0.01299095],
[ 0.02490771]],
...,
[[-0.06089251],
[-0.06089243],
[-0.06565897],
[-0.05850889],
[-0.06327543]],
[[-0.06089251],
[-0.06565908],
[-0.05850898],
[-0.06327555],
[-0.05850878]],
[[-0.06565916],
[-0.0585091 ],
[-0.06327564],
[-0.05850889],
[-0.06565876]]])
Non reshaped Y_train
array([[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
...,
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0]])
Model with lesser LSTM units
def create_model():
model = Sequential()
model.add(LSTM(16, return_sequences=False))
model.add(Dense(8))
model.add(Activation("softmax"))
model.compile(
loss="categorical_crossentropy", optimizer=Adam(lr=0.001), metrics=["accuracy"]
)
return model
Training output
Epoch 1/5
878396/878396 [==============================] - 26s 30us/step - loss: 0.1325 - accuracy: 0.0190
Epoch 2/5
878396/878396 [==============================] - 26s 29us/step - loss: 0.1352 - accuracy: 0.0189
Epoch 3/5
878396/878396 [==============================] - 26s 30us/step - loss: 0.1353 - accuracy: 0.0192
Epoch 4/5
878396/878396 [==============================] - 26s 29us/step - loss: 0.1365 - accuracy: 0.0197
Epoch 5/5
878396/878396 [==============================] - 27s 31us/step - loss: 0.1378 - accuracy: 0.0201
The sequence should be the first dimension of the LSTM (2nd of the input array), i.e.:
Reshaped dataset
X_train = train_df.rolling_window.values
X_train = X_train.reshape(X_train.shape[0], SEQ_LEN, 1)
Y_train = train_df.labels.values
Y_train = Y_train.reshape(Y_train.shape[0], 3)
The input shape is not required for LSTM.
LSTM has 'tanh' activation by default, which is usually a good option.
Model
def create_model():
model = Sequential()
model.add(LSTM(64, return_sequences=True))
model.add(Flatten())
model.add(Dense(3))
model.add(Activation("softmax"))
model.compile(loss="categorical_crossentropy", optimizer=Adam(lr=0.01), metrics=["accuracy"])
return model
Maybe it would be a better choice not to use a Flatten() layer but to use return_sequences=False for the LSTM. Just try.
Edit
Also try pre-processing in terms of feature scaling of the data. The data values seem to be quite large.
The structure of my input data is:
print(df.col)
0 [262, 330, 392, 522, 784, 0, 0]
1 [262, 290, 330, 392, 522, 784, 0]
2 [262, 330, 392, 522, 784, 0, 0]
3 [250, 262, 330, 392, 522, 784, 0]
4 [262, 290, 306, 330, 392, 784, 0]
.
.
.
I had variable sized data so I've added a padding of 0's in the end to fix the input data shape.
The output column is:
print(df.predict)
array([[0., 0., 0., 1.],
[1., 0., 0., 0.],
[0., 0., 1., 0.],
[0., 0., 1., 0.],
[0., 0., 1., 0.],
[0., 1., 0., 0.],...])
Output is one hot encoded.
Following is my model:
model = Sequential()
model.add(Dense(7, activation='relu', input_dim = 7))
model.add(Dense(10, activation='relu'))
model.add(Dense(10, activation='relu'))
model.add(Dense(4))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
# Fit the model
model.fit(X_train, y_train, epochs=500, batch_size=10, verbose=2)
The accuracy and loss become constant after 2-3 epochs.
Epoch 1/500
0s - loss: 5.8413 - acc: 0.1754
Epoch 2/500
0s - loss: 5.7398 - acc: 0.1754
Epoch 3/500
0s - loss: 5.7190 - acc: 0.1754
Epoch 4/500
0s - loss: 5.6885 - acc: 0.1754
Epoch 5/500
0s - loss: 5.6650 - acc: 0.1754
Epoch 6/500
0s - loss: 5.6403 - acc: 0.1754
Epoch 7/500
0s - loss: 5.6164 - acc: 0.2456
Epoch 8/500
0s - loss: 5.5900 - acc: 0.2456
Epoch 9/500
0s - loss: 5.5730 - acc: 0.2456
...
0s - loss: 5.3727 - acc: 0.1754
Epoch 499/500
0s - loss: 5.3727 - acc: 0.1754
Epoch 500/500
0s - loss: 5.3727 - acc: 0.1754
I have 72 data points and 4 classes (about 18 samples for each class)
The data is fairly simple. Why is the accuracy so low?
Is the model designed right?
I'm new to ML and Keras. Any help is appreciated.
Try this model.add(layers.Dense(4, activation = 'softmax')) as you last layer.
If you have more than 2 classes for classification you will need a softmax layer in the end. This is a function, that output the probabilities for the 4 different classes (all add to 1) and the one with the highest probability will be your class. This way your network will be able to learn all the 4 different classes instead of only two.
I have created a simulation of the CNN I am trying to use on video data set.
I set the test data to all one single image on all frames for positive examples and 0 for negative examples. I thought this would learn very quickly. But it does not move at all.
Using current versions of Keras & Tensorflow on Windows 10 64bit.
First question, is my logic wrong? Should I expect the learning of this test data to quickly reach high accuracy?
Is there something wrong with my model or parameters? I have been trying a number of changes but still get the same problem.
Is the sample size (56) too small?
# testing feature extraction model.
import time
import numpy as np, cv2
import sys
import os
import keras
import tensorflow as tf
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten, BatchNormalization
from keras.layers import Conv3D, MaxPooling3D
from keras.optimizers import SGD,rmsprop, adam
from keras import regularizers
from keras.initializers import Constant
from keras.models import Model
#set gpu options
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=.99, allocator_type = 'BFC')
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True, gpu_options=gpu_options))
config = tf.ConfigProto()
batch_size = 5
num_classes = 1
epochs = 50
nvideos = 56
nframes = 55
nchan = 3
nrows = 480
ncols = 640
#load any single image, resize if needed
img = cv2.imread('C:\\Users\\david\\Documents\\AutonomousSS\\single frame.jpg',cv2.IMREAD_COLOR)
img = cv2.resize(img,(640,480))
x_learn = np.random.randint(0,255,(nvideos,nframes,nrows,ncols,nchan),dtype=np.uint8)
y_learn = np.array([[1],[1],[1],[0],[1],[0],[1],[0],[1],[0],
[1],[0],[0],[1],[0],[0],[1],[0],[1],[0],
[1],[0],[1],[1],[0],[1],[0],[0],[1],[1],
[1],[0],[1],[0],[1],[0],[1],[0],[1],[0],
[0],[1],[0],[0],[1],[0],[1],[0],[1],[0],
[1],[1],[0],[1],[0],[0]],np.uint8)
#each sample, each frame is either the single image for postive examples or 0 for negative examples.
for i in range (nvideos):
if y_learn[i] == 0 :
x_learn[i]=0
else:
x_learn[i,:nframes]=img
#build model
m_loss = 'mean_squared_error'
m_opt = SGD(lr=0.001, decay=1e-6, momentum=0.9, nesterov=True)
m_met = 'acc'
model = Sequential()
# 1st layer group
model.add(Conv3D(32, (3, 3,3), activation='relu',padding="same", name="conv1a", strides=(3, 3, 3),
kernel_initializer = 'glorot_normal',
trainable=False,
input_shape=(nframes,nrows,ncols,nchan)))
#model.add(BatchNormalization(axis=1))
model.add(Conv3D(32, (3, 3, 3), trainable=False, strides=(1, 1, 1), padding="same", name="conv1b", activation="relu"))
#model.add(BatchNormalization(axis=1))
model.add(MaxPooling3D(padding="valid", trainable=False, pool_size=(1, 5, 5), name="pool1", strides=(2, 2, 2)))
# 2nd layer group
model.add(Conv3D(128, (3, 3, 3), trainable=False, strides=(1, 1, 1), padding="same", name="conv2a", activation="relu"))
model.add(Conv3D(128, (3, 3, 3), trainable=False, strides=(1, 1, 1), padding="same", name="conv2b", activation="relu"))
#model.add(BatchNormalization(axis=1))
model.add(MaxPooling3D(padding="valid", trainable=False, pool_size=(1, 5, 5), name="pool2", strides=(2, 2, 2)))
# 3rd layer group
model.add(Conv3D(256, (3, 3, 3), trainable=False, strides=(1, 1, 1), padding="same", name="conv3a", activation="relu"))
model.add(Conv3D(256, (3, 3, 3), trainable=False, strides=(1, 1, 1), padding="same", name="conv3b", activation="relu"))
#model.add(BatchNormalization(axis=1))
model.add(MaxPooling3D(padding="valid", trainable=False, pool_size=(1, 5, 5), name="pool3", strides=(2, 2, 2)))
# 4th layer group
model.add(Conv3D(512, (3, 3, 3), trainable=False, strides=(1, 1, 1), padding="same", name="conv4a", activation="relu"))
model.add(Conv3D(512, (3, 3, 3), trainable=False, strides=(1, 1, 1), padding="same", name="conv4b", activation="relu"))
#model.add(BatchNormalization(axis=1))
model.add(MaxPooling3D(padding="valid", trainable=False, pool_size=(1, 5, 5), name="pool4", strides=(2, 2, 2)))
model.add(Flatten(name='flatten',trainable=False))
model.add(Dense(512,activation='relu', trainable=True,name='den0'))
model.add(Dense(num_classes,activation='softmax',name='den1'))
print (model.summary())
#compile model
model.compile(loss=m_loss,
optimizer=m_opt,
metrics=[m_met])
print ('compiled')
#set callbacks
from keras import backend as K
K.set_learning_phase(0) #set learning phase
tb = keras.callbacks.TensorBoard(log_dir=sample_root_path+'logs', histogram_freq=0,
write_graph=True, write_images=False)
tb.set_model(model)
reduce_lr = keras.callbacks.ReduceLROnPlateau(monitor='loss', factor=0.2,verbose=1,
patience=2, min_lr=0.000001)
reduce_lr.set_model(model)
ear_stop = keras.callbacks.EarlyStopping(monitor='loss', min_delta=0, patience=4, verbose=1, mode='auto')
ear_stop.set_model(model)
#fit
history = model.fit(x_learn, y_learn,
batch_size=batch_size,
callbacks=[reduce_lr,tb, ear_stop],
verbose=1,
validation_split=0.1,
shuffle = True,
epochs=epochs)
score = model.evaluate(x_learn, y_learn, batch_size=batch_size)
print(str(model.metrics_names) + ": " + str(score))
As usual, thanks for any and all help.
added output...
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv1a (Conv3D) (None, 19, 160, 214, 32) 2624
_________________________________________________________________
conv1b (Conv3D) (None, 19, 160, 214, 32) 27680
_________________________________________________________________
pool1 (MaxPooling3D) (None, 10, 78, 105, 32) 0
_________________________________________________________________
conv2a (Conv3D) (None, 10, 78, 105, 128) 110720
_________________________________________________________________
conv2b (Conv3D) (None, 10, 78, 105, 128) 442496
_________________________________________________________________
pool2 (MaxPooling3D) (None, 5, 37, 51, 128) 0
_________________________________________________________________
conv3a (Conv3D) (None, 5, 37, 51, 256) 884992
_________________________________________________________________
conv3b (Conv3D) (None, 5, 37, 51, 256) 1769728
_________________________________________________________________
pool3 (MaxPooling3D) (None, 3, 17, 24, 256) 0
_________________________________________________________________
conv4a (Conv3D) (None, 3, 17, 24, 512) 3539456
_________________________________________________________________
conv4b (Conv3D) (None, 3, 17, 24, 512) 7078400
_________________________________________________________________
pool4 (MaxPooling3D) (None, 2, 7, 10, 512) 0
_________________________________________________________________
flatten (Flatten) (None, 71680) 0
_________________________________________________________________
den0 (Dense) (None, 512) 36700672
_________________________________________________________________
den1 (Dense) (None, 1) 513
=================================================================
Total params: 50,557,281
Trainable params: 36,701,185
Non-trainable params: 13,856,096
_________________________________________________________________
None
compiled
Train on 50 samples, validate on 6 samples
Epoch 1/50
50/50 [==============================] - 20s - loss: 0.5000 - acc: 0.5000 - val_loss: 0.5000 - val_acc: 0.5000
Epoch 2/50
50/50 [==============================] - 16s - loss: 0.5000 - acc: 0.5000 - val_loss: 0.5000 - val_acc: 0.5000
Epoch 3/50
50/50 [==============================] - 16s - loss: 0.5000 - acc: 0.5000 - val_loss: 0.5000 - val_acc: 0.5000
Epoch 4/50
45/50 [==========================>...] - ETA: 1s - loss: 0.5111 - acc: 0.4889
Epoch 00003: reducing learning rate to 0.00020000000949949026.
50/50 [==============================] - 16s - loss: 0.5000 - acc: 0.5000 - val_loss: 0.5000 - val_acc: 0.5000
Epoch 5/50
50/50 [==============================] - 16s - loss: 0.5000 - acc: 0.5000 - val_loss: 0.5000 - val_acc: 0.5000
Epoch 6/50
45/50 [==========================>...] - ETA: 1s - loss: 0.5111 - acc: 0.4889
Epoch 00005: reducing learning rate to 4.0000001899898055e-05.
50/50 [==============================] - 16s - loss: 0.5000 - acc: 0.5000 - val_loss: 0.5000 - val_acc: 0.5000
Epoch 7/50
50/50 [==============================] - 16s - loss: 0.5000 - acc: 0.5000 - val_loss: 0.5000 - val_acc: 0.5000
Epoch 8/50
45/50 [==========================>...] - ETA: 1s - loss: 0.4889 - acc: 0.5111
Epoch 00007: reducing learning rate to 8.000000525498762e-06.
50/50 [==============================] - 16s - loss: 0.5000 - acc: 0.5000 - val_loss: 0.5000 - val_acc: 0.5000
Epoch 9/50
50/50 [==============================] - 16s - loss: 0.5000 - acc: 0.5000 - val_loss: 0.5000 - val_acc: 0.5000
Epoch 00008: early stopping
56/56 [==============================] - 12s
['loss', 'acc']: [0.50000001516725334, 0.5000000127724239]
Your layers are set to trainable=False(apart from the last dense layer). Therefore your CNN cannot learn. In addition you won´t be able to train just on a single sample.
If you run into performance issues on your GPU switch to CPU or AWS or reduce your image size.
I am new to the machine learning and TensorFlow. I am trying to train a simple model to recognize gender. I use small data-set of height, weight, and shoe size. However, I have encountered a problem with evaluating model's accuracy.
Here's the entire code:
import tflearn
import tensorflow as tf
import numpy as np
# [height, weight, shoe_size]
X = [[181, 80, 44], [177, 70, 43], [160, 60, 38], [154, 54, 37], [166, 65, 40],
[190, 90, 47], [175, 64, 39], [177, 70, 40], [159, 55, 37], [171, 75, 42],
[181, 85, 43], [170, 52, 39]]
# 0 - for female, 1 - for male
Y = [1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0]
data = np.column_stack((X, Y))
np.random.shuffle(data)
# Split into train and test set
X_train, Y_train = data[:8, :3], data[:8, 3:]
X_test, Y_test = data[8:, :3], data[8:, 3:]
# Build neural network
net = tflearn.input_data(shape=[None, 3])
net = tflearn.fully_connected(net, 32)
net = tflearn.fully_connected(net, 32)
net = tflearn.fully_connected(net, 1, activation='linear')
net = tflearn.regression(net, loss='mean_square')
# fix for tflearn with TensorFlow 12:
col = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES)
for x in col:
tf.add_to_collection(tf.GraphKeys.VARIABLES, x)
# Define model
model = tflearn.DNN(net)
# Start training (apply gradient descent algorithm)
model.fit(X_train, Y_train, n_epoch=100, show_metric=True)
score = model.evaluate(X_test, Y_test)
print('Training test score', score)
test_male = [176, 78, 42]
test_female = [170, 52, 38]
print('Test male: ', model.predict([test_male])[0])
print('Test female:', model.predict([test_female])[0])
Even though model's prediction is not very accurate
Test male: [0.7158362865447998]
Test female: [0.4076206684112549]
The model.evaluate(X_test, Y_test) always returns 1.0. How do I calculate real accuracy on the test data-set using TFLearn?
You want to do binary classification in this case. Your network is set to perform linear regression.
First, transform the labels (gender) to categorical features:
from tflearn.data_utils import to_categorical
Y_train = to_categorical(Y_train, nb_classes=2)
Y_test = to_categorical(Y_test, nb_classes=2)
The output layer of your network needs two output units for the two classes you want to predict. Also the activation needs to be softmax for classification. The tf.learn default loss is cross-entropy and the default metric is accuracy, so this is already correct.
# Build neural network
net = tflearn.input_data(shape=[None, 3])
net = tflearn.fully_connected(net, 32)
net = tflearn.fully_connected(net, 32)
net = tflearn.fully_connected(net, 2, activation='softmax')
net = tflearn.regression(net)
The output will now be a vector with the probability for each gender. For example:
[0.991, 0.009] #female
Bear in mind that you will hopelessly overfit the network with your tiny data set. This means that during training the accuracy will approach 1 while, the accuracy on your test set will be quite poor.