I am trying to visualize loss and custom metrics in TensorBoard during training time using:
visualizer = TensorBoard(log_dir=os.path.join(experiment_path, 'logs'), histogram_freq=1, write_graph=True,write_images=True,write_grads=False)
model.compile(loss='mean_absolute_error',
optimizer='adam',
metrics=[pearson_correlation,rmse,mean_true,mean_pred,kl_loss])
model.fit_generator(training_data_generator,
....
callbacks=[lr_reducer, csv_logger, checkpoint, visualizer])
when I go start TensorBoard with: tensorboard --logdir=...
TensorBoard shows no scalars or images. Only the Graph is shown. I can see the metrics in the console as outputs. Is there a update frequency one should adjust? Or does Keras only show the metrics on the validation set?
Related
I was using GridSearhCV to get the best parameter for the number of neurons in my neural network model.
I am trying to see why GridSearchCV produces different results (mean_test_score) each time even though I tried different methods to address this issue.
I think I am using CPU (Not GPU), judging from the result of timestamps.
I have set shuffle=False in KFold
cv=KFold(n_splits=3, shuffle=False)
The neural network model(KerasRegressor) I am using is in a separate cell which was already run before I run GridsearchCV, and when I just ran the GridSearchCV code block, still it gives different result each run (without rerunning the KerasRegressor code block). So any source of randomness in KerasRegressor should not affect the randomness after that (and also I am using tf.random.set_seed(seed) before defining the model)
I tried including the following code block to set the global seed
import numpy as np
import tensorflow as tf
from numpy.random import seed
seed(1)
tf.random.set_seed(seed)
I included random_state as a key in param_grid dictionary but still I get different results.
param_grid = {
'hidden_size1': [64],
'hidden_size2': [64],
'hidden_size3': [64],
'random_state': [2]
}
The code I am using is
def make_model(optimizer="adam", hidden_size1=32, hidden_size2=32, hidden_size3=32, random_state=2):
model = Sequential()
model.add(Dense(hidden_size1, activation=activation,input_shape=(X_train.shape[1],)))
model.add(Dense(hidden_size2, activation=activation))
model.add(Dense(hidden_size3, activation=activation))
model.add(Dense(y_train.shape[1], activation='linear'))
model.compile(loss='mse',optimizer=tf.keras.optimizers.Adam(learning_rate=learning_rate), metrics=['accuracy'])
return model
# fix random seed for reproducibility
seed = 7
tf.random.set_seed(seed)
clf = KerasRegressor(make_model,verbose=1)
param_grid = {
'hidden_size1': [64],
'hidden_size2': [64],
'hidden_size3': [64],
'random_state': [2]
}
and on the separate code block, I have
cv=KFold(n_splits=3, shuffle=False)
grid = GridSearchCV(clf, param_grid=param_grid, cv=cv, return_train_score=True)
grid.fit(X_train, y_train)
and I am keep running this cell but everytime I get different result.
My question is if there is another source of randomness in GridSearchCV even when I disable the shuffle in KFold (Setting random state=integer is contradictory with Shuffle=False, so I just disabled Shuffle).
Like the commentor said, you are running GridSearch on a neural network with different weights every time, because they are randomly initiliazed. The solution to this would be to fill in initialization weights yourself.
Assume that I have 3 dataset in a ML problem.
train dataset: used to estimate ML model parameters (training)
test dataset: used to evaulate trained model, calculate accuracy of trained model
prediction dataset: used only for prediction after model deployment
I don't have evaluation dataset, and I use Grid Search with k-fold cross validation to find the best model.
Also, I have two python scripts as follows:
train.py: used to train and test ML model, load train and test dataset, save the trained model, best model is found by Grid Search.
predict.py: used to load pre-trained model & load prediction dataset, predict model output and calculate accuracy.
Before starting training process in train.py, I use MinMaxScaler as follows:
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
scaler.fit(x_train) # fit only on train dataset
x_train_norm = scaler.transform(x_train)
x_test_norm = scaler.transform(x_test)
In predict.py, after loding prediction dataset, I need to use the same data pre-processing as below:
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
scaler.fit(x_predict)
x_predict_norm = scaler.transform(x_predict)
As you can see above, both fit and transform are done on prediction dataset. However, in train.py, fit is done on train dataset, and the same MinMaxScaler is applied to transform test dataset.
My understanding is that test dataset is a simulation of real data that model is supposed to predict after deployment. Therefore, data pre-processing of test and prediction dataset should be the same.
I think separate MinMaxScaler should be used in train.py for train and test dataset as follows:
from sklearn.preprocessing import MinMaxScaler
scaler_train = MinMaxScaler()
scaler_test = MinMaxScaler()
scaler_train.fit(x_train) # fit only on train dataset
x_train_norm = scaler_train.transform(x_train)
scaler_test.fit(x_test) # fit only on test dataset
x_test_norm = scaler_test.transform(x_test)
What is the difference?
Value of x_test_norm will be different if I use separate MinMaxScaler as explained above. In this case, value of x_test_norm is in the range of [-1, 1]. However, If I transform test dataset by a MinMaxScaler which was fit by train dataset, value of x_test_norm can be outside the range of [-1, 1].
Please let me know your idea about it.
When you run .transform() MinMax scaling does something like: (value - min) / (Max - min) The value of min and Max are defined when you run .fit(). So the answer - yes, you should fit MinMaxScaller on the training dataset and then use it on the test dataset.
Just imagine the situation when in the training dataset you have some feature with Max=100 and min=10, while in the test dataset Max=10 and min=1. If you will train separate MinMaxScaller for test subset, yes, it will scale the feature in the range [-1, 1], but in comparison to the training dataset, the called values should be lower.
Also, regarding Grid Search with k-fold cross-validation, you should use the Pipeline. In this case, Grid Search will automatically fit MinMaxScaller on the k-1 folds. Here is a good example of how to organize pipeline with Mixed Types.
I understand what AutoKeras ImageClassifier does (https://autokeras.com/image_classifier/)
clf = ImageClassifier(verbose=True, augment=False)
clf.fit(x_train, y_train, time_limit=12 * 60 * 60)
clf.final_fit(x_train, y_train, x_test, y_test, retrain=True)
y = clf.evaluate(x_test, y_test)
But i am unable to Understand what does AutoModel class (https://autokeras.com/auto_model/) does, or how is it different from ImageClassifier
autokeras.auto_model.AutoModel(
inputs,
outputs,
name="auto_model",
max_trials=100,
directory=None,
objective="val_loss",
tuner="greedy",
seed=None)
Documentation for Arguments Inputs and Outputs Says
inputs: A list of or a HyperNode instance. The input node(s) of the AutoModel.
outputs: A list of or a HyperHead instance. The output head(s) of the AutoModel.
What is HyperNode Instance ?
Similarly, what is GraphAutoModel class ? (https://autokeras.com/graph_auto_model/)
autokeras.auto_model.GraphAutoModel(
inputs,
outputs,
name="graph_auto_model",
max_trials=100,
directory=None,
objective="val_loss",
tuner="greedy",
seed=None)
Documentation Reads
A HyperModel defined by a graph of HyperBlocks. GraphAutoModel is a subclass of HyperModel. Besides the HyperModel properties, it also has a tuner to tune the HyperModel. The user can use it in a similar way to a Keras model since it also has fit() and predict() methods.
What is HyperBlocks ?
If Image Classifier automatically does HyperParameter Tuning, what is the use of GraphAutoModel ?
Links to Any Documents / Resources for better understanding of AutoModel and GraphAutoModel appreciated .
Having worked with autokeras recently, I can share my little knowledge.
Task API
When doing a classical task such as image classification/regression, text classification/regression, ..., you can use the simplest APIs provided by autokeras called Task API: ImageClassifier, ImageRegressor, TextClassifier, TextRegressor, ... In this case you have one input (image or text or tabular data, ...) and one output (classification, regression).
Automodel
However when you are in a situation where you have for example a task that requires multi inputs/outputs architecture, then you cannot use directly Task API, and this is where Automodel comes into play with the I/O API. you can check the example provided in the documentation where you have two inputs (image and structured data) and two outputs (classification and regression)
GraphAutoModel
GraphAutomodel works like keras functional API. It assembles different blocks (Convolutions, LSTM, GRU, ...) and create a model using this block, then it will look for the best hyperparameters given this architecture you provided. Suppose for instance I want to do a binary classification task using time series as input data.
First let's generate a toy dataset :
import numpy as np
import autokeras as ak
x = np.random.randn(100, 7, 3)
y = np.random.choice([0, 1], size=100, p=[0.5, 0.5])
Here x is a time series of 100 samples, each sample is a sequence of length 7 and a features dimension of 3. The corresponding target variable y is binary (0, 1).
Using GraphAutomodel, I can specify the architecture I want, using what is called HyperBlocks. There are many blocks: Conv, RNN, Dense, ... check the full list here.
In my case I want to use RNN blocks to create a model because I have time series data :
input_layer = ak.Input()
rnn_layer = ak.RNNBlock(layer_type="lstm")(input_layer)
dense_layer = ak.DenseBlock()(rnn_layer)
output_layer = ak.ClassificationHead(num_classes=2)(dense_layer)
automodel = ak.GraphAutoModel(input_layer, output_layer, max_trials=2, seed=123)
automodel.fit(x, y, validation_split=0.2, epochs=2, batch_size=32)
(If you are not familiar with the above style of defining model, then you should check the keras functional API documentation).
So in this example I have more flexibility for creating the skeleton of architecture I would like to use : LSTM block followed by a Dense layer, followed by a Classification layer, However I didn't specify any hyperparameter, (number of lstm layers, number of dense layers, size of lstm layers, size of dense layers, activation functions, dropout, batchnorm, ....), Autokeras will do the hyperparameters tuning automatically based on the architecture (skeleton) I provided.
I want to save the results of my experiments in keras not the model. For example, I want to save everything that results from:
''' Plots '''
if plot:
# Plots for training and testing process: loss and accuracy
plt.figure(0)
plt.plot(cnn.history['acc'],'r')
plt.plot(cnn.history['val_acc'],'g')
plt.xticks(np.arange(0, nb_epochs+1, 2.0))
plt.rcParams['figure.figsize'] = (8, 6)
plt.xlabel("Num of Epochs")
plt.ylabel("Accuracy")
plt.title("Training Accuracy vs Validation Accuracy")
plt.legend(['train','validation'])
plt.figure(1)
plt.plot(cnn.history['loss'],'r')
plt.plot(cnn.history['val_loss'],'g')
plt.xticks(np.arange(0, nb_epochs+1, 2.0))
plt.rcParams['figure.figsize'] = (8, 6)
plt.xlabel("Num of Epochs")
plt.ylabel("Loss")
plt.title("Training Loss vs Validation Loss")
plt.legend(['train','validation'])
how do I save all that so I can plot the plots again and inspect what happened during training?
the website:
https://keras.io/getting-started/faq/#how-can-i-save-a-keras-model
doesn't seem to explain it...help?
The pickle module lets you serialize python objects.
You can save the history with:
pkl.dump(cnn.history, file_obj)
If you want to save your plots as an image:
plt.savefig(path)
You can also try to pickle matplotlib Figure/Axes objects to recreate the interactive plots but this feature is experimental. I would suggest just pickling your history dict and then regenerating the plots with your code above.
I am trying to train LSTM, but while training accuracy remains zero in each epoch.
I have transformed data to multivariate Time-series data and also shape in three-dimensional shape.
I also have normalised data using minmaxsaller.
I have tried on a number of the epoch from 5 to 50 and batch size from 25 to 200.
I have tried data samples from 1000000 to 1000 but none is working.
Every time I am getting training accuracy zero only.
Can anyone help me in understanding it or suggest some more experiments.
Following is my network.
from keras.layers.core import Dense,Activation,Dropout
from keras.layers.recurrent import LSTM
from keras.models import Sequential
from keras.layers import Flatten
model = Sequential()
model.add(LSTM(50,return_sequences=True, input_shape=(X_train_values.shape[1], X_train_values.shape[2])))
model.add(Dropout(0.2))
model.add(Flatten())
model.add(Dense(1))
model.add(Activation('linear'))
model.compile(loss='mse',optimizer='rmsprop',metrics=['accuracy'])
history = model.fit(X_train_values, y_train.values,epochs=25, batch_size=30, verbose=2, shuffle=False)
me too, I'm a student from china, when I train LSTM model, the model's accuracy is very close zeros, but predicted answer and test collections is very close.
enter image description here
enter image description here