Issue:
I'm trying to predict the future stock price of Google using the LSTM model in Keras. I'm able to train the model successfully and the test prediction also goes well, but the after test/future prediction is bad. It forms a steadily decreasing curve which is not an actual future data.
Some Explanation
I'm training the model with two inputs and expecting a single output from it.
# Feature Scaling
from sklearn.preprocessing import MinMaxScaler
sc = MinMaxScaler(feature_range = (0, 1))
training_set_scaled = sc.fit_transform(training_set)
# Creating a data structure with 60 timesteps and 1 output
X_train = []
y_train = []
for i in range(2, 999):
X_train.append(training_set_scaled[i-2:i, 0])
y_train.append(training_set_scaled[i, 0])
X_train, y_train = np.array(X_train), np.array(y_train)
# Reshaping
X_train = np.reshape(X_train, (X_train.shape[0], X_train.shape[1], 1))
# Part 2 - Building the RNN
# Importing the Keras libraries and packages
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers import Dropout
# Initialising the RNN
regressor = Sequential()
# Adding the first LSTM layer and some Dropout regularisation
regressor.add(LSTM(units = 50, return_sequences = True, input_shape = (X_train.shape[1], 1)))
regressor.add(Dropout(0.2))
# Adding a second LSTM layer and some Dropout regularisation
regressor.add(LSTM(units = 50, return_sequences = True))
regressor.add(Dropout(0.2))
# Adding a third LSTM layer and some Dropout regularisation
regressor.add(LSTM(units = 50, return_sequences = True))
regressor.add(Dropout(0.2))
# Adding a fourth LSTM layer and some Dropout regularisation
regressor.add(LSTM(units = 50))
regressor.add(Dropout(0.2))
# Adding the output layer
regressor.add(Dense(units = 1))
# Compiling the RNN
regressor.compile(optimizer = 'rmsprop', loss = 'mean_squared_error')
# Fitting the RNN to the Training set
regressor.fit(X_train, y_train, epochs = 500, batch_size = 50)
Testing the predicted model
dataset_test = pd.read_csv('/media/vinothubuntu/Ubuntu Storage/Downloads/Test - Test.csv')
real_stock_price = dataset_test.iloc[:, 2:3].values
# Getting the predicted stock price of 2017
dataset_total = pd.concat((dataset_train['data'], dataset_test['data']), axis = 0)
inputs = dataset_total[len(dataset_total) - len(dataset_test) -0:].values
inputs = inputs.reshape(-1,1)
inputs = sc.transform(inputs)
X_test = []
test_var = []
for i in range(0, 28):
X_test.append(inputs[i:i+2, 0])
test_var.append(inputs[i, 0])
X_test_pred = np.array(X_test)
X_test_pred = np.reshape(X_test_pred, (X_test_pred.shape[0], X_test_pred.shape[1], 1))
predicted_stock_price = regressor.predict(X_test_pred)
This part goes very well, the test prediction give a perfect result.
After test/future prediction:
for x in range(0,30):
X_test_length = X_test[len(X_test)-1] # get the last array of X_test list
future=[]
Prev_4 = X_test_length[1:2] # get the last four value of the X_test_length
Last_pred = predicted_stock_price.flat[-1] # get the last value from prediction
merger = np.append(Prev_4,Last_pred)
X_test.append(merger) #append the new array to X_test
future.append(merger) #append the new array to future array
one_time_pred=np.array(future)
one_time_pred = np.reshape(one_time_pred, (one_time_pred.shape[0], one_time_pred.shape[1], 1))
future_prediction = regressor.predict(one_time_pred) #predict future - gives one new prediction
predicted_stock_price = np.append(predicted_stock_price, future_prediction, axis=0) #put the new predicction on predicted_stock_price array
Here comes the actual problem, I'm getting the last value from the test prediction and predicting a single output and creating a loop on the new precited value. [Please suggest me a better way, if you feel this is a bad idea]
My output:
Expected Result: Actual future data, which is definitely not a decreasing curve.
Related
I've trained an LSTM model with 8 features and 1 output. I have one dataset and split it into two separate files to train and predict with the first half of the set, and then attempt to predict the second half of the set using the trained model from the first part of my dataset. My model predicts the trained and testing sets from the dataset I used to train the model pretty well (RMSE of around 5-7), however when I attempt to predict using the second half of the set I get very poor predictions (RMSE of around 50-60). How can I get my trained model to predict outside datasets well?
dataset at this link
file = r'/content/drive/MyDrive/only_force_pt1.csv'
df = pd.read_csv(file)
df.head()
X = df.iloc[:, 1:9]
y = df.iloc[:,9]
print(X.shape)
print(y.shape)
plt.figure(figsize = (20, 6), dpi = 100)
plt.plot(y)
WINDOW_LEN = 50
def window_size(size, inputdata, targetdata):
X = []
y = []
i=0
while(i + size) <= len(inputdata)-1:
X.append(inputdata[i: i+size])
y.append(targetdata[i+size])
i+=1
assert len(X)==len(y)
return (X,y)
X_series, y_series = window_size(WINDOW_LEN, X, y)
print(len(X))
print(len(X_series))
print(len(y_series))
X_train, X_val, y_train, y_val = train_test_split(np.array(X_series),np.array(y_series),test_size=0.3, shuffle = True)
X_val, X_test,y_val, y_test = train_test_split(np.array(X_val),np.array(y_val),test_size=0.3, shuffle = False)
n_timesteps, n_features, n_outputs = X_train.shape[1], X_train.shape[2],1
[verbose, epochs, batch_size] = [1, 300, 32]
input_shape = (n_timesteps, n_features)
model = Sequential()
# LSTM
model.add(LSTM(64, input_shape=input_shape, return_sequences = False))
model.add(Dropout(0.2))
model.add(Dense(64, activation='relu', kernel_regularizer=keras.regularizers.l2(0.001)))
#model.add(Dropout(0.2))
model.add(Dense(32, activation='relu', kernel_regularizer=keras.regularizers.l2(0.001)))
model.add(Dense(1, activation='relu'))
earlystopper = EarlyStopping(monitor='val_loss', min_delta=0, patience = 30, verbose =1, mode = 'auto')
model.summary()
model.compile(loss = 'mse', optimizer = Adam(learning_rate = 0.001), metrics=[tf.keras.metrics.RootMeanSquaredError()])
history = model.fit(X_train, y_train, batch_size = batch_size, epochs = epochs, verbose = verbose, validation_data=(X_val,y_val), callbacks = [earlystopper])
Second dataset:
tests = r'/content/drive/MyDrive/only_force_pt2.csv'
df_testing = pd.read_csv(tests)
X_testing = df_testing.iloc[:4038,1:9]
torque = df_testing.iloc[:4038,9]
print(X_testing.shape)
print(torque.shape)
plt.figure(figsize = (20, 6), dpi = 100)
plt.plot(torque)
X_testing = X_testing.to_numpy()
X_testing_series, y_testing_series = window_size(WINDOW_LEN, X_testing, torque)
X_testing_series = np.array(X_testing_series)
y_testing_series = np.array(y_testing_series)
scores = model.evaluate(X_testing_series, y_testing_series, verbose =1)
X_prediction = model.predict(X_testing_series, batch_size = 32)
If your model is working fine on training data but performs bad on validation data, then your model did not learn the "true" connection between input and output variables but simply memorized the corresponding output to your input. To tackle this you can do multiple things:
Typically you would use 80% of your data to train and 20% to test, this will present more data to the model, which should make it learn more of the true underlying function
If your model is too complex, it will have neurons which will just be used to memorize input-output data pairs. Try to reduce the complexity of your model (layers, neurons) to make it more simple, so that the remaining layers can really learn instead of memorize
Look into more detail on training performance here
I have created a simple pytorch classification model with sample datasets generated using sklearns make_classification. Even after training for thousands of epochs the accuracy of the model hovers between 30 and 40 percentage. During training itself the loss value is fluctuating very far and wide. I am wondering why this model is not learning, whether it's due to some logical error in the code.
import torch
from torch.utils.data import Dataset, DataLoader
import torch.nn as nn
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
X,y = make_classification(n_features=15,n_classes=5,n_informative=4)
DEVICE = torch.device('cuda')
epochs = 5000
class CustomDataset(Dataset):
def __init__(self,X,y):
self.X = torch.from_numpy(X)
self.y = torch.from_numpy(y)
def __len__(self):
return len(self.X)
def __getitem__(self, index):
X = self.X[index]
y = self.y[index]
return (X,y)
class Model(nn.Module):
def __init__(self):
super().__init__()
self.l1 = nn.Linear(15,10)
self.l2 = nn.Linear(10,5)
self.relu = nn.ReLU()
def forward(self,x):
x = self.l1(x)
x = self.relu(x)
x = self.l2(x)
x = self.relu(x)
return x
model = Model().double().to(DEVICE)
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
loss_function = nn.CrossEntropyLoss()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
train_data = CustomDataset(X_train,y_train)
test_data = CustomDataset(X_test,y_test)
trainloader = DataLoader(train_data, batch_size=32, shuffle=True)
testloader = DataLoader(test_data, batch_size=32, shuffle=True)
for i in range(epochs):
for (x,y) in trainloader:
x = x.to(DEVICE)
y = y.to(DEVICE)
optimizer.zero_grad()
output = model(x)
loss = loss_function(output,y)
loss.backward()
optimizer.step()
if i%200==0:
print("epoch: ",i," Loss: ",loss.item())
correct = 0
total = 0
# since we're not training, we don't need to calculate the gradients for our outputs
with torch.no_grad():
for x, y in testloader:
# calculate outputs by running x through the network
outputs = model(x.to(DEVICE)).to(DEVICE)
# the class with the highest energy is what we choose as prediction
_, predicted = torch.max(outputs.data, 1)
total += y.size(0)
correct += (predicted == y.to(DEVICE)).sum().item()
print(f'Accuracy of the network on the test data: {100 * correct // total} %')
EDIT
I tried to over-fit my model with only 10 samples (batch_size=5) X,y = make_classification(n_samples=10,n_features=15,n_classes=5,n_informative=4) but now the accuracy decreased to 15-20%. I then normalize the input data between the values 0 and 1 which pushed the accuracy a bit higher but not over 50 percentage. Any idea why this might be happening?
You should not be using ReLU activation on your output layer. Usually softmax activation is used for multi class classification on the final layer, or the logits are fed to the loss function directly without explicitly adding a softmax activation layer.
Try removing the ReLU activation from the final layer.
Below is the code of what I'm trying to do, but my accuracy is always under 50% so I'm wondering how should I fix this? What I'm trying to do is use the first 1885 daily unit sale data as input and the rest of the daily unit sale data from 1885 as output. After train these data, I need to use it to predict 20 more daily unit sale in the future
The data I used here is provided in this link
https://drive.google.com/file/d/13qzIZMD6Wz7e1GpOsNw1_9Yq-4PI2HrC/view?usp=sharing
import pandas as pd
import numpy as np
import keras
import keras.backend as k
import tensorflow as tf
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.callbacks import EarlyStopping
from sklearn import preprocessing
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
data = pd.read_csv('sales_train.csv')
#Since there are 3 departments and 10 store from 3 different areas, thus I categorized the data into 30 groups and numerize them
Unique_dept = data["dept_id"].unique()
Unique_state = data['state_id'].unique()
Unique_store = data["store_id"].unique()
data0 = data.copy()
for i in range(3):
data0["dept_id"] = data0["dept_id"].replace(to_replace=Unique_dept[i], value = i)
data0["state_id"] = data0["state_id"].replace(to_replace=Unique_state[i], value = i)
for j in range(10):
data0["store_id"] = data0["store_id"].replace(to_replace=Unique_store[j], value = int(Unique_store[j][3]) -1)
# Select the three numerized categorical variables and daily unit sale data
pt = 6 + 1885
X = pd.concat([data0.iloc[:,2],data0.iloc[:, 4:pt]], axis = 1)
Y = data0.iloc[:, pt:]
# Remove the daily unit sale data that are highly correlated to each other (corr > 0.9)
correlation = X.corr(method = 'pearson')
corr_lst = []
for i in correlation:
for j in correlation:
if (i != j) & (correlation[i][j] >= 0.9) & (j not in corr_lst) & (i not in corr_lst):
corr_lst.append(i)
x = X.drop(corr_lst, axis = 1)
x_value = x.values
y_value = Y.values
sc = StandardScaler()
X_scale = sc.fit_transform(x_value)
X_train, X_val_and_test, Y_train, Y_val_and_test = train_test_split(x_value, y_value, test_size=0.2)
X_val, X_test, Y_val, Y_test = train_test_split(X_val_and_test, Y_val_and_test, test_size=0.5)
print(X_train.shape, X_val.shape, X_test.shape, Y_train.shape, Y_val.shape, Y_test.shape)
#create model
model = Sequential()
#get number of columns in training data
n_cols = X_train.shape[1]
#add model layers
model.add(Dense(32, activation='softmax', input_shape=(n_cols,)))
model.add(Dense(32, activation='relu'))
model.add(Dense(32, activation='softmax'))
model.add(Dense(1))
#compile model using rmsse as a measure of model performance
model.compile(optimizer='Adagrad', loss= "mean_absolute_error", metrics = ['accuracy'])
#set early stopping monitor so the model stops training when it won't improve anymore early_stopping_monitor = EarlyStopping(patience=3)
early_stopping_monitor = EarlyStopping(patience=20)
#train model
model.fit(X_train, Y_train,batch_size=32, epochs=10, validation_data=(X_val, Y_val))
Here is what I got
The plots are also pretty strange:
Accuracy
Loss
Two mistakes:
Accuracy is meaningless in regression settings, such as yours here (it is meaningful only for classification ones); see What function defines accuracy in Keras when the loss is mean squared error (MSE)? (the argument is identical when MAE loss is used, like here). Your performance measure here is the same with your loss (i.e. MAE).
We never use softmax activations in anything but the final layer of a classification model; replace both softmax activation functions used in your model with relu (keep the last layer as is, as no activation means linear, which is indeed the correct one for regression).
I am using the keras model with the following layers to predict a label of input (out of 4 labels)
embedding_layer = keras.layers.Embedding(MAX_NB_WORDS,
EMBEDDING_DIM,
weights=[embedding_matrix],
input_length=MAX_SEQUENCE_LENGTH,
trainable=False)
sequence_input = keras.layers.Input(shape = (MAX_SEQUENCE_LENGTH,),
dtype = 'int32')
embedded_sequences = embedding_layer(sequence_input)
hidden_layer = keras.layers.Dense(50, activation='relu')(embedded_sequences)
flat = keras.layers.Flatten()(hidden_layer)
preds = keras.layers.Dense(4, activation='softmax')(flat)
model = keras.models.Model(sequence_input, preds)
model.compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['acc'])
model.fit(X_train, Y_train, batch_size=32, epochs=100)
However, the softmax function returns a number of outputs of 4 (because I have 4 labels)
When I'm using the predict function to get the predicted Y using the same model, I am getting an array of 4 for each X rather than one single label deciding the label for the input.
model.predict(X_test, batch_size = None, verbose = 0, steps = None)
How do I make the output layer of the keras model, or the model.predict function, decide on one single label, rather than output weights for each label?
The following is a common function to sample from a probability vector
def sample(preds, temperature=1.0):
# helper function to sample an index from a probability array
preds = np.asarray(preds).astype('float64')
preds = np.log(preds) / temperature
exp_preds = np.exp(preds)
preds = exp_preds / np.sum(exp_preds)
probas = np.random.multinomial(1, preds, 1)
return np.argmax(probas)
Taken from here.
The temperature parameter decides how much the differences between the probability weights are weightd. A temperature of 1 is considering each weight "as it is", a temperature larger than 1 reduces the differences between the weights, a temperature smaller than 1 increases them.
Here an example using a probability vector on 3 labels:
p = np.array([0.1, 0.7, 0.2]) # The first label has a probability of 10% of being chosen, the second 70%, the third 20%
print(sample(p, 1)) # sample using the input probabilities, unchanged
print(sample(p, 0.1)) # the new vector of probabilities from which to sample is [ 3.54012033e-09, 9.99996371e-01, 3.62508322e-06]
print(sample(p, 10)) # the new vector of probabilities from which to sample is [ 0.30426696, 0.36962778, 0.32610526]
To see the new vector make sample return preds.
What is an example of how to use a TensorFlow TFRecord with a Keras Model and tf.session.run() while keeping the dataset in tensors w/ queue runners?
Below is a snippet that works but it needs the following improvements:
Use the Model API
specify an Input()
Load a dataset from a TFRecord
Run through a dataset in parallel (such as with a queuerunner)
Here is the snippet, there are several TODO lines indicating what is needed:
from keras.models import Model
import tensorflow as tf
from keras import backend as K
from keras.layers import Dense, Input
from keras.objectives import categorical_crossentropy
from tensorflow.examples.tutorials.mnist import input_data
sess = tf.Session()
K.set_session(sess)
# Can this be done more efficiently than placeholders w/ TFRecords?
img = tf.placeholder(tf.float32, shape=(None, 784))
labels = tf.placeholder(tf.float32, shape=(None, 10))
# TODO: Use Input()
x = Dense(128, activation='relu')(img)
x = Dense(128, activation='relu')(x)
preds = Dense(10, activation='softmax')(x)
# TODO: Construct model = Model(input=inputs, output=preds)
loss = tf.reduce_mean(categorical_crossentropy(labels, preds))
# TODO: handle TFRecord data, is it the same?
mnist_data = input_data.read_data_sets('MNIST_data', one_hot=True)
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(loss)
sess.run(tf.global_variables_initializer())
# TODO remove default, add queuerunner
with sess.as_default():
for i in range(1000):
batch = mnist_data.train.next_batch(50)
train_step.run(feed_dict={img: batch[0],
labels: batch[1]})
print(loss.eval(feed_dict={img: mnist_data.test.images,
labels: mnist_data.test.labels}))
Why is this question relevant?
For high performance training without going back to python
no TFRecord to numpy to tensor conversions
Keras will soon be part of tensorflow
Demonstrate how Keras Model() classes can accept tensors for input data correctly.
Here is some starter information for a semantic segmentation problem example:
example unet Keras model unet.py, happens to be for semantic segmentation.
Keras + Tensorflow Blog Post
An attempt at running the unet model a tf session with TFRecords and a Keras model (not working)
Code to create the TFRecords: tf_records.py
An attempt at running the unet model a tf session with TFRecords and a Keras model is in densenet_fcn.py (not working)
I don't use tfrecord dataset format so won't argue on the pros and cons, but I got interested in extending Keras to support the same.
github.com/indraforyou/keras_tfrecord is the repository. Will briefly explain the main changes.
Dataset creation and loading
data_to_tfrecord and read_and_decode here takes care of creating tfrecord dataset and loading the same. Special care must be to implement the read_and_decode otherwise you will face cryptic errors during training.
Initialization and Keras model
Now both tf.train.shuffle_batch and Keras Input layer returns tensor. But the one returned by tf.train.shuffle_batch don't have metadata needed by Keras internally. As it turns out, any tensor can be easily turned into a tensor with keras metadata by calling Input layer with tensor param.
So this takes care of initialization:
x_train_, y_train_ = ktfr.read_and_decode('train.mnist.tfrecord', one_hot=True, n_class=nb_classes, is_train=True)
x_train_batch, y_train_batch = K.tf.train.shuffle_batch([x_train_, y_train_],
batch_size=batch_size,
capacity=2000,
min_after_dequeue=1000,
num_threads=32) # set the number of threads here
x_train_inp = Input(tensor=x_train_batch)
Now with x_train_inp any keras model can be developed.
Training (simple)
Lets say train_out is the output tensor of your keras model. You can easily write a custom training loop on the lines of:
loss = tf.reduce_mean(categorical_crossentropy(y_train_batch, train_out))
train_op = tf.train.GradientDescentOptimizer(0.01).minimize(loss)
# sess.run(tf.global_variables_initializer())
sess.run(tf.initialize_all_variables())
with sess.as_default():
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)
try:
step = 0
while not coord.should_stop():
start_time = time.time()
_, loss_value = sess.run([train_op, loss], feed_dict={K.learning_phase(): 0})
duration = time.time() - start_time
if step % 100 == 0:
print('Step %d: loss = %.2f (%.3f sec)' % (step, loss_value,
duration))
step += 1
except tf.errors.OutOfRangeError:
print('Done training for %d epochs, %d steps.' % (FLAGS.num_epochs, step))
finally:
coord.request_stop()
coord.join(threads)
sess.close()
Training (keras style)
One of the features of keras that makes it so lucrative is its generalized training mechanism with the callback functions.
But to support tfrecords type training there are several changes that are need in the fit function
running the queue threads
no feeding in batch data through feed_dict
supporting validation becomes tricky as the validation data will also be coming in through another tensor an different model needs to be internally created with shared upper layers and validation tensor fed in by other tfrecord reader.
But all this can be easily supported by another flag parameter. What makes things messing are the keras features sample_weight and class_weight they are used to weigh each sample and weigh each class. For this in compile() keras creates placeholders (here) and placeholders are also implicitly created for the targets (here) which is not needed in our case the labels are already fed in by tfrecord readers. These placeholders needs to be fed in during session run which is unnecessary in our cae.
So taking into account these changes, compile_tfrecord(here) and fit_tfrecord(here) are the extension of compile and fit and shares say 95% of the code.
They can be used in the following way:
import keras_tfrecord as ktfr
train_model = Model(input=x_train_inp, output=train_out)
ktfr.compile_tfrecord(train_model, optimizer='rmsprop', loss='categorical_crossentropy', out_tensor_lst=[y_train_batch], metrics=['accuracy'])
train_model.summary()
ktfr.fit_tfrecord(train_model, X_train.shape[0], batch_size, nb_epoch=3)
train_model.save_weights('saved_wt.h5')
You are welcome to improve on the code and pull requests.
Update 2018-08-29 this is now directly supported in keras, see the following example:
https://github.com/keras-team/keras/blob/master/examples/mnist_tfrecord.py
Original Answer:
TFRecords are supported by using an external loss. Here are the key lines constructing an external loss:
# tf yield ops that supply dataset images and labels
x_train_batch, y_train_batch = read_and_decode_recordinput(...)
# create a basic cnn
x_train_input = Input(tensor=x_train_batch)
x_train_out = cnn_layers(x_train_input)
model = Model(inputs=x_train_input, outputs=x_train_out)
loss = keras.losses.categorical_crossentropy(y_train_batch, x_train_out)
model.add_loss(loss)
model.compile(optimizer='rmsprop', loss=None)
Here is an example for Keras 2. It works after applying the small patch #7060:
'''MNIST dataset with TensorFlow TFRecords.
Gets to 99.25% test accuracy after 12 epochs
(there is still a lot of margin for parameter tuning).
'''
import os
import copy
import time
import numpy as np
import tensorflow as tf
from tensorflow.python.ops import data_flow_ops
from keras import backend as K
from keras.models import Model
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import Flatten
from keras.layers import Input
from keras.layers import Conv2D
from keras.layers import MaxPooling2D
from keras.callbacks import EarlyStopping
from keras.callbacks import TensorBoard
from keras.objectives import categorical_crossentropy
from keras.utils import np_utils
from keras.utils.generic_utils import Progbar
from keras import callbacks as cbks
from keras import optimizers, objectives
from keras import metrics as metrics_module
from keras.datasets import mnist
if K.backend() != 'tensorflow':
raise RuntimeError('This example can only run with the '
'TensorFlow backend for the time being, '
'because it requires TFRecords, which '
'are not supported on other platforms.')
def images_to_tfrecord(images, labels, filename):
def _int64_feature(value):
return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))
def _bytes_feature(value):
return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))
""" Save data into TFRecord """
if not os.path.isfile(filename):
num_examples = images.shape[0]
rows = images.shape[1]
cols = images.shape[2]
depth = images.shape[3]
print('Writing', filename)
writer = tf.python_io.TFRecordWriter(filename)
for index in range(num_examples):
image_raw = images[index].tostring()
example = tf.train.Example(features=tf.train.Features(feature={
'height': _int64_feature(rows),
'width': _int64_feature(cols),
'depth': _int64_feature(depth),
'label': _int64_feature(int(labels[index])),
'image_raw': _bytes_feature(image_raw)}))
writer.write(example.SerializeToString())
writer.close()
else:
print('tfrecord %s already exists' % filename)
def read_and_decode_recordinput(tf_glob, one_hot=True, classes=None, is_train=None,
batch_shape=[1000, 28, 28, 1], parallelism=1):
""" Return tensor to read from TFRecord """
print 'Creating graph for loading %s TFRecords...' % tf_glob
with tf.variable_scope("TFRecords"):
record_input = data_flow_ops.RecordInput(
tf_glob, batch_size=batch_shape[0], parallelism=parallelism)
records_op = record_input.get_yield_op()
records_op = tf.split(records_op, batch_shape[0], 0)
records_op = [tf.reshape(record, []) for record in records_op]
progbar = Progbar(len(records_op))
images = []
labels = []
for i, serialized_example in enumerate(records_op):
progbar.update(i)
with tf.variable_scope("parse_images", reuse=True):
features = tf.parse_single_example(
serialized_example,
features={
'label': tf.FixedLenFeature([], tf.int64),
'image_raw': tf.FixedLenFeature([], tf.string),
})
img = tf.decode_raw(features['image_raw'], tf.uint8)
img.set_shape(batch_shape[1] * batch_shape[2])
img = tf.reshape(img, [1] + batch_shape[1:])
img = tf.cast(img, tf.float32) * (1. / 255) - 0.5
label = tf.cast(features['label'], tf.int32)
if one_hot and classes:
label = tf.one_hot(label, classes)
images.append(img)
labels.append(label)
images = tf.parallel_stack(images, 0)
labels = tf.parallel_stack(labels, 0)
images = tf.cast(images, tf.float32)
images = tf.reshape(images, shape=batch_shape)
# StagingArea will store tensors
# across multiple steps to
# speed up execution
images_shape = images.get_shape()
labels_shape = labels.get_shape()
copy_stage = data_flow_ops.StagingArea(
[tf.float32, tf.float32],
shapes=[images_shape, labels_shape])
copy_stage_op = copy_stage.put(
[images, labels])
staged_images, staged_labels = copy_stage.get()
return images, labels
def save_mnist_as_tfrecord():
(X_train, y_train), (X_test, y_test) = mnist.load_data()
X_train = X_train[..., np.newaxis]
X_test = X_test[..., np.newaxis]
images_to_tfrecord(images=X_train, labels=y_train, filename='train.mnist.tfrecord')
images_to_tfrecord(images=X_test, labels=y_test, filename='test.mnist.tfrecord')
def cnn_layers(x_train_input):
x = Conv2D(32, (3, 3), activation='relu', padding='valid')(x_train_input)
x = Conv2D(64, (3, 3), activation='relu')(x)
x = MaxPooling2D(pool_size=(2, 2))(x)
x = Dropout(0.25)(x)
x = Flatten()(x)
x = Dense(128, activation='relu')(x)
x = Dropout(0.5)(x)
x_train_out = Dense(classes,
activation='softmax',
name='x_train_out')(x)
return x_train_out
sess = tf.Session()
K.set_session(sess)
save_mnist_as_tfrecord()
batch_size = 100
batch_shape = [batch_size, 28, 28, 1]
epochs = 3000
classes = 10
parallelism = 10
x_train_batch, y_train_batch = read_and_decode_recordinput(
'train.mnist.tfrecord',
one_hot=True,
classes=classes,
is_train=True,
batch_shape=batch_shape,
parallelism=parallelism)
x_test_batch, y_test_batch = read_and_decode_recordinput(
'test.mnist.tfrecord',
one_hot=True,
classes=classes,
is_train=True,
batch_shape=batch_shape,
parallelism=parallelism)
x_batch_shape = x_train_batch.get_shape().as_list()
y_batch_shape = y_train_batch.get_shape().as_list()
x_train_input = Input(tensor=x_train_batch, batch_shape=x_batch_shape)
x_train_out = cnn_layers(x_train_input)
y_train_in_out = Input(tensor=y_train_batch, batch_shape=y_batch_shape, name='y_labels')
cce = categorical_crossentropy(y_train_batch, x_train_out)
train_model = Model(inputs=[x_train_input], outputs=[x_train_out])
train_model.add_loss(cce)
train_model.compile(optimizer='rmsprop',
loss=None,
metrics=['accuracy'])
train_model.summary()
tensorboard = TensorBoard()
# tensorboard disabled due to Keras bug
train_model.fit(batch_size=batch_size,
epochs=epochs) # callbacks=[tensorboard])
train_model.save_weights('saved_wt.h5')
K.clear_session()
# Second Session, pure Keras
(X_train, y_train), (X_test, y_test) = mnist.load_data()
X_train = X_train[..., np.newaxis]
X_test = X_test[..., np.newaxis]
x_test_inp = Input(batch_shape=(None,) + (X_test.shape[1:]))
test_out = cnn_layers(x_test_inp)
test_model = Model(inputs=x_test_inp, outputs=test_out)
test_model.load_weights('saved_wt.h5')
test_model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])
test_model.summary()
loss, acc = test_model.evaluate(X_test, np_utils.to_categorical(y_test), classes)
print('\nTest accuracy: {0}'.format(acc))
I've also been working to improve the support for TFRecords in the following issue and pull request:
#6928 Yield Op support: High Performance Large Datasets via TFRecords, and RecordInput
#7102 Keras Input Tensor API Design Proposal
Finally, it is possible to use tf.contrib.learn.Experiment to train Keras models in TensorFlow.