I'm trying to take a vanilla autoencoder using Keras (with a Tensorflow backend) and stop it when the loss value converges to a specific value. After the last epoch, I want to use a sigmoid function to perform classification. Would you know how to go about doing this (or at least point me in the right direction)?
The below code is quite similar to the vanilla autoencoder at http://wiseodd.github.io/techblog/2016/12/03/autoencoders/. (I'm using my own data, but feel free to use the MNIST example in the link to demonstrate what you are talking about.)
NUM_ROWS = len(x_train)
NUM_COLS = len(x_train[0])
inputs = Input(shape=(NUM_COLS, ))
h = Dense(64, activation='sigmoid')(inputs)
outputs = Dense(NUM_COLS)(h)
# trying to add last sigmoid layer
outputs = Dense(1)
outputs = Activation('sigmoid')
model = Model(input=inputs, output=outputs)
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.fit(x_train, y_train,
batch_size=batch,
epochs=epochs,
validation_data=(x_test, y_test))
I have an interpretation of what you are aiming at, however, you don't seem to have a very clear image yourself.
I guess you can clarify if you prepare the necessary dataset yourself.
One possible solution would be as below:
NUM_ROWS = len(x_train)
NUM_COLS = len(x_train[0])
inputs = Input(shape=(NUM_COLS, ))
h = Dense(64, activation='sigmoid')(inputs)
outputs = Dense(NUM_COLS)(h)
model = Model(input=inputs, output=outputs)
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.fit(x_train, x_train,
batch_size=batch,
epochs=epochs,
validation_data=(x_test, y_test))
h.trainable=False
# trying to add last sigmoid layer
outputs = Dense(1)(h)
outputs = Activation('sigmoid')
model2.fit(x_train, y_train,
batch_size=batch,
epochs=epochs,
validation_data=(x_test, y_test))
Related
I am looking for a way to do classification and regression sequentially?
For example, assuming samples have 3 input values and 1 output value. The model should first classify using the 3 input values and sequentially do the regression task using the classification output (i.e. classification has 3 input values from the original samples and regression has 4 input values (3 from the original samples + the classification output).
Below the architecture that I draw. However, not really sure about the part where the second input layer occurs. Could someone give advice or working examples for this application?
input1_classification = Input(shape=(3,))
hidden1 = Dense(20, activation='relu', kernel_initializer='he_normal'(input1_classification)
# classsfication
outputout_classification = Dense(2, activation='softmax')(hidden1)
# regression input
input1_regression =Input(shape=(5,))
hidden2 = Dense(10, activation='relu', kernel_initializer='he_normal'(out_classification)
out_reg_final = Dense(1)(hidden2)
# define model
model = Model(inputs=input1_classification, outputs=[out_classification, out_reg_final])
# compile the keras modelmodel.compile(loss['sparse_categorical_crossentropy','mse'], optimizer='adam')
# fit the keras model on the dataset
model.fit(X_train, [y_train_class,y_train_reg], epochs=150, batch_size=32, verbose=2)
All you need to do is to concatenate your original input with the output of classification and apply your regression model there, you do not specify "extra" inputs.
So it will become something among the lines of:
input1_classification = Input(shape=(3,))
# classsfication
hidden1 = Dense(20, activation='relu', kernel_initializer='he_normal'(input1_classification)
outputout_classification = Dense(2, activation='softmax')(hidden1)
# regression input
new_input = Concatenate(axis=1)([input1_classification, outputout_classification ])
hidden2 = Dense(10, activation='relu', kernel_initializer='he_normal'(new_input)
out_reg_final = Dense(1)(hidden2)
# define model
model = Model(inputs=input1_classification, outputs=[out_classification, out_reg_final])
# compile the keras modelmodel.compile(loss['sparse_categorical_crossentropy','mse'], optimizer='adam')
# fit the keras model on the dataset
model.fit(X_train, [y_train_class,y_train_reg], epochs=150, batch_size=32, verbose=2)
I am working on fake news detection using CNN, I am new to ccoding CNNs in keras and tensorflow. I need help regarding creating a CNN that takes input as statements in form of vectors each of length 100 and outputs 0 or 1 depending on its predicted value as false or true.
X_train.shape
# 10229, 100
X_train = np.expand_dims(X_train, axis=2)
X_train.shape
# 10229,100,1
# actual cnn model here
import tensorflow as tf
from tensorflow.keras import layers
# Conv1D + global max pooling
from keras.layers import Conv1D, MaxPooling1D, Embedding, Dropout, Flatten, Dense
from keras.layers import Input
text_len=100
from keras.models import Model
inp = Input(batch_shape=(None, text_len, 1))
conv2 = Conv1D(filters=128, kernel_size=5, activation='relu')(inp)
drop21 = Dropout(0.5)(conv2)
conv22 = Conv1D(filters=64, kernel_size=5, activation='relu')(drop21)
drop22 = Dropout(0.5)(conv22)
pool2 = MaxPooling1D(pool_size=2)(drop22)
flat2 = Flatten()(pool2)
out = Dense(1, activation='softmax')(flat2)
model = Model(inp, out)
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()
model.fit(X_train, Y_train)
I will really appreciate if someone could give me a working code for this with a little bit of explaination
in this dummy example, I use a Conv1D with 2D features. The Conv1D accepts as input sequences in 3D format (n_samples, time_steps, features). If you are using 2D features you have to adapt it to 3D. the normal choice is to consider your features as is expanding simply the temporal dimension (expand_dims on axis 1) there is no reason to assume positional/temporal pattern on tfidf/one-hot features.
When you build your NN you start with 3D dimension and you have to pass in 2D. to pass from to 3D to 2D there are lot of possibilities, the post simple is flattening, with 1 temporal dim a pooling layer is useless. if u are using softmax as last activation layer remember to pass to your dense layer a dimensionality equal to the number of your classes
import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import *
from tensorflow.keras.models import *
## define variable
n_sample = 10229
text_len = 100
## create dummy data
X_train = np.random.uniform(0,1, (n_sample,text_len))
y_train = np.random.randint(0,2, n_sample)
## expand train dimnesion: pass from 2d to 3d
X_train = np.expand_dims(X_train, axis=1)
print(X_train.shape, y_train.shape)
## create model
inp = Input(shape=(1,text_len))
conv2 = Conv1D(filters=128, kernel_size=5, activation='relu', padding='same')(inp)
drop21 = Dropout(0.5)(conv2)
conv22 = Conv1D(filters=64, kernel_size=5, activation='relu', padding='same')(drop21)
drop22 = Dropout(0.5)(conv22)
pool2 = Flatten()(drop22) # this is an option to pass from 3d to 2d
out = Dense(2, activation='softmax')(pool2) # the output dim must be equal to the num of class if u use softmax
model = Model(inp, out)
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()
model.fit(X_train, y_train, epochs=5)
I made an experiment between the usage of binary_crossentropy and categorical_crossentropy. I try to understand the behavior of these two loss functions on same problem.
I worked on binary classification problem with this data.
In the first experiment, I used 1 neuron in the last layer with sigmoid activation function and binary_crossentropy. I trained this model 10 times and take the average accuracy. The average accuracy is 74.12760416666666.
The code that I used for first experiment is below.
total_acc = 0
for each_iter in range(0, 10):
print each_iter
X = dataset[:,0:8]
y = dataset[:,8]
# define the keras model
model = Sequential()
model.add(Dense(12, input_dim=8, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# compile the keras model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# fit the keras model on the dataset
model.fit(X, y, epochs=150, batch_size=32)
# evaluate the keras model
_, accuracy = model.evaluate(X, y)
print('Accuracy: %.2f' % (accuracy*100))
temp_acc = accuracy*100
total_acc += temp_acc
del model
In the second experiment, I used 2 neurons in the last layer with softmax activation function and categorical_crossentropy. I converted my target `y, into categorical and again I trained this model 10 times and take the average accuracy. The average accuracy is 66.92708333333334.
The code that I used for the second setting is in below:
total_acc_v2 = 0
for each_iter in range(0, 10):
print each_iter
X = dataset[:,0:8]
y = dataset[:,8]
y = np_utils.to_categorical(y)
# define the keras model
model = Sequential()
model.add(Dense(12, input_dim=8, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(2, activation='softmax'))
# compile the keras model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
# fit the keras model on the dataset
model.fit(X, y, epochs=150, batch_size=32)
# evaluate the keras model
_, accuracy = model.evaluate(X, y)
print('Accuracy: %.2f' % (accuracy*100))
temp_acc = accuracy*100
total_acc_v2 += temp_acc
del model
I think that these two experiments are identical and should give very similar results. What is the reason of this huge difference between accuracy?
Seems like the reason of such behaviour is randomness. I've ran your code and got around 74 average accuracy for the sigmoid model and around 74 for the softmax model.
I have trained a model applying some image augmentations by using ImageDataGenerator in Keras as follows:
train_datagen = ImageDataGenerator(
rotation_range=60,
width_shift_range=0.1,
height_shift_range=0.1,
horizontal_flip=True)
train_datagen.fit(x_train)
history = model.fit_generator(
train_datagen.flow(x_train, y_train, batch_size=7),
steps_per_epoch=600,
epochs=epochs,
callbacks=callbacks_list
)
How should I make predictions with this model? By using model.predict() as shown below?
predictions = model.predict(x_test)
Or should I use model.predict_generator() where an ImageDataGenerator is applied on x_test where x_test is unlabelled?
If I use predict_generator(): How to do that?
What is the difference between two methods?
predict_generator() is a convenience function that makes it easier to load in the images and apply the same preprocessing like you did for your training samples. I recommend using that rather than model.predict.
In your case simply do:
test_gen = ImageDataGenerator()
predictions = model.predict_generator(test_gen.flow(# ... your params here ... #))
I ran a training job with tensorflow and got the following curve for loss on validation set. The net starts to overfit after 6000-th iteration. So I'd like to get the model before overfitting.
My training code is something like below:
train_step = ......
summary = tf.scalar_summary(l1_loss.op.name, l1_loss)
summary_writer = tf.train.SummaryWriter("checkpoint", sess.graph)
saver = tf.train.Saver()
for i in xrange(20000):
batch = get_next_batch(batch_size)
sess.run(train_step, feed_dict = {x: batch.x, y:batch.y})
if (i+1) % 100 == 0:
saver.save(sess, "checkpoint/net", global_step = i+1)
summary_str = sess.run(summary, feed_dict=validation_feed_dict)
summary_writer.add_summary(summary_str, i+1)
summary_writer.flush()
After training finishes, there is only five checkpoints saved (19600, 19700, 19800, 19900, 20000). Is there any way to let tensorflow save checkpoint according to the validation error?
P.S. I know that tf.train.Saver has a max_to_keep argument, which in principal could save all the checkpoints. But that's not I wanted (unless it's the only option). I want the saver keep the checkpoint with the smallest validation loss so far. Is that possible?
You need to calculate the classification accuracy on the validation-set and keep track of the best one seen so far, and only write the checkpoint once an improvement has been found to the validation accuracy.
If the data-set and/or model is large, then you may have to split the validation-set into batches to fit the computation in memory.
This tutorial shows exactly how to do what you want:
https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/04_Save_Restore.ipynb
It is also available as a short video:
https://www.youtube.com/watch?v=Lx8JUJROkh0
This can be done with checkpoints. In tensorflow 1:
# you should import other functions/libs as needed to build the model
from keras.callbacks.callbacks import ModelCheckpoint
# add checkpoint to save model with lowest val loss
filepath = 'tf1_mnist_cnn.hdf5'
save_checkpoint = ModelCheckpoint(filepath, monitor='val_loss', verbose=1, \
save_best_only=True, save_weights_only=False, \
mode='auto', period=1)
model.fit(x_train, y_train,
batch_size=batch_size,
epochs=epochs,
verbose=1,
validation_data=(x_test, y_test),
callbacks=[save_checkpoint])
Tensorflow 2:
# import other libs as needed for building model
from tensorflow.keras.callbacks import ModelCheckpoint
# add a checkpoint to save the lowest validation loss
filepath = 'tf2_mnist_model.hdf5'
checkpoint = ModelCheckpoint(filepath, monitor='val_loss', verbose=1, \
save_best_only=True, save_weights_only=False, \
mode='auto', save_frequency=1)
model.fit(x_train, y_train,
batch_size=batch_size,
epochs=epochs,
verbose=1,
validation_data=(x_test, y_test),
callbacks=[checkpoint])
Complete demo files are here: https://github.com/nateGeorge/slurm_gpu_ubuntu/tree/master/demo_files.
In your session.run you'll need to explicitely ask for the loss. Then create a list with your last eval-losses and only if the current eval loss is smaller than i.e. the last two saved losses create the checkpoint.