GAN Converges in Just a Few Epochs - machine-learning

I implemented a genrative adversarial network in Keras. My training data size is about 16,000, where each image is of 32*32 size. All of my training images are the resized versions of the imageds from the imagenet dataset with regard to the object detection task. I fed the image matrix directly into the network without doing the center crop. I used the AdamOptimizer with the learning rate being 1e-4, and beta1 being 0.5 and I also set the dropout rate to be 0.1. I first trained the discrimator on 3000 real images and 3000 fake images and it achieved a 93% accuracy. Then, I trained for 500 epochs with the batch size being 32. However, my model seemed to converge in only a few epochs(<10), and the images it generated were ugly.
Plot of the Loss Function
Random Samples Generated by the Generator
I was wondering whether my training dataset is too small(compared to those in the paper of DCGAN, which are more than 300,000) or my model configuration is not correct. What's more, should I train the SGD on D for k iterations (where k is small, perhaps 1) and then training with SGD on G for one iteration as suggested by Ian Goodfellow in the original paper?(I have just tried to train them one at a time)
Below is the configuration of the generator.
g_input = Input(shape=[100])
H = Dense(1024*4*4, init='glorot_normal')(g_input)
H = BatchNormalization(mode=2)(H)
H = Activation('relu')(H)
H = Reshape( [4, 4,1024] )(H)
H = UpSampling2D(size=( 2, 2))(H)
H = Convolution2D(512, 3, 3, border_mode='same', init='glorot_uniform')(H)
H = BatchNormalization(mode=2)(H)
H = Activation('relu')(H)
H = UpSampling2D(size=( 2, 2))(H)
H = Convolution2D(256, 3, 3, border_mode='same', init='glorot_uniform')(H)
H = BatchNormalization(mode=2)(H)
H = Activation('relu')(H)
H = UpSampling2D(size=( 2, 2))(H)
H = Convolution2D(3, 3, 3, border_mode='same', init='glorot_uniform')(H)
g_V = Activation('tanh')(H)
generator = Model(g_input,g_V)
generator.compile(loss='binary_crossentropy', optimizer=opt)
generator.summary()
Below is the configuration of the discriminator:
d_input = Input(shape=shp)
H = Convolution2D(64, 5, 5, subsample=(2, 2), border_mode = 'same', init='glorot_normal')(d_input)
H = LeakyReLU(0.2)(H)
#H = Dropout(dropout_rate)(H)
H = Convolution2D(128, 5, 5, subsample=(2, 2), border_mode = 'same', init='glorot_normal')(H)
H = BatchNormalization(mode=2)(H)
H = LeakyReLU(0.2)(H)
#H = Dropout(dropout_rate)(H)
H = Flatten()(H)
H = Dense(256, init='glorot_normal')(H)
H = LeakyReLU(0.2)(H)
d_V = Dense(2,activation='softmax')(H)
discriminator = Model(d_input,d_V)
discriminator.compile(loss='categorical_crossentropy', optimizer=dopt)
discriminator.summary()
Below is the configuration of GAN as a whole:
gan_input = Input(shape=[100])
H = generator(gan_input)
gan_V = discriminator(H)
GAN = Model(gan_input, gan_V)
GAN.compile(loss='categorical_crossentropy', optimizer=opt)
GAN.summary()

I think problem is with loss function
Try
loss='categorical_crossentropy',

I suspect that your generator is trainable while you training the gan. You can verify by using generator.layers[-1].get_weights() to see if the parameters changed during training process of gan.
You should freeze discriminator before you assemble it to gan:
generator.trainnable = False
gan_input = Input(shape=[100])
H = generator(gan_input)
gan_V = discriminator(H)
GAN = Model(gan_input, gan_V)
GAN.compile(loss='categorical_crossentropy', optimizer=opt)
GAN.summary()
see this discussion:
https://github.com/fchollet/keras/issues/4674

Related

Poor predictions on second dataset from trained LSTM model

I've trained an LSTM model with 8 features and 1 output. I have one dataset and split it into two separate files to train and predict with the first half of the set, and then attempt to predict the second half of the set using the trained model from the first part of my dataset. My model predicts the trained and testing sets from the dataset I used to train the model pretty well (RMSE of around 5-7), however when I attempt to predict using the second half of the set I get very poor predictions (RMSE of around 50-60). How can I get my trained model to predict outside datasets well?
dataset at this link
file = r'/content/drive/MyDrive/only_force_pt1.csv'
df = pd.read_csv(file)
df.head()
X = df.iloc[:, 1:9]
y = df.iloc[:,9]
print(X.shape)
print(y.shape)
plt.figure(figsize = (20, 6), dpi = 100)
plt.plot(y)
WINDOW_LEN = 50
def window_size(size, inputdata, targetdata):
X = []
y = []
i=0
while(i + size) <= len(inputdata)-1:
X.append(inputdata[i: i+size])
y.append(targetdata[i+size])
i+=1
assert len(X)==len(y)
return (X,y)
X_series, y_series = window_size(WINDOW_LEN, X, y)
print(len(X))
print(len(X_series))
print(len(y_series))
X_train, X_val, y_train, y_val = train_test_split(np.array(X_series),np.array(y_series),test_size=0.3, shuffle = True)
X_val, X_test,y_val, y_test = train_test_split(np.array(X_val),np.array(y_val),test_size=0.3, shuffle = False)
n_timesteps, n_features, n_outputs = X_train.shape[1], X_train.shape[2],1
[verbose, epochs, batch_size] = [1, 300, 32]
input_shape = (n_timesteps, n_features)
model = Sequential()
# LSTM
model.add(LSTM(64, input_shape=input_shape, return_sequences = False))
model.add(Dropout(0.2))
model.add(Dense(64, activation='relu', kernel_regularizer=keras.regularizers.l2(0.001)))
#model.add(Dropout(0.2))
model.add(Dense(32, activation='relu', kernel_regularizer=keras.regularizers.l2(0.001)))
model.add(Dense(1, activation='relu'))
earlystopper = EarlyStopping(monitor='val_loss', min_delta=0, patience = 30, verbose =1, mode = 'auto')
model.summary()
model.compile(loss = 'mse', optimizer = Adam(learning_rate = 0.001), metrics=[tf.keras.metrics.RootMeanSquaredError()])
history = model.fit(X_train, y_train, batch_size = batch_size, epochs = epochs, verbose = verbose, validation_data=(X_val,y_val), callbacks = [earlystopper])
Second dataset:
tests = r'/content/drive/MyDrive/only_force_pt2.csv'
df_testing = pd.read_csv(tests)
X_testing = df_testing.iloc[:4038,1:9]
torque = df_testing.iloc[:4038,9]
print(X_testing.shape)
print(torque.shape)
plt.figure(figsize = (20, 6), dpi = 100)
plt.plot(torque)
X_testing = X_testing.to_numpy()
X_testing_series, y_testing_series = window_size(WINDOW_LEN, X_testing, torque)
X_testing_series = np.array(X_testing_series)
y_testing_series = np.array(y_testing_series)
scores = model.evaluate(X_testing_series, y_testing_series, verbose =1)
X_prediction = model.predict(X_testing_series, batch_size = 32)
If your model is working fine on training data but performs bad on validation data, then your model did not learn the "true" connection between input and output variables but simply memorized the corresponding output to your input. To tackle this you can do multiple things:
Typically you would use 80% of your data to train and 20% to test, this will present more data to the model, which should make it learn more of the true underlying function
If your model is too complex, it will have neurons which will just be used to memorize input-output data pairs. Try to reduce the complexity of your model (layers, neurons) to make it more simple, so that the remaining layers can really learn instead of memorize
Look into more detail on training performance here

Trying to understand Pytorch's implementation of LSTM

I have a dataset containing 1000 examples where each example has 5 features (a,b,c,d,e). I want to feed 7 examples to an LSTM so it predicts the feature (a) of the 8th day.
Reading Pytorchs documentation of nn.LSTM() I came up with the following:
input_size = 5
hidden_size = 10
num_layers = 1
output_size = 1
lstm = nn.LSTM(input_size, hidden_size, num_layers)
fc = nn.Linear(hidden_size, output_size)
out, hidden = lstm(X) # Where X's shape is ([7,1,5])
output = fc(out[-1])
output # output's shape is ([7,1])
According to the docs:
The input of the nn.LSTM is "input of shape (seq_len, batch, input_size)" with "input_size – The number of expected features in the input x",
And the output is: "output of shape (seq_len, batch, num_directions * hidden_size): tensor containing the output features (h_t) from the last layer of the LSTM, for each t."
In this case, I thought seq_len would be the sequence of 7 examples, batchis 1 and input_size is 5. So the lstm would consume each example containing 5 features refeeding the hidden layer every iteration.
What am I missing?
When I extend your code to a full example -- I also added some comments to may help -- I get the following:
import torch
import torch.nn as nn
input_size = 5
hidden_size = 10
num_layers = 1
output_size = 1
lstm = nn.LSTM(input_size, hidden_size, num_layers)
fc = nn.Linear(hidden_size, output_size)
X = [
[[1,2,3,4,5]],
[[1,2,3,4,5]],
[[1,2,3,4,5]],
[[1,2,3,4,5]],
[[1,2,3,4,5]],
[[1,2,3,4,5]],
[[1,2,3,4,5]],
]
X = torch.tensor(X, dtype=torch.float32)
print(X.shape) # (seq_len, batch_size, input_size) = (7, 1, 5)
out, hidden = lstm(X) # Where X's shape is ([7,1,5])
print(out.shape) # (seq_len, batch_size, hidden_size) = (7, 1, 10)
out = out[-1] # Get output of last step
print(out.shape) # (batch, hidden_size) = (1, 10)
out = fc(out) # Push through linear layer
print(out.shape) # (batch_size, output_size) = (1, 1)
This makes sense to me, given your batch_size = 1 and output_size = 1 (I assume, you're doing regression). I don't know where your output.shape = (7, 1) come from.
Are you sure that your X has the correct dimensions? Did you create nn.LSTM maybe with batch_first=True? There are lot of little things that can sneak in.

Why does Keras not generalize my data?

Ive been trying to implement a basic multilayered LSTM regression network to find correlations between cryptocurrency prices.
After running into unusable training results, i've decided to play around with some sandbox code, to make sure i've got the idea right before trying again on my full dataset.
The problem is I can't get Keras to generalize my data.
ts = 3
in_dim = 1
data = [i*100 for i in range(10)]
# tried this, didn't accomplish anything
# data = [(d - np.mean(data))/np.std(data) for d in data]
x = data[:len(data) - 4]
y = data[3:len(data) - 1]
assert(len(x) == len(y))
x = [[_x] for _x in x]
y = [[_y] for _y in y]
x = [x[idx:idx + ts] for idx in range(0, len(x), ts)]
y = [y[idx:idx + ts] for idx in range(0, len(y), ts)]
x = np.asarray(x)
y = np.asarray(y)
x looks like this:
[[[ 0]
[100]
[200]]
[[300]
[400]
[500]]]
and y:
[[[300]
[400]
[500]]
[[600]
[700]
[800]]]
and this works well when I predict using a very similar dataset, but doesn't generalize when I try a similar sequence with scaled values
model = Sequential()
model.add(BatchNormalization(
axis = 1,
input_shape = (ts, in_dim)))
model.add(LSTM(
100,
input_shape = (ts, in_dim),
return_sequences = True))
model.add(TimeDistributed(Dense(in_dim)))
model.add(Activation('linear'))
model.compile(loss = 'mse', optimizer = 'rmsprop')
model.fit(x, y, epochs = 2000, verbose = 0)
p = np.asarray([[[10],[20],[30]]])
prediction = model.predict(p)
print(prediction)
prints
[[[ 165.78544617]
[ 209.34489441]
[ 216.02174377]]]
I want
[[[ 40.0000]
[ 50.0000]
[ 60.0000]]]
how can I format this so that when i plug in a sequence with values that are of a completely different scale, the network will still output its predicted value? I've tried normalizing my training data, but the results are still entirely unusable.
What have I done wrong here?
How about transform your input data before sending into your LSTM, use something like sklearn.preprocessing.StandardScaler? after prediction you can call scaler.inverse_transform(prediction)

Neural Network does not perform well on the CIFAR-10 dataset

I have been trying to implement a CNN on the CIFAR-10 dataset for a few days and my test set accuracy does not seem to go beyond the 10% and the error just hang around 69.07733. I have tweaking the model and few days but in vain. I haven't been able to spot out where I am going wrong. Please help me recognise the fault in the model. Here is the code for it:
import os
import sys
import pickle
import tensorflow as tf
import numpy as np
from matplotlib import pyplot as plt
data_root = './cifar-10-batches-py'
train_data = np.ndarray(shape=(50000,3072), dtype=np.float32)
train_labels = np.ndarray(shape=(50000), dtype=np.float32)
num_images = 0
test_data = np.ndarray(shape=(10000,3072),dtype = np.float32)
test_labels = np.ndarray(shape=(10000),dtype=np.float32)
meta_data = {}
for file in os.listdir(data_root):
file_path = os.path.join(data_root,file)
with open(file_path,'rb') as f:
temp = pickle.load(f,encoding ='bytes')
if file == 'batches.meta':
for i,j in enumerate(temp[b'label_names']):
meta_data[i] = j
if 'data_batch_' in file:
for i in range(10000):
train_data[num_images,:] = temp[b'data'][i]
train_labels[num_images] = temp[b'labels'][i]
num_images += 1
if 'test_batch' in file:
for i in range(10000):
test_data[i,:] = temp[b'data'][i]
test_labels[i] = temp[b'labels'][i]
'''
print('meta: \n',meta_data)
train_data = train_data.reshape(50000,3,32,32).transpose(0,2,3,1)
print('\ntrain data: \n',train_data.shape,'\nLabels: \n',train_labels[0])
print('\ntest data: \n',test_data[0].shape,'\nLabels: \n',train_labels[0])'''
#accuracy function acc = (no. of correct prediction/total attempts) * 100
def accuracy(predictions, labels):
return (100 * (np.sum(np.argmax(predictions,1)== np.argmax(labels, 1))/predictions.shape[0]))
#reformat the data
def reformat(data,labels):
data = data.reshape(data.shape[0],3,32,32).transpose(0,2,3,1).astype(np.float32)
labels = (np.arange(10) == labels[:,None]).astype(np.float32)
return data,labels
train_data, train_labels = reformat(train_data,train_labels)
test_data, test_labels = reformat(test_data, test_labels)
print ('Train ',train_data[0][1])
plt.axis("off")
plt.imshow(train_data[1], interpolation = 'nearest')
plt.savefig("1.png")
plt.show()
'''
print("Train: \n",train_data.shape,test_data[0],"\nLabels: \n",train_labels.shape,train_labels[:11])
print("Test: \n",test_data.shape,test_data[0],"\nLabels: \n",test_labels.shape,test_labels[:11])'''
image_size = 32
num_channels = 3
batch_size = 30
patch_size = 5
depth = 64
num_hidden = 256
num_labels = 10
graph = tf.Graph()
with graph.as_default():
#input data and labels
train_input = tf.placeholder(tf.float32,shape=(batch_size,image_size,image_size,num_channels))
train_output = tf.placeholder(tf.float32,shape=(batch_size,num_labels))
test_input = tf.constant(test_data)
#layer weights and biases
layer_1_weights = tf.Variable(tf.truncated_normal([patch_size,patch_size,num_channels,depth]))
layer_1_biases = tf.Variable(tf.zeros([depth]))
layer_2_weights = tf.Variable(tf.truncated_normal([patch_size,patch_size,depth,depth]))
layer_2_biases = tf.Variable(tf.constant(0.1, shape=[depth]))
layer_3_weights = tf.Variable(tf.truncated_normal([64*64, num_hidden]))
layer_3_biases = tf.Variable(tf.constant(0.1, shape=[num_hidden]))
layer_4_weights = tf.Variable(tf.truncated_normal([num_hidden, num_labels]))
layer_4_biases = tf.Variable(tf.constant(0.1, shape=[num_labels]))
def convnet(data):
conv_1 = tf.nn.conv2d(data, layer_1_weights,[1,1,1,1], padding = 'SAME')
hidden_1 = tf.nn.relu(conv_1+layer_1_biases)
norm_1 = tf.nn.lrn(hidden_1, 4, bias=1.0, alpha=0.001 / 9.0, beta=0.75)
pool_1 = tf.nn.max_pool(norm_1,[1,2,2,1],[1,2,2,1], padding ='SAME')
conv_2 = tf.nn.conv2d(pool_1,layer_2_weights,[1,1,1,1], padding = 'SAME')
hidden_2 = tf.nn.relu(conv_2+layer_2_biases)
norm_2 = tf.nn.lrn(hidden_2, 4, bias=1.0, alpha=0.001 / 9.0, beta=0.75)
pool_2 = tf.nn.max_pool(norm_2,[1,2,2,1],[1,2,2,1], padding ='SAME')
shape = pool_2.get_shape().as_list()
hidd2_trans = tf.reshape(pool_2,[shape[0],shape[1]*shape[2]*shape[3]])
hidden_3 = tf.nn.relu(tf.matmul(hidd2_trans,layer_3_weights) + layer_3_biases)
return tf.nn.relu(tf.matmul(hidden_3,layer_4_weights) + layer_4_biases)
logits = convnet(train_input)
loss = tf.reduce_sum(tf.nn.softmax_cross_entropy_with_logits(labels=train_output, logits = logits))
optimizer = tf.train.AdamOptimizer(1e-4).minimize(loss)
train_prediction = tf.nn.softmax(logits)
test_prediction = tf.nn.softmax(convnet(test_input))
num_steps = 100000
with tf.Session(graph=graph) as session:
tf.global_variables_initializer().run()
print('Initialized \n')
for step in range(num_steps):
offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
batch = train_data[offset:(offset+batch_size),:,:,:]
batch_labels = train_labels[offset:(offset+batch_size),:]
feed_dict ={train_input: batch, train_output: batch_labels}
_,l,prediction = session.run([optimizer, loss, train_prediction], feed_dict = feed_dict)
if (step % 500 == 0):
print("Loss at step %d: %f" %(step, l))
print("Accuracy: %f" %(accuracy(prediction, batch_labels)))
print("Test accuracy: %f" %(accuracy(session.run(test_prediction), test_labels)))
On a first glance I would say the initialization of the CNN is the culprit. A convnet is an optimization algorithm in a highly non-convex space and therefore depends a lot on careful initialization to not get stuck on local minima or saddle points. Look at xavier initialization for an example on how to fix that.
Example Code:
W = tf.get_variable("W", shape=[784, 256],
initializer=tf.contrib.layers.xavier_initializer())
Problem is your network is having very high depth(number of filters = 64 for both layers). Also, you are training the network from scratch. And your dataset of CIFAR10 (50000 images) is very little. Moreover, each CIFAR10 image is only 32x32x3 size.
Couple of alternatives what I can suggest you is to retrain a pre-trained model, i.e do transfer learning.
Other better alternative is to reduce the number of filters in each layer. In this way, you will be able to train the model from scratch and also it will be faster. (Assuming you don't have GPU).
Next you are making use of local response normalization. I would suggest you to remove this layer and do mean normalization in pre-processing step.
Next, if you feel the learning is not picking up at all, try increasing the learning rate a little and see.
Lastly, just to reduce some operation in your code, you are reshaping your tensor and then doing transpose in many places like this:
data.reshape(data.shape[0],3,32,32).transpose(0,2,3,1)
Why not directly reshape it to something like this?
data.reshape(data.shape[0], 32, 32, 3)
Hope the answer helps you.

What are the problems that causes neural networks stagnate in learning?

I was trying to see how accurate a neural network can approximate simple functions, like a scalar-valued polynomial in several variables. So I had these ideas:
Fix a polynomial of several variables, say, f(x_1,..,x_n).
Generate 50000 vectors of length n using numpy.random which will serve as training data.
Evaluate the f(x) at these points, the value will be used as label.
Make test data and label in the same way
Write a neural network and see how accuracy it can approximate f(x) on test set.
Here is my sample neural network implemented in tensorflow
import tensorflow as tf
import numpy as np
input_vector_length = int(10)
output_vector_length = int(1)
train_data_size = int(50000)
test_data_size = int(10000)
train_input_domain = [-10, 10] #Each component in an input vector is between -10 and 10
test_input_domain = [-10, 10]
iterations = 20000
batch_size = 200
regularizer = 0.01
sess = tf.Session()
x = tf.placeholder(tf.float32, shape=[None, input_vector_length], name="x")
y = tf.placeholder(tf.float32, shape =[None, output_vector_length], name="y")
function = tf.reduce_sum(x, 1) + 0.25*tf.pow(tf.reduce_sum(x,1), 2) + 0.025*tf.pow(tf.reduce_sum(x,1), 3)
#make train data input
train_input = (train_input_domain[1]-train_input_domain[0])*np.random.rand(train_data_size, input_vector_length) + train_input_domain[0]
#make train data label
train_label = sess.run(function, feed_dict = {x : train_input})
train_label = train_label.reshape(train_data_size, output_vector_length)
#make test data input
test_input = (test_input_domain[1]-test_input_domain[0])*np.random.rand(test_data_size, input_vector_length) + test_input_domain[0]
#make test data label
test_label = sess.run(function, feed_dict = {x : test_input})
test_label = test_label.reshape(test_data_size, output_vector_length)
def weight_variables(shape, name):
initial = 10*tf.truncated_normal(shape, stddev=0.1)
return tf.Variable(initial)
def bias_variables(shape, name):
initial = 10*tf.truncated_normal(shape, stddev=0.1)
return tf.Variable(initial)
def take_this_batch(data, batch_index=[]):
A = []
for i in range(len(batch_index)):
A.append(data[i])
return A
W_0 = weight_variables(shape=[input_vector_length, 10], name="W_0")
B_0 = bias_variables(shape=[10], name="W_0")
y_1 = tf.sigmoid(tf.matmul(x, W_0) + B_0)
W_1 = weight_variables(shape=[10, 20], name="W_1")
B_1 = bias_variables(shape=[20], name="B_1")
y_2 = tf.sigmoid(tf.matmul(y_1, W_1) + B_1)
W_2 = weight_variables(shape=[20,40], name="W_2")
B_2 = bias_variables(shape=[40], name="B_2")
y_3 = tf.sigmoid(tf.matmul(y_2, W_2) + B_2)
keep_prob = tf.placeholder(tf.float32, name="keep_prob")
y_drop = tf.nn.dropout(y_3, keep_prob)
W_output = weight_variables(shape=[40, output_vector_length], name="W_output")
B_output = bias_variables(shape=[output_vector_length], name="B_output")
y_output = tf.matmul(y_drop, W_output) + B_output
weight_sum = tf.reduce_sum(tf.square(W_0)) + tf.reduce_sum(tf.square(W_1)) + tf.reduce_sum(tf.square(W_2)) + tf.reduce_sum(tf.square(W_3))
cost = tf.reduce_mean(tf.square(y - y_output)) + regularizer*(weight_sum)
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cost)
error = cost
sess.run(tf.initialize_all_variables())
with sess.as_default():
for step in range(iterations):
batch_index = np.random.randint(low=0, high=train_data_size, size=batch_size)
batch_input = take_this_batch(train_input, batch_index)
batch_label = take_this_batch(train_label, batch_index)
train_step.run(feed_dict = {x : batch_input, y:batch_label, keep_prob:0.5})
if step % 1000 == 0:
current_error = error.eval(feed_dict = {x:batch_input, y:batch_label, keep_prob:1.0})
print("step %d, Current error is %f" % (step,current_error))
print(error.eval(feed_dict={x:test_input, y:test_label, keep_prob:1.0}))
Simply speaking, the performance of this neural network is horrifying! My neural network has three hidden layers of size 10, 20 and 40. The input layer is of size 10, and the output layer has size 1. I used a simple L^2 cost function, and I regularized it with the square of weights and regularizer 0.01.
During training stage, I noticed that the error seems to get stuck and refuses to go down. I am wondering what could go wrong? Thanks a lot for reading this long question. Any suggestion is appreciated.
Since you are using sigmoid as the activation function in the hidden layers, the value at these neurons is reduced to the range of (0,1). Hence, it is a good idea to normalize the input data for this network.

Resources