Related
I've trained an LSTM model with 8 features and 1 output. I have one dataset and split it into two separate files to train and predict with the first half of the set, and then attempt to predict the second half of the set using the trained model from the first part of my dataset. My model predicts the trained and testing sets from the dataset I used to train the model pretty well (RMSE of around 5-7), however when I attempt to predict using the second half of the set I get very poor predictions (RMSE of around 50-60). How can I get my trained model to predict outside datasets well?
dataset at this link
file = r'/content/drive/MyDrive/only_force_pt1.csv'
df = pd.read_csv(file)
df.head()
X = df.iloc[:, 1:9]
y = df.iloc[:,9]
print(X.shape)
print(y.shape)
plt.figure(figsize = (20, 6), dpi = 100)
plt.plot(y)
WINDOW_LEN = 50
def window_size(size, inputdata, targetdata):
X = []
y = []
i=0
while(i + size) <= len(inputdata)-1:
X.append(inputdata[i: i+size])
y.append(targetdata[i+size])
i+=1
assert len(X)==len(y)
return (X,y)
X_series, y_series = window_size(WINDOW_LEN, X, y)
print(len(X))
print(len(X_series))
print(len(y_series))
X_train, X_val, y_train, y_val = train_test_split(np.array(X_series),np.array(y_series),test_size=0.3, shuffle = True)
X_val, X_test,y_val, y_test = train_test_split(np.array(X_val),np.array(y_val),test_size=0.3, shuffle = False)
n_timesteps, n_features, n_outputs = X_train.shape[1], X_train.shape[2],1
[verbose, epochs, batch_size] = [1, 300, 32]
input_shape = (n_timesteps, n_features)
model = Sequential()
# LSTM
model.add(LSTM(64, input_shape=input_shape, return_sequences = False))
model.add(Dropout(0.2))
model.add(Dense(64, activation='relu', kernel_regularizer=keras.regularizers.l2(0.001)))
#model.add(Dropout(0.2))
model.add(Dense(32, activation='relu', kernel_regularizer=keras.regularizers.l2(0.001)))
model.add(Dense(1, activation='relu'))
earlystopper = EarlyStopping(monitor='val_loss', min_delta=0, patience = 30, verbose =1, mode = 'auto')
model.summary()
model.compile(loss = 'mse', optimizer = Adam(learning_rate = 0.001), metrics=[tf.keras.metrics.RootMeanSquaredError()])
history = model.fit(X_train, y_train, batch_size = batch_size, epochs = epochs, verbose = verbose, validation_data=(X_val,y_val), callbacks = [earlystopper])
Second dataset:
tests = r'/content/drive/MyDrive/only_force_pt2.csv'
df_testing = pd.read_csv(tests)
X_testing = df_testing.iloc[:4038,1:9]
torque = df_testing.iloc[:4038,9]
print(X_testing.shape)
print(torque.shape)
plt.figure(figsize = (20, 6), dpi = 100)
plt.plot(torque)
X_testing = X_testing.to_numpy()
X_testing_series, y_testing_series = window_size(WINDOW_LEN, X_testing, torque)
X_testing_series = np.array(X_testing_series)
y_testing_series = np.array(y_testing_series)
scores = model.evaluate(X_testing_series, y_testing_series, verbose =1)
X_prediction = model.predict(X_testing_series, batch_size = 32)
If your model is working fine on training data but performs bad on validation data, then your model did not learn the "true" connection between input and output variables but simply memorized the corresponding output to your input. To tackle this you can do multiple things:
Typically you would use 80% of your data to train and 20% to test, this will present more data to the model, which should make it learn more of the true underlying function
If your model is too complex, it will have neurons which will just be used to memorize input-output data pairs. Try to reduce the complexity of your model (layers, neurons) to make it more simple, so that the remaining layers can really learn instead of memorize
Look into more detail on training performance here
So I trained by python/pytorch DC-GAN (deep convolutional GAN) for 30 epochs on grayscale faces, and my GAN pretty much failed. I added batch normalization and leaky relu's to my generator and discriminator (I heard those are ways to make the GAN converge), and the Adam optimizer. My GAN still only putting out random grayscale pixels (nothing even remotely related to faces.) I have no problem with the discriminator, my discriminator works very well. I then implemented weight decay of 0.01 on my discriminator to make my GAN train better (since my discriminator was doing better than my generator) but to no avail. Finally, I tried training the GAN for more epochs, 60 epochs. My GAN still generates just random pixels, sometimes outputting completely black.
The GAN training method I used worked for the MNIST dataset (but I used a way simpler GAN architecture for that.)
import torch.nn as nn
import torch.nn.functional as F
class Discriminator(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(1, 4, 3)
self.conv2 = nn.Conv2d(4, 8, 3)
self.bnorm1 = nn.BatchNorm2d(8)
self.conv3 = nn.Conv2d(8, 16, 3)
self.conv4 = nn.Conv2d(16, 32, 3)
self.bnorm2 = nn.BatchNorm2d(32)
self.conv5 = nn.Conv2d(32, 4, 3)
self.fc1 = nn.Linear(5776, 1024)
self.fc2 = nn.Linear(1024, 512)
self.fc3 = nn.Linear(512, 1)
def forward(self, x):
pred = F.leaky_relu(self.conv1(x.reshape(-1,1,48,48)))
pred = F.leaky_relu(self.bnorm1(self.conv2(pred)))
pred = F.leaky_relu(self.conv3(pred))
pred = F.leaky_relu(self.bnorm2(self.conv4(pred)))
pred = F.leaky_relu(self.conv5(pred))
pred = pred.reshape(-1, 5776)
pred = F.leaky_relu(self.fc1(pred))
pred = F.leaky_relu(self.fc2(pred))
pred = torch.sigmoid(self.fc3(pred))
return pred
class Generator(nn.Module):
def __init__(self):
super().__init__()
self.fc1 = nn.Linear(512, 1024)
self.fc2 = nn.Linear(1024, 2048)
self.fc3 = nn.Linear(2048, 5776)
self.convT1 = nn.ConvTranspose2d(4, 32, 3)
self.convT2 = nn.ConvTranspose2d(32, 16, 3)
self.bnorm1 = nn.BatchNorm2d(16)
self.convT3 = nn.ConvTranspose2d(16, 8, 3)
self.convT4 = nn.ConvTranspose2d(8, 4, 3)
self.bnorm2 = nn.BatchNorm2d(4)
self.convT5 = nn.ConvTranspose2d(4, 1, 3)
def forward(self, x):
pred = F.leaky_relu(self.fc1(x))
pred = F.leaky_relu(self.fc2(pred))
pred = F.leaky_relu(self.fc3(pred))
pred = pred.reshape(-1, 4, 38, 38)
pred = F.leaky_relu(self.convT1(pred))
pred = F.leaky_relu(self.bnorm1(self.convT2(pred)))
pred = F.leaky_relu(self.convT3(pred))
pred = F.leaky_relu(self.bnorm2(self.convT4(pred)))
pred = torch.sigmoid(self.convT5(pred))
return pred
import torch.optim as optim
discriminator = discriminator.to("cuda")
generator = generator.to("cuda")
discriminator_losses = []
generator_losses = []
for epoch in range(30):
for data,label in tensor_dataset:
data = data.to("cuda")
label = label.to("cuda")
batch_size = data.size(0)
real_labels = torch.ones(batch_size, 1).to("cuda")
fake_labels = torch.zeros(batch_size, 1).to("cuda")
noise = torch.randn(batch_size, 512).to("cuda")
D_real = discriminator(data)
D_fake = discriminator(generator(noise))
D_real_loss = F.binary_cross_entropy(D_real, real_labels)
D_fake_loss = F.binary_cross_entropy(D_fake, fake_labels)
D_loss = D_real_loss+D_fake_loss
d_optim.zero_grad()
D_loss.backward()
d_optim.step()
noise = torch.randn(batch_size, 512).to("cuda")
D_fake = discriminator(generator(noise))
G_loss = F.binary_cross_entropy(D_fake, real_labels)
g_optim.zero_grad()
G_loss.backward()
g_optim.step()
discriminator_losses.append(D_loss)
generator_losses.append(G_loss)
print(epoch)
I'm also new to Deep learning and GAN models, but this method solved similar problem for my DCGAN project. Use kernel size at least 4*4: it's my guess, but it seems that small kernels can't catch details in image, no matter how deep the network is. Other tips I found are mostly from here:(same link from above)
https://machinelearningmastery.com/how-to-train-stable-generative-adversarial-networks/
df = pd.read_csv('F:/series.csv')
train, validate, test = df[0:60], df[60:80], df[80:100]
sc = MinMaxScaler(feature_range = (-1, 1))
train = sc.fit_transform(train)
validate = sc.fit_transform(validate)
test = sc.fit_transform(test)
train = train.reshape((len(train),1))
test = test.reshape((len(test),1))
validate = validate.reshape((len(validate),1))
n_input = 5
n_features = 1
generator_train = TimeseriesGenerator(train, train, length=n_input, batch_size=2)
generator_validate = TimeseriesGenerator(validate, validate, length=n_input, batch_size=2)
generator_test = TimeseriesGenerator(test, test, length=n_input, batch_size=2)
model = Sequential()
model.add(LSTM(200, return_sequences = True, input_shape=(n_input, n_features)))
model.add(Dropout(0.2))
model.add(LSTM(200))
model.add(Dense(units = 1))
model.compile(loss='mean_squared_error', optimizer='adam')
history = model.fit_generator(generator_train, epochs= 100, validation_data = generator_validate)
model.evaluate_generator(generator_test)
prediction = model.predict_generator(generator_test, steps = 5)
prediction.shape
(10,1)
test.shape
(20,1)
This confuses me, How to solve the problem? How to evaluate the predicted and test data? Whats the mistake I am making?
I just found the answer, the predict generator length depends on length of (test_generator*batch_size). Now to measure RMSE eliminate the first n_input length of test data. Now the size becomes equal.
I have two inputs: qi_pos & qi_neg with the same shape. They should be processed by the two mlp layers, and finally get two results which acts as score. Here is my codes:
self.mlp1_pos = nn_layers.full_connect_(qi_pos, 256, activation='relu', use_bn = None, keep_prob=self.keep_prob, name = 'deep_mlp_1')
self.mlp2_pos = nn_layers.full_connect_(self.mlp1_pos, 128, activation='relu', use_bn = True, keep_prob=self.keep_prob, name = 'deep_mlp_2')
self.pos_pair_sim = nn_layers.full_connect_(self.mlp2_pos, 1, activation=None, use_bn = True, keep_prob=self.keep_prob, name = 'deep_mlp_3')
tf.get_variable_scope().reuse_variables()
self.mlp1_neg = nn_layers.full_connect_(qi_neg, 256, activation='relu', use_bn = None, keep_prob=self.keep_prob, name = 'deep_mlp_1')
self.mlp2_neg = nn_layers.full_connect_(self.mlp1_neg, 128, activation='relu', use_bn = True, keep_prob=self.keep_prob, name = 'deep_mlp_2')
self.neg_pair_sim = nn_layers.full_connect_(self.mlp2_neg, 1, activation=None, use_bn = True, keep_prob=self.keep_prob, name = 'deep_mlp_3')
I use BN layer to normalize the nodes in hidden layers:
def full_connect_(inputs, num_units, activation=None, use_bn = None, keep_prob = 1.0, name='full_connect_'):
with tf.variable_scope(name):
shape = [inputs.get_shape()[-1], num_units]
weight = weight_variable(shape)
bias = bias_variable(shape[-1])
outputs_ = tf.matmul(inputs, weight) + bias
if use_bn:
outputs_ = tf.contrib.layers.batch_norm(outputs_, center=True, scale=True, is_training=True,decay=0.9,epsilon=1e-5, scope='bn')
if activation=="relu":
outputs = tf.nn.relu(outputs_)
elif activation == "tanh":
outputs = tf.tanh(outputs_)
elif activation == "sigmoid":
outputs = tf.nn.sigmoid(outputs_)
else:
outputs = outputs_
return outputs
with tf.name_scope('predictions'):
self.sim_diff = self.pos_pair_sim - self.neg_pair_sim # shape = (batch_size, 1)
self.preds = tf.sigmoid(self.sim_diff) # shape = (batch_size, 1)
self.infers = self.pos_pair_sim
Below is the loss definition.It seems all right.
with tf.name_scope('predictions'):
sim_diff = pos_pair_sim - neg_pair_sim
predictions = tf.sigmoid(sim_diff)
self.infers = pos_pair_sim
## loss and optim
with tf.name_scope('loss'):
self.loss = nn_layers.cross_entropy_loss_with_reg(self.labels, self.preds)
tf.summary.scalar('loss', self.loss)
I am not sure whether I have used the BN layers in right way. I mean that the BN parameters are derived from the hidden units from the two separate parts, which are based on qi_pos and qi_neg tensors as inputs. Anyway, anyone could help check it?
Your code seems fine to me, there's no problem in applying BN in different branches of the network. But I'd like to mention few notes here:
BN hyperparameters are pretty standard, so I usually don't manually set decay, epsilon and renorm_decay. It doesn't mean you mustn't change them, it's simply unnecessary in most cases.
You're applying the BN before the activation function, however, there is evidence that it works better if applied after the activation. See, for example, this discussion. Once again, it doesn't mean it's a bug, simply one more architecture to consider.
I implemented a genrative adversarial network in Keras. My training data size is about 16,000, where each image is of 32*32 size. All of my training images are the resized versions of the imageds from the imagenet dataset with regard to the object detection task. I fed the image matrix directly into the network without doing the center crop. I used the AdamOptimizer with the learning rate being 1e-4, and beta1 being 0.5 and I also set the dropout rate to be 0.1. I first trained the discrimator on 3000 real images and 3000 fake images and it achieved a 93% accuracy. Then, I trained for 500 epochs with the batch size being 32. However, my model seemed to converge in only a few epochs(<10), and the images it generated were ugly.
Plot of the Loss Function
Random Samples Generated by the Generator
I was wondering whether my training dataset is too small(compared to those in the paper of DCGAN, which are more than 300,000) or my model configuration is not correct. What's more, should I train the SGD on D for k iterations (where k is small, perhaps 1) and then training with SGD on G for one iteration as suggested by Ian Goodfellow in the original paper?(I have just tried to train them one at a time)
Below is the configuration of the generator.
g_input = Input(shape=[100])
H = Dense(1024*4*4, init='glorot_normal')(g_input)
H = BatchNormalization(mode=2)(H)
H = Activation('relu')(H)
H = Reshape( [4, 4,1024] )(H)
H = UpSampling2D(size=( 2, 2))(H)
H = Convolution2D(512, 3, 3, border_mode='same', init='glorot_uniform')(H)
H = BatchNormalization(mode=2)(H)
H = Activation('relu')(H)
H = UpSampling2D(size=( 2, 2))(H)
H = Convolution2D(256, 3, 3, border_mode='same', init='glorot_uniform')(H)
H = BatchNormalization(mode=2)(H)
H = Activation('relu')(H)
H = UpSampling2D(size=( 2, 2))(H)
H = Convolution2D(3, 3, 3, border_mode='same', init='glorot_uniform')(H)
g_V = Activation('tanh')(H)
generator = Model(g_input,g_V)
generator.compile(loss='binary_crossentropy', optimizer=opt)
generator.summary()
Below is the configuration of the discriminator:
d_input = Input(shape=shp)
H = Convolution2D(64, 5, 5, subsample=(2, 2), border_mode = 'same', init='glorot_normal')(d_input)
H = LeakyReLU(0.2)(H)
#H = Dropout(dropout_rate)(H)
H = Convolution2D(128, 5, 5, subsample=(2, 2), border_mode = 'same', init='glorot_normal')(H)
H = BatchNormalization(mode=2)(H)
H = LeakyReLU(0.2)(H)
#H = Dropout(dropout_rate)(H)
H = Flatten()(H)
H = Dense(256, init='glorot_normal')(H)
H = LeakyReLU(0.2)(H)
d_V = Dense(2,activation='softmax')(H)
discriminator = Model(d_input,d_V)
discriminator.compile(loss='categorical_crossentropy', optimizer=dopt)
discriminator.summary()
Below is the configuration of GAN as a whole:
gan_input = Input(shape=[100])
H = generator(gan_input)
gan_V = discriminator(H)
GAN = Model(gan_input, gan_V)
GAN.compile(loss='categorical_crossentropy', optimizer=opt)
GAN.summary()
I think problem is with loss function
Try
loss='categorical_crossentropy',
I suspect that your generator is trainable while you training the gan. You can verify by using generator.layers[-1].get_weights() to see if the parameters changed during training process of gan.
You should freeze discriminator before you assemble it to gan:
generator.trainnable = False
gan_input = Input(shape=[100])
H = generator(gan_input)
gan_V = discriminator(H)
GAN = Model(gan_input, gan_V)
GAN.compile(loss='categorical_crossentropy', optimizer=opt)
GAN.summary()
see this discussion:
https://github.com/fchollet/keras/issues/4674