generative adversarial network generating image with some random pixels - machine-learning

I am trying to generate images using Generative Adversarial Networks(GANs) on CelebA aligned data set with each image resized to 64*64 in .jpeg format. My network definition is like this
def my_discriminator(input_var= None):
net = lasagne.layers.InputLayer(shape= (None, 3,64,64), input_var = input_var)
net = lasagne.layers.Conv2DLayer(net, 64, filter_size= (6,6 ),stride = 2,pad=2,W = lasagne.init.HeUniform(), nonlinearity= lasagne.nonlinearities.LeakyRectify(0.2))#64*32*32
net = lasagne.layers.Conv2DLayer(net, 128, filter_size= (6,6),stride = 2,pad=2,W = lasagne.init.HeUniform(), nonlinearity= lasagne.nonlinearities.LeakyRectify(0.2))#128*16*16
net = lasagne.layers.Conv2DLayer(net, 256, filter_size= (6,6),stride = 2,pad=2,W = lasagne.init.HeUniform(), nonlinearity= lasagne.nonlinearities.LeakyRectify(0.2))#256*8*8
net = lasagne.layers.Conv2DLayer(net, 512, filter_size= (6,6),stride = 2,pad=2,W = lasagne.init.HeUniform(), nonlinearity= lasagne.nonlinearities.LeakyRectify(0.2))#512*4*4
net = lasagne.layers.DenseLayer(net, 2048, W= lasagne.init.HeUniform(), nonlinearity= lasagne.nonlinearities.LeakyRectify(0.2))
net = lasagne.layers.DenseLayer(net, 1, nonlinearity = lasagne.nonlinearities.sigmoid)
def my_generator(input_var=None):
gen_net = lasagne.layers.InputLayer(shape = (None, 100), input_var = input_var)
gen_net = lasagne.layers.DenseLayer(gen_net, 2048, W= lasagne.init.HeUniform())
gen_net = lasagne.layers.DenseLayer(gen_net, 512*4*4, W= lasagne.init.HeUniform())
gen_net = lasagne.layers.ReshapeLayer(gen_net, shape = ([0],512,4,4))
gen_net = lasagne.layers.Deconv2DLayer(gen_net, 256,filter_size= (6,6),stride = 2,crop=2, W= lasagne.init.HeUniform(), nonlinearity= lasagne.nonlinearities.rectify)
gen_net = lasagne.layers.Deconv2DLayer(gen_net, 128,filter_size= (6,6),stride = 2,crop=2, W= lasagne.init.HeUniform(), nonlinearity= lasagne.nonlinearities.rectify)
gen_net = lasagne.layers.Deconv2DLayer(gen_net, 64, filter_size= (6,6), stride=2,crop=2,W= lasagne.init.HeUniform(), nonlinearity= lasagne.nonlinearities.rectify)
gen_net = lasagne.layers.Deconv2DLayer(gen_net, 3, filter_size= (6,6),stride = 2,crop=2, nonlinearity= lasagne.nonlinearities.tanh)
With the images generated by generator, I am getting some randomly colored pixels and also a "grid" like structure in them as can be seen in the example image:
My question is what are the reasons for these two problems, I also used almost the same architecture with one less Convolution layer in Generator and Discriminator on Cifar-10 datset with 32*32 resolution images in .png format, but there the generated images were not like this. Not sure if the image format could be the reason. I would be very thankful if someone could provide some ideas or ways or links, anything to get rid of such issues.

The reasons for these problems were:-
Random pixels- Normalization of the image data must be in accordance with the activation function of the last layer of the Generator [-1,1] -> tanh
"Grid" in generated images- The way each image's dimensions were changed. So should use 'transpose' function instead of 'reshape', to convert (64,64,3)->(3,64,64)

Related

ML classification model for any pic input size

I'm learning for weeks now how to train own models to classify images.
So far I trained some models and got a rough overview about classification.
The training pictures for my model are 3x32x32, training works fine. Predictions with test images works also fine, as long as they come in the same dimensions as the training pics.
I already added adaptive max pooling before fully connected layers, any image size can also be used with the model but the predictions are always wrong.
Can anyone tell me why it behaves like this and how can I improve it?
Thats may model:
class ResNet9(ImageClassificationBase):
def __init__(self, in_channels, num_classes):
super().__init__()
self.conv1 = conv_block(in_channels, 64)
self.conv2 = conv_block(64, 128, pool=True)
self.res1 = nn.Sequential(conv_block(128, 128), conv_block(128, 128))
self.conv3 = conv_block(128, 256, pool=True)
self.conv4 = conv_block(256, 512, pool=True)
self.res2 = nn.Sequential(conv_block(512, 512), conv_block(512, 512))
self.classifier = nn.Sequential(nn.AdaptiveMaxPool2d(1),
nn.Flatten(),
nn.Dropout(0.2),
nn.Linear(512, num_classes))
def forward(self, xb):
out = self.conv1(xb)
out = self.conv2(out)
out = self.res1(out) + out
out = self.conv3(out)
out = self.conv4(out)
out = self.res2(out) + out
out = self.classifier(out)
return out
Normalization part:
# create dataset
stats = ((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
train_trans = tt.Compose(
[tt.RandomCrop(32, padding=4, padding_mode='reflect'),
tt.ToTensor(),
tt.Normalize(*stats, inplace=True)])
test_trans = tt.Compose([tt.ToTensor(),
tt.Normalize(*stats, inplace=True)])

Optimizing filter sizes of CNN with Optuna

I have created a CNN for classification of three classes based on input images of size 39 x 39. I'm optimizing the parameters of the network using Optuna. For Optuna I'm defining the following parameters to optimize:
num_blocks = trial.suggest_int('num_blocks', 1, 4)
num_filters = [int(trial.suggest_categorical("num_filters", [32, 64, 128, 256]))]
kernel_size = trial.suggest_int('kernel_size', 2, 7)
num_dense_nodes = trial.suggest_categorical('num_dense_nodes', [64, 128, 256, 512, 1024])
dense_nodes_divisor = trial.suggest_categorical('dense_nodes_divisor', [1, 2, 4, 8])
batch_size = trial.suggest_categorical('batch_size', [16, 32, 64, 128])
drop_out = trial.suggest_discrete_uniform('drop_out', 0.05, 0.5, 0.05)
lr = trial.suggest_loguniform('lr', 1e-6, 1e-1)
dict_params = {'num_blocks': num_blocks,
'num_filters': num_filters,
'kernel_size': kernel_size,
'num_dense_nodes': num_dense_nodes,
'dense_nodes_divisor': dense_nodes_divisor,
'batch_size': batch_size,
'drop_out': drop_out,
'lr': lr}
My network looks as follows:
input_tensor = Input(shape=(39,39,3))
# 1st cnn block
x = Conv2D(filters=dict_params['num_filters'],
kernel_size=dict_params['kernel_size'],
strides=1, padding='same')(input_tensor)
x = BatchNormalization()(x, training=training)
x = Activation('relu')(x)
x = MaxPooling2D(padding='same')(x)
x = Dropout(dict_params['drop_out'])(x)
# additional cnn blocks
for i in range(1, dict_params['num_blocks']):
x = Conv2D(filters=dict_params['num_filters']*(2**i), kernel_size=dict_params['kernel_size'], strides=1, padding='same')(x)
x = BatchNormalization()(x, training=training)
x = Activation('relu')(x)
x = MaxPooling2D(padding='same')(x)
x = Dropout(dict_params['drop_out'])(x)
# mlp
x = Flatten()(x)
x = Dense(dict_params['num_dense_nodes'], activation='relu')(x)
x = Dropout(dict_params['drop_out'])(x)
x = Dense(dict_params['num_dense_nodes'] // dict_params['dense_nodes_divisor'], activation='relu')(x)
output_tensor = Dense(self.number_of_classes, activation='softmax')(x)
# instantiate and compile model
cnn_model = Model(inputs=input_tensor, outputs=output_tensor)
opt = Adam(lr=dict_params['lr'])
loss = 'categorical_crossentropy'
cnn_model.compile(loss=loss, optimizer=opt, metrics=['accuracy', tf.keras.metrics.AUC()])
I'm optimizing (minimizing) the validation loss with Optuna. There is a maximum of 4 blocks in the network and the number of filters is doubled for each block. That means e.g. 64 in the first block, 128 in the second block, 256 in the third and so on. There are two problems. First, when we start with e.g. 256 filters and a total of 4 blocks, in the last block there will be 2048 filters which is too much.
Is it possible to make the num_filters parameter dependent on the num_blocks parameter? That means if there are more blocks, the starting filter size should be smaller. So, for example, if num_blocks is chosen to be 4, num_filters should only be sampled from 32, 64 and 128.
Second, I think it is common to double the filter size but there are also networks with constant filter sizes or two convolutions (with same number of filters) before a max pooling layer (similar to VGG) and so on. Is it possible to adapt the Optuna optimization to cover all these variations?

Is my code right to use batch normalization layers in tensorflow?

I have two inputs: qi_pos & qi_neg with the same shape. They should be processed by the two mlp layers, and finally get two results which acts as score. Here is my codes:
self.mlp1_pos = nn_layers.full_connect_(qi_pos, 256, activation='relu', use_bn = None, keep_prob=self.keep_prob, name = 'deep_mlp_1')
self.mlp2_pos = nn_layers.full_connect_(self.mlp1_pos, 128, activation='relu', use_bn = True, keep_prob=self.keep_prob, name = 'deep_mlp_2')
self.pos_pair_sim = nn_layers.full_connect_(self.mlp2_pos, 1, activation=None, use_bn = True, keep_prob=self.keep_prob, name = 'deep_mlp_3')
tf.get_variable_scope().reuse_variables()
self.mlp1_neg = nn_layers.full_connect_(qi_neg, 256, activation='relu', use_bn = None, keep_prob=self.keep_prob, name = 'deep_mlp_1')
self.mlp2_neg = nn_layers.full_connect_(self.mlp1_neg, 128, activation='relu', use_bn = True, keep_prob=self.keep_prob, name = 'deep_mlp_2')
self.neg_pair_sim = nn_layers.full_connect_(self.mlp2_neg, 1, activation=None, use_bn = True, keep_prob=self.keep_prob, name = 'deep_mlp_3')
I use BN layer to normalize the nodes in hidden layers:
def full_connect_(inputs, num_units, activation=None, use_bn = None, keep_prob = 1.0, name='full_connect_'):
with tf.variable_scope(name):
shape = [inputs.get_shape()[-1], num_units]
weight = weight_variable(shape)
bias = bias_variable(shape[-1])
outputs_ = tf.matmul(inputs, weight) + bias
if use_bn:
outputs_ = tf.contrib.layers.batch_norm(outputs_, center=True, scale=True, is_training=True,decay=0.9,epsilon=1e-5, scope='bn')
if activation=="relu":
outputs = tf.nn.relu(outputs_)
elif activation == "tanh":
outputs = tf.tanh(outputs_)
elif activation == "sigmoid":
outputs = tf.nn.sigmoid(outputs_)
else:
outputs = outputs_
return outputs
with tf.name_scope('predictions'):
self.sim_diff = self.pos_pair_sim - self.neg_pair_sim # shape = (batch_size, 1)
self.preds = tf.sigmoid(self.sim_diff) # shape = (batch_size, 1)
self.infers = self.pos_pair_sim
Below is the loss definition.It seems all right.
with tf.name_scope('predictions'):
sim_diff = pos_pair_sim - neg_pair_sim
predictions = tf.sigmoid(sim_diff)
self.infers = pos_pair_sim
## loss and optim
with tf.name_scope('loss'):
self.loss = nn_layers.cross_entropy_loss_with_reg(self.labels, self.preds)
tf.summary.scalar('loss', self.loss)
I am not sure whether I have used the BN layers in right way. I mean that the BN parameters are derived from the hidden units from the two separate parts, which are based on qi_pos and qi_neg tensors as inputs. Anyway, anyone could help check it?
Your code seems fine to me, there's no problem in applying BN in different branches of the network. But I'd like to mention few notes here:
BN hyperparameters are pretty standard, so I usually don't manually set decay, epsilon and renorm_decay. It doesn't mean you mustn't change them, it's simply unnecessary in most cases.
You're applying the BN before the activation function, however, there is evidence that it works better if applied after the activation. See, for example, this discussion. Once again, it doesn't mean it's a bug, simply one more architecture to consider.

ValueError: Error when checking input: expected lstm_1_input to have shape (None, 296, 2048) but got array with shape (296, 2048, 1)

I am facing the error in the title. I have thousands of videos and each video have 37 frames. I have extracted features for each frame with a CNN model and saved them.
I have a stacked LSTM model :
batch_size = 8
features_length = 2048
seq_length = 37*batch_size
in_shape = (seq_length, features_length)
lstm_model = Sequential()
lstm_model.add(LSTM(2048, return_sequences=True, input_shape = in_shape, dropout=0.5))
lstm_model.add(Flatten())
lstm_model.add(Dense(512, activation='relu'))
lstm_model.add(Dropout(0.5))
lstm_model.add(Dense(number_of_classes, activation='softmax'))
optimizer = Adam(lr=1e-6)
lstm_model.compile(loss='categorical_crossentropy', optimizer=optimizer, metrics = metrics)
lstm_model.fit_generator(generator = generator, steps_per_epoch = train_steps_per_epoch, epochs = nb_epoch, verbose = 1, callbacks=[checkpointer, tb, early_stopper, csv_logger], validation_data=val_generator, validation_steps = val_steps_per_epoch)
I have a generator; data includes all training videos.
def generator(data):
while 1:
X, y = [], []
for _ in range(batch_size):
sequence = None
sample = random.choice(data)
folder_content, folder_name, class_name, video_features_loc = get_video_features(sample)
for f in folder_content:
image_feature_location = video_features_loc + f
feat = get_extracted_feature(image_feature_location)
X.append(feat)
y.append(get_one_class_rep(class_name))
yield np.array(X), np.array(y)
The shape of X in generator data is = (296, 2048, 1)
The shape of y in generator data is = (296, 27)
This code throws the error. I know there are couple of similar questions. I tried the suggestions there but no luck. For instance one the suggestions was reshaping the array;
X = np.reshape(X, (X.shape[2], X.shape[0], X.shape[1]))
How could I feed my input to the LSTM?
Thanks in advance
The error message tells you everything you need.
X should be shaped as (number of samples, 296, 2048) - It seems you have one sample only, by the shape of X.
But if you have 37 frames, you should definitely change your model for something accepting: (Batch size, 37, 2048) - Here, batch size seems to be 8.
seq_length=37

GAN Converges in Just a Few Epochs

I implemented a genrative adversarial network in Keras. My training data size is about 16,000, where each image is of 32*32 size. All of my training images are the resized versions of the imageds from the imagenet dataset with regard to the object detection task. I fed the image matrix directly into the network without doing the center crop. I used the AdamOptimizer with the learning rate being 1e-4, and beta1 being 0.5 and I also set the dropout rate to be 0.1. I first trained the discrimator on 3000 real images and 3000 fake images and it achieved a 93% accuracy. Then, I trained for 500 epochs with the batch size being 32. However, my model seemed to converge in only a few epochs(<10), and the images it generated were ugly.
Plot of the Loss Function
Random Samples Generated by the Generator
I was wondering whether my training dataset is too small(compared to those in the paper of DCGAN, which are more than 300,000) or my model configuration is not correct. What's more, should I train the SGD on D for k iterations (where k is small, perhaps 1) and then training with SGD on G for one iteration as suggested by Ian Goodfellow in the original paper?(I have just tried to train them one at a time)
Below is the configuration of the generator.
g_input = Input(shape=[100])
H = Dense(1024*4*4, init='glorot_normal')(g_input)
H = BatchNormalization(mode=2)(H)
H = Activation('relu')(H)
H = Reshape( [4, 4,1024] )(H)
H = UpSampling2D(size=( 2, 2))(H)
H = Convolution2D(512, 3, 3, border_mode='same', init='glorot_uniform')(H)
H = BatchNormalization(mode=2)(H)
H = Activation('relu')(H)
H = UpSampling2D(size=( 2, 2))(H)
H = Convolution2D(256, 3, 3, border_mode='same', init='glorot_uniform')(H)
H = BatchNormalization(mode=2)(H)
H = Activation('relu')(H)
H = UpSampling2D(size=( 2, 2))(H)
H = Convolution2D(3, 3, 3, border_mode='same', init='glorot_uniform')(H)
g_V = Activation('tanh')(H)
generator = Model(g_input,g_V)
generator.compile(loss='binary_crossentropy', optimizer=opt)
generator.summary()
Below is the configuration of the discriminator:
d_input = Input(shape=shp)
H = Convolution2D(64, 5, 5, subsample=(2, 2), border_mode = 'same', init='glorot_normal')(d_input)
H = LeakyReLU(0.2)(H)
#H = Dropout(dropout_rate)(H)
H = Convolution2D(128, 5, 5, subsample=(2, 2), border_mode = 'same', init='glorot_normal')(H)
H = BatchNormalization(mode=2)(H)
H = LeakyReLU(0.2)(H)
#H = Dropout(dropout_rate)(H)
H = Flatten()(H)
H = Dense(256, init='glorot_normal')(H)
H = LeakyReLU(0.2)(H)
d_V = Dense(2,activation='softmax')(H)
discriminator = Model(d_input,d_V)
discriminator.compile(loss='categorical_crossentropy', optimizer=dopt)
discriminator.summary()
Below is the configuration of GAN as a whole:
gan_input = Input(shape=[100])
H = generator(gan_input)
gan_V = discriminator(H)
GAN = Model(gan_input, gan_V)
GAN.compile(loss='categorical_crossentropy', optimizer=opt)
GAN.summary()
I think problem is with loss function
Try
loss='categorical_crossentropy',
I suspect that your generator is trainable while you training the gan. You can verify by using generator.layers[-1].get_weights() to see if the parameters changed during training process of gan.
You should freeze discriminator before you assemble it to gan:
generator.trainnable = False
gan_input = Input(shape=[100])
H = generator(gan_input)
gan_V = discriminator(H)
GAN = Model(gan_input, gan_V)
GAN.compile(loss='categorical_crossentropy', optimizer=opt)
GAN.summary()
see this discussion:
https://github.com/fchollet/keras/issues/4674

Resources