I wrote a little linear autoencoder on ORL 32x32 dataset (1024 - 961 - 900 - 961 - 1024).
self.encoder1 = torch.nn.Sequential(
torch.nn.Linear(inout_size, 961 ),
#torch.nn.BatchNorm1d(512),
torch.nn.ReLU(),
)
self.encoder2 = torch.nn.Sequential(
torch.nn.Linear(961, 900),
#torch.nn.BatchNorm1d(256),
torch.nn.ReLU(),
)
self.decoder = torch.nn.Sequential(
torch.nn.Linear(900, 961),
torch.nn.ReLU(),
torch.nn.Linear(961, inout_size),
Autoencoder learns some contours and outlines of the image very well after only about 50 epochs.
But I was interested in how the outputs from the first and second layers of the autoencoder look like, in order to see on the basis of which he actually tries to reconstruct the image. And I received quite strange data for which I would ask for clarification as to whether they are expected or not, and if not, what would be the expected outputs.
The original is the original image, and la1 shows the image after the first layer, and la2 the image after exiting the second layer.
enter image description here
Related
My model:
classifier = Sequential()
# Convolutional + MaxPooling -> 1
classifier.add(Conv2D(32, (3,3), input_shape = (IMAGE_SIZE, IMAGE_SIZE, 3)))
convout1 = Activation('relu')
classifier.add(convout1)
classifier.add(MaxPooling2D(pool_size = (2,2)))
classifier.add(Dropout(0.25))
I am running the following code to get weights
classifier.layers[0].get_weights()[0]
It returns an array of 3x3x3x32. Shouldn't it return 32 matrices of 3x3?
The weights shape is correct, because the convolutional filter is applied to the whole 3D input volume and the parameters for different channels are not shared (though they are shared spatially). See the picture from CS231n class:
Yes, the output volume is obtained by summing up the convolutions across the depth volume, but the parameters in each channel are different.
In your case, the channels are RGB (since input_shape = (IMAGE_SIZE, IMAGE_SIZE, 3)), the spatial filter size is 3x3 and there are 32 filters. Hence the result shape is 3x3x3x32 and shape of each filter is 3x3x3.
No, the return value has the right shape. What you are not considering is that each of the 32 filters is 3x3 in spatial dimensions, and has three channels, same as the input. This means that each filter also works on the channels dimension. What you expect would only be valid in the case of doing 2D convolution on a one channel image.
I am adapting this implementation of VAE https://github.com/keras-team/keras/blob/master/examples/variational_autoencoder.py that I found here https://blog.keras.io/building-autoencoders-in-keras.html
This implementation does not use convolutional layers so everything happens in 1D so to speak. My goal is to implement 3D convolutional layers within this model.
However I run into a shape mismatch at the loss function when running the batches (which are of 128 samples):
def vae_loss(self, x, x_decoded_mean):
xent_loss = original_dim * metrics.binary_crossentropy(x, x_decoded_mean)
#xent_loss.shape >> [128, 40, 20, 40, 1]
kl_loss = - 0.5 * K.sum(1 + z_log_var - K.square(z_mean) - K.exp(z_log_var), axis=-1)
#kl_loss.shape >> [128]
return K.mean(xent_loss + kl_loss) # >> error shape mismatch
Almost the same question is answered here already Keras - Variational Autoencoder Incompatible shape for a model with 1D convolutional layers, but I can't really understand how to extrapolate the answer to my case wjich has a more complex Input shape.
I have tried this solution:
xent_loss = original_dim * metrics.binary_crossentropy(K.flatten(x), K.flatten(x_decoded_mean))
But I don't know whether it is a valid solution or not from a mathematical point of view, although now the model is running.
Your approach is right but it's highly dependent on K.binary_crossentropy implementation. tensorflow and theano ones should work for you (as far as I know). To make it more clean and not implementation dependent I suggest you the following way:
xent_loss_vec = original_dim * metrics.binary_crossentropy(x, x_decoded_mean)
xent_loss = K.mean(xent_loss_vec, axis=[1, 2, 3, 4])
# xent_loss.shape = (128,)
Now you are taking a mean out of losses for each voxel and thanks to that every valid implementation of binary_crossentropy should work fine for you.
I found problems when trying to compile a network with one recurrent layer. It seems there is some issue with the dimensionality of the first layer and thus my understanding of how RNN layers work in Keras.
My code sample is:
model.add(Dense(8,
input_dim = 2,
activation = "tanh",
use_bias = False))
model.add(SimpleRNN(2,
activation = "tanh",
use_bias = False))
model.add(Dense(1,
activation = "tanh",
use_bias = False))
The error is
ValueError: Input 0 is incompatible with layer simple_rnn_1: expected ndim=3, found ndim=2
This error is returned regardless of input_dim value. What am I missing ?
That message means: the input going into the rnn has 2 dimensions, but an rnn layer expects 3 dimensions.
For an RNN layer, you need inputs shaped like (BatchSize, TimeSteps, FeaturesPerStep). These are the 3 dimensions expected.
A Dense layer (in keras 2) can work with either 2 or 3 dimensions. We can see that you're working with 2 because you passed an input_dim instead of passing an input_shape=(Steps,Features).
There are many possible ways to solve this, but the most meaningful and logical would be a case where your input data is a sequence with time steps.
Solution 1 - Your training data is a sequence:
If your training data is a sequence, you shape it like (NumberOfSamples, TimeSteps, Features) and pass it to your model. Make sure you use input_shape=(TimeSteps,Features) in the first layer instead of using input_dim.
Solution 2 - You reshape the output of the first dense layer so it has the additional dimension:
model.add(Reshape((TimeSteps,Features)))
Make sure that the product TimeSteps*Features is equal to 8, the output of your first dense layer.
I recently built a model for POS tagging. I tried an LSTM model and it works well, but I still want to add a CNN layer which rebuilds the original word's vector. The main problem is the flexible length of the sequence, which can be solved by a masking layer when in RNN, but that's not supported by the CNN. I still zero-pad the origin sequence to the MAXLEN and use it as the input of the CNN because the output of these extra words are still mostly zero, and can be solved by the masking layer.
But it seems very bad with low loss and low acc(0.342,0.298) compared with LSTM(0.478,0.871). What is the main reason for this? How can I solve the flexible length problem?'
input_seq = Input(shape=(None, input_dim), )
#conv,RELU
conv_out=Conv1D(
filters=200,
kernel_size=3,
padding='same',
activation='relu',
use_bias=1,)(input_seq)
#zero pad 2 at head
pad_out=ZeroPadding1D(padding=(2,0))(conv_out)
#max_pool
pool_out=MaxPool1D(pool_size=3,strides=1,padding='valid')(pad_out)
# masking
mask_out = Masking(mask_value=0.0)(pool_out)
# LSTM
lstm_out = LSTM(units=hidden_unit, return_sequences=True)(mask_out)
# drop_out
drop_out = Dropout(drop_out_rate)(lstm_out)
# softmax
output_seq = TimeDistributed(Dense(output_dim, activation="softmax"))(drop_out)
# compile
model = Model(inputs=input_seq, outputs=output_seq)
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
the padding sequences' shape is x(Samples,MAXLEN,200),y(Samples,MAXLEN,42),i use zero-pad for each sequence of x and y.
I would like to code with Keras a neural network that acts both as an autoencoder AND a classifier for semi-supervised learning. Take for example this dataset where there is a few labeled images and a lot of unlabeled images: https://cs.stanford.edu/~acoates/stl10/
Some papers listed here achieved that, or very similar things, successfully.
To sum up: if the model would have the same input data shape and the same "encoding" convolutional layers, but would split into two heads (fork-style), so there is a classification head and a decoding head, in a way that the unsupervised autoencoder will contribute to a good learning for the classification head.
With TensorFlow there would be no problem doing that as we have full control over the computational graph.
But with Keras, things are more high-level and I feel that all the calls to ".fit" must always provide all the data at once (so it would force me to tie together the classification head and the autoencoding head into one time-step).
One way in keras to almost do that would be with something that goes like this:
input = Input(shape=(32, 32, 3))
cnn_feature_map = sequential_cnn_trunk(input)
classification_predictions = Dense(10, activation='sigmoid')(cnn_feature_map)
autoencoded_predictions = decode_cnn_head_sequential(cnn_feature_map)
model = Model(inputs=[input], outputs=[classification_predictions, ])
model.compile(optimizer='rmsprop',
loss='binary_crossentropy',
metrics=['accuracy'])
model.fit([images], [labels, images], epochs=10)
However, I think and I fear that if I just want to fit things in that way it will fail and ask for the missing head:
for epoch in range(10):
# classifications step
model.fit([images], [labels, None], epochs=1)
# "semi-unsupervised" autoencoding step
model.fit([images], [None, images], epochs=1)
# note: ".train_on_batch" could probably be used rather than ".fit" to avoid doing a whole epoch each time.
How should one implement that behavior with Keras? And could the training be done jointly without having to split the two calls to the ".fit" function?
Sometimes when you don't have a label you can pass zero vector instead of one hot encoded vector. It should not change your result because zero vector doesn't have any error signal with categorical cross entropy loss.
My custom to_categorical function looks like this:
def tricky_to_categorical(y, translator_dict):
encoded = np.zeros((y.shape[0], len(translator_dict)))
for i in range(y.shape[0]):
if y[i] in translator_dict:
encoded[i][translator_dict[y[i]]] = 1
return encoded
When y contains labels, and translator_dict is a python dictionary witch contains labels and its unique keys like this:
{'unisex':2, 'female': 1, 'male': 0}
If an UNK label can't be found in this dictinary then its encoded label will be a zero vector
If you use this trick you also have to modify your accuracy function to see real accuracy numbers. you have to filter out all zero vectors from our metrics
def tricky_accuracy(y_true, y_pred):
mask = K.not_equal(K.sum(y_true, axis=-1), K.constant(0)) # zero vector mask
y_true = tf.boolean_mask(y_true, mask)
y_pred = tf.boolean_mask(y_pred, mask)
return K.cast(K.equal(K.argmax(y_true, axis=-1), K.argmax(y_pred, axis=-1)), K.floatx())
note: You have to use larger batches (e.g. 32) in order to prevent zero matrix update, because It can make your accuracy metrics crazy, I don't know why
Alternative solution
Use Pseudo Labeling :)
you can train jointly, you have to pass an array insted of single label.
I used fit_generator, e.g.
model.fit_generator(
batch_generator(),
steps_per_epoch=len(dataset) / batch_size,
epochs=epochs)
def batch_generator():
batch_x = np.empty((batch_size, img_height, img_width, 3))
gender_label_batch = np.empty((batch_size, len(gender_dict)))
category_label_batch = np.empty((batch_size, len(category_dict)))
while True:
i = 0
for idx in np.random.choice(len(dataset), batch_size):
image_id = dataset[idx][0]
batch_x[i] = load_and_convert_image(image_id)
gender_label_batch[i] = gender_labels[idx]
category_label_batch[i] = category_labels[idx]
i += 1
yield batch_x, [gender_label_batch, category_label_batch]