Is there any toy example of building convolutional autoencoders using MxNet? - machine-learning

I'm looking for implementations of convolutional autoencoder using MxNet. But there is only one example of autoencoder based on Fully Connected Networks, which is here. There is also an issue asking similar questions in github, but receives very few responses. Is there any toy example of convolutional autoencoders implemented using MxNet?

Please find an example of Conv Autoencoder model in Mxnet Gluon. Code quoted from here. Training this model in a standard way in Gluon.
from mxnet import gluon as g
class CNNAutoencoder(g.nn.HybridBlock):
def __init__(self):
super(CNNAutoencoder, self).__init__()
with self.name_scope():
self.encoder = g.nn.HybridSequential('encoder_')
with self.encoder.name_scope():
self.encoder.add(g.nn.Conv2D(16, 3, strides=3, padding=1, activation='relu'))
self.encoder.add(g.nn.MaxPool2D(2, 2))
self.encoder.add(g.nn.Conv2D(8, 3, strides=2, padding=1, activation='relu'))
self.encoder.add(g.nn.MaxPool2D(2, 1))
self.decoder = g.nn.HybridSequential('decoder_')
with self.decoder.name_scope():
self.decoder.add(g.nn.Conv2DTranspose(16, 3, strides=2, activation='relu'))
self.decoder.add(g.nn.Conv2DTranspose(8, 5, strides=3, padding=1, activation='relu'))
self.decoder.add(g.nn.Conv2DTranspose(1, 2, strides=2, padding=1, activation='tanh'))
def forward(self, x):
x = self.encoder(x)
x = self.decoder(x)
return x
model = CNNAutoencoder()
model.hybridize()

There is still no convolutional autoencoder example in mxnet, though there is some progress in research in that area. Anyway, there is a ticket for that in MxNet github, but it is still open. You are more than welcome to contribute, by, for example, migrating the code from Keras.

Related

Clueless as to how to proceed and improve my chess neural network

I am starting to make a neural network that can learn chess. As of current, my training data is roughly 50 million lines long and stored in a CSV file, where each line contains a fen and an outcome. I've made a model and a small function so far. It can play, but not very well.
def create_model() -> tf.keras.Model:
"""Create and return a TensorFlow model for evaluating chess positions.
Returns:
A TensorFlow model.
"""
model = tf.keras.Sequential()
model.add(tf.keras.layers.Conv2D(filters=128, kernel_size=3, padding='same', activation='relu', kernel_regularizer='l2', input_shape=(8, 8, 12)))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Conv2D(filters=128, kernel_size=3, padding='same', activation='relu', kernel_regularizer='l2'))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.MaxPool2D(pool_size=2))
model.add(tf.keras.layers.Dropout(rate=0.2))
model.add(tf.keras.layers.Conv2D(filters=256, kernel_size=3, padding='same', activation='relu', kernel_regularizer='l2'))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Conv2D(filters=256, kernel_size=3, padding='same', activation='relu', kernel_regularizer='l2'))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.MaxPool2D(pool_size=2))
model.add(tf.keras.layers.Dropout(rate=0.2))
model.add(tf.keras.layers.Conv2D(filters=512, kernel_size=3, padding='same', activation='relu', kernel_regularizer='l2'))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Conv2D(filters=512, kernel_size=3, padding='same', activation='relu', kernel_regularizer='l2'))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.MaxPool2D(pool_size=2))
model.add(tf.keras.layers.Dropout(rate=0.2))
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(units=1024, activation='relu', kernel_regularizer='l2'))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Dropout(rate=0.2))
model.add(tf.keras.layers.Dense(units=512, activation='relu', kernel_regularizer='l2'))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Dropout(rate=0.2))
model.add(tf.keras.layers.Dense(units=2, activation='softmax', kernel_regularizer='l2'))
optimiser = tf.keras.optimizers.Adam()
model.compile(optimizer=optimiser, loss='categorical_crossentropy', metrics=['accuracy'])
return model
def train() -> None:
"""
Train the TensorFlow model using the data in the `sample_fen.csv` file. The model is saved to the file `weights.h5` after training.
"""
# This is only needed for training
import pandas as pd
training_data = pd.read_csv(r'neural_net\Players\mtcs_engine\sample_fen.csv', chunksize=100000)
model = create_model()
try:
model.load_weights(r"neural_net\Players\mtcs_engine\weights.h5")
print("Weights file found. Loading weights.")
except FileNotFoundError:
print("No weights file found. Training from scratch.")
try:
for cycle, chunk in enumerate(training_data):
games = chunk.values.tolist()
if cycle <= 11:
continue
# Preprocess the data
positions = []
outcomes = []
for game in games:
position = fen_to_tensor(game[0])
outcome = game[1]
if outcome == "w":
one_hot_outcome = [1, 0]
elif outcome == "b":
one_hot_outcome = [0, 1]
else:
one_hot_outcome = [0, 0]
outcomes.append(one_hot_outcome)
positions.append(position)
positions = np.array(positions)
outcomes = np.array(outcomes)
model.fit(positions, outcomes, epochs=150, batch_size=64)
print(f"Finished training cycle {cycle}")
except KeyboardInterrupt:
pass
model.save_weights(r"neural_net\Players\mtcs_engine\weights.h5")
print()
print("Saved weights to disk")
but upon learning for around a day its accuracy has increased from 0.5000 to 0.5100 with a loss in the hundreds of thousands. To be honest, I'm not really sure what I'm doing at all. Does anyone have any pointers, be it with the model or anything else? Full code can be found at https://github.com/Iridum-png/warden-chess/blob/master/neural_net/Players/mtcs_engine/mtcs_engine.py
I think you should think more about what it is you're trying to accomplish. Working in AI is difficult but far from impossible for beginners, but it's important to remember to start small. That is to say, answer these questions in order:
What is the model's goal?
How can it best achieve it?
What do I need to do to handle the model's inputs and outputs to realistically make these predictions?
It looks like all your model is trying to do is determine if a game is a win for white or for black. Is that what you want it to do?
You also need to do quite a bit of research into what each layer in your network is for. In particular: this stretch is highly problematic and will lead to a very confused output due to your use of dropouts (which are good for preventing overfitting but it seems the opposite is the case for you):
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Dropout(rate=0.2))
model.add(tf.keras.layers.Dense(units=512, activation='relu', kernel_regularizer='l2'))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Dropout(rate=0.2))
If you want this model to PLAY chess I would suggest having your network do the following:
Take as input the 8,8 chess board, but also provide it with a move. This way the model can output a probability of the move instead of trying to guess at what to play (which is much more difficult). You can also provide the color to play for additional aid
Research the monte-carlo search and how alpha-zero implemented it
All in all, making a move prediction from only a board, while possible, is extremely difficult, and would require 2000+ epochs to have a hope of a chance at, and even then, the model would be woefully inept at looking ahead and wouldn't get much above 600 ELO if I had to guess.

How to use categorical_hinge loss in keras in order to train with an SVM in the last layer?

I wanna train a CNN using SVM to classify at the last layer. I understand that the categorical_hinge is the best loss function for that . I have 6 classes to classify .
My model is as shown below:
model = Sequential()
model.add(Conv2D(50, 3, 3, activation = 'relu', input_shape = train_data.shape[1:]))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(50, 3, 3, activation = 'relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(50, 3, 3, activation = 'relu'))
model.add(Flatten())
model.add(Dense(400, activation = 'relu'))
model.add(Dropout(0.5))
model.add(Dense(128, activation = 'relu'))
model.add(Dropout(0.5))
model.add(Dense(1, activation = 'sigmoid'))
Is there a problem with the network , data processing , or the loss function?
The model does not learn anything after a point as shown in the image
What should I do?
Your model has a single output neuron, there is no way this will work with 6 classes. The output of your model should have 6 neurons. Also the output of your model should have no activation function in order to produce logits that the categorical hinge can use.
Note that the categorical hinge was added recently (2-3 weeks ago) so its quite new and probably not many people have tested it.
Use hinge loss in and linear activation in last layer.
model.add(Dense(nb_classes), W_regularizer=l2(0.01))
model.add(Activation('linear'))
model.compile(loss='hinge',
optimizer='adadelta',
metrics=['accuracy'])
for more information visit https://github.com/keras-team/keras/issues/6090

Keras autoencoder negative loss and val_loss with data in range [-1 1]

I am trying to adapt keras autoencoder example to a my data. I have the following network:
Xtrain = np.reshape(Xtrain, (len(Xtrain), 28, 28, 2))
Xtest = np.reshape(Xtest, (len(Xtest), 28, 28, 2))
input_signal = Input(shape=(28, 28, 2))
x = Conv2D(16, (3, 3), activation='relu', padding='same')(input_signal)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
encoded = MaxPooling2D((2, 2), padding='same', name='encoder')(x)
# added Dense layers, is that correct?
encoded2 = Flatten()(encoded)
encoded2 = Dense(128, activation='sigmoid')(encoded2)
encoded2 = Dense(128, activation='softmax')(encoded2)
encoded3 = Reshape((4, 4, 8))(encoded2)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(encoded3)
x = UpSampling2D((2, 2))(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
x = Conv2D(16, (3, 3), activation='relu')(x)
x = UpSampling2D((2, 2))(x)
decoded = Conv2D(2, (3, 3), activation='sigmoid', padding='same')(x)
autoencoder = Model(inputs=input_signal, outputs=decoded)
encoder = Model(input_signal, encoded2)
autoencoder.compile(optimizer='adam', loss='binary_crossentropy')
autoencoder.fit(Xtrain, Xtrain, epochs=100, batch_size=128, shuffle=True, validation_data=(Xtest, Xtest))
And, when I'm running on MNIST data, which are normalized to [0,1] everything works fine, but with my data that are in range [-1,1] I only see negative losses and 0.0000 accuracy while training. If I do data = np.abs(data), training starts and looks that goes well, but doing abs() on data makes no reasons to train data fakes.
The data I'm trying to feed to network are IQ channels of signal, 1st channel for real part, and 2nd channel for imag part, so both are normalized to a [-1 1], and both often contains very low values, e.g. 5e-12. I have shaped them to a (28,28,2) input.
I have also added Dense layers in the middle of autoencoder, as I wish to make predictions about classes (that are fitted automatically) when autoencoder completes training. Am I did this correctly, does this breaks the network?
There are several issues with your question, including your understanding of autoencoders and their usage. I strongly suggest at least going through the Keras blog post Building Autoencoders in Keras (if you do have gone through it, arguably you have to do it again, this time more thoroughly).
A few general points, most of which are included in the above linked post:
Autoencoders are not used for classification, hence it makes no sense to ask for a metric such as accuracy. Similarly, since the fitting objective is the reconstruction of their input, categorical cross entropy is not the correct loss function to use (try binary cross entropy instead).
The very existence of the intermediate dense layers you use is puzzling, and even more puzzling is the choice of a sigmoid layer followed by a softmax one; the same holds for the sigmoid choice in your final, decoded layer. Both these activation functions are normally used for classification purposes at final layers, so again refer to point (1) above.
I strongly suggest you start with a model demonstrated in the blog post linked above, and, if necessary, incrementally modify it to fit your purpose, as I am not sure what you have built here can even qualify as an autoencoder in the first place.
You are mixing between binary ('sigmoid') and categorical ('softmax' and 'categorical_crossentropy'). Change the following:
Remove the dense layers in between and feed 'encoded' instead of 'encoded3' to the decoder
Change the autoencoder loss to 'binary_crossentropy'
Alternatively if you really want to try the dense layers in between, just use them without an activation function (None)

Intuition behind Stacking Multiple Conv2D Layers before Dropout in CNN

Background:
Tagging TensorFlow since Keras runs on top of it and this is more a general deep learning question.
I have been working on the Kaggle Digit Recognizer problem and used Keras to train CNN models for the task. This model below has the original CNN structure I used for this competition and it performed okay.
def build_model1():
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), padding="Same" activation="relu", input_shape=[28, 28, 1]))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Dropout(0.25))
model.add(layers.Conv2D(64, (3, 3), padding="Same", activation="relu"))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Dropout(0.25))
model.add(layers.Conv2D(64, (3, 3), padding="Same", activation="relu"))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Dropout(0.25))
model.add(layers.Flatten())
model.add(layers.Dense(64, activation="relu"))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(10, activation="softmax"))
return model
Then I read some other notebooks on Kaggle and borrowed another CNN structure (copied below), which works much better than the one above in that it achieved better accuracy, lower error rate, and took many more epochs before overfitting the training data.
def build_model2():
model = models.Sequential()
model.add(layers.Conv2D(32, (5, 5),padding ='Same', activation='relu', input_shape = (28, 28, 1)))
model.add(layers.Conv2D(32, (5, 5),padding = 'Same', activation ='relu'))
model.add(layers.MaxPool2D((2, 2)))
model.add(layers.Dropout(0.25))
model.add(layers.Conv2D(64,(3, 3),padding = 'Same', activation ='relu'))
model.add(layers.Conv2D(64, (3, 3),padding = 'Same', activation ='relu'))
model.add(layers.MaxPool2D(pool_size=(2, 2), strides=(2, 2)))
model.add(layers.Dropout(0.25))
model.add(layers.Flatten())
model.add(layers.Dense(256, activation = "relu"))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(10, activation = "softmax"))
return model
Question:
Is there any intuition or explanation behind the better performance of the second CNN structure? What is it that makes stacking 2 Conv2D layers better than just using 1 Conv2D layer before max pooling and dropout? Or is there something else that contributes to the result of the second model?
Thank y'all for your time and help.
The main difference between these two approaches is that the later (2 conv) has more flexibility in expressing non-linear transformations without loosing information. Maxpool removes information from the signal, dropout forces distributed representation, thus both effectively make it harder to propagate information. If, for given problem, highly non-linear transformation has to be applied on raw data, stacking multiple convs (with relu) will make it easier to learn, that's it. Also note that you are comparing a model with 3 max poolings with model with only 2, consequently the second one will potentially loose less information. Another thing is it has way bigger fully connected bit at the end, while the first one is tiny (64 neurons + 0.5 dropout means that you effectively have at most 32 neurons active, that is a tiny layer!). To sum up:
These architectures differe in many aspects, not just stacking conv nets.
Stacking convnets usually leads to less information being lost in processing; see for example "all convolutional" architectures.

Keras model.fit() - which training algorithm is used?

I am using Keras on top of Theano to create a MLP which I train and use to predict time series. Independently of the structure and depth of my network I cannot figure out (Keras documentation, StackOverflow, searching the net...) which training algorithm (Backpropagation,...) Keras' model.fit() function is using.
Within Theano (used without Keras before) I could define the way the parameters are adjusted myself with
self.train_step = theano.function(inputs=[u_in, t_in, lrate], outputs=[cost, y],
on_unused_input='warn',
updates=[(p, p - lrate * g) for p, g in zip(self.parameters, self.gradients)],
allow_input_downcast=True)
Not finding any information causes a certain fear that I am missing something essential and that this may be a totally stupid question.
Can anybody help me out here? Thanks a lot in advance.
Look at the example here:
...
model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),
activation='relu',
input_shape=input_shape))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax'))
model.compile(loss=keras.losses.categorical_crossentropy,
optimizer=keras.optimizers.Adadelta(),
metrics=['accuracy'])
model.fit(x_train, y_train,
batch_size=batch_size,
epochs=epochs,
verbose=1,
validation_data=(x_test, y_test))
score = model.evaluate(x_test, y_test, verbose=0)
...
model.fit does not use an algorithm to predict the outcome, rather it uses the model you describe. The optimiser algorithm is then specified in model.compile
e.g.
model.compile(loss=keras.losses.categorical_crossentropy,
optimizer=**keras.optimizers.Adadelta()**,
metrics=['accuracy'])
You can find out more about the available optimisers here : https://keras.io/optimizers/

Resources