How to get the input and output channels in a CNN? - image-processing

I am specifically looking at the AlexNet architecture found here:
https://github.com/pytorch/vision/blob/master/torchvision/models/alexnet.py
I am confused as to how they are getting the input and output channels. Based on my readings of the AlexNet, I can't figure out where they are getting outputchannels = 64 from (as the second argument to the Conv2d function). Even if the 256 is split across 2 GPUs, that should give 128 rather than 64. The input channel of 3 initially represents the color channels as per my assumption. However, the other input and output channels don't make sense to me either.
Could anyone clarify what the input and output channels are?
class AlexNet(nn.Module):
def __init__(self, num_classes=1000):
super(AlexNet, self).__init__()
self.features = nn.Sequential(
nn.Conv2d(3, 64, kernel_size=11, stride=4, padding=2), #why 64?
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.Conv2d(64, 192, kernel_size=5, padding=2),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.Conv2d(192, 384, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(384, 256, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(256, 256, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
)

The 3 is the number of input channels (R, G, B). That 64 is the number of channels (i.e. feature maps) in the output of the first convolution operation. So, the first conv layer takes a color (RGB) image as input, applies 11x11 kernel with a stride 4, and outputs 64 feature maps.
I agree that this is different from the number of channels (96, 48 in each GPU) in the architecture diagram (of original AlexNet implementation).
However, PyTorch does not implement the original Alexnet architecture. Rather it implements a variant of the AlexNet implementation described in the paper: One weird trick for parallelizing convolutional neural networks.
Also, see cs231n - convolutional networks for more details about how input, filters, stride, and padding equates to output after the conv operation.
P.S: See pytorch/vision/issues/185

Related

Clueless as to how to proceed and improve my chess neural network

I am starting to make a neural network that can learn chess. As of current, my training data is roughly 50 million lines long and stored in a CSV file, where each line contains a fen and an outcome. I've made a model and a small function so far. It can play, but not very well.
def create_model() -> tf.keras.Model:
"""Create and return a TensorFlow model for evaluating chess positions.
Returns:
A TensorFlow model.
"""
model = tf.keras.Sequential()
model.add(tf.keras.layers.Conv2D(filters=128, kernel_size=3, padding='same', activation='relu', kernel_regularizer='l2', input_shape=(8, 8, 12)))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Conv2D(filters=128, kernel_size=3, padding='same', activation='relu', kernel_regularizer='l2'))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.MaxPool2D(pool_size=2))
model.add(tf.keras.layers.Dropout(rate=0.2))
model.add(tf.keras.layers.Conv2D(filters=256, kernel_size=3, padding='same', activation='relu', kernel_regularizer='l2'))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Conv2D(filters=256, kernel_size=3, padding='same', activation='relu', kernel_regularizer='l2'))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.MaxPool2D(pool_size=2))
model.add(tf.keras.layers.Dropout(rate=0.2))
model.add(tf.keras.layers.Conv2D(filters=512, kernel_size=3, padding='same', activation='relu', kernel_regularizer='l2'))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Conv2D(filters=512, kernel_size=3, padding='same', activation='relu', kernel_regularizer='l2'))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.MaxPool2D(pool_size=2))
model.add(tf.keras.layers.Dropout(rate=0.2))
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(units=1024, activation='relu', kernel_regularizer='l2'))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Dropout(rate=0.2))
model.add(tf.keras.layers.Dense(units=512, activation='relu', kernel_regularizer='l2'))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Dropout(rate=0.2))
model.add(tf.keras.layers.Dense(units=2, activation='softmax', kernel_regularizer='l2'))
optimiser = tf.keras.optimizers.Adam()
model.compile(optimizer=optimiser, loss='categorical_crossentropy', metrics=['accuracy'])
return model
def train() -> None:
"""
Train the TensorFlow model using the data in the `sample_fen.csv` file. The model is saved to the file `weights.h5` after training.
"""
# This is only needed for training
import pandas as pd
training_data = pd.read_csv(r'neural_net\Players\mtcs_engine\sample_fen.csv', chunksize=100000)
model = create_model()
try:
model.load_weights(r"neural_net\Players\mtcs_engine\weights.h5")
print("Weights file found. Loading weights.")
except FileNotFoundError:
print("No weights file found. Training from scratch.")
try:
for cycle, chunk in enumerate(training_data):
games = chunk.values.tolist()
if cycle <= 11:
continue
# Preprocess the data
positions = []
outcomes = []
for game in games:
position = fen_to_tensor(game[0])
outcome = game[1]
if outcome == "w":
one_hot_outcome = [1, 0]
elif outcome == "b":
one_hot_outcome = [0, 1]
else:
one_hot_outcome = [0, 0]
outcomes.append(one_hot_outcome)
positions.append(position)
positions = np.array(positions)
outcomes = np.array(outcomes)
model.fit(positions, outcomes, epochs=150, batch_size=64)
print(f"Finished training cycle {cycle}")
except KeyboardInterrupt:
pass
model.save_weights(r"neural_net\Players\mtcs_engine\weights.h5")
print()
print("Saved weights to disk")
but upon learning for around a day its accuracy has increased from 0.5000 to 0.5100 with a loss in the hundreds of thousands. To be honest, I'm not really sure what I'm doing at all. Does anyone have any pointers, be it with the model or anything else? Full code can be found at https://github.com/Iridum-png/warden-chess/blob/master/neural_net/Players/mtcs_engine/mtcs_engine.py
I think you should think more about what it is you're trying to accomplish. Working in AI is difficult but far from impossible for beginners, but it's important to remember to start small. That is to say, answer these questions in order:
What is the model's goal?
How can it best achieve it?
What do I need to do to handle the model's inputs and outputs to realistically make these predictions?
It looks like all your model is trying to do is determine if a game is a win for white or for black. Is that what you want it to do?
You also need to do quite a bit of research into what each layer in your network is for. In particular: this stretch is highly problematic and will lead to a very confused output due to your use of dropouts (which are good for preventing overfitting but it seems the opposite is the case for you):
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Dropout(rate=0.2))
model.add(tf.keras.layers.Dense(units=512, activation='relu', kernel_regularizer='l2'))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Dropout(rate=0.2))
If you want this model to PLAY chess I would suggest having your network do the following:
Take as input the 8,8 chess board, but also provide it with a move. This way the model can output a probability of the move instead of trying to guess at what to play (which is much more difficult). You can also provide the color to play for additional aid
Research the monte-carlo search and how alpha-zero implemented it
All in all, making a move prediction from only a board, while possible, is extremely difficult, and would require 2000+ epochs to have a hope of a chance at, and even then, the model would be woefully inept at looking ahead and wouldn't get much above 600 ELO if I had to guess.

why need to use conv2D(64) twice, instead it can be used as conv2D(128). Is both are sme or different

model = Sequential()
model.add(Conv2D(64, kernel_size=(3, 3), activation='relu', input_shape=(X_train.shape[1:])))
model.add(Conv2D(64,kernel_size= (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2), strides=(2, 2)))
model.add(Dropout(0.5))
In the above code whether I can use con2D(128) instead of conv24(64) twice.
No, you cannot, because both configurations do not represent the same functions, this pattern was introduced in the VGG network paper, and it is used to increase the representation power of the network. Two layers with 3x3 filters are kind of equivalent to one layer with a 5x5 filter (through composition), it is not equivalent to adding the number of filters
In particular, if you had a convolutional layer with 128 filters, this is not the same to having two convolutional layers with 64 filters each, specially considering that there is a ReLU activation in between them, which makes behavior more non-linear.

Intuition behind Stacking Multiple Conv2D Layers before Dropout in CNN

Background:
Tagging TensorFlow since Keras runs on top of it and this is more a general deep learning question.
I have been working on the Kaggle Digit Recognizer problem and used Keras to train CNN models for the task. This model below has the original CNN structure I used for this competition and it performed okay.
def build_model1():
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), padding="Same" activation="relu", input_shape=[28, 28, 1]))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Dropout(0.25))
model.add(layers.Conv2D(64, (3, 3), padding="Same", activation="relu"))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Dropout(0.25))
model.add(layers.Conv2D(64, (3, 3), padding="Same", activation="relu"))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Dropout(0.25))
model.add(layers.Flatten())
model.add(layers.Dense(64, activation="relu"))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(10, activation="softmax"))
return model
Then I read some other notebooks on Kaggle and borrowed another CNN structure (copied below), which works much better than the one above in that it achieved better accuracy, lower error rate, and took many more epochs before overfitting the training data.
def build_model2():
model = models.Sequential()
model.add(layers.Conv2D(32, (5, 5),padding ='Same', activation='relu', input_shape = (28, 28, 1)))
model.add(layers.Conv2D(32, (5, 5),padding = 'Same', activation ='relu'))
model.add(layers.MaxPool2D((2, 2)))
model.add(layers.Dropout(0.25))
model.add(layers.Conv2D(64,(3, 3),padding = 'Same', activation ='relu'))
model.add(layers.Conv2D(64, (3, 3),padding = 'Same', activation ='relu'))
model.add(layers.MaxPool2D(pool_size=(2, 2), strides=(2, 2)))
model.add(layers.Dropout(0.25))
model.add(layers.Flatten())
model.add(layers.Dense(256, activation = "relu"))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(10, activation = "softmax"))
return model
Question:
Is there any intuition or explanation behind the better performance of the second CNN structure? What is it that makes stacking 2 Conv2D layers better than just using 1 Conv2D layer before max pooling and dropout? Or is there something else that contributes to the result of the second model?
Thank y'all for your time and help.
The main difference between these two approaches is that the later (2 conv) has more flexibility in expressing non-linear transformations without loosing information. Maxpool removes information from the signal, dropout forces distributed representation, thus both effectively make it harder to propagate information. If, for given problem, highly non-linear transformation has to be applied on raw data, stacking multiple convs (with relu) will make it easier to learn, that's it. Also note that you are comparing a model with 3 max poolings with model with only 2, consequently the second one will potentially loose less information. Another thing is it has way bigger fully connected bit at the end, while the first one is tiny (64 neurons + 0.5 dropout means that you effectively have at most 32 neurons active, that is a tiny layer!). To sum up:
These architectures differe in many aspects, not just stacking conv nets.
Stacking convnets usually leads to less information being lost in processing; see for example "all convolutional" architectures.

input_shape 2D Convolutional layer in keras

In the Keras Documentation for Convolution2D the input_shape a 128x128 RGB pictures is given by input_shape=(3, 128, 128), thus I figured the first component should be the number of planes (or feature layers).
If I run the following code:
model = Sequential()
model.add(Convolution2D(4, 5,5, border_mode='same', input_shape=(3, 19, 19), activation='relu'))
print(model.output_shape)
I get an output_shapeof (None, 3, 19, 4), whereas in my understanding this should be (None, 4, 19, 19) with 4 the number of filters.
Is this an error in the example from the keras documentation or am I missing something?
(I am trying to recreate a part of AlphaGo so the 19x19 is the board size which would correspond to the images size. )
You are using the Theano dimension ordering (channels, rows, cols) as input but your Keras seems to use the Tensorflow one which is (rows, cols, channels).
So either you can switch to the Theano dimension ordering, directly in your code with :
import keras.backend as K
K.set_image_dim_ordering('th')
Or editing the keras.json file in (usually in ~\.keras) and switching
"image_dim_ordering": "tf" to "image_dim_ordering": "th"
Or you can keep the Tensorflow dimension ordering and switch your input_shape to (19,19,3)
Yes it should be (None, 4, 19, 19). There is something called dim_ordering in keras that decides in which index should one place the number of input channels. Check the documentation of "dim_ordering" parameter in the documentation. Mine is set to 'tf'.
So; just change the input shape to (19, 19, 3) like so
model.add(Convolution2D(4, 5,5, border_mode='same', input_shape=(19, 19,3), activation='relu'))
Then check the output shape.
You can also modify the dim_ordering in the file usually at ~/.keras/keras.json to your liking

Tensorflow MNIST using convolution parameters

I dont understand why in official documentation he use the bias_variable of size 32, as i know the bias num is equal to num of neurons in the layer and in this case the number of neurons in the first layer is equal to 28 because the image pixels = 28 and he use the padding = "SAME", why it is equal 32 not 28?
Remember that mnist is using convolutional networks, not conventional neural networks and hence, you are dealing with convolutions(not neurons) and in this example , and in convolutions you commonly use a bias for every output channel, and this example uses 32 output channels in the first convolution layer and that gives you 32 biases.
They use bias of size 32 to be compatible with the weights:
W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])
They use weights in conv2d function tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME').
tf.nn.conv2d() tells that the second parameter represent your filter and consists of [filter_height, filter_width, in_channels, out_channels]. So [5, 5, 1, 32] means that your in_channels is 1: you have a greyscale image, so no surprises here.
32 means that during our learning phase, the network will try to learn 32 different kernels which will be used during the prediction. You can change this number to any other number as it is a hyperparameter that you can tune.

Resources