Batch wise batch normalization in TensorFlow - machine-learning

What is the correct way of performing batch wise batch normalization in TensorFlow? (I.e. I don't want to compute a running mean and variance). My current implementation is based on tf.nn.batch_normalization, where xis the output of a convolutional layer with shape [batch_size, width, height, num_channels]. I want to perform batch norm channel wise.
batch_mean, batch_var = tf.nn.moments(x, axes=[0, 1, 2])
x = tf.nn.batch_normalization(x, batch_mean, batch_var, offset=0, scale=0, variance_epsilon=1e-6)
But the results of this implementation are very bad. Comparison with tensorflow.contrib.slim.batch_norm shows that it is fare inferior (similarly bad training performance).
What am I doing wrong, and what can explain this bad performance?

You may consider tf.contrib.layers.layer_norm. You may want to reshape x to [batch, channel, width, height] and set begin_norm_axis=2 for channel wise normalization (each batch and each channel will be normalized independently).
Here is example how to reshape from your original order to [batch, channel, width, height]:
import tensorflow as tf
sess = tf.InteractiveSession()
batch = 2
height = 2
width = 2
channel = 3
tot_size = batch * height * channel * width
ts_4D_bhwc = tf.reshape(tf.range(tot_size), [batch, height, width, channel])
ts_4D_bchw = tf.transpose(ts_4D_bhwc, perm=[0,3,1,2])
print("Original tensor w/ order bhwc\n")
print(ts_4D_bhwc.eval())
print("\nTransormed tensor w/ order bchw\n")
print(ts_4D_bchw.eval())
Outputs:
Original tensor w/ order bhwc
[[[[ 0 1 2]
[ 3 4 5]]
[[ 6 7 8]
[ 9 10 11]]]
[[[12 13 14]
[15 16 17]]
[[18 19 20]
[21 22 23]]]]
Transormed tensor w/ order bchw
[[[[ 0 3]
[ 6 9]]
[[ 1 4]
[ 7 10]]
[[ 2 5]
[ 8 11]]]
[[[12 15]
[18 21]]
[[13 16]
[19 22]]
[[14 17]
[20 23]]]]

The solution by #Maosi works, but I found that it is slow. The following is simple and fast.
batch_mean, batch_var = tf.nn.moments(x, axes=[0, 1, 2])
x = tf.subtract(x, batch_mean)
x = tf.div(x, tf.sqrt(batch_var) + 1e-6)

Related

How to keep input and output shape consistent after applying conv2d and convtranspose2d to image data?

I'm using Pytorch to experiment image segmentation task. I found input and output shape are often inconsistent after applying Conv2d() and Convtranspose2d() to my image data of shape [1,1,height,width]). How to fix it the issue for arbitrary height and width?
Best regards
import torch
data = torch.rand(1,1,16,26)
a = torch.nn.Conv2d(1,1,kernel_size=3, stride=2)
b = a(data)
print(b.shape)
c = torch.nn.ConvTranspose2d(1,1,kernel_size=3, stride=2)
d = c(b)
print(d.shape) # torch.Size([1, 1, 15, 25])
TLDR; Given the same parameters nn.ConvTranspose2d is not the invert operation of nn.Conv2d in terms of dimension shape conservation.
From an input with spatial dimension x_in, nn.Conv2d will output a tensor with respective spatial dimension x_out:
x_out = [(x_in + 2p - d*(k-1) - 1)/s + 1]
Where [.] is the whole part function, p the padding, d the dilation, k the kernel size, and s the stride.
In your case: k=3, s=2, while other parameters default to p=0 and d=1. In other words x_out = [(x_in - 3)/2 + 1]. So given x_in=16, you get x_out = [7.5] = 7.
On the other hand, we have for nn.ConvTranspose2d:
x_out = (x_in-1)*s - 2p + d*(k-1) + op + 1
Where [.] is the whole part function, p the padding, d the dilation, k the kernel size, s the stride, and op the output padding.
In your case: k=3, s=2, while other parameters default to p=0, d=1, and op=0. You get x_out = (x_in-1)*2 + 3. So given x_in=7, you get x_out = 15.
However, if you apply an output padding on your transpose convolution, you will get the desired shape:
>>> conv = nn.Conv2d(1,1, kernel_size=3, stride=2)
>>> convT = nn.ConvTranspose2d(1, 1, kernel_size=3, stride=2, output_padding=1)
>>> convT(conv(data)).shape
torch.Size([1, 1, 16, 26])

Optimizing filter sizes of CNN with Optuna

I have created a CNN for classification of three classes based on input images of size 39 x 39. I'm optimizing the parameters of the network using Optuna. For Optuna I'm defining the following parameters to optimize:
num_blocks = trial.suggest_int('num_blocks', 1, 4)
num_filters = [int(trial.suggest_categorical("num_filters", [32, 64, 128, 256]))]
kernel_size = trial.suggest_int('kernel_size', 2, 7)
num_dense_nodes = trial.suggest_categorical('num_dense_nodes', [64, 128, 256, 512, 1024])
dense_nodes_divisor = trial.suggest_categorical('dense_nodes_divisor', [1, 2, 4, 8])
batch_size = trial.suggest_categorical('batch_size', [16, 32, 64, 128])
drop_out = trial.suggest_discrete_uniform('drop_out', 0.05, 0.5, 0.05)
lr = trial.suggest_loguniform('lr', 1e-6, 1e-1)
dict_params = {'num_blocks': num_blocks,
'num_filters': num_filters,
'kernel_size': kernel_size,
'num_dense_nodes': num_dense_nodes,
'dense_nodes_divisor': dense_nodes_divisor,
'batch_size': batch_size,
'drop_out': drop_out,
'lr': lr}
My network looks as follows:
input_tensor = Input(shape=(39,39,3))
# 1st cnn block
x = Conv2D(filters=dict_params['num_filters'],
kernel_size=dict_params['kernel_size'],
strides=1, padding='same')(input_tensor)
x = BatchNormalization()(x, training=training)
x = Activation('relu')(x)
x = MaxPooling2D(padding='same')(x)
x = Dropout(dict_params['drop_out'])(x)
# additional cnn blocks
for i in range(1, dict_params['num_blocks']):
x = Conv2D(filters=dict_params['num_filters']*(2**i), kernel_size=dict_params['kernel_size'], strides=1, padding='same')(x)
x = BatchNormalization()(x, training=training)
x = Activation('relu')(x)
x = MaxPooling2D(padding='same')(x)
x = Dropout(dict_params['drop_out'])(x)
# mlp
x = Flatten()(x)
x = Dense(dict_params['num_dense_nodes'], activation='relu')(x)
x = Dropout(dict_params['drop_out'])(x)
x = Dense(dict_params['num_dense_nodes'] // dict_params['dense_nodes_divisor'], activation='relu')(x)
output_tensor = Dense(self.number_of_classes, activation='softmax')(x)
# instantiate and compile model
cnn_model = Model(inputs=input_tensor, outputs=output_tensor)
opt = Adam(lr=dict_params['lr'])
loss = 'categorical_crossentropy'
cnn_model.compile(loss=loss, optimizer=opt, metrics=['accuracy', tf.keras.metrics.AUC()])
I'm optimizing (minimizing) the validation loss with Optuna. There is a maximum of 4 blocks in the network and the number of filters is doubled for each block. That means e.g. 64 in the first block, 128 in the second block, 256 in the third and so on. There are two problems. First, when we start with e.g. 256 filters and a total of 4 blocks, in the last block there will be 2048 filters which is too much.
Is it possible to make the num_filters parameter dependent on the num_blocks parameter? That means if there are more blocks, the starting filter size should be smaller. So, for example, if num_blocks is chosen to be 4, num_filters should only be sampled from 32, 64 and 128.
Second, I think it is common to double the filter size but there are also networks with constant filter sizes or two convolutions (with same number of filters) before a max pooling layer (similar to VGG) and so on. Is it possible to adapt the Optuna optimization to cover all these variations?

Deep Conv Model number of parameters

I was reading this claim:
A CNN with two 5x5 convolution layers (the first with 32 channels, the
second with 64, each followed with 2x2 max pooling), a fully connected
layer with 512 units and ReLu activation, and a final softmax output
layer (1,663,370 total parameters)
I don't see how they calculate 1.6m parameters. The same network implementation gives me ~ 580k parameters which is more realistic given that this is a small network.
Assuming you are talking about MNIST images, 1 input channel, stride=1, padding=2
INPUT: [28x28x1] weights: 0
CONV5-32: [28x28x32] weights: (1*5*5)*32 + 32 = 832
POOL2: [14x14x32] weights: 0
CONV5-64: [14x14x64] weights: (5*5*32)*64 + 64 = 51,264
POOL2: [7x7x64] weights: 0
FC: [1x1x512] weights: 7*7*64*512 + 512 = 1,606,144
Softmax: [1x1x10] weights: 512*10 + 10 = 5,130
-----------------------------------------------------------
1,663,370
Consider this cheating, but here is how 1663370 is obtained:
import torch.nn as nn
#First fully-connected (linear) layer input size as in the accepted answer:
linear_in = 7*7*64
model = nn.Sequential(
nn.Conv2d(1,32,5),
nn.MaxPool2d(2,2),
nn.Conv2d(32,64,5),
nn.MaxPool2d(2,2),
nn.Linear(linear_in, 512),
nn.ReLU(),
nn.Linear(512,10)
)
Now, the parameters:
sum([p.numel() for p in model.parameters()])
1663370
Layer by Layer:
for p in model.parameters():
print(p.size())
print(p.numel())
torch.Size([32, 1, 5, 5])
800
torch.Size([32])
32
torch.Size([64, 32, 5, 5])
51200
torch.Size([64])
64
torch.Size([512, 3136])
1605632
torch.Size([512])
512
torch.Size([10, 512])
5120
torch.Size([10])
10

Tensorflow Dimension size must be evenly divisible by N but is M for 'linear/linear_model/x/Reshape'

I've been trying to use tensorflow's tf.estimator, but I'm getting the following errors regarding the shape of input/output data.
ValueError: Dimension size must be evenly divisible by 9 but is 12 for
'linear/linear_model/x/Reshape' (op: 'Reshape') with input shapes:
[4,3], [2] and with input tensors computed as partial shapes: input[1]
= [?,9].
Here is the code:
data_size = 3
iterations = 10
learn_rate = 0.005
# generate test data
input = np.random.rand(data_size,3)
output = np.dot(input, [2, 3 ,7]) + 4
output = np.transpose([output])
feature_columns = [tf.feature_column.numeric_column("x", shape=(data_size, 3))]
estimator = tf.estimator.LinearRegressor(feature_columns=feature_columns)
input_fn = tf.estimator.inputs.numpy_input_fn({"x":input}, output, batch_size=4, num_epochs=None, shuffle=True)
estimator.train(input_fn=input_fn, steps=iterations)
input data shape is shape=(3, 3):
[[ 0.06525168 0.3171153 0.61675511]
[ 0.35166298 0.71816544 0.62770994]
[ 0.77846666 0.20930611 0.1710842 ]]
output data shape is shape=(3, 1)
[[ 9.399135 ]
[ 11.25179188]
[ 7.38244104]]
I have sense it is related to input data, output data and batch_size, because when input data changed to 1 row it works. When input data rows count equal to batch_size(data_size = 10 and batch_size=10) then it throws other error:
ValueError: Shapes (1, 1) and (10, 1) are incompatible
Any help with the errors would be much appreciated.

Torch - repeat tensor like numpy repeat

I am trying to repeat a tensor in torch in two ways. For example repeating the tensor {1,2,3,4} 3 times both ways to yield;
{1,2,3,4,1,2,3,4,1,2,3,4}
{1,1,1,2,2,2,3,3,3,4,4,4}
There is a built in torch:repeatTensor function which will generate the first of the two (like numpy.tile()) but I can't find one for the latter (like numpy.repeat()). I'm sure that I could call sort on the first to give the second but I think this might be computationally expensive for larger arrays?
Thanks.
Try torch.repeat_interleave() method: https://pytorch.org/docs/stable/torch.html#torch.repeat_interleave
>>> x = torch.tensor([1, 2, 3])
>>> x.repeat_interleave(2)
tensor([1, 1, 2, 2, 3, 3])
Quoting https://discuss.pytorch.org/t/how-to-tile-a-tensor/13853 -
z = torch.FloatTensor([[1,2,3],[4,5,6],[7,8,9]])
1 2 3
4 5 6
7 8 9
z.transpose(0,1).repeat(1,3).view(-1, 3).transpose(0,1)
1 1 1 2 2 2 3 3 3
4 4 4 5 5 5 6 6 6
7 7 7 8 8 8 9 9 9
This will give you a intuitive feel of how it works.
a = torch.Tensor([1,2,3,4])
To get [1., 2., 3., 4., 1., 2., 3., 4., 1., 2., 3., 4.] we repeat the tensor thrice in the 1st dimension:
a.repeat(3)
To get [1,1,1,2,2,2,3,3,3,4,4,4] we add a dimension to the tensor and repeat it thrice in the 2nd dimension to get a 4 x 3 tensor, which we can flatten.
b = a.reshape(4,1).repeat(1,3).flatten()
or
b = a.reshape(4,1).repeat(1,3).view(-1)
Here's a generic function that repeats elements in tensors.
def repeat(tensor, dims):
if len(dims) != len(tensor.shape):
raise ValueError("The length of the second argument must equal the number of dimensions of the first.")
for index, dim in enumerate(dims):
repetition_vector = [1]*(len(dims)+1)
repetition_vector[index+1] = dim
new_tensor_shape = list(tensor.shape)
new_tensor_shape[index] *= dim
tensor = tensor.unsqueeze(index+1).repeat(repetition_vector).reshape(new_tensor_shape)
return tensor
If you have
foo = tensor([[1, 2],
[3, 4]])
By calling repeat(foo, [2,1]) you get
tensor([[1, 2],
[1, 2],
[3, 4],
[3, 4]])
So you duplicated every element along dimension 0 and left elements as they are on dimension 1.
Use einops:
from einops import repeat
repeat(x, 'i -> (repeat i)', repeat=3)
# like {1,2,3,4,1,2,3,4,1,2,3,4}
repeat(x, 'i -> (i repeat)', repeat=3)
# like {1,1,1,2,2,2,3,3,3,4,4,4}
This code works identically for any framework (numpy, torch, tf, etc.)
Can you try something like:
import torch as pt
#1 work as numpy tile
b = pt.arange(10)
print(b.repeat(3))
#2 work as numpy tile
b = pt.tensor(1).repeat(10).reshape(2,-1)
print(b)
#3 work as numpy repeat
t = pt.tensor([1,2,3])
t.repeat(2).reshape(2,-1).transpose(1,0).reshape(-1)

Resources