Tensorflow convolution layer crashes with "failed to enqueue convolution on stream"

Tensorflow convolution layer crashes with "failed to enqueue convolution on stream" - machine-learning

I'm using convolution layer in tensorflow with 4GB memory GPU(GTX980)
before using convolution layer, everything worked fine but when I started to use convolutional layer, following error occurred
failed to enqueue convolution on stream: CUDNN_STATUS_NOT_SUPPORTED
I heard this issue is related with GPU memory
and I know single tensorflow OP can have maximum 2GB due to protobuf limitation but my network doesn't have any OP that is over 2GB so this can't be problem.
question is, my whole network size(weight matrix size) is even less when I use convolution layer but this error keep occur.
When I change batch size to really smaller number, no error occurs but SGD works poor in that kind of small batch size.
can this be solved using other frameworks like PyTorch? or can I still use Tensorflow to run batch size of 500000?
or is it related with small memory(4GB) GPU?
help me please I'm stuck.
network summary
one 1d-convolution layer
FC layers
regression layer
data summary
batch size = 500000
feature size = 15 (float)
placeholder size for input : 15(feature num) x 8(float64) x 500000(batch size) = 60MB
model code
as you can see, it's really small network
I tried with bigger network without convolution layer but it worked fine
class MyModel:
def __init__(self, learning_rate, batch_size, neighbor, weight_decay = 0.9, huber_delta=0.3, keep_prob_lst=[]):
""" hyperparameters """
self.isConv = True
self.batch_size = batch_size
self.lr = learning_rate
self.input_size = neighbor * 3
self.output_size = 1
self.neighbor = neighbor
self.weight_decay = weight_decay
self.conv1_size = 10
self.layer1_size = 100
self.layer2_size = 100
self.huber_delta = huber_delta
self.keep_prob_lst_val = keep_prob_lst
self.global_step = tf.Variable(0, dtype=tf.int32, trainable=False, name='global_step')
def _create_placeholders(self):
""" define the placeholders for input and output """
with tf.name_scope("data"):
self.input = tf.placeholder(tf.float32, shape = [self.batch_size, self.input_size], name='input')
self.output = tf.placeholder(tf.float32, shape= [self.batch_size, self.output_size], name='output')
def _create_weights(self):
""" define weights. """
# Assemble this part of the graph on the CPU. You can change it to GPU if you have GPU
with tf.name_scope("weights"):
self.conv_W_1 = tf.Variable(tf.random_normal([3,1, self.conv1_size], stddev=0.01, mean=0.0, seed=0), name='conv_layer1_weight')
self.conv_b_1 = tf.Variable(tf.zeros([1, self.conv1_size * self.neighbor]), name='conv_layer1_bias')
self.W_1 = tf.Variable(tf.random_normal([self.conv1_size * self.neighbor, self.layer1_size], stddev=0.01, mean=0.0, seed=0), name='layer1_weight')
self.b_1 = tf.Variable(tf.zeros([1,self.layer1_size]), name='layer1_bias')
self.W_2 = tf.Variable(tf.random_normal([self.layer1_size, self.layer2_size], stddev=0.01, mean=0.0, seed=0), name='layer2_weight')
self.b_2 = tf.Variable(tf.zeros([1,self.layer2_size]), name='layer2_bias')
self.W_out = tf.Variable(tf.random_normal([self.layer2_size, self.output_size], stddev=0.01, mean=0.0, seed=0), name='layer_out_weight')
self.b_out = tf.Variable(tf.zeros([1,self.output_size]), name='layer_out_bias')
def _create_loss(self):
""" define the inference + the loss function """
with tf.name_scope("loss"):
self.conv1_input = tf.reshape(self.input, [self.batch_size, self.neighbor*3, 1])
self.conv1_output = tf.nn.conv1d(self.conv1_input, self.conv_W_1, 3, 'VALID')
self.conv1_output_reshape = tf.reshape(self.conv1_output, [self.batch_size, -1]) + self.conv_b_1
self.layer1_output = tf.nn.relu(tf.matmul(self.conv1_output_reshape, self.W_1) + self.b_1)
self.layer2_output = tf.nn.relu(tf.matmul(self.layer1_output, self.W_2) + self.b_2)
self.layer_out_output = tf.matmul(self.layer2_output, self.W_out) + self.b_out
self.se = 0.5 * tf.square(self.layer_out_output - self.output, name='square')
self.loss = tf.reduce_mean(self.se)
def _create_optimizer(self):
""" define optimizer """
self.optimizer = tf.train.AdamOptimizer(learning_rate=self.lr).minimize(self.loss,
global_step=self.global_step)
def build_graph(self):
""" Build the graph for our model """
self._create_placeholders()
self._create_weights()
self._create_loss()
self._create_optimizer()
# self._create_summaries()

Related

Pytorch model stuck at 0.5 though loss decreases consistently

This is using PyTorch
I have been trying to implement UNet model on my images, however, my model accuracy is always exact 0.5. Loss does decrease.
I have also checked for class imbalance. I have also tried playing with learning rate. Learning rate affects loss but not the accuracy.
My architecture below ( from here )
""" `UNet` class is based on https://arxiv.org/abs/1505.04597
The U-Net is a convolutional encoder-decoder neural network.
Contextual spatial information (from the decoding,
expansive pathway) about an input tensor is merged with
information representing the localization of details
(from the encoding, compressive pathway).
Modifications to the original paper:
(1) padding is used in 3x3 convolutions to prevent loss
of border pixels
(2) merging outputs does not require cropping due to (1)
(3) residual connections can be used by specifying
UNet(merge_mode='add')
(4) if non-parametric upsampling is used in the decoder
pathway (specified by upmode='upsample'), then an
additional 1x1 2d convolution occurs after upsampling
to reduce channel dimensionality by a factor of 2.
This channel halving happens with the convolution in
the tranpose convolution (specified by upmode='transpose')
Arguments:
in_channels: int, number of channels in the input tensor.
Default is 3 for RGB images. Our SPARCS dataset is 13 channel.
depth: int, number of MaxPools in the U-Net. During training, input size needs to be
(depth-1) times divisible by 2
start_filts: int, number of convolutional filters for the first conv.
up_mode: string, type of upconvolution. Choices: 'transpose' for transpose convolution
"""
class UNet(nn.Module):
def __init__(self, num_classes, depth, in_channels, start_filts=16, up_mode='transpose', merge_mode='concat'):
super(UNet, self).__init__()
if up_mode in ('transpose', 'upsample'):
self.up_mode = up_mode
else:
raise ValueError("\"{}\" is not a valid mode for upsampling. Only \"transpose\" and \"upsample\" are allowed.".format(up_mode))
if merge_mode in ('concat', 'add'):
self.merge_mode = merge_mode
else:
raise ValueError("\"{}\" is not a valid mode for merging up and down paths.Only \"concat\" and \"add\" are allowed.".format(up_mode))
# NOTE: up_mode 'upsample' is incompatible with merge_mode 'add'
if self.up_mode == 'upsample' and self.merge_mode == 'add':
raise ValueError("up_mode \"upsample\" is incompatible with merge_mode \"add\" at the moment "
"because it doesn't make sense to use nearest neighbour to reduce depth channels (by half).")
self.num_classes = num_classes
self.in_channels = in_channels
self.start_filts = start_filts
self.depth = depth
self.down_convs = []
self.up_convs = []
# create the encoder pathway and add to a list
for i in range(depth):
ins = self.in_channels if i == 0 else outs
outs = self.start_filts*(2**i)
pooling = True if i < depth-1 else False
down_conv = DownConv(ins, outs, pooling=pooling)
self.down_convs.append(down_conv)
# create the decoder pathway and add to a list
# - careful! decoding only requires depth-1 blocks
for i in range(depth-1):
ins = outs
outs = ins // 2
up_conv = UpConv(ins, outs, up_mode=up_mode, merge_mode=merge_mode)
self.up_convs.append(up_conv)
self.conv_final = conv1x1(outs, self.num_classes)
# add the list of modules to current module
self.down_convs = nn.ModuleList(self.down_convs)
self.up_convs = nn.ModuleList(self.up_convs)
self.reset_params()
#staticmethod
def weight_init(m):
if isinstance(m, nn.Conv2d):
#https://prateekvjoshi.com/2016/03/29/understanding-xavier-initialization-in-deep-neural-networks/
##Doc: https://pytorch.org/docs/stable/nn.init.html?highlight=xavier#torch.nn.init.xavier_normal_
init.xavier_normal_(m.weight)
init.constant_(m.bias, 0)
def reset_params(self):
for i, m in enumerate(self.modules()):
self.weight_init(m)
def forward(self, x):
encoder_outs = []
# encoder pathway, save outputs for merging
for i, module in enumerate(self.down_convs):
x, before_pool = module(x)
encoder_outs.append(before_pool)
for i, module in enumerate(self.up_convs):
before_pool = encoder_outs[-(i+2)]
x = module(before_pool, x)
# No softmax is used. This means we need to use
# nn.CrossEntropyLoss is your training script,
# as this module includes a softmax already.
x = self.conv_final(x)
return x
Parameters are :
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
x,y = train_sequence[0] ; batch_size = x.shape[0]
model = UNet(num_classes = 2, depth=5, in_channels=5, merge_mode='concat').to(device)
optim = torch.optim.Adam(model.parameters(),lr=0.01, weight_decay=1e-3)
criterion = nn.BCEWithLogitsLoss() #has sigmoid internally
epochs = 1000
The function for training is :
import torch.nn.functional as f
def train_model(epoch,train_sequence):
"""Train the model and report validation error with training error
Args:
model: the model to be trained
criterion: loss function
data_train (DataLoader): training dataset
"""
model.train()
for idx in range(len(train_sequence)):
X, y = train_sequence[idx]
images = Variable(torch.from_numpy(X)).to(device) # [batch, channel, H, W]
masks = Variable(torch.from_numpy(y)).to(device)
outputs = model(images)
print(masks.shape, outputs.shape)
loss = criterion(outputs, masks)
optim.zero_grad()
loss.backward()
# Update weights
optim.step()
# total_loss = get_loss_train(model, data_train, criterion)
My function for calculating loss and accuracy is below:
def get_loss_train(model, train_sequence):
"""
Calculate loss over train set
"""
model.eval()
total_acc = 0
total_loss = 0
for idx in range(len(train_sequence)):
with torch.no_grad():
X, y = train_sequence[idx]
images = Variable(torch.from_numpy(X)).to(device) # [batch, channel, H, W]
masks = Variable(torch.from_numpy(y)).to(device)
outputs = model(images)
loss = criterion(outputs, masks)
preds = torch.argmax(outputs, dim=1).float()
acc = accuracy_check_for_batch(masks.cpu(), preds.cpu(), images.size()[0])
total_acc = total_acc + acc
total_loss = total_loss + loss.cpu().item()
return total_acc/(len(train_sequence)), total_loss/(len(train_sequence))
Edit : Code which runs (calls) the functions:
for epoch in range(epochs):
train_model(epoch, train_sequence)
train_acc, train_loss = get_loss_train(model,train_sequence)
print("Train Acc:", train_acc)
print("Train loss:", train_loss)
Can someone help me identify as why is accuracy always exact 0.5?
Edit-2:
As asked accuracy_check_for_batch function is here:
def accuracy_check_for_batch(masks, predictions, batch_size):
total_acc = 0
for index in range(batch_size):
total_acc += accuracy_check(masks[index], predictions[index])
return total_acc/batch_size
and
def accuracy_check(mask, prediction):
ims = [mask, prediction]
np_ims = []
for item in ims:
if 'str' in str(type(item)):
item = np.array(Image.open(item))
elif 'PIL' in str(type(item)):
item = np.array(item)
elif 'torch' in str(type(item)):
item = item.numpy()
np_ims.append(item)
compare = np.equal(np_ims[0], np_ims[1])
accuracy = np.sum(compare)
return accuracy/len(np_ims[0].flatten())

I found the mistake.
model = UNet(num_classes = 2, depth=5, in_channels=5, merge_mode='concat').to(device)
should be
model = UNet(num_classes = 1, depth=5, in_channels=5, merge_mode='concat').to(device)
because I am using BCELosswithLogits.

Neural Network does not perform well on the CIFAR-10 dataset

I have been trying to implement a CNN on the CIFAR-10 dataset for a few days and my test set accuracy does not seem to go beyond the 10% and the error just hang around 69.07733. I have tweaking the model and few days but in vain. I haven't been able to spot out where I am going wrong. Please help me recognise the fault in the model. Here is the code for it:
import os
import sys
import pickle
import tensorflow as tf
import numpy as np
from matplotlib import pyplot as plt
data_root = './cifar-10-batches-py'
train_data = np.ndarray(shape=(50000,3072), dtype=np.float32)
train_labels = np.ndarray(shape=(50000), dtype=np.float32)
num_images = 0
test_data = np.ndarray(shape=(10000,3072),dtype = np.float32)
test_labels = np.ndarray(shape=(10000),dtype=np.float32)
meta_data = {}
for file in os.listdir(data_root):
file_path = os.path.join(data_root,file)
with open(file_path,'rb') as f:
temp = pickle.load(f,encoding ='bytes')
if file == 'batches.meta':
for i,j in enumerate(temp[b'label_names']):
meta_data[i] = j
if 'data_batch_' in file:
for i in range(10000):
train_data[num_images,:] = temp[b'data'][i]
train_labels[num_images] = temp[b'labels'][i]
num_images += 1
if 'test_batch' in file:
for i in range(10000):
test_data[i,:] = temp[b'data'][i]
test_labels[i] = temp[b'labels'][i]
'''
print('meta: \n',meta_data)
train_data = train_data.reshape(50000,3,32,32).transpose(0,2,3,1)
print('\ntrain data: \n',train_data.shape,'\nLabels: \n',train_labels[0])
print('\ntest data: \n',test_data[0].shape,'\nLabels: \n',train_labels[0])'''
#accuracy function acc = (no. of correct prediction/total attempts) * 100
def accuracy(predictions, labels):
return (100 * (np.sum(np.argmax(predictions,1)== np.argmax(labels, 1))/predictions.shape[0]))
#reformat the data
def reformat(data,labels):
data = data.reshape(data.shape[0],3,32,32).transpose(0,2,3,1).astype(np.float32)
labels = (np.arange(10) == labels[:,None]).astype(np.float32)
return data,labels
train_data, train_labels = reformat(train_data,train_labels)
test_data, test_labels = reformat(test_data, test_labels)
print ('Train ',train_data[0][1])
plt.axis("off")
plt.imshow(train_data[1], interpolation = 'nearest')
plt.savefig("1.png")
plt.show()
'''
print("Train: \n",train_data.shape,test_data[0],"\nLabels: \n",train_labels.shape,train_labels[:11])
print("Test: \n",test_data.shape,test_data[0],"\nLabels: \n",test_labels.shape,test_labels[:11])'''
image_size = 32
num_channels = 3
batch_size = 30
patch_size = 5
depth = 64
num_hidden = 256
num_labels = 10
graph = tf.Graph()
with graph.as_default():
#input data and labels
train_input = tf.placeholder(tf.float32,shape=(batch_size,image_size,image_size,num_channels))
train_output = tf.placeholder(tf.float32,shape=(batch_size,num_labels))
test_input = tf.constant(test_data)
#layer weights and biases
layer_1_weights = tf.Variable(tf.truncated_normal([patch_size,patch_size,num_channels,depth]))
layer_1_biases = tf.Variable(tf.zeros([depth]))
layer_2_weights = tf.Variable(tf.truncated_normal([patch_size,patch_size,depth,depth]))
layer_2_biases = tf.Variable(tf.constant(0.1, shape=[depth]))
layer_3_weights = tf.Variable(tf.truncated_normal([64*64, num_hidden]))
layer_3_biases = tf.Variable(tf.constant(0.1, shape=[num_hidden]))
layer_4_weights = tf.Variable(tf.truncated_normal([num_hidden, num_labels]))
layer_4_biases = tf.Variable(tf.constant(0.1, shape=[num_labels]))
def convnet(data):
conv_1 = tf.nn.conv2d(data, layer_1_weights,[1,1,1,1], padding = 'SAME')
hidden_1 = tf.nn.relu(conv_1+layer_1_biases)
norm_1 = tf.nn.lrn(hidden_1, 4, bias=1.0, alpha=0.001 / 9.0, beta=0.75)
pool_1 = tf.nn.max_pool(norm_1,[1,2,2,1],[1,2,2,1], padding ='SAME')
conv_2 = tf.nn.conv2d(pool_1,layer_2_weights,[1,1,1,1], padding = 'SAME')
hidden_2 = tf.nn.relu(conv_2+layer_2_biases)
norm_2 = tf.nn.lrn(hidden_2, 4, bias=1.0, alpha=0.001 / 9.0, beta=0.75)
pool_2 = tf.nn.max_pool(norm_2,[1,2,2,1],[1,2,2,1], padding ='SAME')
shape = pool_2.get_shape().as_list()
hidd2_trans = tf.reshape(pool_2,[shape[0],shape[1]*shape[2]*shape[3]])
hidden_3 = tf.nn.relu(tf.matmul(hidd2_trans,layer_3_weights) + layer_3_biases)
return tf.nn.relu(tf.matmul(hidden_3,layer_4_weights) + layer_4_biases)
logits = convnet(train_input)
loss = tf.reduce_sum(tf.nn.softmax_cross_entropy_with_logits(labels=train_output, logits = logits))
optimizer = tf.train.AdamOptimizer(1e-4).minimize(loss)
train_prediction = tf.nn.softmax(logits)
test_prediction = tf.nn.softmax(convnet(test_input))
num_steps = 100000
with tf.Session(graph=graph) as session:
tf.global_variables_initializer().run()
print('Initialized \n')
for step in range(num_steps):
offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
batch = train_data[offset:(offset+batch_size),:,:,:]
batch_labels = train_labels[offset:(offset+batch_size),:]
feed_dict ={train_input: batch, train_output: batch_labels}
_,l,prediction = session.run([optimizer, loss, train_prediction], feed_dict = feed_dict)
if (step % 500 == 0):
print("Loss at step %d: %f" %(step, l))
print("Accuracy: %f" %(accuracy(prediction, batch_labels)))
print("Test accuracy: %f" %(accuracy(session.run(test_prediction), test_labels)))

On a first glance I would say the initialization of the CNN is the culprit. A convnet is an optimization algorithm in a highly non-convex space and therefore depends a lot on careful initialization to not get stuck on local minima or saddle points. Look at xavier initialization for an example on how to fix that.
Example Code:
W = tf.get_variable("W", shape=[784, 256],
initializer=tf.contrib.layers.xavier_initializer())

Problem is your network is having very high depth(number of filters = 64 for both layers). Also, you are training the network from scratch. And your dataset of CIFAR10 (50000 images) is very little. Moreover, each CIFAR10 image is only 32x32x3 size.
Couple of alternatives what I can suggest you is to retrain a pre-trained model, i.e do transfer learning.
Other better alternative is to reduce the number of filters in each layer. In this way, you will be able to train the model from scratch and also it will be faster. (Assuming you don't have GPU).
Next you are making use of local response normalization. I would suggest you to remove this layer and do mean normalization in pre-processing step.
Next, if you feel the learning is not picking up at all, try increasing the learning rate a little and see.
Lastly, just to reduce some operation in your code, you are reshaping your tensor and then doing transpose in many places like this:
data.reshape(data.shape[0],3,32,32).transpose(0,2,3,1)
Why not directly reshape it to something like this?
data.reshape(data.shape[0], 32, 32, 3)
Hope the answer helps you.

TensorFlow average gradients over several batches

This is a possible duplicate of Tensorflow: How to get gradients per instance in a batch?. I ask it anyway, because there has not been a satisfying answer and the goal here is a bit different.
I have a very big network that I can fit on my GPU but the max batch size I can feed is 32. Anything bigger than that causes the GPU to run out of memory. I want to use a bigger batch in order to get a more accurate approximation of the gradient.
For concreteness, let's say I want to compute the gradient on a big batch of size 96 by feeding 3 batches of 32 in turn. The best way that I know of is to use Optimizer.compute_gradients() and Optimizer.apply_gradients(). Here is a small example how it can work
import tensorflow as tf
import numpy as np
learn_rate = 0.1
W_init = np.array([ [1,2,3], [4,5,6], [7,8,9] ], dtype=np.float32)
x_init = np.array([ [11,12,13], [14,15,16], [17,18,19] ], dtype=np.float32)
X = tf.placeholder(dtype=np.float32, name="x")
W = tf.Variable(W_init, dtype=np.float32, name="w")
y = tf.matmul(X, W, name="y")
loss = tf.reduce_mean(y, name="loss")
opt = tf.train.GradientDescentOptimizer(learn_rate)
grad_vars_op = opt.compute_gradients(loss)
sess = tf.Session()
sess.run(tf.global_variables_initializer())
# Compute the gradients for each batch
grads_vars1 = sess.run(grad_vars_op, feed_dict = {X: x_init[None,0]})
grads_vars2 = sess.run(grad_vars_op, feed_dict = {X: x_init[None,1]})
grads_vars3 = sess.run(grad_vars_op, feed_dict = {X: x_init[None,2]})
# Separate the gradients from the variables
grads1 = [ grad for grad, var in grads_vars1 ]
grads2 = [ grad for grad, var in grads_vars2 ]
grads3 = [ grad for grad, var in grads_vars3 ]
varl = [ var for grad, var in grads_vars1 ]
# Average the gradients
grads = [ (g1 + g2 + g3)/3 for g1, g2, g3 in zip(grads1, grads2, grads3)]
sess.run(opt.apply_gradients(zip(grads,varl)))
print("Weights after 1 gradient")
print(sess.run(W))
Now this is all very ugly and inefficient since the forward pass is being run on the GPU while averaging the gradients happens on the CPU and then applying them happens on the GPU again.
Moreover, this code throws an exception because grads is a list of np.arrays and to make it work, one would have to create a tf.placeholder for every gradient.
I am sure there should be a better and more efficient way to do this? Any suggestions?

You can create copy of trainable_variables and accumulate batch gradients. Here's few simple steps to follow
...
opt = tf.train.GradientDescentOptimizer(learn_rate)
# constant to scale sum of gradient
const = tf.constant(1/n_batches)
# get all trainable variables
t_vars = tf.trainable_variables()
# create a copy of all trainable variables with `0` as initial values
accum_tvars = [tf.Variable(tf.zeros_like(tv.initialized_value()),trainable=False) for t_var in t_vars]
# create a op to initialize all accums vars
zero_ops = [tv.assign(tf.zeros_like(tv)) for tv in accum_tvars]
# compute gradients for a batch
batch_grads_vars = opt.compute_gradients(loss, t_vars)
# collect the (scaled by const) batch gradient into accumulated vars
accum_ops = [accum_tvars[i].assign_add(tf.scalar_mul(const, batch_grad_var[0]) for i, batch_grad_var in enumerate(batch_grads_vars)]
# apply accums gradients
train_step = opt.apply_gradients([(accum_tvars[i], batch_grad_var[1]) for i, batch_grad_var in enumerate(batch_grads_vars)])
# train_step = opt.apply_gradients(zip(accum_tvars, zip(*batch_grads_vars)[1])
while True:
# initialize the accumulated gards
sess.run(zero_ops)
# number of batches for gradient accumulation
n_batches = 3
for i in xrange(n_batches):
sess.run(accum_ops, feed_dict={X: x_init[:, i]})
sess.run(train_step)

very large value of loss in AlexNet

Actually I am using AlexNet to classify my images in 2 groups , I am feeding images to the model in a batch of 60 images and the loss which I am getting after every batch is 6 to 7 digits large (for ex. 1428529.0) , here I am confused that why my loss is such a large value because on MNIST dataset the loss which I got was very small as compared to this. Can anyone explain me why I am getting such a large loss value.
Thanks in advance ;-)
Here is the code :-
import tensorflow as tf
import matplotlib.pyplot as plt
import numpy as np
import os
img_size = 227
num_channels = 1
img_flat_size = img_size * img_size
num_classes = 2
drop = 0.5
x = tf.placeholder(tf.float32,[None,img_flat_size])
y = tf.placeholder(tf.float32,[None,num_classes])
drop_p = tf.placeholder(tf.float32)
def new_weight(shape):
return tf.Variable(tf.random_normal(shape))
def new_bias(size):
return tf.Variable(tf.random_normal(size))
def new_conv(x,num_input_channels,filter_size,num_filters,stride,padd="SAME"):
shape = [filter_size,filter_size,num_input_channels,num_filters]
weight = new_weight(shape)
bias = new_bias([num_filters])
conv = tf.nn.conv2d(x,weight,strides=[1,stride,stride,1],padding=padd)
conv = tf.nn.bias_add(conv,bias)
return tf.nn.relu(conv)
def new_max_pool(x,k,stride):
max_pool = tf.nn.max_pool(x,ksize=[1,k,k,1],strides=[1,stride,stride,1],padding="VALID")
return max_pool
def flatten_layer(layer):
layer_shape = layer.get_shape()
num_features = layer_shape[1:4].num_elements()
flat_layer = tf.reshape(layer,[-1,num_features])
return flat_layer,num_features
def new_fc_layer(x,num_input,num_output):
weight = new_weight([num_input,num_output])
bias = new_bias([num_output])
fc_layer = tf.matmul(x,weight) + bias
return fc_layer
def lrn(x, radius, alpha, beta, bias=1.0):
"""Create a local response normalization layer."""
return tf.nn.local_response_normalization(x, depth_radius=radius,
alpha=alpha, beta=beta,
bias=bias)
def AlexNet(x,drop,img_size):
x = tf.reshape(x,shape=[-1,img_size,img_size,1])
conv1 = new_conv(x,num_channels,11,96,4,"VALID")
max_pool1 = new_max_pool(conv1,3,2)
norm1 = lrn(max_pool1, 2, 2e-05, 0.75)
conv2 = new_conv(norm1,96,5,256,1)
max_pool2 = new_max_pool(conv2,3,2)
norm2 = lrn(max_pool2, 2, 2e-05, 0.75)
conv3 = new_conv(norm2,256,3,384,1)
conv4 = new_conv(conv3,384,3,384,1)
conv5 = new_conv(conv4,384,3,256,1)
max_pool3 = new_max_pool(conv5,3,2)
layer , num_features = flatten_layer(max_pool3)
fc1 = new_fc_layer(layer,num_features,4096)
fc1 = tf.nn.relu(fc1)
fc1 = tf.nn.dropout(fc1,drop)
fc2 = new_fc_layer(fc1,4096,4096)
fc2 = tf.nn.relu(fc2)
fc2 = tf.nn.dropout(fc2,drop)
out = new_fc_layer(fc2,4096,2)
return out #, tf.nn.softmax(out)
def read_and_decode(tfrecords_file, batch_size):
'''read and decode tfrecord file, generate (image, label) batches
Args:
tfrecords_file: the directory of tfrecord file
batch_size: number of images in each batch
Returns:
image: 4D tensor - [batch_size, width, height, channel]
label: 1D tensor - [batch_size]
'''
# make an input queue from the tfrecord file
filename_queue = tf.train.string_input_producer([tfrecords_file])
reader = tf.TFRecordReader()
_, serialized_example = reader.read(filename_queue)
img_features = tf.parse_single_example(
serialized_example,
features={
'label': tf.FixedLenFeature([], tf.int64),
'image_raw': tf.FixedLenFeature([], tf.string),
})
image = tf.decode_raw(img_features['image_raw'], tf.uint8)
##########################################################
# you can put data augmentation here, I didn't use it
##########################################################
# all the images of notMNIST are 28*28, you need to change the image size if you use other dataset.
image = tf.reshape(image, [227, 227])
label = tf.cast(img_features['label'], tf.int32)
image_batch, label_batch = tf.train.batch([image, label],
batch_size= batch_size,
num_threads= 1,
capacity = 6000)
return tf.reshape(image_batch,[batch_size,227*227*1]), tf.reshape(label_batch, [batch_size])
pred = AlexNet(x,drop_p,img_size) #pred
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred,labels=y))
optimiser = tf.train.AdamOptimizer(learning_rate = 0.001).minimize(loss)
correct_pred = tf.equal(tf.argmax(pred,1),tf.argmax(y,1))
accuracy = tf.reduce_mean(tf.cast(correct_pred,tf.float32))
cost = tf.summary.scalar('loss',loss)
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
merge_summary = tf.summary.merge_all()
summary_writer = tf.summary.FileWriter('./AlexNet',graph = tf.get_default_graph())
tf_record_file = 'train.tfrecords'
x_val ,y_val = read_and_decode(tf_record_file,20)
y_val = tf.one_hot(y_val,depth=2,on_value=1,off_value=0)
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
x_val = x_val.eval()
y_val = y_val.eval()
epoch = 2
for i in range(epoch):
_, summary= sess.run([optimiser,merge_summary],feed_dict={x:x_val,y:y_val,drop_p:drop})
summary_writer.add_summary(summary,i)
loss_a,accu = sess.run([loss,accuracy],feed_dict={x:x_val,y:y_val,drop_p:1.0})
print "Epoch "+str(i+1) +', Minibatch Loss = '+ \
"{:.6f}".format(loss_a) + ', Training Accuracy = '+ \
'{:.5f}'.format(accu)
print "Optimization Finished!"
tf_record_file1 = 'test.tfrecords'
x_v ,y_v = read_and_decode(tf_record_file1,10)
y_v = tf.one_hot(y_v,depth=2,on_value=1,off_value=0)
coord1 = tf.train.Coordinator()
threads1 = tf.train.start_queue_runners(coord=coord1)
x_v = sess.run(x_v)
y_v = sess.run(y_v)
print "Testing Accuracy : "
print sess.run(accuracy,feed_dict={x:x_v,y:y_v,drop_p:1.0})
coord.request_stop()
coord.join(threads)
coord1.request_stop()
coord1.join(threads1)

Take a look a what a confusion matrix is. It is a performance evaluator. In addition, you should compare your precision versus your recall. Precision is the accuracy of your positive predictions and recall is the ratio of positive instances that are correctly detected by the classifier. By combining both precision and recall, you get the F_1 score which is keep in evaluating the problems of your model.
I would suggest you pick up the text Hands-On Machine Learning with Scikit-Learn and TensorFlow. It is a truly comprehensive book and covers what I describe above in more detail.

What are the problems that causes neural networks stagnate in learning?

I was trying to see how accurate a neural network can approximate simple functions, like a scalar-valued polynomial in several variables. So I had these ideas:
Fix a polynomial of several variables, say, f(x_1,..,x_n).
Generate 50000 vectors of length n using numpy.random which will serve as training data.
Evaluate the f(x) at these points, the value will be used as label.
Make test data and label in the same way
Write a neural network and see how accuracy it can approximate f(x) on test set.
Here is my sample neural network implemented in tensorflow
import tensorflow as tf
import numpy as np
input_vector_length = int(10)
output_vector_length = int(1)
train_data_size = int(50000)
test_data_size = int(10000)
train_input_domain = [-10, 10] #Each component in an input vector is between -10 and 10
test_input_domain = [-10, 10]
iterations = 20000
batch_size = 200
regularizer = 0.01
sess = tf.Session()
x = tf.placeholder(tf.float32, shape=[None, input_vector_length], name="x")
y = tf.placeholder(tf.float32, shape =[None, output_vector_length], name="y")
function = tf.reduce_sum(x, 1) + 0.25*tf.pow(tf.reduce_sum(x,1), 2) + 0.025*tf.pow(tf.reduce_sum(x,1), 3)
#make train data input
train_input = (train_input_domain[1]-train_input_domain[0])*np.random.rand(train_data_size, input_vector_length) + train_input_domain[0]
#make train data label
train_label = sess.run(function, feed_dict = {x : train_input})
train_label = train_label.reshape(train_data_size, output_vector_length)
#make test data input
test_input = (test_input_domain[1]-test_input_domain[0])*np.random.rand(test_data_size, input_vector_length) + test_input_domain[0]
#make test data label
test_label = sess.run(function, feed_dict = {x : test_input})
test_label = test_label.reshape(test_data_size, output_vector_length)
def weight_variables(shape, name):
initial = 10*tf.truncated_normal(shape, stddev=0.1)
return tf.Variable(initial)
def bias_variables(shape, name):
initial = 10*tf.truncated_normal(shape, stddev=0.1)
return tf.Variable(initial)
def take_this_batch(data, batch_index=[]):
A = []
for i in range(len(batch_index)):
A.append(data[i])
return A
W_0 = weight_variables(shape=[input_vector_length, 10], name="W_0")
B_0 = bias_variables(shape=[10], name="W_0")
y_1 = tf.sigmoid(tf.matmul(x, W_0) + B_0)
W_1 = weight_variables(shape=[10, 20], name="W_1")
B_1 = bias_variables(shape=[20], name="B_1")
y_2 = tf.sigmoid(tf.matmul(y_1, W_1) + B_1)
W_2 = weight_variables(shape=[20,40], name="W_2")
B_2 = bias_variables(shape=[40], name="B_2")
y_3 = tf.sigmoid(tf.matmul(y_2, W_2) + B_2)
keep_prob = tf.placeholder(tf.float32, name="keep_prob")
y_drop = tf.nn.dropout(y_3, keep_prob)
W_output = weight_variables(shape=[40, output_vector_length], name="W_output")
B_output = bias_variables(shape=[output_vector_length], name="B_output")
y_output = tf.matmul(y_drop, W_output) + B_output
weight_sum = tf.reduce_sum(tf.square(W_0)) + tf.reduce_sum(tf.square(W_1)) + tf.reduce_sum(tf.square(W_2)) + tf.reduce_sum(tf.square(W_3))
cost = tf.reduce_mean(tf.square(y - y_output)) + regularizer*(weight_sum)
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cost)
error = cost
sess.run(tf.initialize_all_variables())
with sess.as_default():
for step in range(iterations):
batch_index = np.random.randint(low=0, high=train_data_size, size=batch_size)
batch_input = take_this_batch(train_input, batch_index)
batch_label = take_this_batch(train_label, batch_index)
train_step.run(feed_dict = {x : batch_input, y:batch_label, keep_prob:0.5})
if step % 1000 == 0:
current_error = error.eval(feed_dict = {x:batch_input, y:batch_label, keep_prob:1.0})
print("step %d, Current error is %f" % (step,current_error))
print(error.eval(feed_dict={x:test_input, y:test_label, keep_prob:1.0}))
Simply speaking, the performance of this neural network is horrifying! My neural network has three hidden layers of size 10, 20 and 40. The input layer is of size 10, and the output layer has size 1. I used a simple L^2 cost function, and I regularized it with the square of weights and regularizer 0.01.
During training stage, I noticed that the error seems to get stuck and refuses to go down. I am wondering what could go wrong? Thanks a lot for reading this long question. Any suggestion is appreciated.

Since you are using sigmoid as the activation function in the hidden layers, the value at these neurons is reduced to the range of (0,1). Hence, it is a good idea to normalize the input data for this network.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Tensorflow convolution layer crashes with "failed to enqueue convolution on stream" - machine-learning

Related

Pytorch model stuck at 0.5 though loss decreases consistently

Neural Network does not perform well on the CIFAR-10 dataset

TensorFlow average gradients over several batches

very large value of loss in AlexNet

What are the problems that causes neural networks stagnate in learning?

Categories

Resources