Fine Tuning dimensionality error - machine-learning

I am trying to use Resnet50 for image classification problem. However it shows error and I could not fix it.
RuntimeError: inconsistent tensor size, expected tensor [120 x 2048] and src [1000 x 2048] to have the same number of elements, but got 245760 and 2048000 elements respectively at /Users/soumith/code/builder/wheel/pytorch-src/torch/lib/TH/generic/THTensorCopy.c:86
and error happens below.
self.resnet = models.resnet50(num_classes=num_breeds, pretrained='imagenet')
Model is below
class Resnet(nn.Module):
def __init__(self):
super(Resnet,self).__init__()
self.resnet = models.resnet50(num_classes=num_breeds, pretrained='imagenet')
#self.resnet = nn.Sequential(*list(resnet.children())[:-2])
#self.fc = nn.Linear(2048,num_breeds)
def forward(self,x):
x = self.resnet(x)
return x

When you create your models.resnet50 with num_classes=num_breeds, the last layer is a fully connected layer from 2048 to num_classes (which is 120 in your case).
Having pretrained='imagenet' asks pytorch to load all the corresponding weights into your network, but it has for its last layer 1000 classes, not 120. This is the source of the error since the 2048x120 tensor doesn't match the loaded weights 2048x1000.
You should either create your network with 1000 classes and load the weights, then "trim" to the classes you want to keep. Or you could create the network you wished for with 120 classes, but manually load the weights. In the last case, you only need to pay specific attention to the last layer.

Related

Pytorch reshape to modify batch size in forward()

I have a tensor of shape, say: [4,10] where 4 is the batch size and 10 is the length of my input samples buffer. Now, I know that it is really [4,5+5] i.e. the input samples buffer consists of two windows of length 5 which can be processed independently and, best, in parallel. What I am doing is, inside forward() of my model I first reshape the tensor to [8,5], run my layers on it, and then reshape it back to [4,-1] and return. What I am hoping to get from this is Pytorch would run my model on each of the windows (kind of sub-batches) in parallel, effectively yielding a parallel-for loop. It runs OK, Pytorch does not complain or anything but I am getting weird results. I'd like to know if Pytorch can work this way before I dive into debugging my model.
Well, it doesn't. The reason being the ordering pytorch uses for tensor reshaping. This can be seen by running the small repro code below.
It would be nice to have something like a 'rebatch' function in pytorch that would take care of proper memory layout as a foundation for parallel-for construcs (provided it can even be done memory-efficiently in generic case).
import torch
conv = torch.nn.Conv1d(1,3,1)
def conv_batch(t, conv, window):
batch = t.shape[0]
t = t.view(-1, t.shape[1], window)
t = conv(t)
t = t.view(batch, t.shape[1], -1)
return t
batch = 1
channels = 1
width = 4
window = 2
x = torch.arange(batch*channels*width)
x = x.view(batch,channels,width).float()
r1 = conv(x)
r2 = conv_batch(x, conv, window)
print(r1)
print(r2)
print(r1==r2)

cnn max pooling - non consecutive sliding window (skip gram like)?

When using keras to build a simple cnn like the code below and when it is used on text-based problems such as document classification, I understand that this is as if we are extracting 4-grams from the text (kernel_size of 4) and use them as features.
model = Sequential()
model.add(embedding_layer)
model.add(Conv1D(filters=100, kernel_size=4, padding='same', activation='relu'))
model.add(MaxPooling1D(pool_size=4))
model.add(Dense(4, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
and in this case, the kernel size in the conv1D layer is like a sliding window of size 4 that walks over sequences of tokens in the text to emit 4-grams.
I wonder if here is a way such that we can create 'non-consecutive sliding window in the convolution, i.e., that would generate 'skip-gram' equivalent. So for example, given the following 1d vector:
[a, b, c, d, e, f]
a conv1d with a kernel_size=3 skip=1 will scan the following sequences:
[(a,c,d),(b,d,e),(c,e,f),(d,f,padding),(e,padding,padding)] union [(a,b,d),(b,c,e),(c,d,f),(d,e,padding),(e,f,padding),(f,padding,padding)]
The reason I say 'union' is simply because I suppose from the implementation point of view, it may be easier to generate either part 1 or part 2, giving another parameter for the revised conv1d layer. and if thhat's the case and doable, I can work around this by concatenating multiple layers. But the minimum is really to have an extended conv1d layer that would take additional parameters such that it does either the first or the second part of scanning.
The idea is not new as this paper already experimented it: http://www.aclweb.org/anthology/D/D16/D16-1085.pdf
But excuse my lack of in-depth knowledge of keras I do not know how to implement it. Any suggestions please,
Many thanks in advance
You can do this creating a custom convolutional layer where certain elements in the weight matrix are zero.
You can take the regular Conv1D layer as the base class.
But before doing this, notice that you can create a "dilated" convolution by passing the dilation_rate=n parameter when creating a regular convolutional layer. This will skip n-1 grams between each taken gram in the window. Your window will have fixed regular spaces.
Creating a custom layer for that:
import keras.backend as K
#a 1D convolution that skips some entries
class SkipConv1D(Conv1D):
#in the init, let's just add a parameter to tell which grams to skip
def __init__(self, validGrams, **kwargs):
#for this example, I'm assuming validGrams is a list
#it should contain zeros and ones, where 0's go on the skip positions
#example: [1,1,0,1] will skip the third gram in the window of 4 grams
assert len(validGrams) == kwargs.get('kernel_size')
self.validGrams = K.reshape(K.constant(validGrams),(len(validGrams),1,1))
#the chosen shape matches the dimensions of the kernel
#the first dimension is the kernel size, the others are input and ouptut channels
#initialize the regular conv layer:
super(SkipConv1D,self).__init__(**kwargs)
#here, the filters, size, etc, go inside kwargs, so you should use them named
#but you may make them explicit in this __init__ definition
#if you think it's more comfortable to use it like this
#in the build method, let's replace the original kernel:
def build(self, input_shape):
#build as the original layer:
super(SkipConv1D,self).build(input_shape)
#replace the kernel
self.originalKernel = self.kernel
self.kernel = self.validGrams * self.originalKernel
Be aware of some things that weren't taken care of in this answer:
The method get_weights() will still return the original kernel, not the kernel with the skipped mask. (It's possible to fix this, but there will be an extra work, if necessary, please tell me)
There are unused weights in this layer. This is a simple implementation. The focus here was to keep it the most similar possible to an existing Conv layer, with all its features. It's also possible to use only strictly necessary weights, but this will increase the complexity a lot, and require lots of rewriting of the keras original code for recreating all the original possibilities.
If your kernel_size is too long, it will be very boring to define the validGrams var. You may want to create a version that takes some skipped indices and then converts it in the type of list used above.
Different channels skipping different grams:
It's possible to do this inside a layer as well, if instead of using a validGrams with shape (length,), you use one with shape (length,outputFilters).
In this case, at the point where we create the validGrams matrix, we should reshape it like:
validGrams = np.asarray(validGrams)
shp = (validGrams.shape[0],1,validGrams.shape[1])
validGrams = validGrams.reshape(shp)
self.validGrams = K.constant(validGrams)
You can also simply use many parallel SkipConv1D with different parameters and then concatenate their results.
inputs = Input(yourInputShape)
out = embedding_layer(inputs)
out1 = SkipConv1D(filters=50,kernel_size=4,validGrams=[1,0,1,1])(out)
out2 = SkipConv1D(filters=50,kernel_size=4,validGrams=[1,1,0,1])(out)
out = Concatenate()([out1,out2]) #if using 'channels_first' use Concatenate(axis=1)
out = MaxPooling1D(pool_size=4)(out)
out = Dense(4, activation='softmax')(out)
model = Model(inputs,out)

How to tie word embedding and softmax weights in keras?

Its commonplace for various neural network architectures in NLP and vision-language problems to tie the weights of an initial word embedding layer to that of an output softmax. Usually this produces a boost to sentence generation quality. (see example here)
In Keras its typical to embed word embedding layers using the Embedding class, however there seems to be no easy way to tie the weights of this layer to the output softmax. Would anyone happen to know how this could be implemented ?
Be aware that Press and Wolf dont't propose to freeze the weights to some pretrained ones, but tie them. That means, to ensure that input and output weights are always the same during training (in the sense of synchronized).
In a typical NLP model (e.g. language modelling/translation), you have an input dimension (vocabulary) of size V and a hidden representation size H. Then, you start with an Embedding layer, which is a matrix VxH. And the output layer is (probably) something like Dense(V, activation='softmax'), which is a matrix H2xV. When tying the weights, we want that those matrices are the same (therefore, H==H2).
For doing this in Keras, I think the way to go is via shared layers:
In your model, you need to instantiate a shared embedding layer (of dimension VxH), and apply it to either your input and output. But you need to transpose it, to have the desired output dimensions (HxV). So, we declare a TiedEmbeddingsTransposed layer, which transposes the embedding matrix from a given layer (and applies an activation function):
class TiedEmbeddingsTransposed(Layer):
"""Layer for tying embeddings in an output layer.
A regular embedding layer has the shape: V x H (V: size of the vocabulary. H: size of the projected space).
In this layer, we'll go: H x V.
With the same weights than the regular embedding.
In addition, it may have an activation.
# References
- [ Using the Output Embedding to Improve Language Models](https://arxiv.org/abs/1608.05859)
"""
def __init__(self, tied_to=None,
activation=None,
**kwargs):
super(TiedEmbeddingsTransposed, self).__init__(**kwargs)
self.tied_to = tied_to
self.activation = activations.get(activation)
def build(self, input_shape):
self.transposed_weights = K.transpose(self.tied_to.weights[0])
self.built = True
def compute_mask(self, inputs, mask=None):
return mask
def compute_output_shape(self, input_shape):
return input_shape[0], K.int_shape(self.tied_to.weights[0])[0]
def call(self, inputs, mask=None):
output = K.dot(inputs, self.transposed_weights)
if self.activation is not None:
output = self.activation(output)
return output
def get_config(self):
config = {'activation': activations.serialize(self.activation)
}
base_config = super(TiedEmbeddingsTransposed, self).get_config()
return dict(list(base_config.items()) + list(config.items()))
The usage of this layer is:
# Declare the shared embedding layer
shared_embedding_layer = Embedding(V, H)
# Obtain word embeddings
word_embedding = shared_embedding_layer(input)
# Do stuff with your model
# Compute output (e.g. a vocabulary-size probability vector) with the shared layer:
output = TimeDistributed(TiedEmbeddingsTransposed(tied_to=shared_embedding_layer, activation='softmax')(intermediate_rep)
I have tested this in NMT-Keras and it trains properly. But, as I try to load a trained model, it gets an error, related to the way Keras loads the models: it doesn't load the weights from the tied_to. I've found several questions regarding this (1, 2, 3), but I haven't managed to solve this issue. If someone have any ideas on the next steps to take, I'd be very glad to hear them :)
As you may read here you should simply set trainable flag to False. E.g.
aux_output = Embedding(..., trainable=False)(input)
....
output = Dense(nb_of_classes, .. ,activation='softmax', trainable=False)

How to use a trained CNN model for object identification in Tensorflow

I have a CNN model that is trained using a set of 120 pictures.
The images are converted in TFR record and labeled with this method
def write_records_file(dataset, record_location):
"""
dataset : dict(list)
Dictionary with each key being a label for the list of image filenames of its value.
record_location : str
Location to store the TFRecord output.
"""
writer = None
# Enumerating the dataset because the current index is used to breakup the files if they get over 100
current_index = 0
for breed, images_filenames in dataset.items():
for image_filename in images_filenames:
if current_index % 100 == 0:
if writer:
writer.close()
record_filename = "{record_location}-{current_index}.tfrecords".format(
record_location=record_location,
current_index=current_index)
writer = tf.python_io.TFRecordWriter(record_filename)
current_index += 1
image_file = tf.read_file(image_filename)
image = tf.image.decode_jpeg(image_file)
grayscale_image = tf.image.rgb_to_grayscale(image)
resized_image = tf.image.resize_images(grayscale_image, 250, 151)
image_bytes = sess.run(tf.cast(resized_image, tf.uint8)).tobytes()
image_label = breed.encode("utf-8")
example = tf.train.Example(features=tf.train.Features(feature={
'label': tf.train.Feature(bytes_list=tf.train.BytesList(value=[image_label])),
'image': tf.train.Feature(bytes_list=tf.train.BytesList(value=[image_bytes]))
}))
writer.write(example.SerializeToString())
write_records_file(testing_dataset, "./output/testing-images/testing-image")
write_records_file(training_dataset, "./output/training-images/training-image")
The whole model+training script ends with train_prediction = tf.nn.softmax(final_fully_connected)
and I get 2 .tfr files as output (training and test).
Now suppose you have a picture and want to know what is the more similar picture of the 120 pic sample to identify it. How I have to proceed?
train_prediction tensor has this format shape=(3, 120), dtype=float32120 is the total numbers of categories
In the book that I'm reading unfortunately there isn't any indication and the chapter end with this trained model that I don't know how to use in a real application, and searching in internet there are many similar sample that end at same point.
I didn't understand your question. By 120 samples, do you mean there are 120 classes with few examples images per class? In that case, this is classification problem where you train a model for classification with input as image and output as probability that the input will belong to one of the 120 classes (+ extra class that it doesnt belong to any of those). In this case, fully connected layer is fed into softmax function that outputs this probability for 120 classes which is essentially vector of length 120. But for this, you need multiple training examples per class.
In case you just have 120 images and you need to know if another test image similarity with one of these images, you can just take output vector of layer before fully connected layer for those 120 images for a neural network that is already trained on a larger sample of images (like inception model) and then just similarity of the test image vector with those 120 vectors.
I have hosted a simple to use demo using above technique here. Check if works out for you otherwise refine the problem statement more.

MLP giving inaccurate results

I tried to build a simple MLP with 2 hidden layers and 3 output classes.
What I have done in the model is:
Input images are 120x120 rgb images. Flattened size (3 * 120 * 120)
2 hidden layers of size 100.
Relu activation is used
Output layer has 3 neurons
Code
def model(input, weights, biases):
l_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])
l_1 = tf.nn.relu(l_1)
l_2 = tf.add(tf.matmul(l_1, weights['h2']), biases['b2'])
l_2 = tf.nn.relu(l_2)
out = tf.matmul(l_2, weights['out']) + biases['out']
return out
Optimizer
pred = model(input_batch, weights, biases)
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred, y))
optimizer = tf.train.GradientDescentOptimizer(rate).minimize(cost)
The model however does not work. The accuracy is only equal to that of a random model.
The example followed is this one:
https://github.com/aymericdamien/TensorFlow-Examples/blob/master/examples/3_NeuralNetworks/multilayer_perceptron.py
You have a copy-paste typo in def model. First argument name is input while it is x on the next line.
Another trick to use when you suspect that model is not being trained is to run it on the same batch again and again. If implementation is correct and model is being trained it will soon learn that batch by heart yielding 100% accuracy. If it does not then it is an indicator that something is wrong in your implementation.

Resources