Fine-tune a model with larger input size - machine-learning

I was wondering does it make sense to fine-tune a model with larger input size? Ideally, the properties I would like to have:
Fine-tune: meaning reuse the weights by pre-training
Larger input size: Not down-sampling before feeding in the model. Maybe have a larger stride size?
Specifically I'm trying to fine-tune InceptionV3 in Keras with my specific label set. I want a larger data size since I hope the model can implicitly learn some important characters. With InceptionV3 default size (299x299) this doesn't sounds possible to me.
But that sounds like I have to change the specific model that I'm re-using (say by modifying specific layer in model architecure), then re-using pre-trained weights doesn't make sense?

If you want to fine-tune a classification model usually you would remove a few of the top layers, which acts as the classifier, and add your own layers. This is the same with fine-tuning the Inception_V3 model: you can remove the top layers and add your own classifier with the desired number of units (i.e. number of classes in your dataset). For example:
from keras.applications.inception_v3 import InceptionV3
# let's say our images are of size (1000, 1000, 3)
inc_v3 = InceptionV3(include_top=False, input_shape=(1000, 1000, 3), pooling)
# add your desired layers to the top
# we only add one layer just for illustration
# but you can add as many layers as you want
out = Dense(num_classes, activation='softmax')(inc_v3.output)
# construct the new model
model = Model(inc_v3.input, out)
However, note that you need to first freeze all the base layers (i.e. layers of Inception_V3 model) for fine-tuning. Further, instead of adding a pooling layer at the top (i.e. pooling='avg'), you can also use other alternatives such as using a Flatten layer.
Further, I recommend you to read the relevant official Keras tutorial: Building powerful image classification models using very little data (the second and third sections are mostly relevant to this).

Related

How does masks and images work with each other in UNET?

Let's say , we have a 1000 number of images with their corresponding masks .Correct me if I am wrong that if we use UNET then it will pass through a number of different convolutional layers , relu , pooling etc. . It will learn the features of images according to its corresponding masks . It will give the label to objects and then it learns the features of images we pass in its training . It will match the object of image with its corresponding mask to learn the object features only not unnecessary objects features . Like if we pass the image of cat and its background is filled with some unnecessary obstacles(bins , table , chair etc. )
According to the mask of cat , it will learn the features of cats only . Kindly elaborate your answer if I am wrong ?
Yes, you are right.
However not only UNET every segmentation algorithm works in the same way that it will learn to detect the features that are masked and ignoring unnecessary objects(as you mentioned).
By the way, people typically choose Fast RCNN, Yolo than UNET for multiclass segmentation for real world objects (like chair, table, cat, cars, etc).
so here is a short explanation (but not limited to).
1- All the segmentation network or let's say task (in a more general term), uses the actual image and ground truth (your masks) to learn a classification task.
Is it really a classification task like logistics regression or decision tree? (then why the hell such a complex name).
Ans: Cool, intrinsically YES, Your network is learning to classify. But it's a bit different than your decision tree or logistics.
So our network like UNET tries to learn, how to classify each pixel in the image. And this learning is completely supervised, as you have a ground truth (masks), which tells the network, which class a pixel in the image belongs to. Hence, when you do the training the network weights (weights of all your conv layers and blah blah...) are adjusted such that it learns to classify each pixel in the image to its corresponding classes.

How to add new vector to Keras embedding matrix

Background
I am using an embedding layer for a categorical data column in Keras.
My understanding is that an embedding layer is simply a matrix which consist of trainable vectors, each mapped to an index.
My problem
After training finished, I want to add a new index-vector pair to the embedding matrix.
(The vector is generated by me, no training involved at this stage.)
How do I do this?
I wish to use the newly added embedding in predictions as well.
Code
keras.layers.Embedding(number_of_categories, embedding_size, input_length=1)
I am especially stuck, since number of categories are coded in the model architecture. Is there any way around this?

How to use mid-level fine tuning in Keras?

My task is to adapt a pre-trained network from Keras for classification of aerial images (we have a database of 30 categories of aerial images, each containing 200-400 images).
Now, what I don't really understand is this next part.
We must use mid-level fine tuning using a smaller image database, which contains 21 aerial categories.
How can I achieve this?
Should I try to fine tune the smaller database on top of a VGG16 network and then save the model and train the larger database on top of it?
I'm guessing that they want you to fine-tune a trained model by freezing its first X layers and only updating the weights of the last few layers (maybe just the last one, not sure what "mid-level fine-tuning" means).
You need to take your trained model and replace its last layer with 30 outputs to a new layer of 21 outputs. Then you need to freeze all your other layers (except the new one) and train the model on the new dataset.
In Keras you just need to set: "trainable=False" for each layer.
How can I "freeze" Keras layers?

Reduce big dataset (like open images) for object detection of few classes

I'm quite new to Machine Learning and try to write my own neural network for object detection. I have downloaded the Open Images dataset, but I only need a very small set of classes to be detected. Therefore I want to reduce the size of the dataset, because I don't want the resulting model to detect a pizza or some other objects, that are not relevant for my specific task.
I am not sure whether I should delete only the labels of the classes that I don't need or also all Images that are labeled with this classes. In fact I only need 3 Classes, but the dataset contains like 600 different classes with bounding boxes. And I don't really know how I should modify the dataset so that it becomes reasonable for training my model.

Reduce dimensions of model's fully connected layer for image retrieval task

I'm working on a image retrieval task(not involving faces) and one of the things I am trying is to swap out the softmax layer in the CNN model and use the LMNN classifier. For this purpose I fine tuned the model and then extracted the features at fully connected layer. I have about 3000 images right now. The fully connected layer gives a 4096 dim vector. So my final vector is a 3000x4096 vector with about 700 classes(Each class has 2+ images). I believe this is an extremely large dimension size which the LMNN algorithm is going to take forever(it really did take forever).
How can I reduce the number of dimensions? I tried PCA but that didn't squeeze down the dimensions too much(got down to 3000x3000). I am thinking 256/512/1024 dim vector should be able to help. If I were to add another layer to reduce dimensions, say a new fully connected layer would I have to fine tune my network again? Inputs on how to do that would be great!
I am also currently trying to augment my data to get more images per class and increase the size of my dataset.
Thank you.
PCA should let you reduce the data further - you should be able to specify the desired dimensionality - see the wikipedia article.
As well as PCA you can try t-distributed stochastic neighbor embedding (t-SNE). I really enjoyed Wattenberg, et al.'s article - worth a read if you want to get an insight into how it works and some of the pitfalls.
In a neural net the standard way to reduce dimensionality is by adding more, smaller layers, as you suggested. As they can only learn during training, you'll need to re-run your fine-tuning. Ideally you would re-run the entire training process if you make a change to the model structure but if you have enough data it may be OK still.
To add new layers in TensorFlow, you would add a fully connected layer whose input is the output of your 3000 element layer, and output size is the desired number of elements. You may repeat this if you want to go down gradually (e.g. 3000 -> 1024 -> 512). You would then perform your training (or fine tuning) again.
Lastly, I did a quick search and found this paper that claims to support LMNN over large datasets through random sampling. You might be able to use that to save a few headaches: Fast LMNN Algorithm through Random Sampling

Resources