how to make Multi-scale images to train the CNN - machine-learning

I am working on Convolution Neural Network using satellite images. I want to try Multi-scale problem. Can you please suggest me how can I make the multi-scale dataset. As the input of the CNN is fixed image is fixed (e.g. 100x100)
how can the images of different scale to train the system for multi-scale problem.

There is a similar question about YOLO9000:Multi-Scale Training?
Since there are only convolutional and pooling layers, so when you input multi-scale image, the weight parameter amount is same. Thus, multi-scale images can use a CNN model to train.
In different tasks, the methods are different. for example, in classification task, we can we can add a global pooling after the last layer; in detection task, the output is not fixed if we input multi-scale images.

Related

Autoencoder vs Pre-trained network for feature extraction

I wanted to know if anyone has any sort of guidance on what is better for image classification with a small amount of samples per class (arround 20) yet a lot of classes (about 400) for relatively big RGB images (arround 600x600).
I know that Autoencoders can be used for feature extraction, such that I can just let an autoencoder run on the images unsupervised, and thus reduce the dimensionality of the images to train on those dimensionally-reduced images.
Similarly, I also know that you can just use a pre-trained network, strip the final layer and change it into a linear layer to your own dataset's number of classes, and then just train that final layer or a few layers before it to fit your dataset.
I haven't been able to find any resources online that determine which of these two techniques for feature extraction is better and under which conditions; anyone has any advice?

Is there any difference if I use cropped objects or full frames for training a cascade classifier?

Can I use cropped objects from full frames as training dataset for a cascade classifier (LBP or HAAR)?
I know that I have to use full frames with annotations when retraining a neural net (Tensorflow, YOLO and so on)
But do I need it for a cascade classifier? Or cropped images are ok?
It seems I can do it because we have positive and negative images
So it should be ok to crop objects from positive images
E.g.
or
The answer to this question Can I use cropped objects from full frames as a training dataset for a cascade classifier (LBP or HAAR) is yes. It depends on your model architecture, your aims and system compatibility. For training, we normally crop target objects from a whole image and feed into the model.
The answer to your this question I know that I have to use full frames with annotations when retraining a neural net (Tensorflow, YOLO and so on) It depends. What is your ROI size? You can resize your ROIs according to your architecture or, You can crop target objects from ROIs. It is completely up to you.
But do I need it for a cascade classifier? Or cropped images are ok?-Answer is both are okay. Choose based on your model architecture, training time, system configuration, and obviously training performance.

Reducing pixels in large data set (sklearn)

Im currently working on a classification project but I'm in doubt about how I should start off.
Goal
Accurately classifying pictures of size 80*80 (so 6400 pixels) in the correct class (binary).
Setting
5260 training samples, 600 test samples
Question
As there are more pixels than samples, it seems logic to me to 'drop' most of the pixels and only look at the important ones before I even start working out a classification method (like SVM, KNN etc.).
Say the training data consists of X_train (predictors) and Y_train (outcomes). So far, I've tried looking at the SelectKBest() method from sklearn for feature extraction. But what would be the best way to use this method and to know how many k's I've actually got to select?
It could also be the case that I'm completely on the wrong track here, so correct me if I'm wrong or suggest an other approach to this if possible.
You are suggesting to reduce the dimension of your feature space. That is a method of regularization to reduce overfitting. You haven't mentioned overfitting is an issue so I would test that first. Here are some things I would try:
Use transfer learning. Take a pretrained network for image recognition tasks and fine tune it to your dataset. Search for transfer learning and you'll find many resources.
Train a convolutional neural network on your dataset. CNNs are the go-to method for machine learning on images. Check for overfitting.
If you want to reduce the dimensionality of your dataset, resize the image. Going from 80x80 => 40x40 will reduce the number of pixels by 4x, assuming your task doesn't depend on fine details of the image you should maintain classification performance.
There are other things you may want to consider but I would need to know more about your problem and its requirements.

Understanding Faster rcnn

I'm trying to understand fast(er) RCNN and following are the questions I'm searching for:
To train, a FastRcnn model do we have to give bounding box
information in training phase.
If you have to give bonding box information then what's the role of
ROI layer.
Can we use a pre-trained model, which is only trained for classification, not
object detection and use it for Fast(er) RCNN's
Your answers:
1.- Yes.
2.- The ROI layer is used to produce a fixed-size vector from variable-sized images. This is performed by using max-pooling, but instead of using the typical n by n cells, the image is divided into n by n non-overlapping regions (which vary in size) and the maximum value in each region is output. The ROI layer also does the job of proyecting the bounding box in input space to the feature space.
3.- Faster R-CNN MUST be used with a pretrained network (typically on ImageNet), it cannot be trained end-to-end. This might be a bit hidden in the paper but the authors do mention that they use features from a pretrained network (VGG, ResNet, Inception, etc).

sub patch generation mechanism for training fully convolutional neural network

I have a image set, consisting of 300 image pairs, i.e., raw image and mask image. A typical mask image is shown as follows. Each image has size of 800*800. I am trying to train a fully convolutional neural network model for this image set to perform the semantic segmentation. I am trying to generate the small patches (256*256) from the original images for constructing the training set. Are there any strategies recommended for this patch sampling process? Naturally, random sampling is a trivial approach. Here the area marked with yellow, foreground class, usually take 25% of the whole image area across the image set. It tends to reflect an imbalanced data set.
If you train a fully convolutional architecture, assuming 800x800 inputs and 25x25 outputs (after five 2x2 pooling layers, 25=800/2^5). Try to build the 25x25 outputs directly and train directly on them. You can add higher weights in the loss function for the "positive" labels to balance them with the "negative".
I definitely do not recommend sampling because it will be an expensive process and is not really fully convolutional.

Resources