how can i improve on my convnet performance with small dataset - machine-learning

I have a very small image data set (about 8 images). I am aware that my model can result in overfitting with a small dataset and I wanted some ideas on ways to deal with situations where the dataset is as small as stated above.

The best way to deal with this kind of issue is to use Image Augmentation. There are several libraries present which provides this like opencv2, keras, scikit-image. The basic idea behind image augmentation is to artificially create more images from one image by introducing certain changes in the data like rotating the image, blurring it at certain sides, zooming in/out on images, changing the coloring, flipping the image and a lot more. You can create 10x, 20x, 40x, etc images from one image.
This method will help you generate more images but remember that 8 images is a very small data and these new augmented images will in one way or another will have, to some extent, similar features to that of the original.

Related

How to resize(reshape) the images in CNN? Mathematical intuition behind resizing

I have been working on Images for few months for my internship, and recently I have been wondering that is there a mathematical way of resizing the images.
This becomes a fairly difficult task to resize the images because many a times freshers like me have little experience about the pre-processing in Images.
Given that my problem statement was Gender classification using the human eye. However I found it difficult because
The images were 3 channel
The images were in rectangular shape (17:11)
I did try to resize the images by following few blogs which said to start small and then go up, while it could have worked I still did not understand how small. I resized them to 800,800 randomly and go Resource Exhaustive error(I was using GPU).
So I ask the community if there is any such mathematical formula or a generalized way of doing the resizing task.
Thank you in advance.
This partially answers your question. But, normally many people use transfer learning and a pre-designed architecture for computer vision tasks. Since almost all architecture is designed for square input shape, you can get a better results by making the shape of your input image squared. Another solution would be only padding your 17X11 to make it square by 0 values. (you need to test to see which one works best in your case, but the common practice is re-shaping to square.)
It is fine to have 3 channel images, almost all images are designed for 3 channel input ( even for BW images it is suggested to repeat the channel to have 3 channel input for the model)
About resizing
About resizing the image, in theory, you need to resize the image to the model you are going to use. For example, LeNet-5 accepts images of Mnist with size 28x28. In theory, larger images result in better model performance, but in your case, the images are super low resolution you can start with 28x28 or 224x224 architectures and later use bigger ones and see if it helps in your case.
About the error it's pretty normal your model size was going to be bigger than your GPU memory so, you see Out of memory error. you can use a smaller model ( and smaller input image size) with your device, or you need to use a device with bigger GPU memory.
Finally, you should consider the size of architecture you are going to reuse to determine the correct resize of the dataset you need. If you are designing your model then best starting point can be something around 28x28 ( basically using Lenet) and later developing based on needs/performance.
the resizing can be as easy as calling a Transform with Pytorch transforms like ( i mean you don't need to manually recreate a copy of the dataset just for resizing)
T.Compose([
T.RandomResize(224)
])

Is it feasible to do random cropping as a data augmentation technique in the context of multi-label image classification?

I have read 2 top-ranking solutions in kaggle concerning multi-label image classification. In both of the competitions I read, random cropping was performed. To me, this seems like a bad move to make because we could have a mismatch between the labels and the cropped images. Here are the two links:
1.human-protein-atlas-image-classification
2.iMet Collection 2019 - FGVC6
If the reason for cropping is an input size image constraint for the used model architecture, then isn't it better to resize the image instead of cropping it?
I haven't visited the links yet, but random crop helps as long as you can keep the presence of classes, even only a small part of the actual object.

Is Image Stitching and Image overlapping same concept?

I am working on a project, which will intake multiple images (lets say 2 for the moment) and combine them to generate a better image. The resultant image will be a combination of those input images. As a requirement I want to achieve this by using OpenCV. I read about Image Stitching and saw some example images in the process and now I am confused whether image overlapping is equal to image stitching, or can the Stitcher class in OpenCV do Image overlapping? A little clarity as to how can I achieve the above project problem thru OpenCV.
"Image overlapping" is not really a term used in the CV literature. The general concept of matching images via transformations is most often called image registration. Image registration is taking many images and inserting them all into one shared coordinate system. Image stitching relies on that same function, but additionally concerns itself with how to blend multiple images. Furthermore, image stitching tries to take into account multiple images at once and makes small adjustments to the paired image registrations.
But it seems you're interested in producing higher quality images from multiple images of the same space (or from video feed of the space for example). The term for that is not image overlapping but super-resolution; specifically, super-resolution from multiple images. You'll want to look into specialized filters (after warping to the same coordinates) to combine those multiple views into a high resolution image. There are many papers on this topic (e.g.). Even mean or median filters (that is, taking the mean or median at every pixel location across the images) can work well, assuming your transformations are very good.

semantic segmentation for large images

I am working on a limited number of large size images, each of which can have 3072*3072 pixels. To train a semantic segmentation model using FCN or U-net, I construct a large sample of training sets, each training image is 128*128.
In the prediction stage, what I do is to cut a large image into small pieces, the same as trainning set of 128*128, and feed these small pieces into the trained model, get the predicted mask. Afterwards, I just stitch these small patches together to get the mask for the whole image. Is this the right mechanism to perform the semantic segmentation against the large images?
Your solution is often used for this kind of problem. However, I would argue that it depends on the data if it truly makes sense. Let me give you two examples you can still find on kaggle.
If you wanted to mask certain parts of satellite images, you would probably get away with this approach without a drop in accuracy. These images are highly repetitive and there's likely no correlation between the segmented area and where in the original image it was taken from.
If you wanted to segment a car from its background, it wouldn't be desirable to break it into patches. Over several layers the network will learn the global distribution of a car in the frame. It's very likely that the mask is positive in the middle and negative in the corners of the image.
Since you didn't give any specifics what you're trying to solve, I can only give a general recommendation: Try to keep the input images as large as your hardware allows. In many situation I would rather downsample the original images than breaking it down into patches.
Concerning the recommendation of curio1729, I can only advise against training on small patches and testing on the original images. While it's technically possible thanks to fully convolutional networks, you're changing the data to an extend, that might very likely hurt performance. CNNs are known for their extraction of local features, but there's a large amount of global information that is learned over the abstraction of multiple layers.
Input image data:
I would not advice feeding the big image (3072x3072) directly into the caffe.
Batch of small images will fit better into the memory and parallel programming will too come into play.
Data Augmentation will also be feasible.
Output for big Image:
As for the output of big Image, you better recast the input size of FCN to 3072x3072 during test phase. Because, layers of FCN can accept inputs of any size.
Then you will get 3072x3072 segmented image as output.

Logo detection/recognition in natural images

Given a logo image as a reference image, how to detect/recognize it in a cluttered natural image?
The logo may be quite small in the image, it can appear in clothes, hats, shoes, background wall etc. I have tried SIFT feature for matching without any other preprocessing, and the result is good for cases in which the size of the logo in images is big and the logo is clear. However, it fails for some cases where the scene is quite cluttered and the proportion of the logo size is quite small compared with the whole image. It seems that SIFT feature is sensitive to perspective distortions.
Anyone know some better features or ideas for logo detection/recognition in natural images? For example, training a classifier to locate candidate regions first, and then apply directly SIFT matching for further recognition. However, training a model needs many data, especially it needs manually annotating logo regions in images, and it needs re-training (needs to collect and annotate new image) if I want to apply it for new logos.
So, any suggestions for this? Detailed workflow/code/reference will be highly appreciated, thanks!
There are many algorithms from shape matching to haar classifiers. The best algorithm very depend on kind of logo.
If you want to continue with feature registration, i recommend:
For detection of small logos, use tiles. Split whole image to smaller (overlapping) tiles and perform usual detection. It will use "locality" of searched features.
Try ASIFT for affine invariant detection.
Use many template images for reference feature extraction, with different lightning , different background images (black, white, gray)

Resources