Data augmentation techniques for small image datasets? - image-processing

Currently i am training small logo datasets similar to Flickrlogos-32 with deep CNNs. For training larger networks i need more dataset, thus using augmentation. The best i'm doing right now is using affine transformations(featurewise normalization, featurewise center, rotation, width height shift, horizontal vertical flip). But for bigger networks i need more augmentation. I tried searching on kaggle's national data science bowl's forum but couldn't get much help. There's code for some methods given here but i'm not sure what could be useful. What are some other(or better) image data augmentation techniques that could be applied to this type of(or in any general image) dataset other than affine transformations?

A good recap can be found here, section 1 on Data Augmentation: so namely flips, random crops and color jittering and also lighting noise:
Krizhevsky et al. proposed fancy PCA when training the famous Alex-Net in 2012. Fancy PCA alters the intensities of the RGB channels in training images.
Alternatively you can also have a look at the Kaggle Galaxy Zoo challenge: the winners wrote a very detailed blog post. It covers the same kind of techniques:
rotation,
translation,
zoom,
flips,
color perturbation.
As stated they also do it "in realtime, i.e. during training".
For example here is a practical Torch implementation by Facebook (for ResNet training).

I've collected a couple of augmentation techniques in my masters thesis, page 80. It includes:
Zoom,
Crop
Flip (horizontal / vertical)
Rotation
Scaling
shearing
channel shifts (rgb, hsv)
contrast
noise,
vignetting

Related

How to detect this kind of artifacts/noises in my image?

I've been processing some image frames in videos and I discovered that sometimes one or two frames of the video will have artifacts or noise like the images below:
The artifacts look like abrasions of paint with noisy colors that covers only a small region (less than 100x100 in a 1000x2000 frame) of the image. I wonder if there are ways to detect the noisy frames? I've tried to use the difference of frames with SSIM, NMSE or PSNR but found limited effectiveness. Saliency map (left) or sobel/scharr filtering (right) providing more obvious view but regular borders are also included and I'm not sure how to form a classifier.
Scharr saliency map:
Since they are only a few frames in videos it's not quite necessary to denoising and I can just remove the frames one detected. The main problem here is that it's difficult to distinguish those frames in playing videos.
Can anybody offer some help here?
Detailing the comment as an answer with a few more details:
The Scharr and saliency map looks good.
Thresholding will result in a binary image which can be cleaned up with morphological filters (erode to enhance artefacts, dilate to 'erase' gradient contours).
Finding contours will result in lists of points which can be further processed/filtered using contour features.
If the gradients are always bigger than the artefacts, contour features, such as the bounding box dimensions and aspect ratio should help segment artefact contours from gradient contours (if any: hopefully dilation would've cleaned up the thresholded/binary image).
Another idea could be looking into oriented gradients:
either computer the oriented gradients (see visualisations): with the right cell size you might strike a balance where the artefacts have a high magnitude while gradient edges don't
you could try a full histogram of oriented gradients (HoG) classifier setup (using an SVM trained on histograms (as features))
The above options do rely on hand crafted features/making assumptions about the size of artefacts.
ML could be an interesting route too, hopefully it can generalise well enough.
Depending how many example images you have available, you could test a basic prototype using Teachable Machine (which behind the scenes would apply KNN to a transfer learning layer on top of MobileNet (or similar net)) fairly fast.
(Note: I've posted OpenCV Python links, but there are libraries that can help (e.g. scikit-image, scikit-learn, kornia, etc. in Python, cvv in c++, BoofCV in java, etc. (and there might be toolboxes for Matlab/Octave with similar features))

Image Similarity - Deep Learning vs hand-crafted features

I am doing research in the field of computer vision, and am working on a problem related to finding visually similar images to a query image. For example, finding t-shirts of similar colour with similar patterns (Striped/ Checkered), or shoes of similar colour and shape, and so on.
I have explored hand-crafted image features such as Color Histograms, Texture features, Shape features (Histogram of Oriented Gradients), SIFT and so on. I have also read up literature about Deep Neural Networks (Convolutional Neural Networks), which have been trained on massive amounts of data and are currently state of the art in Image Classification.
I was wondering if the same features (extracted from the CNN's) can also be used for my project - finding fine-grained similarities between images. From what I understand, the CNNs have learnt good representative features that can help classify images - for example, be it a red shirt or a blue shirt or an orange shirt, it is able to identify that the image is a shirt. However it doesn't understand that an orange shirt looks more similar to a red shirt than a blue shirt does, and hence it is not able to capture these similarities.
Please correct me if I am wrong. I would like to know if there are any Deep Neural Networks that capture these similarities, and have proven to be superior to the hand-crafted features. Thanks in advance.
For your task, a CNN is definitely worth a try!
Many researchers used networks which are pretrained for Image Classification and obtained state-of-the-art results on fine-grained classification. For example, trying to classify birds species or cars.
Now, your task is not classification, but it is related. You can think about similarity as some geometric distance between features, which are basically vectors. Thus, you may carry out some experiments computing the distance between the feature vectors for all your training images (the reference) and the feature vector extracted from the query image.
CNNs features extracted from the first layers of the net should be more related to color or other graphical traits, rather than more "semantical" ones.
Alternatively, there is some work on learning directly a similarity metric through CNN, see here for example.
A little bit out-dated, but it can still be useful for other people. Yes, CNNs can be used for image similarity and I used before. As Flavio pointed out, for a simple start, you can use a pre-trained CNN of your choice such as Alexnet,GoogleNet etc.. and then use it as feature extractor. You can compare the features based on the distance, similar pictures will have a smaller distance between their feature vectors.

A suitable workflow to detect and classify blurs in images? [duplicate]

I had asked this on photo stackexchange but thought it might be relevant here as well, since I want to implement this programatically in my implementation.
I am trying to implement a blur detection algorithm for my imaging pipeline. The blur that I want to detect is both -
1) Camera Shake: Pictures captured using hand which moves/shakes when shutter speed is less.
2) Lens focussing errors - (Depth of Field) issues, like focussing on a incorrect object causing some blur.
3) Motion blur: Fast moving objects in the scene, captured using a not high enough shutter speed. E.g. A moving car a night might show a trail of its headlight/tail light in the image as a blur.
How can one detect this blur and quantify it in some way to make some decision based on that computed 'blur metric'?
What is the theory behind blur detection?
I am looking of good reading material using which I can implement some algorithm for this in C/Matlab.
thank you.
-AD.
Motion blur and camera shake are kind of the same thing when you think about the cause: relative motion of the camera and the object. You mention slow shutter speed -- it is a culprit in both cases.
Focus misses are subjective as they depend on the intent on the photographer. Without knowing what the photographer wanted to focus on, it's impossible to achieve this. And even if you do know what you wanted to focus on, it still wouldn't be trivial.
With that dose of realism aside, let me reassure you that blur detection is actually a very active research field, and there are already a few metrics that you can try out on your images. Here are some that I've used recently:
Edge width. Basically, perform edge detection on your image (using Canny or otherwise) and then measure the width of the edges. Blurry images will have wider edges that are more spread out. Sharper images will have thinner edges. Google for "A no-reference perceptual blur metric" by Marziliano -- it's a famous paper that describes this approach well enough for a full implementation. If you're dealing with motion blur, then the edges will be blurred (wide) in the direction of the motion.
Presence of fine detail. Have a look at my answer to this question (the edited part).
Frequency domain approaches. Taking the histogram of the DCT coefficients of the image (assuming you're working with JPEG) would give you an idea of how much fine detail the image has. This is how you grab the DCT coefficients from a JPEG file directly. If the count for the non-DC terms is low, it is likely that the image is blurry. This is the simplest way -- there are more sophisticated approaches in the frequency domain.
There are more, but I feel that that should be enough to get you started. If you require further info on either of those points, fire up Google Scholar and look around. In particular, check out the references of Marziliano's paper to get an idea about what has been tried in the past.
There is a great paper called : "analysis of focus measure operators for shape-from-focus" (https://www.researchgate.net/publication/234073157_Analysis_of_focus_measure_operators_in_shape-from-focus) , which does a comparison about 30 different techniques.
Out of all the different techniques, the "Laplacian" based methods seem to have the best performance. Most image processing programs like : MATLAB or OPENCV have already implemented this method . Below is an example using OpenCV : http://www.pyimagesearch.com/2015/09/07/blur-detection-with-opencv/
One important point to note here is that an image can have some blurry areas and some sharp areas. For example, if an image contains portrait photography, the image in the foreground is sharp whereas the background is blurry. In sports photography, the object in focus is sharp and the background usually has motion blur. One way to detect such a spatially varying blur in an image is to run a frequency domain analysis at every location in the image. One of the papers which addresses this topic is "Spatially-Varying Blur Detection Based on Multiscale Fused and Sorted Transform Coefficients of Gradient Magnitudes" (cvpr2017).
the authors look at multi resolution DCT coefficients at every pixel. These DCT coefficients are divided into low, medium, and high frequency bands, out of which only the high frequency coefficients are selected.
The DCT coefficients are then fused together and sorted to form the multiscale-fused and sorted high-frequency transform coefficients
A subset of these coefficients are selected. the number of selected coefficients is a tunable parameter which is application specific.
The selected subset of coefficients are then sent through a max pooling block to retain the highest activation within all the scales. This gives the blur map as the output, which is then sent through a post processing step to refine the map.
This blur map can be used to quantify the sharpness in various regions of the image. In order to get a single global metric to quantify the bluriness of the entire image, the mean of this blur map or the histogram of this blur map can be used
Here are some examples results on how the algorithm performs:
The sharp regions in the image have a high intensity in the blur_map, whereas blurry regions have a low intensity.
The github link to the project is: https://github.com/Utkarsh-Deshmukh/Spatially-Varying-Blur-Detection-python
The python implementation of this algorithm can be found on pypi which can easily be installed as shown below:
pip install blur_detector
A sample code snippet to generate the blur map is as follows:
import blur_detector
import cv2
if __name__ == '__main__':
img = cv2.imread('image_name', 0)
blur_map = blur_detector.detectBlur(img, downsampling_factor=4, num_scales=4, scale_start=2, num_iterations_RF_filter=3)
cv2.imshow('ori_img', img)
cv2.imshow('blur_map', blur_map)
cv2.waitKey(0)
For detecting blurry images, you can tweak the approach and add "Region of Interest estimation".
In this github link: https://github.com/Utkarsh-Deshmukh/Blurry-Image-Detector , I have used local entropy filters to estimate a region of interest. In this ROI, I then use DCT coefficients as feature extractors and train a simple multi-layer perceptron. On testing this approach on 20000 images in the "BSD-B" dataset (http://cg.postech.ac.kr/research/realblur/) I got an average accuracy of 94%
Just to add on the focussing errors, these may be detected by comparing the psf of the captured blurry images (wider) with reference ones (sharper). Deconvolution techniques may help correcting them but leaving artificial errors (shadows, rippling, ...). A light field camera can help refocusing to any depth planes since it captures the angular information besides the traditional spatial ones of the scene.

How to get to shoulders localization on image photo?

I have to localize from video where the shoulders of a person are in a movie.
Do have any advice on how to get to this?
I thought about corner detection or some kind of shape detection. But I'm still not sure what next. We can treat video like image sequence (I wrote this, but I think is obvious)?
Luckily, the shoulders are usually attached to the head...
I have used the Dalal-Triggs algorithm (Wikipedia) to detect head+shoulders of all persons facing the camera.
Basically, you train a linear SVM on positive examples in which the head+shoulders are marked, and on negative examples that do not contain these body parts. The descriptor is a Histogram of Gradients (HOG) which tells you what edge directions are dominant in each cell of the descriptor. I found that their normalization scheme is very important in dealing with non-uniform lighting.
With enough examples, the linear SVM will provide you with a plane normal that can be interpreted as a descriptor: you can visualize the meaning of the positive weights, and see that they outline the profile of head+shoulders. Likewise, the negative weights will belong to the areas outside the body, and/or directions orthogonal to the profile edges.
You can apply the linear SVM classifier on each image efficiently at multiple scales and aspect ratios, and find the image patch with best response. This should give you the location of the head and shoulders (it will not be exact, though)

Detection of Blur in Images/Video sequences

I had asked this on photo stackexchange but thought it might be relevant here as well, since I want to implement this programatically in my implementation.
I am trying to implement a blur detection algorithm for my imaging pipeline. The blur that I want to detect is both -
1) Camera Shake: Pictures captured using hand which moves/shakes when shutter speed is less.
2) Lens focussing errors - (Depth of Field) issues, like focussing on a incorrect object causing some blur.
3) Motion blur: Fast moving objects in the scene, captured using a not high enough shutter speed. E.g. A moving car a night might show a trail of its headlight/tail light in the image as a blur.
How can one detect this blur and quantify it in some way to make some decision based on that computed 'blur metric'?
What is the theory behind blur detection?
I am looking of good reading material using which I can implement some algorithm for this in C/Matlab.
thank you.
-AD.
Motion blur and camera shake are kind of the same thing when you think about the cause: relative motion of the camera and the object. You mention slow shutter speed -- it is a culprit in both cases.
Focus misses are subjective as they depend on the intent on the photographer. Without knowing what the photographer wanted to focus on, it's impossible to achieve this. And even if you do know what you wanted to focus on, it still wouldn't be trivial.
With that dose of realism aside, let me reassure you that blur detection is actually a very active research field, and there are already a few metrics that you can try out on your images. Here are some that I've used recently:
Edge width. Basically, perform edge detection on your image (using Canny or otherwise) and then measure the width of the edges. Blurry images will have wider edges that are more spread out. Sharper images will have thinner edges. Google for "A no-reference perceptual blur metric" by Marziliano -- it's a famous paper that describes this approach well enough for a full implementation. If you're dealing with motion blur, then the edges will be blurred (wide) in the direction of the motion.
Presence of fine detail. Have a look at my answer to this question (the edited part).
Frequency domain approaches. Taking the histogram of the DCT coefficients of the image (assuming you're working with JPEG) would give you an idea of how much fine detail the image has. This is how you grab the DCT coefficients from a JPEG file directly. If the count for the non-DC terms is low, it is likely that the image is blurry. This is the simplest way -- there are more sophisticated approaches in the frequency domain.
There are more, but I feel that that should be enough to get you started. If you require further info on either of those points, fire up Google Scholar and look around. In particular, check out the references of Marziliano's paper to get an idea about what has been tried in the past.
There is a great paper called : "analysis of focus measure operators for shape-from-focus" (https://www.researchgate.net/publication/234073157_Analysis_of_focus_measure_operators_in_shape-from-focus) , which does a comparison about 30 different techniques.
Out of all the different techniques, the "Laplacian" based methods seem to have the best performance. Most image processing programs like : MATLAB or OPENCV have already implemented this method . Below is an example using OpenCV : http://www.pyimagesearch.com/2015/09/07/blur-detection-with-opencv/
One important point to note here is that an image can have some blurry areas and some sharp areas. For example, if an image contains portrait photography, the image in the foreground is sharp whereas the background is blurry. In sports photography, the object in focus is sharp and the background usually has motion blur. One way to detect such a spatially varying blur in an image is to run a frequency domain analysis at every location in the image. One of the papers which addresses this topic is "Spatially-Varying Blur Detection Based on Multiscale Fused and Sorted Transform Coefficients of Gradient Magnitudes" (cvpr2017).
the authors look at multi resolution DCT coefficients at every pixel. These DCT coefficients are divided into low, medium, and high frequency bands, out of which only the high frequency coefficients are selected.
The DCT coefficients are then fused together and sorted to form the multiscale-fused and sorted high-frequency transform coefficients
A subset of these coefficients are selected. the number of selected coefficients is a tunable parameter which is application specific.
The selected subset of coefficients are then sent through a max pooling block to retain the highest activation within all the scales. This gives the blur map as the output, which is then sent through a post processing step to refine the map.
This blur map can be used to quantify the sharpness in various regions of the image. In order to get a single global metric to quantify the bluriness of the entire image, the mean of this blur map or the histogram of this blur map can be used
Here are some examples results on how the algorithm performs:
The sharp regions in the image have a high intensity in the blur_map, whereas blurry regions have a low intensity.
The github link to the project is: https://github.com/Utkarsh-Deshmukh/Spatially-Varying-Blur-Detection-python
The python implementation of this algorithm can be found on pypi which can easily be installed as shown below:
pip install blur_detector
A sample code snippet to generate the blur map is as follows:
import blur_detector
import cv2
if __name__ == '__main__':
img = cv2.imread('image_name', 0)
blur_map = blur_detector.detectBlur(img, downsampling_factor=4, num_scales=4, scale_start=2, num_iterations_RF_filter=3)
cv2.imshow('ori_img', img)
cv2.imshow('blur_map', blur_map)
cv2.waitKey(0)
For detecting blurry images, you can tweak the approach and add "Region of Interest estimation".
In this github link: https://github.com/Utkarsh-Deshmukh/Blurry-Image-Detector , I have used local entropy filters to estimate a region of interest. In this ROI, I then use DCT coefficients as feature extractors and train a simple multi-layer perceptron. On testing this approach on 20000 images in the "BSD-B" dataset (http://cg.postech.ac.kr/research/realblur/) I got an average accuracy of 94%
Just to add on the focussing errors, these may be detected by comparing the psf of the captured blurry images (wider) with reference ones (sharper). Deconvolution techniques may help correcting them but leaving artificial errors (shadows, rippling, ...). A light field camera can help refocusing to any depth planes since it captures the angular information besides the traditional spatial ones of the scene.

Resources