In the Cifar10 example in the TensorFlow examples they are distorting the images with a random combination of cropping, flipping, brightening, contrasting, and whitening. This concept makes sense except the cropping seems a little odd to me. The images will need to be the same dimensions for the network and the cropping code looks like this:
height = IMAGE_SIZE
width = IMAGE_SIZE
# Image processing for training the network. Note the many random
# distortions applied to the image.
# Randomly crop a [height, width] section of the image.
distorted_image = tf.random_crop(reshaped_image, [height, width, 3])
Since the height and width are based on the image size is this actually doing anything?
In the example, IMAGE_SIZE is set to 24. So basically what this code does is select a randomly chosen offset and extracts a 24 X 24 patch. It probably ensures that the offset is chosen in a way that the patch can be extracted without any wrap around or other weird boundary condition or maybe it pads it (should be easy to check).
I guess IMAGE_SIZE could be better named as PATCH_SIZE or something. Note the original CIFAR 10 input image is 32 x 32
Related
I'm working on a project which involves determining the volume of a transparent liquid (or air if it proves easier) in a confined space.
The images I'm working with are a background image of the container without any liquid and a foreground image which may be also be empty in rare cases, but most times is partly filled with some amount of liquid.
While it may seem like a pretty straightforward smooth and threshold approach, it proves somewhat more difficult.
I'm working with a set with tons of these image pairs of background and foreground images, and I can't seem to find an approach that is robust enough to be applied to all images in the set.
My work so far involves smoothing and thresholding the image and applying closing to wrap it up.
bg_image = cv.imread("bg_image", 0)
fg_image = cv.imread("fg_image", 0)
blur_fg = cv.GaussianBlur(fg_image, (5, 5), sigmaX=0, sigmaY=0)
thresholded_image = cv.threshold(blur_fg, 186, 255, cv.THRESH_BINARY_INV)[1]
kernel = np.ones((4,2),np.uint8)
closing = cv.morphologyEx(thresholded_image, cv.MORPH_CLOSE, kernel)
The results vary, here is an example when it goes well:
In other examples, it doesn't go as well:
Aside from that, I have also tried:
Subtraction of the background and foreground images
Contrast stretching
Histogram equalization
Other thresholding techniques such as Otsu
The main issue is that the pixel intensities in air and liquid sometime overlap (and pretty low contrast in general), causing inaccurate estimations. I am leaning towards utilizing the edge that occurs between the liquid and air but I'm not really sure how..
I don't want to overflow with information here so I'm leaving it at that. I am grateful for any suggestions and can provide more information if necessary.
EDIT:
Here are some sample images to play around with.
Here is an approach whereby you calculate the mean of each column of pixels in your image, then calculate the gradient of the means:
#!/usr/bin/env python3
import cv2
import numpy as np
import matplotlib.pyplot as plt
filename = 'fg1.png'
# Load image as greyscale and calculate means of each column of pixels
im = cv2.imread(filename, cv2.IMREAD_GRAYSCALE)
means = np.mean(im, axis=0)
# Calculate the gradient of the means
y = np.gradient(means)
# Plot the gradient of the means
xdata = np.arange(0, y.shape[0])
plt.plot(xdata, y, 'bo') # blue circles
plt.title(f'Gradient of Column Means for "{filename}"')
plt.xlabel('x')
plt.ylabel('Gradient of Column Means')
plt.grid(True)
plt.show()
If you just plot the means of all columns, without taking the gradient, you get this:
I have a fully convoluted CNN and I have no problem designing it to have an exact same output size by using "same" padding and a stride of one.
However, I have an image-translation problem where I need the output being resized to (W/26, H/15) where (W, H) is the size of the input. (Resizing the image beforehand is problematic, it won't be an option in our case)
I understand by using the formula: O = (I - F +2P )/s + 1. Where:
O: output size
I: input size
F: filter size
P: padding
s: stride
I may be able to use some really strange filter size to achieve this. But is there a systematic or organized way to construct such a network to reduce input size?
Getting that precise output size playing only with filter size and stride is going to give you some headaches.
My two cents: whenever the size of your output is particularly weird (like (w//26, h//15)), interpolation layers might be helpful to get you to that particular size.
For example:
PyTorch you can use torch.nn.functional.interpolate.
Tensorflow tf.image.resize
I'm training a HOG + SVM model, and my training data comes in various sizes and aspect ratios. The SVM model can't be trained on variable sized lists, so I'm looking to calculate a histogram of gradients that is the same length regardless of image size.
Is there a clever way to do that? Or is it better to resize the images or pad them?
What people usually do in such case is one of the follow two things:
Resize all images (or image patches) to a fixed size and extract the HOG features from those.
Use the "Bag of Words/Features" method and don't resize the images.
The first method 1. is quite simple but it has some problems which method 2. tries to solve.
First, think of what a hog descriptor does. It divides an image into cells of a fixed length, calculates the gradients cell-wise to generate cell-wise histograms(based on voting). At the end, you'll have a concatenated histogram of all the cells and that's your descriptor.
So there is a problem with it, because the object (that you want to detect) has to cover the images in similar manner. Otherwise your descriptor would look different depending on the location of the object inside the image.
Method 2. works as follows:
Extract the HOG features from both positive and negative images in your training set.
Use an clustering algorithm like k-means to define a fixed amount of k centroids.
For each image in your dataset, extract the HOG features and compare them element-wise to the centroids to create a frequency histogram.
Use the frequency histograms for the training of your SVM and use it for the classification phase. This way, the location doesn't matter and you'll always have a fixed sized of inputs. You'll also benefit from the reduction of dimensions.
You can normalize the images to a given target shape using cv2.resize(), divide image into number of blocks you want and calculate the histogram of orientations along with the magnitudes. Below is a simple implementation of the same.
img = cv2.imread(filename,0)
img = cv2.resize(img,(16,16)) #resize the image
gx = cv2.Sobel(img, cv2.CV_32F, 1, 0) #horizontal gradinets
gy = cv2.Sobel(img, cv2.CV_32F, 0, 1) # vertical gradients
mag, ang = cv2.cartToPolar(gx, gy)
bin_n = 16 # Number of bins
# quantizing binvalues in (0-16)
bins = np.int32(bin_n*ang/(2*np.pi))
# divide to 4 sub-squares
s = 8 #block size
bin_cells = bins[:s,:s],bins[s:,:s],bins[:s,s:],bins[s:,s:]
mag_cells = mag[:s,:s], mag[s:,:s], mag[:s,s:], mag[s:,s:]
hists = [np.bincount(b.ravel(), m.ravel(), bin_n) for b, m in zip(bin_cells,mag_cells)]
hist = np.hstack(hists) #histogram feature data to be fed to SVM model
Hope that helps!
Given an image (Like the one given below) I need to convert it into a binary image (black and white pixels only). This sounds easy enough, and I have tried with two thresholding functions. The problem is I cant get the perfect edges using either of these functions. Any help would be greatly appreciated.
The filters I have tried are, the Euclidean distance in the RGB and HSV spaces.
Sample image:
Here it is after running an RGB threshold filter. (40% it more artefects after this)
Here it is after running an HSV threshold filter. (at 30% the paths become barely visible but clearly unusable because of the noise)
The code I am using is pretty straightforward. Change the input image to appropriate color spaces and check the Euclidean distance with the the black color.
sqrt(R*R + G*G + B*B)
since I am comparing with black (0, 0, 0)
Your problem appears to be the variation in lighting over the scanned image which suggests that a locally adaptive thresholding method would give you better results.
The Sauvola method calculates the value of a binarized pixel based on the mean and standard deviation of pixels in a window of the original image. This means that if an area of the image is generally darker (or lighter) the threshold will be adjusted for that area and (likely) give you fewer dark splotches or washed-out lines in the binarized image.
http://www.mediateam.oulu.fi/publications/pdf/24.p
I also found a method by Shafait et al. that implements the Sauvola method with greater time efficiency. The drawback is that you have to compute two integral images of the original, one at 8 bits per pixel and the other potentially at 64 bits per pixel, which might present a problem with memory constraints.
http://www.dfki.uni-kl.de/~shafait/papers/Shafait-efficient-binarization-SPIE08.pdf
I haven't tried either of these methods, but they do look promising. I found Java implementations of both with a cursory Google search.
Running an adaptive threshold over the V channel in the HSV color space should produce brilliant results. Best results would come with higher than 11x11 size window, don't forget to choose a negative value for the threshold.
Adaptive thresholding basically is:
if (Pixel value + constant > Average pixel value in the window around the pixel )
Pixel_Binary = 1;
else
Pixel_Binary = 0;
Due to the noise and the illumination variation you may need an adaptive local thresholding, thanks to Beaker for his answer too.
Therefore, I tried the following steps:
Convert it to grayscale.
Do the mean or the median local thresholding, I used 10 for the window size and 10 for the intercept constant and got this image (smaller values might also work):
Please refer to : http://homepages.inf.ed.ac.uk/rbf/HIPR2/adpthrsh.htm if you need more
information on this techniques.
To make sure the thresholding was working fine, I skeletonized it to see if there is a line break. This skeleton may be the one needed for further processing.
To get ride of the remaining noise you can just find the longest connected component in the skeletonized image.
Thank you.
You probably want to do this as a three-step operation.
use leveling, not just thresholding: Take the input and scale the intensities (gamma correct) with parameters that simply dull the mid tones, without removing the darks or the lights (your rgb threshold is too strong, for instance. you lost some of your lines).
edge-detect the resulting image using a small kernel convolution (5x5 for binary images should be more than enough). Use a simple [1 2 3 2 1 ; 2 3 4 3 2 ; 3 4 5 4 3 ; 2 3 4 3 2 ; 1 2 3 2 1] kernel (normalised)
threshold the resulting image. You should now have a much better binary image.
You could try a black top-hat transform. This involves substracting the Image from the closing of the Image. I used a structural element window size of 11 and a constant threshold of 0.1 (25.5 on for a 255 scale)
You should get something like:
Which you can then easily threshold:
Best of luck.
When applying a Gaussian blur to an image, typically the sigma is a parameter (examples include Matlab and ImageJ).
How does one know what sigma should be? Is there a mathematical way to figure out an optimal sigma? In my case, i have some objects in images that are bright compared to the background, and I need to find them computationally. I am going to apply a Gaussian filter to make the center of these objects even brighter, which hopefully facilitates finding them. How can I determine the optimal sigma for this?
There's no formula to determine it for you; the optimal sigma will depend on image factors - primarily the resolution of the image and the size of your objects in it (in pixels).
Also, note that Gaussian filters aren't actually meant to brighten anything; you might want to look into contrast maximization techniques - sounds like something as simple as histogram stretching could work well for you.
edit: More explanation - sigma basically controls how "fat" your kernel function is going to be; higher sigma values blur over a wider radius. Since you're working with images, bigger sigma also forces you to use a larger kernel matrix to capture enough of the function's energy. For your specific case, you want your kernel to be big enough to cover most of the object (so that it's blurred enough), but not so large that it starts overlapping multiple neighboring objects at a time - so actually, object separation is also a factor along with size.
Since you mentioned MATLAB - you can take a look at various gaussian kernels with different parameters using the fspecial('gaussian', hsize, sigma) function, where hsize is the size of the kernel and sigma is, well, sigma. Try varying the parameters to see how it changes.
I use this convention as a rule of thumb. If k is the size of kernel than sigma=(k-1)/6 . This is because the length for 99 percentile of gaussian pdf is 6sigma.
You have to find a min/max of a function G such that G(X,sigma) where X is a set of your observations (in your case, your image grayscale values) , This function can be anything that maintain the "order" of the intensities of the iamge, for example, this can be done with the 1st derivative of the image (as G),
fil = fspecial('sobel');
im = imfilter(I,fil);
imagesc(im);
colormap = gray;
this gives you the result of first derivative of an image, now you want to find max sigma by
maximzing G(X,sigma), that means that you are trying a few sigmas (let say, in increasing order) until you reach a sigma that makes G maximal. This can also be done with second derivative.
Given the central value of the kernel equals 1 the dimension that guarantees to have the outermost value less than a limit (e.g 1/100) is as follows:
double limit = 1.0 / 100.0;
size = static_cast<int>(2 * std::ceil(sqrt(-2.0 * sigma * sigma * log(limit))));
if (size % 2 == 0)
{
size++;
}