Special loss function for perspective images - machine-learning

I am currently working on a project to detect the trackbed of railroads. The images to segemt always have the same camera perspective.
Is there a special loss function that can be used for perspective images or is binary cross entropy the best choice here?
I would like to penalize errors more heavily the nearer they are to the image horizon, because they make up a larger area of the actual track.
Example image with label

You can build & train a separate convolutional neural network to determine the loss from the images.
But, you will have to label the loss manually by hand for each image in order to train the network.
You can reduce your efforts of labeling by labeling around 200 images then, by using data augmentation, up-sample it to 1000 images.
Data Augmentation techniques:
Change image brightness as a random percentage between 65 & 135
Change image contrast as a random percentage between 65 & 135
Change image saturation as a random percentage between 65 & 135
Horizontal flip.
Add Gaussian noise to image.

Related

Deep Learning - How to perform RANDOM CROP and not lose any information in data (change ground truth label)

I have image patches from DDSM Breast Mammography that are 150x150 in size. I would like to augment my dataset by randomly cropping these images 2x times to 120x120 size. So, If my dataset contains 6500 images, augmenting it with random crop should get me to 13000 images. Thing is, I do NOT want to lose potential information in the image and possibly change ground truth label.
What would be best way to do this? Should I crop them randomly from 150x150 to 120x120 and hope for the best or maybe pad them first and then perform the cropping? What is the standard way to approach this problem?
If your ground truth contains the exact location of what you are trying to classify, use the ground truth to crop your images in an informed way. I.e. adjust the ground truth, if you are removing what you are trying to classify.
If you don't know the location of what you are classifying, you could
attempt to train a classifier on your un-augmented dataset,
find out, what the regions of the images are that your classifier reacts to,
make note of these location
crop your images in an informed way
train a new classifier
But how do you "find out, what regions your classifier reacts to"?
Multiple ways are described in Visualizing and Understanding Convolutional Networks by Zeiler and Fergus:
Imagine your classifier classifies breast cancer or no breast cancer. Now simply take an image that contains positive information for breast cancer and occlude part of the image with some blank color (see gray square in image above, image by Zeiler et al.) and predict cancer or not. Now move the occluded square around. In the end you'll get rough predictions scores for all parts of your original image (see (d) in the image above), because when you covered up the important part that is responsible for a positive prediction, you (should) get a negative cancer prediction.
If you have someone who can actually recognize cancer in an image, this is also a good way to check for and guard against confounding factors.
BTW: You might want to crop on-the-fly and randomize how you crop even more to generate way more samples.
If the 150x150 is already the region of interest (ROI) you could try the following data augmentations:
use a larger patch, e.g. 170x170 that always contains your 150x150 patch
use a larger patch, e.g. 200x200, and scale it down to 150x150
add some gaussian noise to the image
rotate the image slightly (by random amounts)
change image contrast slightly
artificially emulate whatever other (image-)effects you see in the original dataset

Traffic Signal Detection with Cascade Classifier with OpenCV

I am working on " Controlling Raspberry Pi's GPIO pins according to change in traffic lights (Red, Green, Yellow)". Right now, I am focusing only on Traffic light detection part. For that, I am using Cascade classifier for Haar features.
I have 2000 negative sample images, which I have converted to grayscale and reshaped to 120 X 120. Also, I have ONE positive image of traffic signal (40 X 120), from which I am generating 2000 positive samples. And finally, I am training my classifier using 2000 positive samples and 1000 negative samples with 10 stages.
My output for some test images looks like following:
Output 1
output 2
Output 3
Image from which I created positive samples:
positive image
I have some questions/doubts and need some suggestions to improve or modify my classifier.
1) Do I need to use more than one image as positive image to create samples?
2) Why I am not able to detect all the traffic signals in above images?
3) Am I doing wrong in image shape or anything?
4) Please correct me in this point if I am wrong - To draw a rectangle over traffic signal, I am using cv2.rectangle function and provided constant height/width parameter, and thats the ONLY reason why it is drawing a big rectangle regardless of how near/far my traffic signal is in an image! Any suggestions to change this size dynamically?
Thank you.
To me, it looks like your network has not learned enough.
1) I strongly suggest taking 20-50 samples of traffic lights, instead of one sample. You still can generate thousands of samples using them, for training.
2) most likely because of inadequate training, but you should also check the parameters in the detection stage. What are the minimum and maximum sizes that you have set for detection?
3) You don't have to re-shape or re-size the image, so that should not be a problem.
The detector returns the position (x,Y) and the size (width, height) of all objects that were detected. So you should be able to change the size dynamically instead of using constant width and height. Please refer to the opencv example of Haar Face Detection, of the language of your choice.

Haar Classifier positive image set clarification

Could you please help understand several points related to Haar Classifier training:
1) Should positive image contain only the training object or they can contain some other objects in it? Like I want to recognize some traffic sign, should the positive image contain only traffic sign or it can contain highway also?
2) There are 2 ways of creating samples vector file, one is using info file, which contains the detected object coordinates in positive image, another just giving the list of positives and negatives. Which one is better?
3) How usually you create info file, which contains the detected object coordinates in positive image? Can image clipper generate object cordinates?
And does dlib histogram of adaptive gradient provides better results than Haar classifier?
My target is traffic sign detection in raspberry pi.
Thanks
the positive sample (not necessarily the image) should contain only the object. Sometimes it is not possible to get the right aspect ratio for each positive sample, then you would either add some background or crop some of the object boundary. The final detector will detect regions of your positive sample aspect ratio, so if you use a lot of background around all of your positive samples, your final detector will probably not detect a region of your traffix sign, but a region with a lot of background around your traffic sign.
Afaik, the positive samples must be provided by a .vec file which is created with opencv_createsamples.exe and you'll need a file with the description (where in the images are your positive samples?). I typically go the way that I preprocess my labeled training samples, crop away all the background, so that there are only intermediate images where the positive sample fills the whole image and the image is already the right aspect ratio. I fill a text file with basically "folder/filename.png 0 0 width height" for each of those intermediate images and then create a .vec file from that intermediate images. But the other way, using a real roi information out of full-size images should be of same quality.
Be aware that if you don't fix the same aspect ratio for each positive sample, you'll stretch your objects, which might or might not be a problem in your task.
And keep in mind, that you can create additional positive samples from warping/transforming your images. opencv_createsamples can do that for you, but I never really used it, so I'm not sure whether training will benefit from using such samples.

Image preprocessing mean image subtraction

I have a question regarding the preprocessing step "Image mean subtraction".
I use the UCSD Dataset for my training.
One popular preprocessing step is the mean subtraction. Now I wonder if I am doing it right.
What I am doing is the following:
I have 200 gray scaled Train images
I put all images in a list and compute the mean with numpy:
np.mean(ImageList, axis=0)
This returns me a mean image
Now I subtract the mean image from all Train images
When I now visualize my preprocessed train images they are mostly black and have also negative values in them.
Is this correct? Or is my understanding of subtracting the mean image incorrect?
Here is one of my training images:
And this is the "mean image":
It seems like you are doing it right.
As for the negative values: they are to be expected. Your original images had intensity values in range [0..1], once you subtract the mean (that should be around ~0.5), you should have values in range (roughly) [-.5..0.5].
Please note that you should save the "mean image" you got for test time as well: once you wish to predict using the trained net you need to subtract the same mean image from the test image.
Update:
In your case (static camera) the mean subtracted removes the "common" background. These settings seems to be in your favor as they focus the net to the temporal changes in the frame. this method will work well for you, as long as you test on the same image set (i.e., frames from the same static camera).

Determine if an image needs contrasting automatically in OpenCV

OpenCV has a handy cvEqualizeHist() function that works great on faded/low-contrast images.
However when an already high-contrast image is given, the result is a low-contrast one. I got the reason - the histogram being distributed evenly and stuff.
Question is - how do I get to know the difference between a low-contrast and a high-contrast image?
I'm operating on Grayscale images and setting their contrast properly so that thresholding them won't delete the text i'm supposed to extract (thats a different story).
Suggestions welcome - esp on how to find out if the majority of the pixels in the image are light gray (which means that the equalise hist is to be performed)
Please help!
EDIT: thanks everyone for many informative answers. But the standard deviation calculation was sufficient for my requirements and hence I'm taking that to be the answer to my query.
You can probably just use a simple statistical measure of the image to determine whether an image has sufficient contrast. The variance of the image would probably be a good starting point. If the variance is below a certain threshold (to be empirically determined) then you can consider it to be "low contrast".
If you're adjusting contrast just so you can threshold later on, you may be able to avoid the contrast adjustment step if you set your threshold adaptively using Ohtsu's method.
If you're still interested in finding out the image contrast, then read on.
While there are a number of different ways to calculate "contrast". Often, those metrics are applied locally as opposed to the entire image, to make the result more sensitive to image content:
Divide the image into adjacent non-overlaying neighborhoods.
Pick neighborhood sizes that are approximate to size of the features of your image (e.g. if your main feature is horizontal text, make neighborhoods tall enough to capture 2 lines of text, and just as wide).
Apply the metric to each neighborhood individually
Threshold the metric result to separate low and high variance blocks. This will prevent such things as large, blank areas of page skewing your contrast estimates.
From there, you can use a number of features to determine contrast:
The proportion of high metric blocks to low metric blocks
High metric block mean
Intensity distance between the high and low metric blocks (using means, modes, etc)
This may serve as a better indication of image contrast than global image variance alone. Here's why:
(stddev: 50.6)
(stddev: 7.9)
The two images are perfectly in contrast (the grey background is just there to make it obvious it's an image), but their standard deviations (and thus variance) are completely different.
Calculate cumulative histogram of image.
Make linear regression of cumulative histogram in the form y(x) = A*x + B.
Calculate RMSE of real_cumulative_frequency(x)-y(x).
If that RMSE is close to zero - image is already equalized. (That means that for equalized images cumulative histograms must be linear)
Idea is taken from here.
EDIT:
I've illustrated this approach in my blog (C example code included).
There is a support provided in skimage for this. skimage.exposure.is_low_contrast. reference
example :
>>> image = np.linspace(0, 0.04, 100)
>>> is_low_contrast(image)
True
>>> image[-1] = 1
>>> is_low_contrast(image)
True
>>> is_low_contrast(image, upper_percentile=100)
False

Resources