Good approach for training a neural network - machine-learning

I am training a neural network model to differentiate the orange and pomegranate.
In the training dataset, the background of the object (for both orange and pomegranate) is same and constant. But while testing, the background of the object is different than what I trained with.
So my first doubt is,
Is it good approach to train a model with one background (suppose white background) and test with
another background (suppose grey background)?.
Second, I trained the object with different position and the same background. Since the theory says that position doesn't matter for convolution, it has ability to recognise the object placed at anywhere, because anyhow, after convolution, the dimension of the activation map decreases and the depth increases.
So my second doubt is,
Is it necessary or good approach to keep the object at different position while training
the model?

Is it good approach to train a model with one background (suppose white background) and test with
another background (suppose grey background)?.
When training a neural network, it is important to shuffle the dataset you are using and split the dataset to training and testing sets. The reason why you need to shuffle the data, is in order for your model to see all types of samples in the training set so the moment it is exposed to new unseen data, it can reflect it over the previously seen data. In the example you mentioned above, it is important to shuffle the data due to the fact that there are different background colors which can effect the prediction of the model. Therefore both the training and the testing set need to have both background colors in order for your model to give good predictions.
Is it necessary or good approach to keep the object at different position while training
the model?
It is indeed better to train your model with objects in different positions due to the fact it can bring your model to predict more types of oranges or pomegranates. Note that if you are using different positions for the object you are trying to predict, it is important to have a sufficient amount of data in order for the model to give you good predictions over the testing set.
I hope this short explanation helped, if something isn't clear please let me know and I'll edit the post.

Is it good approach to train a model with one background (suppose white background) and test with another background (suppose grey background)?.
Background is a property of an image that is not required for distinguishing the object. You want your network to learn this behavior. Consider two cases now:
You give your network images with one background. Lets see what can go possibly wrong here.
Assume that your background is completely black. This means that there will be 0 output for a feature map (kernel) when it was put into the background. Your network will learn that it can put any high weights for these features and it will do a good job during training as long as those weights can successfully extract feature of the classes.
Now during testing, the background color is white. The same feature maps with high weight now will have very high output. These high output can saturate the non-linear unit and all categories may be classified as one category.
The second case where during training you shows images with different background.
In this case, neural network has to learn that the feature maps corresponding to background and need to subtract the bias based on the background.
In short, there is an extra information that you need to learn that is background is not important for deciding the category. When you provide only one color background, your neural network cannot learn this behavior and can give garbage result on test dataset.
Is it necessary or good approach to keep the object at different position while training the model?
You are right, Convolutional Neural Network are translational-equivariant. But for building a classifier, you pass the output of CNN-layer through a fully-connected layer. If you put image at different positions, different input will go to the fully-connected layer but output for all these images is the same category. So you are forcing your neural network to learn that the position of the object is not required for classifying its category.

Regarding your first doubt, It is not much of an issue as long as the target object is present in the images. Shuffle the data before feeding it to the network.
For second doubt, Yes it is always a good idea have target object at different positions. Also one more thing to take care is that the source of your data is same and mostly of same quality. Otherwise performance issue will arise.

Related

Neural Network for Learning Cut VS Uncut Grass

I've got a script to take pictures like the one provided, with colored loops encircling either uncut grass, cut grass, or other background details (for purposes of rejecting non-grass regions), and generate training data in the form of a bunch of small images from inside the colored loops of those types of training data. I'm struggling to find which type of neural network that would work best for learning from this training data and telling me in real time from a video feed mounted on a lawn mower which sections of the image is uncut grass or cut grass as it is mowing though a field. Is there anyone on here experienced with neural networks, and can either tell me some I could use, or just point me in the right direction?
Try segmentation network. There are many types of segmentation.
Mind that for neuron networks, training data is necessary. Your case (to detect cut and uncut grass) is considered special, which means existing models may not fit your purpose. If so, you'll need a dataset including images and annotations. There are also tools for labeling segmentation images.
Hope it helps.

Does the presence of an particular object in all the images of data set affect a CNN's performance

Context: I have partial images of size view of different types of vehicles in my data set ( Partial images because of limited Field Of View of my camera lens ). These partial images cover more than half the vehicle and can be considered as good representative images of the vehicle. The vehicle categories are car, bus, trucks. I always get a wheel of the vehicle in these images and because I am capturing these images during different parts of the day the colour intensity of the wheels vary throughout the day. However a wheel is definitely present in all the images.
Question: I wanted to know if presence of a object in all the images of a data set not logically useful for classification will affect the CNN in any way. Basically I wanted to know before training the CNN should I mask the object i.e black it out in all the images or just let it be there.
A CNN creates a hierarchical decomposition of the image into combinations of various discriminatory patterns. These patterns are learnt during training to find those that separate the classes well.
If an object is present in every image, it is likely that it is not needed to separate the classes and won't be learnt. If there is some variation on the onject that is class dependant, then maybe it will be used. It is really difficult to know what features are important beforehand. Maybe busses have shinier wheels than other cars, and this is something you have not noticed, and thus having the wheel in the image is beneficial.
If you have inadvertently introduced some class specific variation, this can cause a problem for later classification. For example, if you only took photos of busses at night, the network might learn night = bus and when you show it a photo of a bus during the day it won't classify correctly.
However, using dropout in the network forces it to learn multiple features for classification, and not just rely on one. So if there is variation, this might not have as big an impact.
I would use the images without blanking anything out. Unless it is something simple such as background removal of particles etc., finding and blacking out the object adds another layer of complexity. You can test if the wheels make a big difference by training the network on the normal images, then classifying a few training examples with the object blacked out and seeing if the class probabilities change.
Focus you energy on doing good data augmentation, that is where you will get the most gains.
You can see an example of which features are learnt on MNIST in this paper.

Designing a classifier with minimal image data

I want to train a 3-class classifier with tissue images, but only have around 50 labelled images in total. I can't take patches from the images and train on them, so I am looking for another way to deal with this problem.
Can anyone suggest an approach to this? Thank you in advance.
The question is very broad but here are some recommendations:
It could make sense to generate variations of your input images. Things like modifying contrast, brightness or color, rotating the image, adding noise. But which of these operations, if any, make sense really depends on the type of classification problem.
Generally, the less data you have, the fewer parameters (weights etc.) your model should have. Otherwise it will result in overlearning, meaning that your classifier will classify the training data but nothing else.
You should check for overlearning. A simple method would be to split your training data into a training set and a control set. Once you have found that the classification is correct for the control set as well, you could do additional training including the control set.

Is this image too complex for a shallow NN classifier?

I am trying to classify a series of images like this one, with each class of comprising images taken from similar cellular structure:
I've built a simple network in Keras to do this, structured as:
1000 - 10
The network unaltered achieves very high (>90%) accuracy on MNIST classification, but almost never higher than 5% on these types of images. Is this because they are too complex? My next approach would be to try stacked deep autoencoders.
Seriously - I don't expect any nonconvolutional model to work well on this type of data.
A nonconv net for MNIST works well because the data is well preprocessed (it is centered in the middle and resized to certain size). Your images are not.
You may notice (on your pictures) that certain motifs reoccure - like this darker dots - with different positions and sizes - if you don't use convolutional model you will not capture that efficiently (e.g. you will have to recognize a dark dot moved a little bit in the image as a completely different object).
Because of this I think that you should try convolutional MNIST model instead classic one or simply try to design your own.
First question, is if you run the training longer do you get better accuracy? You may not have trained long enough.
Also, what is the accuracy on training data and what is the accuracy on testing data? If they are both high, you can run longer or use a more complex model. If training accuracy is better than testing accuracy, you are essentially at the limits of your data. (i.e. brute force scaling of model size wont help, but clever improvements might, i.e. try convolutional nets)
Finally, complex and noisy data you may need a lot of data to make a reasonable classification. So you need many, many images.
Deep stacked autoencoders, as I understand it is an unsupervised method, which isn't directly suitable for classification.

How to use a Neural Network for face detection?

I'm trying to build a face detection system using a neural network written in theano. I am a bit confused as to what should be the expected output against which i would have to calculate the crossentropy. I don't want to know whether the face is present or not, i need to highlight the face in an image (find the location of the face). The size of the images is constant. But the size of the faces in the image is not. How do i go about that? Also, my webcam currently captures 480x640 images. Creating that number of neurons in the input layer would be very heavy on the system, how do i compress the images without losing any features?
There are many possible solutions, one of the easiest ones is to perform a sliding window search and ask a network "is there a face in this part of an image?" - and this is quite "standard" approach. In particular, you do it hierarchicaly - split image into 9 overlapping squares (I assume the image is square) and ask in each of them "is there a face in it?" by rescaling it to your network input. Next you again split the one answering "yes" into 9 squares and repeat. This way you can find face kind fast. Another would be to perform supervised segmentation where you try to predict which part of image (pixels/superpixels) belong to face and which do not. This is not exhaustive list, but should give you general idea how to proceed.
how do i compress the images without losing any features?
You do not. It is not possible. You will always lose some data when downscaling (lossless compression exists but it destroys structure, thus making classification extremely hard).
You should first create a training set from the images received through the web-cam. The training set must contain face and non-face images (such as apple, car and ...). For better generalization you may use some off-the-shelve data sets. After you trained the network on the images you can use the network to classify unseen images.
This approach is suitable if your goal is to only detect whether an image contains a face. However, if you want to identify faces (e.g. this face belongs to John and not other people) you need to train the network with the images of the people you want to do identification for. The number of classes in such network is equivalent to the number of distinct people.

Resources