Simulation before or after cross validation - machine-learning

I need to perform some kinds of simulation to benchmark the model performance.
The input data contain all binary features and we also want to perform binary classification. What if we want to use simulation to randomly mask (make those 1 become 0) some of the samples/features, when should we do this?
Currently, I perform this masking procedure before the cross validation. Thus, both the training set and the testing set are masked randomly. Does it seem a little bit weird?
Or should I split the full dataset into two parts. Masking one of them and keep the other one unchanged. Taking the unchanged one as validation set.

Related

What is wrong with my approach of using MLP to make a chess engine?

I’m making a chess engine using machine learning, and I’m experiencing problems debugging it. I need help figuring out what is wrong with my program, and I would appreciate any help.
I made my research and borrowed ideas from multiple successful projects. The idea is to use reinforcement learning to teach NN to differentiate between strong and weak positions.
I collected 3 million games with Elo over 2000 and used my own method to label them. After researching hundreds of games, I found out, that it’s safe to assume that in the last 10 turns of any game, the balance doesn’t change, and the winning side has a strong advantage. So I picked positions from the last 10 turns and made two labels: one for a win for white and zero for black. I didn’t include any draw positions. To avoid bias, I have picked even numbers of positions labeled with wins for both sides and even number of positions for both sides with the next turn.
Each position I represented by a vector with the length of 773 elements. Every piece on every square of a chess board, together with castling rights and a next turn, I coded with ones and zeros. My sequential model has an input layer with 773 neurons and an output layer with one single neuron. I have used a three hidden layer deep MLP with 1546, 500 and 50 hidden units for layers 1, 2, and 3 respectively with dropout regularization value of 20% on each. Hidden layers are connected with the non- linear activation function ReLU, while the final output layer has a sigmoid output. I used binary crossentropy loss function and the Adam algorithm with all default parameters, except for the learning rate, which I set to 0.0001.
I used 3 percent of the positions for validation. During the first 10 epochs, validation accuracy gradually went up from 90 to 92%, just one percent behind training accuracy. Further training led to overfitting, with training accuracy going up, and validation accuracy going down.
I tested the trained model on multiple positions by hand, and got pretty bad results. Overall the model can predict which side is winning, if that side has more pieces or pawns close to a conversion square. Also it gives the side with a next turn a small advantage (0.1). But overall it doesn’t make much sense. In most cases it heavily favors black (by ~0.3) and doesn’t properly take into account the setup. For instance, it labels the starting position as ~0.0001, as if the black side has almost 100% chance to win. Sometimes irrelevant transformation of a position results in unpredictable change of the evaluation. One king and one queen from each side usually is viewed as lost position for white (0.32), unless black king is on certain square, even though it doesn’t really change the balance on the chessboard.
What I did to debug the program:
To make sure I have not made any mistakes, I analyzed, how each position is being recorded, step by step. Then I picked a dozen of positions from the final numpy array, right before training, and converted it back to analyze them on a regular chess board.
I used various numbers of positions from the same game (1 and 6) to make sure, that using too many similar positions is not the cause for the fast overfitting. By the way, even one position for each game in my database resulted in 3 million data set, which should be sufficient according to some research papers.
To make sure that the positions I use are not too simple, I analyzed them. 1.3 million of them had 36 points in pieces (knights, bishops, rooks, and queens; pawns were not included in the count), 1.4 million - 19 points, and only 0.3 million - had less.
Some things you could try:
Add unit tests and asserts wherever possible. E.g. if you know that some value is never supposed to get negative, add an assert to check that this condition really holds.
Print shapes of all tensors to check that you have really created the architecture you intended.
Check if your model outperforms some simple baseline model.
You say your model overfits, so maybe simplify it / add regularization?
Check how your model performs on the simplest positions. E.g. can it recognize a checkmate?

Good approach for training a neural network

I am training a neural network model to differentiate the orange and pomegranate.
In the training dataset, the background of the object (for both orange and pomegranate) is same and constant. But while testing, the background of the object is different than what I trained with.
So my first doubt is,
Is it good approach to train a model with one background (suppose white background) and test with
another background (suppose grey background)?.
Second, I trained the object with different position and the same background. Since the theory says that position doesn't matter for convolution, it has ability to recognise the object placed at anywhere, because anyhow, after convolution, the dimension of the activation map decreases and the depth increases.
So my second doubt is,
Is it necessary or good approach to keep the object at different position while training
the model?
Is it good approach to train a model with one background (suppose white background) and test with
another background (suppose grey background)?.
When training a neural network, it is important to shuffle the dataset you are using and split the dataset to training and testing sets. The reason why you need to shuffle the data, is in order for your model to see all types of samples in the training set so the moment it is exposed to new unseen data, it can reflect it over the previously seen data. In the example you mentioned above, it is important to shuffle the data due to the fact that there are different background colors which can effect the prediction of the model. Therefore both the training and the testing set need to have both background colors in order for your model to give good predictions.
Is it necessary or good approach to keep the object at different position while training
the model?
It is indeed better to train your model with objects in different positions due to the fact it can bring your model to predict more types of oranges or pomegranates. Note that if you are using different positions for the object you are trying to predict, it is important to have a sufficient amount of data in order for the model to give you good predictions over the testing set.
I hope this short explanation helped, if something isn't clear please let me know and I'll edit the post.
Is it good approach to train a model with one background (suppose white background) and test with another background (suppose grey background)?.
Background is a property of an image that is not required for distinguishing the object. You want your network to learn this behavior. Consider two cases now:
You give your network images with one background. Lets see what can go possibly wrong here.
Assume that your background is completely black. This means that there will be 0 output for a feature map (kernel) when it was put into the background. Your network will learn that it can put any high weights for these features and it will do a good job during training as long as those weights can successfully extract feature of the classes.
Now during testing, the background color is white. The same feature maps with high weight now will have very high output. These high output can saturate the non-linear unit and all categories may be classified as one category.
The second case where during training you shows images with different background.
In this case, neural network has to learn that the feature maps corresponding to background and need to subtract the bias based on the background.
In short, there is an extra information that you need to learn that is background is not important for deciding the category. When you provide only one color background, your neural network cannot learn this behavior and can give garbage result on test dataset.
Is it necessary or good approach to keep the object at different position while training the model?
You are right, Convolutional Neural Network are translational-equivariant. But for building a classifier, you pass the output of CNN-layer through a fully-connected layer. If you put image at different positions, different input will go to the fully-connected layer but output for all these images is the same category. So you are forcing your neural network to learn that the position of the object is not required for classifying its category.
Regarding your first doubt, It is not much of an issue as long as the target object is present in the images. Shuffle the data before feeding it to the network.
For second doubt, Yes it is always a good idea have target object at different positions. Also one more thing to take care is that the source of your data is same and mostly of same quality. Otherwise performance issue will arise.

Whether Data augmentation really needed in Machine Learning [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I am interested in knowing the importance of data augmentation(rotation at various angles, flipping the images) while providing a dataset to a Machine Learning problem.
Whether it is really needed? Or the CNN networks using will handle that as well no matter how different the data are transformed?
So I took a classification task with 2 classes to conclude some results
Arrow shapes
Circle shapes
The idea is to train the shapes with only one orientation(I have taken arrows pointing right) and check the model with a different orientation(I have taken arrows pointing downwards) which is not at all given during the training stage.
Some of the samples used in Training
Some of the samples used in Testing
This is the entire dataset I am using in for creating a tensorflow model.
https://bitbucket.org/akhileshmalviya/samples/src/bab50b85d826?at=master
I am wondering with the results I got,
(i) Except a few downward arrows all others are getting predicted correctly as arrow. Does it mean data augmentation is not at all needed?
(ii) Or is this the right use case I have taken to understand the importance of data augmentation?
Kindly share your thoughts, Any help could be really appreciated!
Data augmentation is a data-depended process.
In general, you need it when your training data is complex and you have a few samples.
A neural network can easily learn to extract simple patterns like arcs or straight lines and these patterns are enough to classify your data.
In your case data augmentation can barely help, the features the network will learn to extract are easy and highly different from each other.
When you, instead, have to deal with complex structures (cats, dogs, airplanes, ...) you can't rely on simple features like edges, arcs, etc..
Instead, you have to show to your network that the instances you're trying to classify got an high variance and that the features extracted can be combined in a lot of different ways for the same subject.
Think about a cat: it can be of any color, the picture can be taken in different light conditions, its whole body can be in any position, the picture could be taken with a certain orientation...
To correctly classify instances so different, the network must learn to extract robust features that could be learned only after seeing a lot of different inputs.
In your case, instead, simple features can completely discriminate your input, thus any sort of data augmentation could help by just a little bit.
The task you are solving can be easily solved without any NN and even without machine learning.
Just because the problem is so simple it does not really matter whether you do a data augmentation or not. The need for data augmentation is task specific and depends on many things:
how easy is to augment the data with preserving the ability to correctly mark the class. For image, sounds which we used to see/hear it is not a problem (we know that adding small noise to the sound does not change the meaning, rotating the lizard is still a lizard). For other things augmenting without preserving the class/value is hard (for example in Go, randomly adding a stone can change the value of the position dramatically)
does the augmented data is drawn from the same distribution you care about. Adding random stones to Go does not work, but rotating flipping the board works and preserves distribution. But for example in a racing king game (variant of chess) it will not help. You can't flip the position (left <-> right), the evaluation stays the same, but it will never happen in real game and therefore drawn from different distribution and useless
how much data do you have and how expressive is your model. The more parameters you model have, the bigger the chance of overfitting and the more is your need for data. If you train a linear regression in n dims, you will have n + 1 params. You do not really need to augment this. Also if you already have 10bln data points, the augmentation is probably will not be helpful.
how expensive the augmentation procedure. For rotating/scaling the image it is very cheap, but for other augmentation it can be computationally expensive
something else that I forgot.

Faster-RCNN, why don't we just use only RPN for detection?

As we know, faster-RCNN has two main parts: one is region proposal network(RPN), and another one is fast-RCNN.
My question is, now that region proposal network(RPN) can output class scores and bounding boxes and is trainable, why do we need Fast-RCNN?
Am I thinking it right that the RPN is enough for detection (red circle), and Fast-RCNN is now becoming redundant (blue circle)?
Short answer: no they are not redundant.
The R-CNN article and its variants popularized the use of what we used to call a cascade.
Back then for detection it was fairly common to use different detectors often very similar in structures to do detection because of their complementary power.
If the detections are partly orthogonal it allows to remove false positive along the way.
Furthermore by definition both parts of R-CNN have different roles the first one is used to discriminate objects from background and the second one to discriminate fine grained categories of objects from themselves (and from the background also).
But you are right if there is only 1 class vs the background one could use only the RPN part to to detection but even in that case it would probably better the result to chain two different classifiers (or not see e.g. this article)
PS: I answered because I wanted to but this question is definitely unsuited for stackoverflow
If you just add a class head to the RPN Network, you would indeed get detections, with scores and class estimates.
However, the second stage is used mainly to obtain more accurate detection boxes.
Faster-RCNN is a two-stage detector, like Fast R-CNN.
There, Selective Search was used to generate rough estimates of the location of objects and the second stage then refines them, or rejects them.
Now why is this necessary for the RPN? So why are they only rough estimates?
One reason is the limited receptive field:
The input image is transformed via a CNN into a feature map with limited spatial resolution. For each position on the feature map, the RPN heads estimate if the features at that position correspond to an object and the heads regress the detection box.
The box regression is done based on the final feature map of the CNN. In particular, it may happen that the correct bounding box on the image is larger than the corresponding receptive field due to the CNN.
Example: Lets say we have an image depicting a person and the features at one position of the feature map indicate a high possibiliy for the person. Now, if the corresponding receptive field contains only the body parts, the regressor has to estimate a box enclosing the entire person, although it "sees" only the body part.
Therefore, RPN creates a rough estimate of the bounding box. The second stage of Faster RCNN uses all features contained in the predicted bounding box and can correct the estimate.
In the example, RPN creates a too large bounding box, which is enclosing the person (since it cannot the see the pose of the person), and the second stage uses all information of this box to reshape it such that it is tight. This however can be done much more accurate, since more content of the object is accessable for the network.
faster-rcnn is a two-stage method comparing to one stage method like yolo, ssd, the reason faster-rcnn is accurate is because of its two stage architecture where the RPN is the first stage for proposal generation and the second classification and localisation stage learn more precise results based on the coarse grained result from RPN.
So yes, you can, but your performance is not good enough
I think the blue circle is completely redundant and just adding a class classification layer (gives class for each bounding box containing object) should work just fine and that's what the single shot detectors do with compromised accuracy.
According to my understanding, RPN is just for binary checking if you have Objects in the bbox or not and final Detector part is for classifying the classes ex) car, human, phones, etc

Designing a classifier with minimal image data

I want to train a 3-class classifier with tissue images, but only have around 50 labelled images in total. I can't take patches from the images and train on them, so I am looking for another way to deal with this problem.
Can anyone suggest an approach to this? Thank you in advance.
The question is very broad but here are some recommendations:
It could make sense to generate variations of your input images. Things like modifying contrast, brightness or color, rotating the image, adding noise. But which of these operations, if any, make sense really depends on the type of classification problem.
Generally, the less data you have, the fewer parameters (weights etc.) your model should have. Otherwise it will result in overlearning, meaning that your classifier will classify the training data but nothing else.
You should check for overlearning. A simple method would be to split your training data into a training set and a control set. Once you have found that the classification is correct for the control set as well, you could do additional training including the control set.

Resources