Handwritten digit recognition pre-processing (Rotation) - machine-learning

source: http://wbieniec.kis.p.lodz.pl/research/files/070618cores.pdf
I am currently performing some pre-processing to the MNIST data-set. Some of the digits are quite rotated and I would like to get them upright before training my NN. The logic above is what I would like to follow, however I do not think I have a good understanding of what it is trying to portray. I would appreciate if someone, maybe someone who has performed something similar or has a good understanding of image processing, could validate the above logic. Thanks

Related

Handwritten text comparison using Reinforcement Learning

I want to build an RL agent which can justify if a handwritten word is written by the legitimate user or not. The plan is as follow:
Let's say I have written any word 10 times and extracted some geometrical properties for all of them to use as features. Then I have trained an RL agent to learn to take the decision on the basis of the differences between geometrical properties of new and the old 10 handwritten texts. Reward is assigned for correct identification and nothing or negative for incorrect one.
Am I going in the right direction or I am missing anything which is vital? Is it possible to train the agent with only 10 samples? Actally as a new student of RL, I am confused about use case of RL; if it is best fit for game solving and robotic problems or it is also suitable for predicting on the basis of training.
Reinforcement learning would be used over time. If you were following the stroke of the pen, over time, to find out which way it was going that would be more reinforcement learning's wheelhouse. The time dimension (or over a series of states) is why it's used in games like Starcraft II.
You are talking about taking a picture of the text that was written and eventually classifying it into a boolean (Good or Not). You are looking for more Convolutional neural networks to solve your problem (those types of algos are good for pictures).
Eventually you won't be able to tell. There are techniques with GAN's (Generative Adversarial Networks) that can train with your discriminator and finally figure out the pattern it's looking for and fool it. But this sounds good as a homework problem.

Best approach to build a model to recognise licence plates (ALPR)

I am trying to make a deep learning model to detect and read number plates using deep learning techniques like CNN. I would be making a model in tensorflow. But i still don't know what can be the best approach to build such model.
i have checked few models like this
https://matthewearl.github.io/2016/05/06/cnn-anpr/
i have also checked some research papers but none show the exact way.
So the steps what i am planning to follow are
Image preprocessing using opencv ( grayscale,transformations etc i dont know much about this part)
Licence plate Detection (probably by sliding window method)
Train using CNN by building a synthetic dataset as in the above link.
My questions
Is there any better way to do this?
Can RNN also be combined after CNN for variable length number?
Should i prefer detecting and recognising individual characters rather the whole plate?
There are many old methods too who prefer image preprocessing and the directly passing to OCR.What will be the best?
PS- i want to make a commercial real time system. So i need good accuracy.
Firstly, I don't think combining RNN and CNN can achieve real time system. And I personally prefer detecting individual characters if I want real time system because there will not more than 10 characters on license plate. When detecting plates with variable length, detecting individual characters can be more feasible.
Before I learned deep learning, I also have tried to use OCR to detect plate. In my case, OCR is fast but the accuracy is limited especially when the plate is not clear enough. Even image processing cannot rescue some unclear case.......
So if I were you I will try as follows:
Simple image preprocessing on the whole image
Licence plate Detection (probably by sliding window method)
Image processing (filters and geometric transformations) on the extracted plate part to make it more clear. Separate characters.
Deploy CNN to each character. (Maybe I will try some short CNNs because of real time, such as LeNet used in MNIST handwritten digit data ) (Multithreading might be needed)
Hope my response can help.

Should I adapt loss weights for misclassified samples during epochs?

I am using FCN (Fully Convolutional Networks) and trying to do image segmentation. When training, there are some areas which are mislabeled, however further training doesn't help much to make them go away. I believe this is because network learns about some features which might not be completely correct ones, but because there are enough correctly classified examples, it is stuck in local minimum and can't get out.
One solution I can think of is to train for an epoch, then validate the network on training images, and then adjust weights for mismatched parts to penalize mismatch more there in next epoch.
Intuitively, this makes sense to me - but I haven't found any writing on this. Is this a known technique? If yes, how is it called? If no, what am I missing (what are the downsides)?
It highly depends on your network structure. If you are using the original FCN, due to the pooling operations, the segmentation performance on the boundary of your objects is degraded. There have been quite some variants over the original FCN for image segmentation, although they didn't go the route you're proposing.
Just name a couple of examples here. One approach is to use Conditional Random Field (CRF) on top of the FCN output to refine the segmentation. You may search for the relevant papers to get more idea on that. In some sense, it is close to your idea but the difference is that CRF is separated from the network as a post-processing approach.
Another very interesting work is U-net. It employs some idea from the residual network (RES-net), which enables high resolution features from lower levels can be integrated into high levels to achieve more accurate segmentation.
This is still a very active research area. So you may bring the next break-through with your own idea. Who knows! Have fun!
First, if I understand well you want your network to overfit your training set ? Because that's generally something you don't want to see happening, because this would mean that while training your network have found some "rules" that enables it to have great results on your training set, but it also means that it hasn't been able to generalize so when you'll give it new samples it will probably perform poorly. Moreover, you never talk about any testing set .. have you divided your dataset in training/testing set ?
Secondly, to give you something to look into, the idea of penalizing more where you don't perform well made me think of something that is called "AdaBoost" (It might be unrelated). This short video might help you understand what it is :
https://www.youtube.com/watch?v=sjtSo-YWCjc
Hope it helps

Face recognition with a small number of samples

Can anyone advise me way to build effective face classifier that may be able to classify many different faces (~1000)?
And i have only 1-5 examples of each face
I know about opencv face classifier, but it works bad for my task (many classes, a few samples).
It works alright for one face classification with small number of samples. But i think that 1k separate classifier is not good idea
I read a few articles about face recognition but methods from these articles reqiues a lot of samples of each class for work
PS Sorry for my writing mistakes. English in not my native language.
Actually, for giving you a proper answer, I'd be happy to know some details of your task and your data. Face Recognition is a non-trivial problem and there is no general solution for all sorts of image acquisition.
First of all, you should define how many sources of variation (posing, emotions, illumination, occlusions or time-lapse) you have in your sample and testing sets. Then you should choose an appropriate algorithm and, very importantly, preprocessing steps according to the types.
If you don't have any significant variations, then it is a good idea to consider for a small training set one of the Discrete Orthogonal Moments as a feature extraction method. They have a very strong ability to extract features without redundancy. Some of them (Hahn, Racah moments) can also work in two modes - local and global feature extraction. The topic is relatively new, and there are still few articles about it. Although, they are thought to become a very powerful tool in Image Recognition. They can be computed in near real-time by using recurrence relationships. For more information, have a look here and here.
If the pose of the individuals significantly varies, you may try to perform firstly pose correction by Active Appearance Model.
If there are lots of occlusions (glasses, hats) then using one of the local feature extractors may help.
If there is a significant time lapse between train and probe images, the local features of the faces could change over the age, then it's a good option to try one of the algorithms which use graphs for face representation so as to keep the face topology.
I believe that non of the above are implemented in OpenCV, but for some of them you can find MATLAB implementation.
I'm not native speaker as well, so sorry for the grammar
Coming to your problem , it is very unique in its way. As you said there are only few images per class , the model which we train should either have an awesome architecture which can create better features within an image itself , or there should be an different approach which can achieve this task .
I have four things which I can share as of now :
Do data pre-processing and then create a bigger dataset and train on a neural network ideally. Here, we can do pre-processing like:
- image rotation
- image shearing
- image scaling
- image blurring
- image stretching
- image translation
and create atleast 200 images per class. Please checkout opencv documentation which provides many more methods on how you can increase the size of your dataset. Once you do this, then we can apply transfer learning , which is a better approach than training a neural network from scratch.
Transfer learning is a method where we train a network on our own custom classes , and this network is already pre-trained on 1000's of classes. Since our data here is very less, I would prefer transfer learning only. I have written a blog on how you can approach this using tranfer learning after you have the required amount of data. It is linked here. Face recognition also is a classification task itself, where each human is a separate class. So, follow the instructions given in the blog , may be it would help you create your own powerful classifer.
Another suggestion would be , after creating a dataset , encode them properly. This encoding would help you preserve the features in an image and can help you train better networks. VLAD ,Fisher , Bag of Words are few encoding techniques. You can search few repositories online which have implemented these already on ORL database. Once you encode , train the network on the encodings , you will obviously see a better performance.
Even do check out , Siamese network here which is meant for this purpose I feel . Here they compare two images with similar characteristics on different networks and there by achieve better classification accuracies . Git repository is here.
Another standard approach would be using SVM , Random forests since the data is less. If you still prefer neural networks the above methods would serve you the purpose. If you intend to go with encodings , then I would suggest random forests , as it is highly preferrable in learning and flexible too.
Hopefully , this answer would help you proceed in the right direction of achieving things.
You might want to take a look at OpenFace, a Python and Torch implementantion of face recognition with deep neural networks: https://cmusatyalab.github.io/openface/

Neural Networks and Image Processing to Shoot Caterpillars w/ Lasers

I am somewhat of an amateur farmer and I have a precious cherry tomato plant growing in a pot. Lately, to my chagrin, I have discovered that my precious plant has been the victim of a scheme perpetrated by the evil Manduca Quinquemaculata - also known as the Tomato Hornworm (http://insects.tamu.edu/images/insects/common/images/cd-43-c-txt/cimg308.html).
While smashing the last worm I saw, I thought to myself, if I were to use a webcam connected to my computer with a program running, would it be possible to use some kind of an application to monitor my precious plant? These pests are supremely camouflaged and very difficult for my naive eyes to detect.
I've seen research using artificial neural networks (ANNs) for all sorts of things such as recognizing people's faces, etc., and so maybe it would be possible to locate the pest with an ANN.
I have several questions though that I would like some suggestions though.
1) Is there a ranking of the different ANNs in terms of how good they are at classifying? Are multilayer perceptrons known to be better than Hopfields? Or is this a question to which the answer is unknown?
2) Why do there exist several different activation functions that can be used in ANNs? Sigmoids, hyperbolic tangents, step functions, etc. How would one know which function to choose?
3) If I had an image of a plant w/ a worm on one of the branches, I think that I could train a neural network to look for branches that are thin, get fat for a short period, and then get thin again. I have a problem though with branches crossing all over the place. Is there a preprocessing step that could be applied on an image to distinguish between foreground and background elements? I would want to isolate individual branches to run through the network one at a time. Is there some kind of nice transformation algorithm?
Any good pointers on pattern recognition and image processing such as books or articles would be much appreciated too.
Sincerely,
mj
Tomato Hornworms were harmed during the writing of this email.
A good rule of thumb for machine learning is: better features beat better algorithms. I.e if you feed the raw image pixels directly into your classifier, the results will be poor, no matter what learning algorithm you use. If you preprocess the image and extract features that are highly correlated with "caterpillar presence", then most algorithms will do a decent job.
So don't focus on the network topology, start with the computer vision task.
Do these little suckers move around regularly? If so, and if the plant is quite static (meaning no wind or other forces that make it move), then a simple filter to find movement could be sufficient. That would bypass the need of any learning algorithm, which are often quite difficult to train and implement.

Resources