I've got a set of 16000 images. I've got one sample images, I need to find one of 16000 images on it. I've already tried OpenCV's ORB + FLANN approach, but it is too slow. I hope once trained network will be faster than it. I don't know NN theory well, I've read some articles & websites and I've got a bunch of questions:
Should I use 16k output neurons to classificate input image?
How can I train my NN if I have only one train image per class?
What architecture should I use?
Maybe I should increase training dataset by randomly distorting input images?
Sorry in advance for my bad English:)
I'm not an expert, but I think that this kind of problem is not the perfect suit for neural networks. Probably feature extraction, interest points and descriptors, all avaliable in openCV, are the best option. Anyway, let's try this. With the infos received, I think you could try this:
SOM network - Create a Self Organizing Maps network with 16.000 classes for output. Never saw a example with that many classes and just one sample per class, but it should work. Maybe you can try to use PCA to reduce the images dimensionality. Keep training the network with your images (or PCA features). Start with, I don't know, 1.000 epochs. Keep raising this value until you have good results.
Here you can read a bit more about SOM
Related
I'm working on a machine learning project where I'm using a neural network to solve a binary classification problem, however, my dataset(in .csv format) is relatively small. It only has around 60 yes/no cases and although it was able to train, the accuracy wasn't very good. My solution to that was just duplicating the dataset and on each duplication, making tiny changes to the numbers, i.e., adding +-1 or multiplying by 0.999 to each number. By doing this I grew the size of the dataset to around 1100 new cases and it achieved much higher levels of accuracy. I was wondering if this is an actual technique used by ML researchers and if it is, does it have an actual official/academic name?
Thank You!
Yes, the process you are referring to is called data augmentation.
However, I would highly recommend you to not use neural networks on datasets with merely hundred to thousand rows. Ideally Neural networks are used to train models over large datasets.
I am currently looking to train a breed classifier for animals passing through a metal race, i am working in a permanent static environment, so the cameras and the static features in the environment do not change. I had an idea to use an image mask to remove the parts of the image that i don't need hence removing features that may lead to poor results in classification. I plan to apply this same pre-processing to the inference data. Is this a good idea? or should i simply train the network on the entire image?
Any advice is much appreciated.
If you have the time and resources, I'd try both: a network with mask and one without.
As a general rule, pre-processing performed prior to training should typically also be performed prior to inference. Yes, in this case, I would apply the same pre-processing. It helps that you can count on your mask since your frame position is static. Your theory on it improving performance seems very reasonable.
I think if those parts of image is constant all the time, a good model, will learn that and if you mask them, it hurts the generalization.
I would suggest to train with a large dataset, and deeper network, and use it as a baseline and compare it to your results, so you could benchmark your results.
I am using FCN (Fully Convolutional Networks) and trying to do image segmentation. When training, there are some areas which are mislabeled, however further training doesn't help much to make them go away. I believe this is because network learns about some features which might not be completely correct ones, but because there are enough correctly classified examples, it is stuck in local minimum and can't get out.
One solution I can think of is to train for an epoch, then validate the network on training images, and then adjust weights for mismatched parts to penalize mismatch more there in next epoch.
Intuitively, this makes sense to me - but I haven't found any writing on this. Is this a known technique? If yes, how is it called? If no, what am I missing (what are the downsides)?
It highly depends on your network structure. If you are using the original FCN, due to the pooling operations, the segmentation performance on the boundary of your objects is degraded. There have been quite some variants over the original FCN for image segmentation, although they didn't go the route you're proposing.
Just name a couple of examples here. One approach is to use Conditional Random Field (CRF) on top of the FCN output to refine the segmentation. You may search for the relevant papers to get more idea on that. In some sense, it is close to your idea but the difference is that CRF is separated from the network as a post-processing approach.
Another very interesting work is U-net. It employs some idea from the residual network (RES-net), which enables high resolution features from lower levels can be integrated into high levels to achieve more accurate segmentation.
This is still a very active research area. So you may bring the next break-through with your own idea. Who knows! Have fun!
First, if I understand well you want your network to overfit your training set ? Because that's generally something you don't want to see happening, because this would mean that while training your network have found some "rules" that enables it to have great results on your training set, but it also means that it hasn't been able to generalize so when you'll give it new samples it will probably perform poorly. Moreover, you never talk about any testing set .. have you divided your dataset in training/testing set ?
Secondly, to give you something to look into, the idea of penalizing more where you don't perform well made me think of something that is called "AdaBoost" (It might be unrelated). This short video might help you understand what it is :
https://www.youtube.com/watch?v=sjtSo-YWCjc
Hope it helps
I am a very new student on machine learning. I just wanted to ask what are possible ways to improve a method (Naive Bayes for example) to get better results classifying images into text or non-text images, instead of just inputing a x number of images and telling the system which have text and which do not?
Thanks in advance
The state of the art in such problems are deep neural networks with several convolutional layers. See this article for an example of image classification using deep convolutional nets. Your problem (just determining if an image has text or not) is much easier than the general image classification problem the authors consider, so you'd probably get away with using a much simpler network architecture.
Nowadays you don't need to implement these things yourself, there are efficient and GPU-accelerated implementations freely available, for instance Caffe, Torch7, keras...
Can anyone advise me way to build effective face classifier that may be able to classify many different faces (~1000)?
And i have only 1-5 examples of each face
I know about opencv face classifier, but it works bad for my task (many classes, a few samples).
It works alright for one face classification with small number of samples. But i think that 1k separate classifier is not good idea
I read a few articles about face recognition but methods from these articles reqiues a lot of samples of each class for work
PS Sorry for my writing mistakes. English in not my native language.
Actually, for giving you a proper answer, I'd be happy to know some details of your task and your data. Face Recognition is a non-trivial problem and there is no general solution for all sorts of image acquisition.
First of all, you should define how many sources of variation (posing, emotions, illumination, occlusions or time-lapse) you have in your sample and testing sets. Then you should choose an appropriate algorithm and, very importantly, preprocessing steps according to the types.
If you don't have any significant variations, then it is a good idea to consider for a small training set one of the Discrete Orthogonal Moments as a feature extraction method. They have a very strong ability to extract features without redundancy. Some of them (Hahn, Racah moments) can also work in two modes - local and global feature extraction. The topic is relatively new, and there are still few articles about it. Although, they are thought to become a very powerful tool in Image Recognition. They can be computed in near real-time by using recurrence relationships. For more information, have a look here and here.
If the pose of the individuals significantly varies, you may try to perform firstly pose correction by Active Appearance Model.
If there are lots of occlusions (glasses, hats) then using one of the local feature extractors may help.
If there is a significant time lapse between train and probe images, the local features of the faces could change over the age, then it's a good option to try one of the algorithms which use graphs for face representation so as to keep the face topology.
I believe that non of the above are implemented in OpenCV, but for some of them you can find MATLAB implementation.
I'm not native speaker as well, so sorry for the grammar
Coming to your problem , it is very unique in its way. As you said there are only few images per class , the model which we train should either have an awesome architecture which can create better features within an image itself , or there should be an different approach which can achieve this task .
I have four things which I can share as of now :
Do data pre-processing and then create a bigger dataset and train on a neural network ideally. Here, we can do pre-processing like:
- image rotation
- image shearing
- image scaling
- image blurring
- image stretching
- image translation
and create atleast 200 images per class. Please checkout opencv documentation which provides many more methods on how you can increase the size of your dataset. Once you do this, then we can apply transfer learning , which is a better approach than training a neural network from scratch.
Transfer learning is a method where we train a network on our own custom classes , and this network is already pre-trained on 1000's of classes. Since our data here is very less, I would prefer transfer learning only. I have written a blog on how you can approach this using tranfer learning after you have the required amount of data. It is linked here. Face recognition also is a classification task itself, where each human is a separate class. So, follow the instructions given in the blog , may be it would help you create your own powerful classifer.
Another suggestion would be , after creating a dataset , encode them properly. This encoding would help you preserve the features in an image and can help you train better networks. VLAD ,Fisher , Bag of Words are few encoding techniques. You can search few repositories online which have implemented these already on ORL database. Once you encode , train the network on the encodings , you will obviously see a better performance.
Even do check out , Siamese network here which is meant for this purpose I feel . Here they compare two images with similar characteristics on different networks and there by achieve better classification accuracies . Git repository is here.
Another standard approach would be using SVM , Random forests since the data is less. If you still prefer neural networks the above methods would serve you the purpose. If you intend to go with encodings , then I would suggest random forests , as it is highly preferrable in learning and flexible too.
Hopefully , this answer would help you proceed in the right direction of achieving things.
You might want to take a look at OpenFace, a Python and Torch implementantion of face recognition with deep neural networks: https://cmusatyalab.github.io/openface/