HOG people detection and changes in size due to perspective - opencv

I am trying to make HOG work for people detection using OpenCV. My problem is that the persons in my application appear in different sizes due to perspective, so, I am afraid I have to train different people sizes. Here are my questions:
1.- I already have INRIA and MIT databases. all images are 128x64 pix. Could I resize that database to have bigger and smaller samples and later train the system several times? 2.- Will I obtain different lenghts in HOG descriptors if the images in tha database are of different size?
Finally, my negative database is very different in samples to the situation I am facing. I want to detect people in an industrial environment ( machines, repair shop, etc) and the samples are of cities, streets, beaches, etc. Is this negative databa useful? or could I detect only people and reject directly low matches?
Thank you all very much in advance.

My problem is that the persons in my application appear in different sizes due to perspective
You can use gpu::HOGDescriptor::detectMultiScale
Is this negative databa useful? or could I detect only people and reject directly low matches?
All negative data is useful, the more data you train with the more robust your system. However if you are only using one use case. I would suggest getting some background images of the environment you are going to be using it in.

Related

How to solve the "cold start" problem in computer vision based deep learning models?

By “Cold Start” I mean that often computer vision models for object detection or semantic segmentation require about 5000 images per class. So if an idea if floated within the company for e.g. we want to use object detection to count the number of wood logs when the truck is dispatched and then use the same app to count the number that is received.
So now the challenge is that you have only a few images of woods logs on a truck but to train any model you need thousands, so what do practitioners typically do for these prototypes?
Because at this stage it is not clear what model to try? It is also not very feasible to ask business to invest in collecting thousands of images of logs and label them?
That is why I am calling this “Cold Start”. How do you start?
What I have looked into is Conditional GANs, Pix-2-Pix but I am trying to understand the recommended method on how to start when you have very few images per object class.
I expect that when I drop a few images in a folder and call this library I end up getting a lot more images per class so I can then start my prototyping.
Note that asking for software libraries is specifically off-topic here.
No, there is no magic solution: if your data set doesn't have enough information in its images to train a hand-crafted model, no amount of software will change that fact. However, the first approach is to challenge that "fact": how do you know that you don't have enough images? What happened when you used what you have to train a model? You will train for more epochs before the model converges, but you should be able to achieve far better than random accuracy by training a comparable quantity of iterations.
I seriously doubt that you'll need to collect and label thousands of images: you have a very restricted paradigm, photos of log trucks taken from an vantage point you control. Training a model to count non-overlapping near-circles will take much less differentiation than, say, distinguishing motor vehicles from postal boxes.
Experiment with the basic models you have at hand -- you already have much more of the solution than you realize. If your data set is too small, go out the yard with a digital camera and get twice as many, three times, whatever you need. Flip the images left-right to get more input.
Does that get you moving?
Transfer learning solves the problem you are describing as "Cold Start". Basically you can import the weights obtained after training using a big and open dataset and just fine-tune them using the smaller dataset you already have. Data augmentation, freezing some of the layers, etc may help improving the results of a fine-tuned model.

How to recognize or match two images?

I have one image stored in my bundle or in the application.
Now I want to scan images in camera and want to compare that images with my locally stored image. When image is matched I want to play one video and if user move camera from that particular image to somewhere else then I want to stop that video.
For that I have tried Wikitude sdk for iOS but it is not working properly as it is crashing anytime because of memory issues or some other reasons.
Other things came in mind that Core ML and ARKit but Core ML detect the image's properties like name, type, colors etc and I want to match the image. ARKit will not support all devices and ios and also image matching as per requirement is possible or not that I don't have idea.
If anybody have any idea to achieve this requirement they can share. every help will be appreciated. Thanks:)
Easiest way is ARKit's imageDetection. You know the limitation of devices it support. But the result it gives is wide and really easy to implement. Here is an example
Next is CoreML, which is the hardest way. You need to understand machine learning even if in brief. Then the tough part - training with your dataset. Biggest drawback is you have single image. I would discard this method.
Finally mid way solution is to use OpenCV. It might be hard but suit your need. You can find different methods of feature matching to find your image in camera feed. example here. You can use objective-c++ to code in c++ for ios.
Your task is image similarity you can do it simply and with more reliable output results using machine learning. Since your task is using camera scanning. Better option is CoreML.You can refer this link by apple for Image Similarity.You can optimize your results by training with your own datasets. Any more clarifications needed comment.
Another approach is to use a so-called "siamese network". Which really means that you use a model such as Inception-v3 or MobileNet and both images and you compare their outputs.
However, these models usually give a classification output, i.e. "this is a cat". But if you remove that classification layer from the model, it gives an output that is just a bunch of numbers that describe what sort of things are in the image but in a very abstract sense.
If these numbers for two images are very similar -- if the "distance" between them is very small -- then the two images are very similar too.
So you can take an existing Core ML model, remove the classification layer, run it twice (once on each image), which gives you two sets of numbers, and then compute the distance between these numbers. If this distance is lower than some kind of threshold, then the images are similar enough.

Use Azure Machine learning to detect symbol within an image

4 years ago I posted this question and got a few answers that were unfortunately outside my skill level. I just attended a build tour conference where they spoke about machine learning and this got me thinking of the possibility of using ML as a solution to my problem. i found this on the azure site but i dont think it will help me because its scope is pretty narrow.
Here is what i am trying to achieve:
i have a source image:
and i want to which one of the following symbols (if any) are contained in the image above:
the compare needs to support minor distortion, scaling, color differences, rotation, and brightness differences.
the number of symbols to match will ultimately at least be greater than 100.
is ML a good tool to solve this problem? if so, any starting tips?
As far as I know, Project Oxford (MS Azure CV API) wouldn't be suitable for your task. Their APIs are very focused to Face related tasks (detection, verification, etc), OCR and Image description. And apparently you can't extend their models or train new ones from the existing ones.
However, even though I don't know an out of the box solution for your object detection problem; there are easy enough approaches that you could try and that would give you some start point results.
For instance, here is a naive method you could use:
1) Create your dataset:
This is probably the more tedious step and paradoxically a crucial one. I will assume you have a good amount of images to work with. What would you need to do is to pick a fixed window size and extract positive and negative examples.
If some of the images in your dataset are in different sizes you would need to rescale them to a common size. You don't need to get too crazy about the size, probably 30x30 images would be more than enough. To make things easier I would turn the images to gray scale too.
2) Pick a classification algorithm and train it:
There is an awful amount of classification algorithms out there. But if you are new to machine learning I will pick the one I would understand the most. Keeping that in mind, I would check out logistic regression which give decent results, it's easy enough for starters and have a lot of libraries and tutorials. For instance, this one or this one. At first I would say to focus in a binary classification problem (like if there is an UD logo in the picture or not) and when you master that one you can jump to the multi-class case. There are resources for that too or you can always have several models one per logo and run this recipe for each one separately.
To train your model, you just need to read the images generated in the step 1 and turn them into a vector and label them accordingly. That would be the dataset that will feed your model. If you are using images in gray scale, then each position in the vector would correspond to a pixel value in the range 0-255. Depending on the algorithm you might need to rescale those values to the range [0-1] (this is because some algorithms perform better with values in that range). Notice that rescaling the range in this case is fairly easy (new_value = value/255).
You also need to split your dataset, reserving some examples for training, a subset for validation and another one for testing. Again, there are different ways to do this, but I'm keeping this answer as naive as possible.
3) Perform the detection:
So now let's start the fun part. Given any image you want to run your model and produce coordinates in the picture where there is a logo. There are different ways to do this and I will describe one that probably is not the best nor the more efficient, but it's easier to develop in my opinion.
You are going to scan the picture, extracting the pixels in a "window", rescaling those pixels to the size you selected in step 1 and then feed them to your model.
If the model give you a positive answer then you mark that window in the original image. Since the logo might appear in different scales you need to repeat this process with different window sizes. You also would need to tweak the amount of space between windows.
4) Rinse and repeat:
At the first iteration it's very likely that you will get a lot of false positives. Then you need to take those as negative examples and retrain your model. This would be an iterative process and hopefully on each iteration you will have less and less false positives and fewer false negatives.
Once you are reasonable happy with your solution, you might want to improve it. You might want to try other classification algorithms like SVM or Deep Learning Artificial Neural Networks, or to try better object detection frameworks like Viola-Jones. Also, you will probably need to use crossvalidation to compare all your solutions (you can actually use crossvalidation from the beginning). By this moment I bet you would be confident enough that you would like to use OpenCV or another ready to use framework in which case you will have a fair understanding of what is going on under the hood.
Also you could just disregard all this answer and go for an OpenCV object detection tutorial like this one. Or take another answer from another question like this one. Good luck!

Compare Images By Color Using a Genetic Algorithm

I am right now in a serious problem, I need to compare images of flowers (carnations) using a genetic algorithm, the program must determine which variety does the flower belongs to (until now I am using 15 different varieties), the thing is I am having difficulties constructing the chromosome, right know I am only analysing the HSV of each image, then a take every channel and calculate the mean for each (n=255), after that I calculate the correlation between HS, HV and SV, I expected that the mean would be enough to locate any new flower next to the clusters of flowers of the variety it belongs (by the way, I have a database of all the flowers used for training purpose) by calculating the distance between the mean of the flower and the centroids of each cluster, and probably using the correlations for adjustment, but that distance is usually way smaller to a different variety than the one it must be. Is there a way to classify this flowers using ONLY colours (I've read of applications that uses the texture, but that's way out of my league), especially using a genetic algorithm (I know Neural Networks are more appropriate to this kind of analysis but that's what the teacher asked)?. Thank you very much. By the way I am working on OpenCV, don’t know if it's relevant. PS: Excuse my English if any mistakes were done, not my native language.

Eigenfaces algorithm

I am programming a face recognition program using OpenCV.
When generating the eigenfaces:
do I need to use a big database of unknown faces ?
do I need to use only photos of the people I want my system to recognize ?
do I need to use both ?
I am talking about the eigenfaces generation, this is the "learning" step.
And how many photos do I need to use to have decent accuracy ? More like 20, or 2000 ?
Thanks
Eigenfaces works by projecting the faces into a particular "face basis" using principal component analysis or PCA. The basis does not have to include photos of people you want to recognize.
Instead, I would encourage you to train based upon a big database (at least 10k faces) that is well registered (eigenfaces doesn't work well with images that are shifted). The original paper by Turk and Pentland was remarkable partly due to the large pin registered face database they released. I would also say that try to have the lighting normalized to the same between the database and your test inputs.
In terms of testing, first 20 components should be sufficient to reconstruct a human recognizable face and first 100 components should be enough to discriminate between any two face for essentially arbitrarily large dataset.
You don't need too many random faces to compose a human face; somewhere close to 20 should give good results, maybe go with more if you can. They should all be lined up as much as possible to one another, front facing, and photos in grayscale under the same lighting conditions.

Resources