How does PC/phone recognize person with one pic? - opencv

Recently, I'm studying about facial recognition with OpenCV, and I'm trying some simple example based on study.
I'm considering to use it at front door condition.
Nowadays some buildings or apartments use facial recognition for preventing intruders. When someone joins them (such as company or houses), they require the person's picture. As I know, they require just one picture.
I didn't care about that last time, but now, I'm very curious about it.
The famous algorithms such as PCA, LDA use machine learning, so they increase successful percentages(cases). To use machine learning, they need sample images as many as I can provide. That's why I'm curious about that. Buildings or companys require just one picture, but they can recognize each person. Moreover, their accuracy is very good. How can this happen? Is there any other algorithm besides PCA or LDA?
Thanks for reading!

As far as I know, this hasn't been achieved yet. So I don't think they can develop a software recognizing a person by using only one picture.
It is most likely that they teach the algorithm with the authorized person's pictures. So if that one picture does not match with the trained ones, the algorithm can say this is an intrusion.
Edit:
As linuxqwerty pointed out those commercial products are already trained with huge datasets.
As a result of this training, learning happens and the algorithm achieves feature extraction of all those sample faces.
Then the algorithm knows almost every kinds of features that an human face can have.
For example: thickness of eyebrows, distance between eyes, roundness of chin... These are only a human can say about faces. The algorithm can extract thousands of these features.
It can keep faces as a representation of those features.
So now we have this commercial software which can represent faces as binary codes with a lot of digits.
I am getting your question again.
The apartment or company bought this software.
They included the picture of authorized person.
What the software does is simply converting the picture as it was a thousand digits password.
So that person has this unique password which the system can only reproduce that password only from his face.
To sum up:
The learning part was achieved using big face databases.
Thanks to learning part, the recognition part can be done by using only one picture.
PS: Corrections are welcome.

I happened to read about facial recognition before, that time I wanted to do it as my semester project. And of course I have heard and thought of using OpenCV as well.
Your question is simple, those company or home that use facial recognition, they usually use very well-developed product, which normally includes well-programmed facial recognition. As we are talking about security here, normally companies will buy these security products, unless if they just want to use it as a tool to deter intruders which focus less on the practical usage, and recognition accuracy, they can opt for free facial recognition software.
So, when I'm talking about well-programmed facial recognition, it means that it was trained with huge amount of databases (the photos to be recognized that you mentioned), this means the training is done even before the software is officially launched, which is during the development stage. A good facial recognition software requires both good, complete and detailed programming coding, and also huge photo databases (taken at different ambient light intensity, different facial features like hair style, spectacles) to train it.
Therefore, the accuracy of the software does not depend solely on the amount of pictures given during the usage of the software provided that it is well-programmed in the first place. Thanks and hope I answered your question and wonder.
ps: recognize is spelled this way (US); recognise (UK) =)

Related

Emotion detection through voice/speech solution for Mobile and web [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
I have been searching for emotion detection through voice/speech solution on mobile (iOS) and web.
I found Moodies-iOS and Vokaturi solution, but they are not free.
I couldn't find any open source or paid version software available to integrate in my app and test the solution.
Could someone share if you have any info on this related.
Is there any OPEN SOURCE for iOS for Emotion analysis and detection through Voice/Speech, Please let me know.
As a former research in affective computing, I highly doubt you can find a ready-for-use iOS open source solution for emotion recognition from speech. The main reason is that it is a damn difficult task that requires a lot of research and a lot of proper data to train models. That is why companies like BeyondVerbal and Vokaturi do not share their models with others. Thus, you will be very lucky if you can find anything in open source, I am not even talking about iOS solutions.
I am aware about some toolkits you can use for this task (namely, the openEAR toolkit), but to build something working from it, you need an expert knowledge in the field and data to train models. A comprehensive list of databases can be found here: http://emotion-research.net/wiki/Databases. A lot of them a freely available.
As Dmytro Prylipko said it is very doubtful that there is any open-source lib for emotion recognition from speech.
You may write your own solution. It is not hard. Trouble is, as mentioned before, proper training and/or trasholding takes a lot of time and nerves.
I will give you a short theory how you should begin writing the algo, but training and so on is on you.
First big trouble is that different people differently relay their emotions vocally.
For example: one shocked person will to their shock respond with overexclaimed sentence while another will "freeze" and their response would sound very flat (almost robot-like).
Therefore you will need a lot of templates from which to learn how to classify your input speech by emotions.
You can remove some difficulties by using context recognition along with voice prosody.
That is what I'd advise you to do.
First make an algorithm that will use speech-recognized text to put it into emotion context. E.g. you can use specific words and phrases that people use when expressing different emotions.
That is easily done. You may use a neural network or simple branching or whatever.
So you will be able to recognize whether person is thankful and surprised at the same time by combining context recognition and emotions from prosody.
Now, to recognize the emotion from prosody you have to get prosody parameters and some others.
For example, some emotions may be recognized by looking at duration of particular words in a sentence.
So you have the sentence and the text of that sentence. You know that the speed of normal speech is approximately 200 words per minute. Knowing this and number of words in the sentence you can see how fast is someone talking. Then you measure the duration of each word and get its speed. By knowing how fast is the speech and how long is the word you can get normalized ratios that can be used for classifications in order to determine the closest guess of the emotion.
For instance, when someone is presented with a present that he/she likes very much, the "thank you" will sound pretty long. It will also be of higher pitch than that person's usual speech.
So the next step would be to get the average pitch for each word to see the relation between them. So you will be able to see how the sentence prosody modulates. From lower to higher, or vice versa.
Also, how prosody changes inside the phrases within the sentence.
You may go about this by comparing curves of known emotion directly, or you may use aproximation to get coefficients from the prosody curve vector. The square function does good for normal speech prosody (with no particular emotions in). So some higher order polynomial should do. So, you can get coefficients of the polynom and use them to get what emotion should whole sentence or phrase relay.
The same goes for individual words within the sentence. You get the pitch for each phoneme or syllable or just the pitch curve for e.g. every 20 ms of the word. Then you either calculate few coefficients to aproximate the polynom you decided is good enough for you, or you take the whole curve and normalize it to e.g. 30 points to use it with recognition.
To compare curves directly you may use gesture recognition algorithm by Oleg Dopertchouk:
http://www.gamedev.net/reference/articles/article2039.asp
I tried it on pitch curves of melodies, it works just fine.
The trouble is, you need a database of speech with context and emotion with clear manually done classification to give your algo something to compare with.
If you use polynomials instead of whole curves, you can do some recognition by using thresholds on coefficients, but results will be a bit shaky. Only real excuse for using coeffs at all is that you do not need to know how long is the word in question. I.e. the same polynom should work on a word with 2 phonemes and on one with 5. (should work)
You see, a theory is nice and easy. Use speech recognition, measure speech rate, and duration of each word, construct pitch curve for whole phrase and pitch curve for each word using FFT, do some comparison between ready database and the input. And walla, emotion recognized.
But where will you find the database with word curves marked with emotions.
For example, you would need for each emotion at least one pitch curve for words with different number of phonemes. At least one, because it is important whether the word starts with vowel or ends with one, or simply someone differently relays the same emotion even if the curve represents the same word.
OK, so you can say that you can make one. Where would you find recorded samples to make your curves or calculate coeffs? Hm, perhaps a recording of some drama. Not bad idea, but the acted emotions aren't the same as the natural ones.
It is a big job to teach a machine such a thing.
Oh, yeah, I almost forgot, emotions aren't only, or sometimes at all transfered using pitch changes, sometimes it's only the way in which the word is being pronounced.
So, for some cases, you would probably need LPC or some other coefficients showing some more info on how phonemes in the word sounds. Or you would need to take in view other harmonics from FFT, not just the one representing the pitch of excitation train.
The best that you can do without following my hints and developing your own algo, is to use NLTK (natural language toolkit) to develop a statistical speech (emotionally rich) model and use algorithms from there (perhaps a bit modified) to try to get to the emotion in question.
But I fear it would be a greater job than going from zero. As far as I know NLTK doesn't support emotions. Just normal speech prosody.
You may try to integrate some things I wrote about into Sphinx, to develop emotion based speech models and introduce emotion recognition directly into sphinxes VR algorithm.
If you really need this, I advise you to learn enough DSP to write your own algo, then pay someone to make you initial database from audiobooks, radio dramas and similar stuff (using a tool you provide).
After your algo starts to work reasonably well, implement autolearning by giving users an option to correct the algo's wrong guesses. After some time you will get 90% reliable algo to recognize emotions from speech.

Image Recognition Techniques for Identifying Logos (Brands)

I started learning Image Recognition a few days back and I would like to do a project in which it need to identify different brand logos in Android.
For Ex: If I take a picture of Nike logo in an Android device then it needs to display "Nike".
Low computational time would be the main criteria for me.
For this, I have done some work and started learning OpenCV sample examples.
What would be the best Image Recognition that would be used for me.
1) I came to know from Template Matching that their applicability is limited mostly by the available computational power, as identification of big and complex templates can be time consuming. (and so I don't want to use it)
2) Feature Based detectors like SIFT/SURF/STAR (As per my knowledge this would be a better option for me)
3) How about Deep Learning and Pattern recognition concepts? (I was digging on this and don't know whether it would be an option for me). Can any of you let me know whether I can use this and why it would be an better choice for me when compared with 1 and 2.
4) Haar caascade classifiers (From one of the posts in SO, I came to know that by using Haar it doesn't work in Rotation and Scale invariant and so I haven't concentrated much on this). Does this been a better Option for me If I focus up on.
I’m now running one of my pet projects and it's required face recognition – detecting the area with face on the photo, if it exists with Raspberry pi, so I’ve done some analysis about that task
And I found this approach. The key idea is in avoiding scanning entire picture to help by scanning windows of different sizes like it was in OpenCV, but by dividing an entire photo into 49 (7x7) squares and train the model not only for detecting of presenting one of classes inside each square, but also for determining the location and size of detecting object
It’s only 49 runs of trained model, so I think it's possible to execute this in less than in a second even on non state-of-the-art smartphones. Anyway, it will be a trade-off between accuracy and performance
About the model
I will use vgg –like model, probably a bit simpler than even vgg11A.
In my case ready dataset already exists. So I can train convolutional network with it
Why deep learning approach is better than 1-3 you mentioned? Because of its higher accuracy for such kind of tasks. It’s practically proven. You could check it in kaggle. Majority of the top models for classification competitions are based on convolutional networks
The only disadvantage for you – probably it would be necessary create your own dataset to train the model
Here is a post that I think can be useful for you: Image Processing: Algorithm Improvement for 'Coca-Cola Can' Recognition. Another one: Logo recognition in images.
2) Feature Based detectors like SIFT/SURF/STAR (As per my knowledge
this would be a better option for me)
Just remember that SIFT and SURF are both patented so you will need a license for any commercial use (free for non-commercial use).
4) Haar caascade classifiers (From one of the posts in SO, I came to know that by using Haar it doesn't work in Rotation and Scale invariant and so I haven't concentrated much on this). Does this been a better Option for me If I focus up on.
It works (if I understand your question right), much of this depends of how you trained your classifier. You could train it to detect all kind of rotations and scales. Anyways, I would discourage you to go for this option as I think the other possible solutions are better meant for the case.

Machine Learning: sign visibility

I work at an airport where we need to determine the visibility conditions of pilots.
To do this, we have signs placed every 200 meters along the runway that allow us to determine how far the visibility is. We have multiple runways, and the visibility needs to be checked every hour.
Right now the visibility check is done manually with a human being who looks at the photos from the cameras placed at the end of each runway. So it can be tedious.
I'm a programmer who has very little experience with machine learning, but this sounds like an easy problem to automate. How should I approach this problem? Which algorithms should I study? Would OpenCV help me?
Thanks!
I think this can be automated using computer vision techniques. openCV could make the implementation easier. If all the signs are similar then ,we can train our program to recognize the sign in a specific conditions(lights). Then, we can use the trained classifier to check for the visibility of signs every hours using a simple script.
There is harr-like feature extraction already in openCV. You can use to train classifier which will output a .xml file and use that .xml file for detecting the sign regularly.
I have done a similar project RTVTR(Real Time Vehicle Tracking and Recognition) using openCV and it worked great. http://www.youtube.com/watch?v=xJwBT76VEZ4
Answering to your questions:
How should I approach this problem?
It depends on the result you want/need to obtain. Is this an "hobby" project (even if job-related) or do you need to build a machine vision system to solve the problem and should it be compliant with some regulations or standard?
Which algorithms should I study?
I am very interested in your question but I am not an expert in the field of meteorology and so searching in the relative literature is, for me, a time consuming task... so I reserve to update this part of the answer in the future. I think there will be different algorithms involved in the solution of the problem, some are very general like for example algorithms for the image segmentation, some are very specific like for example how to measure the visibility.
Update: one of the keyword for searching in the literature is Meteorological Visibility, for example
HAUTIERE, Nicolas, et al. Automatic fog detection and estimation of visibility distance through use of an onboard camera. Machine Vision and Applications, 2006, 17.1: 8-20.
LENOR, Stephan, et al. An Improved Model for Estimating the Meteorological Visibility from a Road Surface Luminance Curve. In: Pattern Recognition. Springer Berlin Heidelberg, 2013. p. 184-193.
Would OpenCV help me?
Yes, I think OpenCV can help giving you a starting point.
An idea for a naïve algorithm:
Segment the image in order to get the pixel regions belonging to the signs and to the background.
Compute the measure of visibility according to some procedure, the measure is computed by a function that has as input the regions of all the signs and the background region.
The segmentation can be simplified a lot if the signs are always in the same fixed and known position inside the image.
The measure of visibility is obviously the core of the algorithm and it can be performed in a lot of ways...
You can follow a simple approach where you compute the visibility with a mathematical formula based on the average gray level of the signs and background regions.
You can follow a more sophisticated and machine-learning oriented approach where you implement an algorithm that mimics your current human being based procedure. In this case your problem can be framed as a supervised learning task: you have a set of training examples, each training example is a pair composed by a) the photo of the runway (the input) and b) the visibility related to that photo and computed by human (the desired output). Then the system is trained by means of the training set and when you give a new photo as input it will give you back the visibility measure. I think you have a log for past visibility measures (METAR?) and if you saved the related images too, you will already have a relevant amount of data in order to build a training set and a test set.
Update in the age of Convolutional Neural Networks:
YOU, Yang, et al. Relative CNN-RNN: Learning Relative Atmospheric Visibility from Images. IEEE Transactions on Image Processing, 2018.
Both Tensor and uvts_cvs 's replies are very helpful. While the opencv mainly aims to recognize the sign pattern or even segment it from the background, when you extract the core feature in your problem : visibility, you may still need to include the background signal in your training set. I assume manual check of visibility is based on image contrast, if so, the signal-to-noise ratio(SNR) or contrast-to-noise ratio(CNR) is a good feature in learning. A threshold is defined to classify 'visible-1' and 'invisible-0'. The SNR/CNR can be obtained automatically especially if your sign position and size are fixed in your camera images.
Gather whole bunch of photos and videos and propose it as a challenge on Kaggle. I am sure many people would like to try solve it, even if reward would not be very high.
You can use the template matching functionality of openCV:
http://docs.opencv.org/doc/tutorials/imgproc/histograms/template_matching/template_matching.html
Where the template is the sign. If you manage to find a correct match, then the sign is visible. I think you can also get a sense of the scale of the sign in the image from that code.
As this is a very controlled and static environment, you have perfect conditions to estimate the visibility with vision-based approaches. Nonetheless, it is not so easy to decide which approach to take. In my thesis, I am reviewing this topic in depth for the less well-controlled environment of road traffic. See: LENOR, Stephan. Model-Based Estimation of Meteorological Visibility in the Context of Automotive Camera Systems. 2016. Doktorarbeit. (https://archiv.ub.uni-heidelberg.de/volltextserver/20855/1/20160509_lenor_thesis_final_print.pdf).
I see two major directions you could follow up:
Model-based approaches: Advantages: Not so much dependent on your very specific setup. You do not need heavy collection of data.
Data-based approaches/ML: Advantages: Can hide the whole complexity of different light and weather conditions. You seem to have a good source of data if there are people doing the job right now. Very promising without much engineering effort (just use a light-weighted CNN with few layers or so).
You could also combine both, etc. etc. If you are still interested in a solution, you can contact me again and I am happy to consult in more depth.

Training the algorithm for better image recognition

This is a research question not a direct programming question.
I am working on a symbol recognition algorithm, What the software currently does, it takes an image, divide it into contours (blobs) and start matching each contour with a list of predefined templates. Then for each contour it takes the one that has the highest match rate.
The algorithm is doing fairely however I need to train it better. What I mean is this:
I want to use a machine learning algorithm that will train the algorithm to have better matching. So lets take an example:
I run the recognition on a symbol, the algorithm will run and find that this symbol is a car, then I have to confirm that result (maybe by clicking on "Yes" or "No") the algorithm should learn from that. So if I click on NO the algorithm should learn that this is not a car and will have better result next time (maybe try to match something else). while if i click on YES he will know that he was correct and next time he will perform better when searching for a car.
This is the concept I am trying to research. I need documents or algorithm that can achieve this sort of things. I am not looking for implementations or programming, just concept or researches.
I have done many researches and read a lot about machine learning, neural networks, decision trees.... but i was not able to know how can I use any in my scenarion.
I hope I was clear and this type of question is allowed on stack overflow. if not I'm sorry
Thanks a lot for any help or tip
Image recognition is still a challenge in the community. What you described in your process of manually clicking yes/no is just creating labeled data. Since this is a very broad area, I will just point you to a few links that might be useful.
To get start, you might want to use some existing image databases instead of creating your own, which saves you a lot of effort. e.g., this car dataset in UCIC image db.
Since you already have the background of machine learning, you can take a look at some survey paper that exactly match your project interests, e.g., search object recognition survey paper or feature extraction car in google.
Then you can dive into some good papers and see whether they are suitable for your project. For example, you can check the two papers below that linked with the UCIC image db.
Shivani Agarwal, Aatif Awan, and Dan Roth,
Learning to detect objects in images via a sparse, part-based representation.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(11):1475-1490, 2004.
Shivani Agarwal and Dan Roth,
Learning a sparse representation for object detection.
In Proceedings of the Seventh European Conference on Computer Vision, Part IV, pages 113-130, Copenhagen, Denmark, 2002.
Also check for some implemented softwares instead of starting from scratch, in your case, opencv should be good one to start with.
For image recognition, feature extraction is one of the most important step. You might want to check some stat-of-the-art algorithms in the community. (SIFT, mean-shift, harr features etc).
Boosting algorithm might also be useful when you reach the classification step. I see a lot of scholars mention this in image recognition community.
As #nickbar suggest, discuss more at https://stats.stackexchange.com/

How to train software so it recognize the kind of object shown in a picture?

I want to detect one kind of object such as person in the picture,
who can tell me how to training a kind of people classifier for use,so we can use the classifier to detect people in any picture.
It's a pretty vague question but i think you're looking for a good computer vision library. The gold standard is OpenCV (Open Source Computer Vision). That'll get you started, and there's lots of people that have done facial recognition with it.
If you want to tell two people apart, that's a hugely more complicated problem. You'll likely use some of the same tools, but you'll need much more complicated algos.
you can take a look at viola-jones framework: Viola-Jones Object Detection Framework at Wikipedia
it's cvHaarDetectObjects() in OpenCV.

Resources