I'm trying to use opencv to implement a feature in my app. Basically, my app allows users to authenticate by using their face. Live video will be captured and frames are extracted. Using these extracted images, the model is learned. Next time when a user logs in, frames are sent to the model for deciding if this is the authenticated user.
I found this example from opencv site which uses FaceRecognizer. However, they use an existing dataset with 10 classes (10 persons). In my case, only one class is considered (Or we can consider two classes including the authenticated user and unknown users). Could you please suggest me a solution?
Thank you.
First of all, I would suggest you look at other methods for face recognition (DNN-based) since the OpenCV FaceRecognizer stuff (ex eigen) is not particularly good.
However, if you want to use it, note that FaceRecognizer::predict has an overload that outputs a "confidence" value. This is the value you would need to look at to decide if the match was right. You'll need to experiment to find your sweet spot between false positives and false negatives.
Related
I want to simulate lidars. I saw that a class DepthSensor was mentioned in the documentation, but I have not found its actual implementation. For now, I am planning on using the RgbdSensor class and use only the height I need of the depth point cloud I receive to simulate my lidars.
Just to get your input on that, maybe I missed something, but is there a specific class for lidars, and how would you go about adding lidars to a simulation?
Thanks in advance,
Arnaud
You've discovered an anchronism in the code. There had previously been a lidar-like sensor (called DepthSensor). The extant documentation refers to that class. The class's removal should've been accompanied by a clean up of the documentation.
The approach you are taking is the expected approach given Drake's current state.
There has always been an intention to re-introduce a lidar-like sensor in Drake's current architecture. It simply hasn't been a high priority.
I'd recommend you proceed with what you're currently doing (lidar from depth images) but, at the same time, post an issue requesting a lidar-like query with specific focus on the minimum lidar-properties that you require. A discussion regarding how that would differ from what you can actually get from the depth images would better inform of us your unique needs and how to prioritize it. (You can also indicate more advanced features that you need less but would be good to have, of course).
As for the question: how would you go about adding lidars?
That's problematic. Ideally, what you would need is ray-casting ability. The intent is for QueryObject to support such a query, but it hasn't happened yet. (It's certainly the underlying technology we'd have used to implement a LidarSensor.) In the absence of that kind of functionality, you'd essentially have to do it yourself in the most horrible, tedious way imaginable. I'd go so far as to suggest that it's not feasible with the current API.
First, you need to know that I'm a beginner in this subject. Initially, I'm an Embedded System Developpers but I never worked with image recognition.
Let me expose my main goal:
I would like to create my own database of Logos and be able to
recognize them in a larger image. Typical application would be, for
example, to make a database of pepsi logos and coca-cola logos and
when I take a photo of a bottle of Soda, it tells me if it one of
them or an another.
So, here is my problem:
I first wanted to use the Auto ML Kit of Google. I gave him my
databases so it could train itself on it. My first attempt was to
take photos of bottle entirely and then compare. It was ok but not
too efficient. I then tried to give him only logos but after
training, it couldnt recognize anything in the whole image of a
bottle.
I think I didn't give enough images in the first case. But I'd prefer to use the second case (by giving only logo) so that the machine would search something similar in the image.
Finally, my questions:
If you've worked with ML Kit from Google, were you able to train a
model by giving images that should be recognized in a larger image?
If yes, do you have any hints to give me?
Do you know reliable software that could help me to perform tests of this kind? I thought about Azure Machine Learning Studio from
Microsoft (since I develop on Visual Studio).
In a first time, I'd like to code as few as I can just for testing. Maybe later I could try to code my own Machine Learning System but I think it's a big challenge.
I also thought that I would need to split my image in smaller image and then send each of this images into the Machine but it would be time consuming and I need a fast reaction (like < 2 seconds).
Thanks in advance for your answer. I don't need complete answer with full tutorial (Stack Overflow is not intended for that anyway ^^) but just some advices would already be good.
Have a good day!
Azure’s Custom Vision is great for this: https://www.customvision.ai
Let’s say you want to detect a pepsi logo. Upload 70 images of products with the logo on them. Use Custom Vision to draw a box around the logo for each photo. Click “train”, and you get a tensorflow model with code.
Look up any tutorial for it, it’s pretty incredible and really easy to use.
I have about 2,000 images of cars, most pointing right, but some pointing left.
I'd like to find a way of automatically tagging a car with it's direction (new images will be coming in continually).
I'm struggling to get started and wondered if this kind of image detection problem has a name that may help my searches. Is object orientation detection a thing?
I'm a software developer (not doing much ML or Image stuff) and have a ton of azure and gcc resources available, but I can't find anything to solve this. Azure Cognitive Service can tell us it's a car in the picture, but doesn't tell us the direction.
Could just do with a good starting point to get going.
Should add, the images are quite clean on white backgrounds, examples:
Thanks to Venkata for commenting, it was a bad dataset causing our issues (too many rights vs left).
Here's what we did to get it all working:
We set up a training and prediction instance in azure (using custom vision cognitive services in our portal).
We then used https://www.customvision.ai/ to set everything up and train the model (it's super simple).
We didn't actually need any left facing images in the end, we just took all the right facing images we had (about 500 in the final instance), we uploaded them all with the tag "Right". We then mirrored all the images with a photoshop script and then uploaded them all again with a "Left" tag. It trained for about 15 minutes and we ended up with a 100% prediction score. We tested it with a load of images that weren't contained in the training set to confirm it was all working.
We then did the same for a ton of van/truck images, these were taken from a different angle (cars were all side profile shots, the vans were all front 3 quarter so we weren't sure if we'd have the same success).
Again, we flipped the images ourselves to create the left images so we only needed to source right facing vans to create the whole model.
We ended up with a 99.8% score, which is totally acceptable for our use case and we can now detect all cars and van directions and it even detects car directions that are front 3 quarters and vans that are in profile (even though we only trained cars in profile and vans in 3 quarter).
The custom vision portal gives you an API endpoint and a key, now when we detect a new image in our system it goes via the API (using the custom image sdk/nuget in our .net site) and we check the tags to see if it needs flipping. If it does, we flip it and save it back to the disk and it's then cached so it doesn't keep hitting the API.
It's pretty amazing, it took us just two days to research the options, pick a provider and then implement the solution in to a production platform. It's probably a simple use case for ML, but 10 years ago (or even 5) we couldn't have dreamed that things would have come along so far.
tldr; If you need to detect if an object in an image is pointing left or right, just grab a lot of right facing examples and then flip them yourself to create a well balanced model. Obviously, this relies on the object looking the same from one side to the other.
I'm very new into Image Processing libraries. I been looking into OpenCV. But I have a question.
What sort of algorithms could I use if I want to identify few similar objects in a room.
Lets say 3 similar tables.
With a camera I sign an identity to each of those tables, after I move the camera
to a positon where the objects are out of sight, when pointing the camera back
to them, the system can properly identify those objects with the initial ID and trigger action based in each id.
I read about aruco makers, but i would like to try the idea without have to attach markers.
There's plenty of methods to choose from. You could use image features, color matching, shape matching, pattern matching ... and so on. It really depends on the specific use case and the environment. In any case you need something unique to distinguish the tables from each other. Using markers would be one way to artificially create uniqueness.
Maybe you wanna start reading here to get a feeling how one method works:
https://docs.opencv.org/3.4.1/dc/dc3/tutorial_py_matcher.html
Could you provide an example set of images of the scenario?
I am trying to build an app that allows the user to record individual people speaking, and then save the recordings on the device and tag each record with the name of the person who spoke. Then there is the detection mode, in which i record someone and can tell whats his name if he is in the local database.
First of all - is this possible at all? I am very new to iOS development and not so familiar with the available APIs.
More importantly, which API should I use (ideally free) to correlate between the incoming voice and the records I have in the local db? This should behave something like Shazam, but much more simple since the database I am looking for a match against is much smaller.
If you're new to iOS development, I'd start with the core app to record the audio and let people manually choose a profile/name to attach it to and worry about the speaker recognition part later.
You obviously have two options for the recognition side of things: You can either tie in someone else's speech authentication/speaker recognition library (which will probably be in C or C++), or you can try to write your own.
How many people are going to use your app? You might be able to create something basic yourself: If it's the difference between a man and a woman you could probably figure that out by doing an FFT spectral analysis of the audio and figure out where the frequency peaks are. Obviously the frequencies used to enunciate different phonemes are going to vary somewhat, so solving the general case for two people who sound fairly similar is probably hard. You'll need to train the system with a bunch of text and build some kind of model of frequency distributions. You could try to do clustering or something, but you're going to run into a fair bit of maths fairly quickly (gaussian mixture models, et al). There are libraries/projects that'll do this. You might be able to port this from matlab, for example: https://github.com/codyaray/speaker-recognition
If you want to take something off-the-shelf, I'd go with a straight C library like mistral, as it should be relatively easy to call into from Objective-C.
The SpeakHere sample code should get you started for audio recording and playback.
Also, it may well take longer for the user to train your app to recognise them than it's worth in time-saving from just picking their name from a list. Unless you're intending their voice to be some kind of security passport type thing, it might just not be worth bothering with.