Face Authentication - image-processing

My project is Face Authentication.
System Description: My input is only one image (which was taken when the user logins for the first time) and using that image system should authenticate whenever the user logins to the application. The authentication images may differ from the first input image like -- different illumination conditions, different distance from camera and -10 to 10 degrees variation in pose. The camera used is same (ex: ipad) for all cases.
1) Authentication images are stored each time the user logins. How to
make use of these images to enhance the accuracy of the system??
2) When a new image comes, I need to select the closest image(s) (and
not all stored images) from the image repository and use for
authenticate to reduce the time. How to label an image based on
illumination/distance from camera automatically??
3) How should I make my system to perform decently for changes in
illumination and distance from camera??
Please, can anyone suggest me good alogirthm/papers/opensource-codes for my above questions??
Though it sounds like a research project, I would be extremely grateful if I get any response from someone.

For this task I think you should take a look at OpenCV's Face Recognition API. The API is basically able to identify the structure of a face (within certain limitations of course) and provide you with the coordinates of the image within which the face is available.
Having to deal with just the face in my opinion reduces the need to deal with different background colours which I think is something you do not really need.
Once you have the image of the face, you could scale it up/down to have a uniform size and also change the colour of the image to grey scale. Lastly, I would consider feeding all this information to an Artificial Neural Network since these are able to deal with inconsistencies with the input. This will allow you to increase your knowledge base each time a user logs in.
I'm pretty sure there are other ways to go around this. I would recommend taking a look at Google Scholar to try and find papers which deal with this matter for more information and quite possible other ways to achieve what you are after. Also, keep in mind that with some luck you might also find some open source project which already does most of what you are after.

If you really have a database of photographs of faces, you could probably use that to enhance the features of OpenCV face detection. The way faces are recognized is by comparing the principal components of the picture with those of the face examples in OpenCV database.
Check out:
How to create Haar Cascade (xml) for using with OpenCV?
Seeing that, you could also try to do your own Principal Component Analysis on every picture of a recognized face (use OpenCV face detection for that-> Black out everything exept the face, OpenCV gives you the position and size of the face). Compare the PCA to the ones in your database and match it to the closest. Course, this would work best with a fairly big database, so maybe at the beginning there could be wrong matches.
I think creating your own OpenCV haarcascade would be the best way to go.
Good Luck!

Related

Adjust drone images to google earth

I want to use opencv to automatically adjust Google earth photo overlays (photos directly embedded in the landscape) to Google Earth's 3D environment. Photos already have location, fov, and orientation as metadata so only a small adjustment between the real image and the image rendered by Google earth is required.
I have tried using feature detectors like shif, Azaka or fast, but most of the images are on forested areas and there are not many clear features to link (they all find trees to be the most prominent features and that is not good). Beside, these algorithms will look everywhere in the image, and I only want to focus on small translations and scale.
I have tried brute force by translating one of the images a few pixels and computing the overall difference between images, in order to find a best fit traslation. Again, no great result.
Finally I have tried obtaining the contours of both (canny algorithm) to try to use as key points in the images. But so far I find it too hard to find the correct parameters that would work in all the images.
I am fairly new to opencv and I think I am just missing something obvious, can anybody give me some hint or ideas I can try to do??
Thanks
Direct methods can be applied if the images only differ by slight translation or rotation.
This may be a similar approach as the brute force method you described.
Here's a link to a short paper describing some of these methods:
http://pages.cs.wisc.edu/~dyer/ai-qual/irani-visalg00.pdf
Hope this helps.

Image registration algorithms for images with varying distances

I have two cameras side by side. I'd like to register two images taken from each camera. I will assume there won't be any rotational differences between cameras, that is, there will be only translation factor for the images.
I think global transformation will not work for this issue since changes of distances between two images for the closer objects are significantly higher. What should I do in this case? I tried to read some papers but I am not sure which one is the perfect match for me. Do you have any suggestions such as "read this paper", "apply this algorithm", "know this and that" etc.
The project that I'm working on is real-time and image registration will be implemented in the GPU.
First of all, do not assume such constraints like just translational differences. You will have inaccuracies and you will not spare any effort.
Second, a global transformation (I assume you mean a linear transformation i.e. a homography applied to your images) will only work if you have pictures of completely planar surfaces, so you're right, it won't work. You will need a non-rigid image registration. Further, I hope the distance between these two cameras is not too far. Due to parallax, you may have artifacts.
I would recommend to google for terms like "non-rigit image registration for hdr", "stack-based hdr" or "image registration for hdr". You'll find a lot.
However I found this nice overview paper. I think it is a good start.

How to match images with unknown rotation differences

I have a collection of about 3000 images that were taken from camera suspended from a weather balloon in flight. The camera is pointing a different direction in each image but is generally aimed down, so all the images share a significant area (40-50%) with the previous image but at a slightly different scale and rotated an arbitrary (and not consistent) amount. The image metadata includes a timestamp, so I do know with certainty the correct order of images and the elapsed time between each.
I want to process these images into a single video. If I simply string them together it will be great for making people seasick, but won't really capture the amazingness of the set :)
The specific part I need help with is finding the rotation of the image from the previous image. Is there a library somewhere that can identify regions of overlap between two images when the images themselves are rotated relative to each other? If I can find 2-3 common points (or more), I can do the remaining calculations to determine the amount of rotation and the offset so I can put them together correctly. Alternately, if there is a library that calculates both of those things for me, that would be even better.
I can do this in any language, with a slight preference for either Java or Python. The data is in Hadoop, so Java is the most natural language, but I can use scripting languages as well if necessary.
Since I'm new to image processing, I don't even know where to start. Any help is greatly appreciated!
For a problem like this you could look into SIFT. This algorithm detects local features in images. OpenCV has an implementation of it, you can read about it here.
You could also try SURF, which is a similar type of algorithm. OpenCV also has this implemented, you can read about that here.

Structure from Motion (SfM) in a tunnel-like structure?

I have a very specific application in which I would like to try structure from motion to get a 3D representation. For now, all the software/code samples I have found for structure from motion are like this: "A fixed object that is photographed from all angle to create the 3D". This is not my case.
In my case, the camera is moving in the middle of a corridor and looking forward. Sometimes, the camera can look on other direction (Left, right, top, down). The camera will never go back or look back, it always move forward. Since the corridor is small, almost everything is visible (no hidden spot). The corridor can be very long sometimes.
I have tried this software and it doesn't work in my particular case (but it's fantastic with normal use). Does anybody can suggest me a library/software/tools/paper that could target my specific needs? Or did you ever needed to implement something like that? Any help is welcome!
Thanks!
What kind of corridors are you talking about and what kind of precision are you aiming for?
A priori, I don't see why your corridor would not be a fixed object photographed from different angles. The quality of your reconstruction might suffer if you only look forward and you can't get many different views of the scene, but standard methods should still work. Are you sure that the programs you used aren't failing because of your picture quality, arrangement or other reasons?
If you have to do the reconstruction yourself, I would start by
1) Calibrating your camera
2) Undistorting your images
3) Matching feature points in subsequent image pairs
4) Extracting a 3D point cloud for each image pair
You can then orient the point clouds with respect to one another, for example via ICP between two subsequent clouds. More sophisticated methods might not yield much difference if you don't have any closed loops in your dataset (as your camera is only moving forward).
OpenCV and the Point Cloud Library should be everything you need for these steps. Visualization might be more of a hassle, but the pretty pictures are what you pay for in commercial software after all.
Edit (2017/8): I haven't worked on this in the meantime, but I feel like this answer is missing some pieces. If I had to answer it today, I would definitely suggest looking into the keyword monocular SLAM, which has recently seen a lot of activity, not least because of drones with cameras. Notably, LSD-SLAM is open source and may not be as vulnerable to feature-deprived views, as it operates directly on the intensity. There even seem to be approaches combining inertial/odometry sensors with the image matching algorithms.
Good luck!
FvD is right in the sense that your corridor is a static object. Your scenario is the same and moving around and object and taking images from multiple views. Your views are just not arranged to provide a 360 degree view of the object.
I see you mentioned in your previous comment that the data is coming from a video? In that case, the problem could very well be the camera calibration. A camera calibration tells the SfM algorithm about the internal parameters of the camera (focal length, principal point, lens distortion etc.) In the absence of knowledge about these, the bundler in VSfM uses information from the EXIF data of the image. However, I don't think video stores any EXIF information (not a 100% sure). As a result, I think the entire algorithm is running with bad focal length information and cannot solve for the orientation.
Can you extract a few frames from the video and see if there is any EXIF information?

Sparse Image matching in iOS

I am building an iOS app that, as a key feature, incorporates image matching. The problem is the images I need to recognize are small orienteering 10x10 plaques with simple large text on them. They can be quite reflective and will be outside(so the light conditions will be variable). Sample image
There will be up to 15 of these types of image in the pool and really all I need to detect is the text, in order to log where the user has been.
The problem I am facing is that with the image matching software I have tried, aurasma and slightly more successfully arlabs, they can't distinguish between them as they are primarily built to work with detailed images.
I need to accurately detect which plaque is being scanned and have considered using gps to refine the selection but the only reliable way I have found is to get the user to manually enter the text. One of the key attractions we have based the product around is being able to detect these images that are already in place and not have to set up any additional material.
Can anyone suggest a piece of software that would work(as is iOS friendly) or a method of detection that would be effective and interactive/pleasing for the user.
Sample environment:
http://www.orienteeringcoach.com/wp-content/uploads/2012/08/startfinishscp.jpeg
The environment can change substantially, basically anywhere a plaque could be positioned they are; fences, walls, and posts in either wooded or open areas, but overwhelmingly outdoors.
I'm not an iOs programmer, but I will try to answer from an algorithmic point of view. Essentially, you have a detection problem ("Where is the plaque?") and a classification problem ("Which one is it?"). Asking the user to keep the plaque in a pre-defined region is certainly a good idea. This solves the detection problem, which is often harder to solve with limited resources than the classification problem.
For classification, I see two alternatives:
The classic "Computer Vision" route would be feature extraction and classification. Local Binary Patterns and HOG are feature extractors known to be fast enough for mobile (the former more than the latter), and they are not too complicated to implement. Classifiers, however, are non-trivial, and you would probably have to search for an appropriate iOs library.
Alternatively, you could try to binarize the image, i.e. classify pixels as "plate" / white or "text" / black. Then you can use an error-tolerant similarity measure for comparing your binarized image with a binarized reference image of the plaque. The chamfer distance measure is a good candidate. It essentially boils down to comparing the distance transforms of your two binarized images. This is more tolerant to misalignment than comparing binary images directly. The distance transforms of the reference images can be pre-computed and stored on the device.
Personally, I would try the second approach. A (non-mobile) prototype of the second approach is relatively easy to code and evaluate with a good image processing library (OpenCV, Matlab + Image Processing Toolbox, Python, etc).
I managed to find a solution that is working quite well. Im not fully optimized yet but I think its just tweaking filters, as ill explain later on.
Initially I tried to set up opencv but it was very time consuming and a steep learning curve but it did give me an idea. The key to my problem is really detecting the characters within the image and ignoring the background, which was basically just noise. OCR was designed exactly for this purpose.
I found the free library tesseract (https://github.com/ldiqual/tesseract-ios-lib) easy to use and with plenty of customizability. At first the results were very random but applying sharpening and monochromatic filter and a color invert worked well to clean up the text. Next a marked out a target area on the ui and used that to cut out the rectangle of image to process. The speed of processing is slow on large images and this cut it dramatically. The OCR filter allowed me to restrict allowable characters and as the plaques follow a standard configuration this narrowed down the accuracy.
So far its been successful with the grey background plaques but I havent found the correct filter for the red and white editions. My goal will be to add color detection and remove the need to feed in the data type.

Resources