Creating Semantic Maps using OpenCV + OpenSLAM? - opencv

I'm currently working on a project that aims to recognize objects in an indoor household type of environment and to roughly represent the location of these objects on a map. It is hoped that this can all be done using a single Kinect camera.
So far I have managed to implements a basic object recognition system using OpenCV's SURF library. I have followed and used similar techniques to what has been described in "OpenCV 2 Computer Vision Application Programming Cookbook".
I'm now slowly shifting focus to the mapping portion of this project. I have looked into RGBDSLAM as a method to create 3D maps and represent any objects found. However, I can't seem to find a way to do this. I have already asked a question about this at http://answers.ros.org/question/37595/create-semantic-maps-using-rgbdslam-octomaps/ with no luck so far.
I have also briefly researched GMapping and MonoSLAM but I'm finding it difficult to assess whether these are suitable since I've only just started learning about SLAM.
So, any advise on these SLAM techniques would be much appreciated!
I'm also open to alternatives to what I've talked about. If you know of any other methods of creating semantic maps of environment then please feel free to share!
Cheers.

I have used A.J Davison's MonoSLAm method and it is only suitable for small environments like a desktop or a small room (using a fish eye lens). Try to use PTAMM (by Dr. Robert Castle), its much more robust and the source code is free for academic use.

Related

3D Object tracking detection using Kinect

I am working on identifying an object by using Kinect sensor so to get x,y,z coordinates of the object.
I am trying to find the related information for this but could not able to find much. I have seen the videos as well but nobody is sharing the information or any sample code?
This is what I want to achieve https://www.youtube.com/watch?v=nw3yix3XomY
Proabably, few people may asked same question but as I am new to the Kinect and these libraries due to which I need little more guidance.
I read somewhere that object detection is not possible using Kinect v1. We need to use 3rd party libraries like open CV or point-clouds (pcl).
Can somebody help me that even by using third party libraries how exactly can I identify object via a Kinect sensor?
It will be really helpful.
Thank you.
As the author of the video you linked stated in the comment, following this PCL tutorial will help you. As you found out already, realizing this may not be possible using the standalone SDK. Relying on PCL will help you not reinvent the wheel.
The idea there is to:
Downsample the cloud to have less data to deal with in the next steps (this also reduces noise a bit).
Identify keypoints/features (i.e. points, areas, textures that remain somehow invariant to some transformations).
Compute the keypoint descriptors, mathematical representations of these features.
For each scene keypoint descriptor, find nearest neighbor into the model keypoints descriptor cloud and add it to the correspondences vector.
Perform clustering on the keypoints and detect the model in the scene.
The software in the tutorial needs the user to manually feed in the model and scene files. It doesn't do that on live feed, as the video you linked.
The process should be pretty similar though. I'm not sure how cpu-intensive the detection is, so it might require additional performance tweaking.
Once you have frame-by-frame detection in place, you could start thinking about actually tracking an object across the frames. But that's another topic.

Active Shape Models vs Active Appearance Models

I am implementing active ASM/AAM using OpenCV for segmentation of face images using OpenCV (to be further used in face recognition). I am pretty much done with the canonical implementation of ASM (as per T. Cootes papers) and result I get is not ideal, it does not always converge and when it does some boundaries are not captured, which I believe is a problem in the modeling of a local structure - i.e. gradient profile matching.
Now I am a bit unsure what to do next. ASM is a simpler and computationally less intensive algorithm compared to AAM. Should I continue improving ASM(say for example using 2D profiles rather than 1D profiles, or use different profile structure for different type of lanmarks) or get my hands straight on AAM?
Edit: Also, what are the papers you could recommend that improve on the original work by T.Cootes? I know there are so many of them, but maybe there are techniques that are considered canonical today?
You can find clarifications and implemented AAM whith 2D profiles in the book "Mastering OpenCV with Practical Computer Vision Projects" by Packt Publishing 2012. A lot of projects described in this book are open source and can be downloaded here: GitHub. They are more advanced than T.Cootes implementation.
I can say that AAM (as existing implementation you can look also at vosm) have good convergence (better than ASM) only if you train it on the same man (very good results for example for FRANCK (Talking Face Video) sequence) in other cases ASM works better.

Facial expression detection

I am currently working on a project where I have to extract the facial expression of a user (only one user at a time) like sad or happy.
There are a lot of programs/APIs to do face detection but I did not find any one to do automatic expression recognition.
The best possibility I found so far:
-Using Luxand FaceSDK, which will give me access to 66 different points within the face, so I would still have to manually map them to expressions.
I used OpenCV for face detection earlier, which was working great, so If anyone has some tips on how to do it with OpenCV, that would be great!
Any programming language is welcome (Java preferred).
Some user on a OpenCV board suggested looking for AAM (active apereance models) and ASM (active shape models), but all I found were papers.
You are looking for machine learning solutions. FaceSDK looks like a good feature extractor. I don't think that there will be an available library to solve your specific problem. Your best bet is to:
choose a machine learning framework (SVM, PCA) with a java implementation
take a serie of photos and label them yourself with the target expression (happy or sad)
compute your model and test it
This involves some knowledge about machine learning.

Professional Object Tracking Solution

I want to build a video based tracking software. I can manage the control and display quite easily but the actual object tracking in a video stream is very difficult (color tracking is not an option).
Solutions like openCV would probably require a very long learning curve which I can't afford ATM.
Are there professional packages which expose a simple API for object tracking? C# and C++ are the preferred languages but other would be fine as well. Price is also less of an issue.
Computer Vision System Toolbox for MATLAB provides tracking functionality. Please check out the following examples:
Tracking a face
Tracking multiple objects
Generally, a lot depends on the specific problem you are trying to solve. Is the camera moving or stationary? Do you need to track a single object or multiple objects? Does your object have a distinctive color or texture? Does your object move in some predictable way?
Use OpenTLD. It tracks almost anything, but at a time, track only one thing. And code is in matlab.

what are the steps in object detection?

I'm new to image processing and I want to do a project in object detection. So help me by suggesting a step-by-step procedure to this project. Thanx.
Object detection is a very complex problem that includes some real hardcore math and long tuning of parameters to the computation methods involved. Your best bet is to use some freely available library for that - Google will help.
There are lot of algorithms about the theme and no one is the best of all. It's usually a mixture of them what makes the best solution to the solution.
For example, for object movement detection you could look at frame differencing and misture of gaussians.
Also, it's very dependent of your application, the environment (i.e. noise, signal quality), the processing capacity you may have available, the allowable error margin...
Besides, for it to work, most of time it's first necessary to do some kind of image processing to the input data like median filter, sobel filter, contrast enhancement and a large so on.
I think you should start reading all you can: books, google and, very important, a lot of papers about the subjects (there are many free in internet) you are interested in.
And first of all, i think it's fundamental (at least it has been for me) having a good library for testing. The one i have used/use is OpenCV. It's very complete, implement many of the actual more advanced algorithms, is very active, has a big community and it's free.
Open Computer Vision Library (OpenCV)
Have luck ;)
Take a look at AForge.NET. It's nowhere near Project Natal's levels of accuracy or usefulness, but it does give you the tools to learn the algorithms easily. It's an image processing and AI library and there are several tutorials on colored object tracking and motion detection.
Another one to look at is OpenCV from Intel. I believe it's a bit more advanced, but it's written in C.
Take a look at this. It might get you started in this complex field. The algorithm pages that it links to are interesting reading.
http://sun-valley.stanford.edu/projects/helicopters/final.html
This lecture by Jeff Hawkins, will give you an idea about the state of the art in this super-difficult field.
Seems that video disappeared... but this vid should cover similar ground.

Resources