3D Object tracking detection using Kinect - opencv

I am working on identifying an object by using Kinect sensor so to get x,y,z coordinates of the object.
I am trying to find the related information for this but could not able to find much. I have seen the videos as well but nobody is sharing the information or any sample code?
This is what I want to achieve https://www.youtube.com/watch?v=nw3yix3XomY
Proabably, few people may asked same question but as I am new to the Kinect and these libraries due to which I need little more guidance.
I read somewhere that object detection is not possible using Kinect v1. We need to use 3rd party libraries like open CV or point-clouds (pcl).
Can somebody help me that even by using third party libraries how exactly can I identify object via a Kinect sensor?
It will be really helpful.
Thank you.

As the author of the video you linked stated in the comment, following this PCL tutorial will help you. As you found out already, realizing this may not be possible using the standalone SDK. Relying on PCL will help you not reinvent the wheel.
The idea there is to:
Downsample the cloud to have less data to deal with in the next steps (this also reduces noise a bit).
Identify keypoints/features (i.e. points, areas, textures that remain somehow invariant to some transformations).
Compute the keypoint descriptors, mathematical representations of these features.
For each scene keypoint descriptor, find nearest neighbor into the model keypoints descriptor cloud and add it to the correspondences vector.
Perform clustering on the keypoints and detect the model in the scene.
The software in the tutorial needs the user to manually feed in the model and scene files. It doesn't do that on live feed, as the video you linked.
The process should be pretty similar though. I'm not sure how cpu-intensive the detection is, so it might require additional performance tweaking.
Once you have frame-by-frame detection in place, you could start thinking about actually tracking an object across the frames. But that's another topic.

Related

Add 2D or 3D Face Filters like MSQRD/SnapChat Using Google Vision API for iOS

Here's some research I have done so far:
- I have used Google Vision API to detect various face landmarks.
Here's the reference: https://developers.google.com/vision/introduction
Here's the link to Sample Code to get the facial landmarks. It uses the same Google Vision API. Here's the reference link: https://github.com/googlesamples/ios-vision
I have gone through the various blogs on internet which says MSQRD based on the Google's cloud vision. Here's the link to it: https://medium.com/#AlexioCassani/how-to-create-a-msqrd-like-app-with-google-cloud-vision-802b578b30a0
For Android here's the reference:
https://www.raywenderlich.com/158580/augmented-reality-android-googles-face-api
There are multiple paid SDK's which full fills the purpose. But they are highly priced. So cant able to afford it.
For instance:
1) https://deepar.ai/contact/
2) https://www.luxand.com/
There is possibility might have some see this question as duplicate of this:
Face filter implementation like MSQRD/SnapChat
But the thread is almost 1.6 years old with no right answers to it.
I have gone through this article:
https://dzone.com/articles/mimic-snapchat-filters-programmatically-1
It describes all the essential steps to achieve the desired results. But they advice to use their own made SDK.
As per my research no good enough material is around which helps to full fill the desired results like MSQRD face filters.
One more Github repository around which has same implementation but it doesn't gives much information about same.
https://github.com/rootkit/LiveFaceMask
Now my question is:
If we have the facial landmarks using Google Vision API (or even using
DiLib), how I can add 2d or 3d models over it. In which format this
needs to be done like this require some X,Y coordinates with vertices
calculation.
NOTE: I have gone through the Googles "GooglyEyesDemo" which adds the
preview layer over eyes. It basically adds a view over the face. So I
dont want to add UIView one dimensional preview layers over it. Image
attached for reference :
https://developers.google.com/vision/ios/face-tracker-tutorial
Creating Models: I also want to know how to create models for live
filters like MSQRD. I welcome any software or format recommendations.
Hope the research I have done will help others and someone else
experience helps me to achieve the desired results. Let me know if any
more details are required.**
Image attached for more reference:
Thanks
Harry
Canvas class is used in android for drawing such 3D / 2D models or core graphics for IOS can be used.
What you can do is detect the face components, take their location points and draw images on top of them. Consider going through this
You need to either predict x,y,z coordinates(check out this demo), either use x,y predictions but then find parameters of universal 3d-model & camera that will give the closest projection of current x,y.

Using OpenCv with no image proccessing background to detect objects on a pavement to avoid them

I'm a Software Engineering student in my last year in a 4-year bachelor degree program, I'm required to work on a graduation project of my own choice.
we are trying to find a way to notify the user of any thing the gets on his/her way while walking, this will be implemented as an android application so we have the ability to use the camera, we thought of Image processing and computer vision but neither me or any of my group members have any Image processing background, we searched a little bit and we found out about OpenCv.
So my question is do I need any special background to deal with OpenCv? and is it a good choice for the objective of my project to use computer vision, if not what alternatives do u advise me to use?
I appreciate your help.. thanks in advance!
At the first glance I would use 2 standard cameras to find depth image - stereo vision (similar to MS Kinect depth sensor)
from that it would be easy to fix a threshold to some distance.
Those algorithms are very CPU hungry so I do not think it will work on Android (although I have zero experience).
I you must use Android, I would look for some depth sensor (to avoid extracting depth data from 2 images)
For prototyping I would use MATLAB (or Octave), then I would switch to OpenCV (pointers, mem. allocations, blah...)

Using OpenCV to find people who wear a certain hat

I would like to use computer vision to do the following:
A camera is mounted outside a building, capturing a videostream of the street below. The camera is installed approximately 5-6 meters above the street.
Whenever a person wearing a certain kind of hat(white, round) is captured by the camera, an event should be triggered.
Which algorithm should I look into to implement this kind of behavior ?
Is this best achieved through training the algorithm with sample data or is there another way to tell it to look for this type of hat ?
Also, how do I use multiple frames of video to increase the quality of detection ?
Edit: Added a picture of the hat
Before we do everything in comments I will start an answer here.
The first link you posted describes a simple color-based detection. You can try that, but it will fail if there are other pixel clusters of similar color in the image. Your idea of combining it with tracking is good: Identify clusters, build trajectories over several images, and only accept plausible trajectories as a hit. For robust tracking you may want to look into Kalman filtering. A problem you will most likely encounter is that a "white" hat will hardly be "white" in the images your camera delivers.
The second link you refer to - boosted Classifiers Based on Haar-like Features - is for detection of more complex objects. It probably won't help you find white blobs. Invest your time and energy in learning about tracking.
I'm happy to repeat myself here: "Solving a computer vision problem" is not something like "sorting an array". OpenCV is not the C++ Standard Library. You can use an std::map without knowing anything about a red-black tree. But (IMHO) you can't use Vision APIs without knowing a good deal of the math and theory. Working solutions Computer Vision are typically heavily tuned towards the specific problem scenario. Sorry if that sounds pedantic, but it explains why your question got beaten.

Face Recognition using Kinect

I went through the Kinect SDK and Toolkit provided by Microsoft. Tested the Face Detection Sample, it worked successfully. But, how to recognize the faces ? I know the basics of OpenCV (VS2010). Is there any Kinect Libraries for face recognition? if no, what are the possible solutions? Are there, any tutorials available for face recognition using Kinect?
I've been working on this myself. At first I just used the Kinect as a webcam and passed the data into a recognizer modeled after this code (which uses Emgu CV to do PCA):
http://www.codeproject.com/Articles/239849/Multiple-face-detection-and-recognition-in-real-ti
While that worked OK, I thought I could do better since the Kinect has such awesome face tracking. I ended up using the Kinect to find the face boundaries, crop it, and pass it into that library for recognition. I've cleaned up the code and put it out on github, hopefully it'll help someone else:
https://github.com/mrosack/Sacknet.KinectFacialRecognition
I've found project which could be a good source for you - http://code.google.com/p/i-recognize-you/ but unfortunetly(for you) its homepage is not in english. The most important parts:
-project(with source code) is at http://code.google.com/p/i-recognize-you/downloads/list
-in bibliography author mentioned this site - http://www.shervinemami.info/faceRecognition.html. This seems to be a good start point for you.
There are no built in functionality for the Kinect that will provide face recognition. I'm not aware of any tutorials out there that will do it, but someone I'm sure has tried. It is on my short list; hopefully time will allow soon.
I would try saving the face tracking information and doing a comparison with that for recognition. You would have a "setup" function that would ask the user the stare at the Kinect, and would save the points the face tracker returns to you. When you wish to recognize a face, the user would look at the screen and you would compare the face tracker points to a database of faces. This is roughly how the Xbox does it.
The big trick is confidence levels. Numbers will not come back exactly as they did previously, so you will need to include buffers of values for each feature -- the code would then come back with "I'm 93% sure this is Bob".

Using Augmented Reality libraries for Academic Project

I'm planning on doing my Final Year Project of my degree on Augmented Reality. It will be using markers and there will also be interaction between virtual objects. (sort of a simulation).
Do you recommend using libraries like ARToolkit, NyARToolkit, osgART for such project since they come with all the functions for tracking, detection, calibration etc? Will there be much work left from the programmers point of view?
What do you think if I use OpenCV and do the marker detection, recognition, calibration and other steps from scratch? Will that be too hard to handle?
I don't know how familiar you are with image or video processing, but writing a tracker from scratch will be very time-consuming if want it to return reliable results. The effort also depends on which kind of markers you plan to use. Artoolkit e.g. compares the marker's content detected from the video stream to images you earlier defined as markers. Hence it tries to match images and returns a value of probability that a certain part of the video stream is a predefined marker. Depending on the threshold you are going to use and the lighting situation, markers are not always recognized correctly. Then there are other markers like datamatrix, qrcode, framemarkers (used by QCAR) that encode an id optically. So there is no image matching required, all necessary data can be retrieved from the video stream. Then there are more complex approaches like natural feature tracking, where you can use predefined images, given that they offer enough contrast and points of interest so they can be recognized later by the tracker.
So if you are more interested in the actual application or interaction than in understanding how trackers work, you should base your work on an existing library.
I suggest you to use OpenCV, you will find high quality algorithms and it is fast. They are continuously developing new methods so soon it will be possible to run it real-time in mobiles.
You can start with this tutorial here.
Mastering OpenCV with Practical Computer Vision Projects
I did the exact same thing and found Chapter 2 of this book immensely helpful. They provide source code for the marker tracking project and I've written a framemarker generator tool. There is still quite a lot to figure out in terms of OpenGL, camera calibration, projection matrices, markers and extending it, but it is a great foundation for the marker tracking portion.

Resources