Detecting whether someone is speaking in a video - opencv

I'm trying to figure out how to detect whether a human that I've identified in a video is speaking. I'm using some of the multi-person multi-camera tracking code posted here to detect individuals and I want to determine whether someone identified is speaking at any given time. Is anyone aware of good CV projects that might be able to do this? I've trawled around the action recognition literature a bit but haven't found anything that seems to directly address this. Detection of speaking needs to be done only with video.

There is an implementation of face pose estimation in an open source library.
As you can see from this figure : there are lines around lips.By digging into source code of example you can track movement of lips as you try this example on your own environment you will see that lines covering lips are also moving depending on movement of lips.

Related

I want to detect motion/movement in the live camera. How can i do it?

I'm creating motion detect app for ios. when camera on live any object passes the camera like person , animal. than i want detect motion feature. how's it possible?
I suggest you get familiar with the AVFoundation framework to understand how to get live video frames using the camera of an iOS device. A good starting point is Apple's famous sample AVCam, which should get you familiar with all the camera concepts.
As the next step, figure out how to do the movement detection. The simplest algorithm for that would be the background subtraction. The idea is to subtract two consecutive frames one from another. The areas without movement just cancel each other and become black, while the areas with movements show some nonzero values.
Here's an example of background subtraction in the OpenCV framework.
If in the end, you decide to use OpenCV (which is a classic Computer Vision framework which I definitely recommend), then you'll need to integrate OpenCV into your iOS app. You can see a short tutorial here.
I tried to show you some pointers which could get you going. The problem (how you presented it) is definitely not an easy one, so good luck!

Structure from Motion (SfM) in a tunnel-like structure?

I have a very specific application in which I would like to try structure from motion to get a 3D representation. For now, all the software/code samples I have found for structure from motion are like this: "A fixed object that is photographed from all angle to create the 3D". This is not my case.
In my case, the camera is moving in the middle of a corridor and looking forward. Sometimes, the camera can look on other direction (Left, right, top, down). The camera will never go back or look back, it always move forward. Since the corridor is small, almost everything is visible (no hidden spot). The corridor can be very long sometimes.
I have tried this software and it doesn't work in my particular case (but it's fantastic with normal use). Does anybody can suggest me a library/software/tools/paper that could target my specific needs? Or did you ever needed to implement something like that? Any help is welcome!
Thanks!
What kind of corridors are you talking about and what kind of precision are you aiming for?
A priori, I don't see why your corridor would not be a fixed object photographed from different angles. The quality of your reconstruction might suffer if you only look forward and you can't get many different views of the scene, but standard methods should still work. Are you sure that the programs you used aren't failing because of your picture quality, arrangement or other reasons?
If you have to do the reconstruction yourself, I would start by
1) Calibrating your camera
2) Undistorting your images
3) Matching feature points in subsequent image pairs
4) Extracting a 3D point cloud for each image pair
You can then orient the point clouds with respect to one another, for example via ICP between two subsequent clouds. More sophisticated methods might not yield much difference if you don't have any closed loops in your dataset (as your camera is only moving forward).
OpenCV and the Point Cloud Library should be everything you need for these steps. Visualization might be more of a hassle, but the pretty pictures are what you pay for in commercial software after all.
Edit (2017/8): I haven't worked on this in the meantime, but I feel like this answer is missing some pieces. If I had to answer it today, I would definitely suggest looking into the keyword monocular SLAM, which has recently seen a lot of activity, not least because of drones with cameras. Notably, LSD-SLAM is open source and may not be as vulnerable to feature-deprived views, as it operates directly on the intensity. There even seem to be approaches combining inertial/odometry sensors with the image matching algorithms.
Good luck!
FvD is right in the sense that your corridor is a static object. Your scenario is the same and moving around and object and taking images from multiple views. Your views are just not arranged to provide a 360 degree view of the object.
I see you mentioned in your previous comment that the data is coming from a video? In that case, the problem could very well be the camera calibration. A camera calibration tells the SfM algorithm about the internal parameters of the camera (focal length, principal point, lens distortion etc.) In the absence of knowledge about these, the bundler in VSfM uses information from the EXIF data of the image. However, I don't think video stores any EXIF information (not a 100% sure). As a result, I think the entire algorithm is running with bad focal length information and cannot solve for the orientation.
Can you extract a few frames from the video and see if there is any EXIF information?

How to Handle Occlusion and Fragmentation

I am trying to implement a people counting system using computer vision for uni project. Currently, my method is:
Background subtraction using MOG2
Morphological filter to remove noise
Track blob
Count blob passing a specified region (a line)
The problem is if people come as group, my method only counts one people. From my readings, I believe this is what called as occlusion. Another problem is when people looks similar to background (use dark clothing and passing a black pillar/wall), the blob is separated while it is actually one person.
From what I read, I should implement a detector + tracker (e.g. detect human using HOG). But my detection result is poor (e.g. 50% false positives with 50% hit rate; using OpenCV human detector and my own trained detector) so I am not convinced to use the detector as basis for tracking. Thanks for your answers and time for reading this post!
Tracking people in video surveillance sequences is still an open problem in the research community. However particule filters (PF) (aka sequential monte-carlo) gives good results towards occlusion and complex scene. You should read this. There is also extra links to example source code after biblio.
An advantage on using PF is the gain in computational time towards tracking by detection (only).
If you go this way, feel free to ask for better understanding about the maths behind the PF.
There is no single "good" answer to this as handling occlusion (and background substraction) are still open problems! There are several pointers that can be given that might help you along with your project.
You want to detect if a "blob" is one person or a group of people. There are several things you could do to handle this.
Use multiple cameras (it's unlikely that a group of people is detected as a single blob from all angles)
Try to detect parts of the human body. If you detect two heads on a single blob, there are multiple people. Same can be said for 3 legs, 5 shoulders, etc.
On the area of tracking a "lost" person (one walking behind another object), is to extrapolate it's position. You know that a person can only move so much in between frames. By holding this into account, you know that it's impossible for a user to be detected in the middle of your image and then suddenly disappear. After several frames of not seeing that person, you can discard the observation, as the person might have had enough time to move away.

Robust motion detection in C#

Can anyone point me to a robust motion detection sample/implementation? I know EMGU has a motion detection sample, but its not very good, even small changes in light will be falsely detected as motion. I don't need to track objects. I am looking for a way to detect motion in a video that will not be falsely triggered by changing light conditions.
Have a look at AForge. Everything you should need is there (though you'll need to spend some time putting it all together), and it has a robust community if you need specific help.
I concur with nizmahone. use Aforge:
Here is a link with soem motiond etection in C#:
http://www.codeproject.com/KB/audio-video/Motion_Detection.aspx

slideshow with gestures interaction using opencv

I'm working on a photo gallery *projected on a wall*, in which the users should interact with gestures. The users will be standing in front of the wall projection. The user should be able to select one photo, to go back to the main gallery and to do other (unspecified) gestures.
I have programming skills in c,c++ and some knowledge in opengl. I have no experience with opencv but I think I can use it to recognize the user gestures.
The raw idea is to place a webcam in front of the user (up or down the wall rectangle) and process the video stream with opencv.
This may not be the best solution at all... so a lot of questions arises:
Any reference to helpful documentation?
Should I use a controlled lights ambient?
In your experience where is the best camera position?
Might it be better to back project the wall (I mean that the wall will not be a real wall ;-) )
Any different (better) solution? are there any devices to visually intercept the user gestures (like xbox360 for example)?
Thanks a lot!
Massimo
I don't have much experience on human detection with OpenCV, but with any tool, this is a difficult task. You didn't even specified which parts of the human body you're planned to use... Are gestures use the full body, only arms and hands, etc. ?
OpenCV has some predefined files to detect full human body, face, mouth, etc. (look for dedicated .xml file into OpenCV source code), you may want to try them.
For documentation, the official OpenCV documentation is a must see: http://opencv.willowgarage.com/documentation/cpp/index.html but of course, it is very general.
Controlling the ambient light may be useful, but it depends on the methods you'll use. First, find the suited methods, and make your choice depending on your capacity to control the light. Again, the best position of the camera will depend on the methods and surely on which parts of the human body you planned to use. Finally, keep in mind that OpenCV is not particularly fast do you may need to use some OpenGL routines to make things faster.
If you're prepared not to use only webcams, you may want to have a look at the Kinect SDKs. The official is only supposed to be released next spring, but you can find stuff for Linux boxes already.
have fun!

Resources