I am trying to differentiate between camera motion and tool motion in a surgical video.
I have tried optical flow using opencv farneback and pass the results to an ML model to learn but no success.a major issue is getting good keypoints in case of camera motion. Is there an alternate technique to distinguish between camera motion and tool/tissue movement? Note: camera motion happens only in 10% of the video
I wish I could add a comment (too new to be able to comment), as I don't have a good answer for you.
I think it really depends on the nature of the input image. Can you show some typical input images here?
What is your optical flow result look like? I thought you might get some reasonable results.
Have you tried some motion estimation method, to analyze if there is global movement across different frames, or there is only some local movements?
Note: I am not asking for code just the name of some algorithms I can research
Problem
I'm trying to do a 3D reconstruction from multiple images where the subject is rotating.
I don't have a lot of experience in the CV field and my mathematics is not particularly good so I am reliant on either simply worded papers or existing libraries/tools.
Where I'm At
I have been researching and testing using Structure from Motion (with VisualSFM) to generate a 3d point cloud and Multi View Stereo (with CMPMVS) to reconstruct the scene.
Obviously these algorithms are designed to process images where the camera (or cameras) moves and the scene is stationary so the reconstructions fail for a number of reasons.
Example: I've been working with short videos of a person rotating around a marker on the floor, see below. I've tried removing the background but this seems to make it harder for SfM to build pairs, probably because of the lack of intformation.
Question
Does anyone know the name of an algorithm/pipeline that will be able to reconstruct a series of images with a static background and a rotating object?
Or is it not possible to do a 3d reconstruction in this way?
Let me explain my need before I explain the problem.
I am looking forward for a hand controlled application.
Navigation using palm and clicks using grab/fist.
Currently, I am working with Openni, which sounds promising and has few examples which turned out to be useful in my case, as it had inbuild hand tracker in samples. which serves my purpose for time being.
What I want to ask is,
1) what would be the best approach to have a fist/grab detector ?
I trained and used Adaboost fist classifiers on extracted RGB data, which was pretty good, but, it has too many false detections to move forward.
So, here I frame two more questions
2) Is there any other good library which is capable of achieving my needs using depth data ?
3)Can we train our own hand gestures, especially using fingers, as some paper was referring to HMM, if yes, how do we proceed with a library like OpenNI ?
Yeah, I tried with the middle ware libraries in OpenNI like, the grab detector, but, they wont serve my purpose, as its neither opensource nor matches my need.
Apart from what I asked, if there is something which you think, that could help me will be accepted as a good suggestion.
You don't need to train your first algorithm since it will complicate things.
Don't use color either since it's unreliable (mixes with background and changes unpredictably depending on lighting and viewpoint)
Assuming that your hand is a closest object you can simply
segment it out by depth threshold. You can set threshold manually, use a closest region of depth histogram, or perform connected component on a depth map to break it on meaningful parts first (and then select your object based not only on its depth but also using its dimensions, motion, user input, etc). Here is the output of a connected components method:
Apply convex defects from opencv library to find fingers;
Track fingers rather than rediscover them in 3D.This will increase stability. I successfully implemented such finger detection about 3 years ago.
Read my paper :) http://robau.files.wordpress.com/2010/06/final_report_00012.pdf
I have done research on gesture recognition for hands, and evaluated several approaches that are robust to scale, rotation etc. You have depth information which is very valuable, as the hardest problem for me was to actually segment the hand out of the image.
My most successful approach is to trail the contour of the hand and for each point on the contour, take the distance to the centroid of the hand. This gives a set of points that can be used as input for many training algorithms.
I use the image moments of the segmented hand to determine its rotation, so there is a good starting point on the hands contour. It is very easy to determine a fist, stretched out hand and the number of extended fingers.
Note that while it works fine, your arm tends to get tired from pointing into the air.
It seems that you are unaware of the Point Cloud Library (PCL). It is an open-source library dedicated to the processing of point clouds and RGB-D data, which is based on OpenNI for the low-level operations and which provides a lot of high-level algorithm, for instance to perform registration, segmentation and also recognition.
A very interesting algorithm for shape/object recognition in general is called implicit shape model. In order to detect a global object (such as a car, or an open hand), the idea is first to detect possible parts of it (e.g. wheels, trunk, etc, or fingers, palm, wrist etc) using a local feature detector, and then to infer the position of the global object by considering the density and the relative position of its parts. For instance, if I can detect five fingers, a palm and a wrist in a given neighborhood, there's a good chance that I am in fact looking at a hand, however, if I only detect one finger and a wrist somewhere, it could be a pair of false detections. The academic research article on this implicit shape model algorithm can be found here.
In PCL, there is a couple of tutorials dedicated to the topic of shape recognition, and luckily, one of them covers the implicit shape model, which has been implemented in PCL. I never tested this implementation, but from what I could read in the tutorial, you can specify your own point clouds for the training of the classifier.
That being said, you did not mentioned it explicitly in your question, but since your goal is to program a hand-controlled application, you might in fact be interested in a real-time shape detection algorithm. You would have to test the speed of the implicit shape model provided in PCL, but I think this approach is better suited to offline shape recognition.
If you do need real-time shape recognition, I think you should first use a hand/arm tracking algorithm (which are usually faster than full detection) in order to know where to look in the images, instead of trying to perform a full shape detection at each frame of your RGB-D stream. You could for instance track the hand location by segmenting the depthmap (e.g. using an appropriate threshold on the depth) and then detecting the extermities.
Then, once you approximately know where the hand is, it should be easier to decide whether the hand is making one gesture relevant to your application. I am not sure what you exactly mean by fist/grab gestures, but I suggest that you define and use some app-controlling gestures which are easy and quick to distinguish from one another.
Hope this helps.
The fast answer is: Yes, you can train your own gesture detector using depth data. It is really easy, but it depends on the type of the gesture.
Suppose you want to detect a hand movement:
Detect the hand position (x,y,x). Using OpenNi is straighforward as you have one node for the hand
Execute the gesture and collect ALL the positions of the hand during the gesture.
With the list of positions train a HMM. For example you can use Matlab, C, or Python.
For your own gestures, you can test the model and detect the gestures.
Here you can find a nice tutorial and code (in Matlab). The code (test.m is pretty easy to follow). Here is an snipet:
%Load collected data
training = get_xyz_data('data/train',train_gesture);
testing = get_xyz_data('data/test',test_gesture);
%Get clusters
[centroids N] = get_point_centroids(training,N,D);
ATrainBinned = get_point_clusters(training,centroids,D);
ATestBinned = get_point_clusters(testing,centroids,D);
% Set priors:
pP = prior_transition_matrix(M,LR);
% Train the model:
cyc = 50;
[E,P,Pi,LL] = dhmm_numeric(ATrainBinned,pP,[1:N]',M,cyc,.00001);
Dealing with fingers is pretty much the same, but instead of detecting the hand you need to detect de fingers. As Kinect doesn't have finger points, you need to use a specific code to detect them (using segmentation or contour tracking). Some examples using OpenCV can be found here and here, but the most promising one is the ROS library that have a finger node (see example here).
If you only need the detection of a fist/grab state, you should give microsoft a chance. Microsoft.Kinect.Toolkit.Interaction contains methods and events that detects the grip / grip release state of a hand. Take a look at the HandEventType of InteractionHandPointer . That works quite good for the fist/grab detection, but does not detect or report the position of individual fingers.
The next kinect (kinect one) detects 3 joint per hand (Wrist, Hand, Thumb) and has 3 hand based gestures: open, closed (grip/fist) and lasso (pointer). If that is enough for you, you should consider the microsoft libraries.
1) If there are a lot of false detections, you could try to extend the negative sample set of the classifier, and train it again. The extended negative image set should contain such images, where the fist was false detected. Maybe this will help to create a better classifier.
I've had quite a bit of succes with the middleware library as provided by http://www.threegear.com/. They provide several gestures (including grabbing, pinching and pointing) and 6 DOF handtracking.
You might be interested in this paper & open-source code:
Robust Articulated-ICP for Real-Time Hand Tracking
Code: https://github.com/OpenGP/htrack
Screenshot: http://lgg.epfl.ch/img/codedata/htrack_icp.png
YouTube Video: https://youtu.be/rm3YnClSmIQ
Paper PDF: http://infoscience.epfl.ch/record/206951/files/htrack.pdf
I am trying to create a Panorama app for iPhone/iPad.
The image stitching bit is OK, I'm using openCV libraries and the results are pretty acceptable.
But I'm a bit stuck on developing the UI for assisting the user while capturing the panorama.
Most apps (even on Android) would provide user with some sort of a marker that translates/rotates exactly matching the movement of the user's camera.
[I'm using the iOS 7 - default camera's panorama feature as a preliminary benchmark].
However, I'm way off the mark till date.
What I've tried:
I've tried using the accelerometer and gyro data for tracking the marker. With this approach -
I've applied an LPF on the accelerometer data and used simple
Newtonian mechanics (with a carefully tuned damping factor) to
translate the marker on the screen. Problem with this approach: very erratic data. Marker tends to jump and wobble between points. Hard to tell between smooth movement and jerk.
I've tried using a complimentary filter between LPF-ed gyro and
accelerometer data to translate the blob. Problem with this approach: Slightly better than the first approach, but still quite random.
I've also tried using image processing to compute optical flow. I'm
using openCV's
goodFeaturesToTrack(firstMat, cornersA, 30, 0.01, 30);
to get the trackable points from a first image (sampled from camera
picker) and then using calcOpticalFlowPyrLK to get the positions
of these points in the next image.
Problem with this approach: However, the motions vectors obtained from tracking these points are too noisy to compute the resultant
direction of motion accurately.
What I think I should do next:
Perhaps compute the DCT matrix from accelerometer and gyro data and
use some algorithm to filter one output with the other.
Work on the image processing algorithms, use some different techniques
(???).
Use Kalman filter to fuse the state prediction from
accelerometer+gyro with that of the image processing block.
The help that I need:
Can you suggest some easier way to get this job done?
If not, can you highlight any possible mistake in my approach? Does it really have to be this complicated?
Please help.
How can we detect rapid motion and object simultaneously, let me give an example,....
suppose there is one soccer match video, and i want to detect position of each and every players with maximum accuracy.i was thinking about human detection but if we see soccer match video then there is nothing with human detection because we can consider human as objects.may be we can do this with blob detection but there are many problems with blobs like:-
1) I want to separate each and every player. so if players will collide then blob detection will not help. so there will problem to identify player separately
2) second will be problem of lights on stadium.
so is there any particular algorithm or method or library to do this..?
i've seen some research paper but not satisfied...so suggest anything related to this like any article,algorithm,library,any method, any research paper etc. and please all express your views in this.
For fast and reliable human detection, Dalal and Triggs' Histogram of Gradients is generally accepted as very good. Have you tried playing with that?
Since you mentioned rapid motion changes, are you worried about fast camera motion or fast player/ball motion?
You can do 2D or 3D video stabilization to fix camera motion (try the excellent Deshaker plugin for VirtualDub).
For fast player motion, background subtraction or other blob detection will definitely help. You can use that to get a rough kinematic estimate and use that as an estimate of your blur kernel. This can then be used to deblur the image chip containing the player.
You can do additional processing to establish identify based upon OCRing jersey numbers, etc.
You mentioned concern about lights on the stadium. Is the main issue that it will cast shadows? That can be dealt with by the HOG detector. Blob detection to get blur kernel should still work fine with the shadow.
If you have control over the camera, you may want to reduce exposure times to reduce blur. Denoising techniques can be used to reduce CCD noise that occurs with extreme low light and dense optical flow approaches align the frames and boost the signal back up to something reasonable via adding the denoised frames.