I'm trying to do object detecting jobs using OpenCV. But there is something confuses me. Tracking & predicting algorithm like camshift and kalman filters can fulfill the task of tracking while SURF matching methods can also do that.
I don't quite understand the difference between the two approaches. I have done some codings based on feature2d (SURF is used) and motion_analysis_and_object_tracking (camshift is used) of OpenCV tutorial. It seems like they're just two means of one purpose. Am I right or am I missing out some concept?
And is it a good way to combine camshift tracking with SURF feature matching?...maybe more stuff can be applied, like contour matching?
Short answer is:
Detect interesting object using keypoint (SURF) or any other approach.
Get bounding rectangle of object and pass it as input for object tracker (e.g. CAMShift).
Use object tracker unless object will not lost.
Object tracking is process of finding the position of an object
using the information in previous frames. The difference between tracking and
detection is that while both the process localize the position of the object,
detection does not used any information from previous frames to localize the
object.
Look at "Object Tracking: A Survey" by Alper Yilmaz, Omar Javed and Mubarak Shah. This paper contains comprehensive overview of detection and tracking techniques.
Related
I'm in research for my final project, i want to make object detection and motion classification like amazon go, i have read lot of research like object detection with SSD or YOLO and video classification using CNN+LSTM, i want to propose training algorithm like this:
Real time detection for multiple object (in my case: person) with SSD/YOLO
Get the boundary object and crop the frame
Feed cropped frame info to CNN+LSTM algo to make motion prediction (if the person's walking/taking items)
is it possible to make it in real-time environment?
or is there any better method for real-time detection and motion classification
If you want to use it in real-time application, several other things must be considered which are not appeared before implementation of algorithm in real environment.
About your 3-step proposed method, it already could be result in a good method, but the first step would be very accurate. I think it is better to combine the 3 steps in one step. Because the motion type of person is a good feature of a person. Because of that, I think all steps could be gathered in one step.
My idea is as follows:
1. a video classification dataset which just tag the movement of person or object
2. cnn-lstm based video classification method
This would solve your project properly.
This answer need to more details, if u interested in, I can answer u in more details.
Had pretty much the same problem. Motion prediction does not work that well in complex real-life situations. Here is a simple one:
(See in action)
I'm building a 4K video processing tool (some examples). Current approach looks like the following:
do rough but super fast segmentation
extract bounding box and shape
apply some "meta vision magic"
do precise segmentation within identified area
(See in action)
As of now the approach looks way more flexible comparing to motion tracking.
"Meta vision" intended to properly track shape evolution:
(See in action)
Let's compare:
Meta vision disabled
Meta vision enabled
I am currently using a CNN based object detection module which gives me objects which I then use as input for tracking using OpenCV. The object detection module produced rectangles until now but I want to shift to a segmentation module like Mask-RCNN which outputs masks along with rectangles for each object. Masks are a more accurate representation of an object. All the trackers in OpenCV take rectangles as input. Is there any way to use the masks for tracking an object rather than the boxes. I can convert the masks to contours if that will help me track the object.
Sorry, there is no built-in out-of-box solution in OpenCV for active contour models.
This segmentation model is widely used on computer vision problems (was proposed by Kass on 1988 and is the starting point for other segmentation model based on energies like level sets models, geodesic active contours or fuzzy-snake model.
So, trying to perform the active contour segmentation on OpenCV, there are several solution, but I think you must understand the mathematical model in order to be able to set the parameters properly according to the context of application.
There is a nice implementation (a bit obfuscated) by Eric Yuan
And others implementation from SO, that could help you to link between theory and implementation:
Solution 1
Solution 2
My advice:
Read the original paper to understand the parameters.
Test some examples on Matlab to play a bit with parameters and results.
Test some of the implementation using OpenCV that are linked here.
Determine the best parameters for you problem context and test them.
Think about contributing to OpenCV with you results.
active contours can track using contours as input. https://www.ee.iitb.ac.in/uma/~krishnan/research.html
So you initialize the first frame using contour from cnn model and in subsequent frames, you don't need to call the expensive forward but able to update the contour to a new one based on this model.
I have a slight confusion defferentiating between object recognition and object detection. Some people say object detection is a sub-topic of object recognition? Can someone clarify the the difference between these two topics?
To the best of my knowledge.
Object Recognition is responding to the question "What is the object in the image"
Whereas,
Object detection is answering the question "Where is that object"?
Hope someone can illustrate the difference by also generously providing an example to each
There is not a clear answer to this in the literature and many authors give these two terms different meanings or use them interchangeably, depending on the application. If I remember correctly, Szeliski in "Computer vision: Algorithms and applications" defines them in a way similar to this:
Object detection: to notice there is an object in an image and to know where it is in the image. So, you can outline the object but you may not know what object it is.
Object (or instance) recognition: to actually say what object you have detected, and maybe providing additionally information, such as where the object is located in the 3D space.
In some applications, such as recognizing and object to grasp it with a robotic arm, the recognition is just a verification step done after a detection, so that if you are not able to recognize the object, you cannot verify the detection and discard it (because it may be a false positive). For this reason "detection" and "recognition" are used as the same task sometimes.
Object recognition - which object is in the given image (which contains an object alone).
Object detection - which object is in the given image (which depicts a scene containing more than one object and is generally taken without constraints of background or view point) and where is it located.
If we take face as subset of object => face detection is to detect a face in an image, and then face recognition is to recognize the face as Angelina Jolie for example.
When detecting objects using SURF, how can a plot a graph for false positives and hits using the Good matches and several keypoints?
(A) How do I get the statistics of good matches i.e an ROC plot or the true positives vs false positives of detection from so many of the line descriptors?Can somebody put a code for plotting true positves vs false positive statistics.
(B)**Secondly,there are many resources vdo1 , vdo2and implemetations, papers ( Object tracking using improved Camshift with SURF method ;
A Study on Moving Object Tracking Algorithm Using SURF Algorithm
and Depth Information
) which say that SURF and SIFT can be used for tracking in combination with camshift or meanshift.
But, what I fail to understand is that we need prediction algorithm like Kalman filters or tracking algorithm like Camshift,mean shift or template differencing(not sure) for tracking.So,how come some video implementations and tutorial say that Lukas Kanade Optical flow,SIFT,SURF is tracking objects whereas the papers mention to club either camshift or meanshift.Am I missing out on some conceptual matter?
Shall be obliged for pointers and a detailed explanation on how SURF or SIFT or feature based methods can be used for tracking alone?
Lucas-Kandae with pyramid (pyrLK) is a method that looks for a small shift in a single feature location. It can do this to many features at once. Camshift and meanshift track a statistic for a group of features. You can also just try to use a matcher, to find where the features went on the next frame. GoodFeturesToTrack, SIFT and SURF are algorithms that find points that should be easy to find and tell apart one from another. SURF and SIFT include also descriptors, that characterise those features in a way which can ignore size change, orientation change or both.
Kalman filter is used to refine Your results. It is able to shrink the area where the answer should lay, because algorithms above are not perfect.
As for the code, I haven't done too much tracking except Shi-Thomasi + pyrLK, so I dont't think I can help.
I would like to know, if there is any code or any good documentation available for implementing HOG features? I tried to read the documentation here but it's quite difficult to understand and it needs SVM..
What I need is just to implement a HOG detector for objects.... Like what it does SIFT or SURF
Btw, I'm not interesting in this work.
Thank you..
you can take a look at
http://szproxy.blogspot.com/2010/12/testtest.html
he also published "tutorial" for HOG on source forge here:
http://sourceforge.net/projects/hogtrainingtuto/?_test=beta
I know this since I'm having the same problem as you. The tutorial though isn't what i would call a tutorial, its a bunch of source codes, no documentation, but I assume that it works and can at least get you somewhere.
At the end and simplifying a bit, all that you need to detect specific objects in image is:
Localize "points of interest" to extract the patches:
In order to get points of interest, you can use some algorithms like Harris corner detector, randomly or something simply like sliding windows.
From these points get patches:
You will have to take the decission of the patch size.
From these patches compute the feature descriptor. (like HOG).
Instead of HOG you can use another feature descriptor like SIFT, SURF...
HOG's implementation is not too hard. You have to calculate the gradients of the extracted patch doing applying Sobel X and Y kernels, after that you have to divide the patch in NxM cells, 8x8 for instance, and compute an histogram of gradients, angle and magnitude. In the following link you can see it more detailed explanation:
HOG Person Detector Tutorial
Check your feature vector in the previously trained classifier
Once you got this vector, check if it is the desired object or not with a previously trained classifier like SMV. Instead SVM you could use NeuralNetworks for instance.
SVM implementation is more dificult, but there are some libraries like opencv that you can use.
There is a function extractHOGFeatures in the Computer Vision System Toolbox for MATLAB.