After doing optical flow (lk) on a video what's the best way to find the objects based on this data and track them?
This probably sounds very noobish, but I would like to be able to define a clear outline around objects, so if it's a weirdly shaped bottle or something to be able to detect the edges.
I'm not sure LK is the best algorithm, since it computes the motion of a sparse set of corner-like points, and tracking behaves usually better from a dense optical flow result (such as Farneback or Horn Schunck). After computing the flow, as a first step, you can do some thresholding on its norm (to retain the moving parts), and try to extract connected regions from this result. But be warned that your tasks is not going to be easy if you don't have a model of the object you want to track.
On the other hand, if you are primarily interested in tracking and a bit of interactivity is acceptable, you can have a look at the camshift sample code to see how to select and track an image region based on its appearance.
--- EDIT ---
If your camera is static, then use background subtraction instead. Using OpenCV 2.4 beta, you have to look for the class BackgroundSubtractor and its subclasses in the video module documentation.
Note also that optical flow can be realtime (or not very far) with good choices of parameters, and also with GPU implementation. On windows, you can use flowlib from TU Graz/Gpu4Vision group. OpenCV also has some GPU dense optical flow, for example the class gpu::BroxOpticalFlow.
--- EDIT 2 ---
Joining single-pixel detections into big objects is a task called connected component labelling. There is a fast algorithm for that, implemented in OpenCV. So this gives you a pipeline which is:
motion detection (pix level) ---> connected comp. labeling ---> object tracking (adding motion information, possible trajectories for Kalman filtering...).
But we'll have to stop here, because we'll soon be far beyond the scope of your initial question ;-)
You can use TLD or CLM for doing object tracking (it is loosely based on the the idea of optical flow tracking and model learning at the same time).
You can find following links useful
https://www.gnebehay.com/tld/
https://www.gnebehay.com/cmt/
Related
I'm trying to create an application that will be able to track rapidly moving objects in video/camera feed, however have not found any CV/DL solution that is good enough. Can you recommend any computer vision solution for tracking fast moving objects on regular laptop computer and web cam? A demo app would be ideal.
For example see this video where the tracking is done in hardware (I'm looking for software solution) : https://www.youtube.com/watch?v=qn5YQVvW-hQ
Target tracking is a very difficult problem. In target tracking you will have two main issues: the motion uncertainty problem, and the origin uncertainty problem. The first one refers to the way you model object motion so you can predict its future state, and the second refers to the issue of data association(what measurement corresponds to what track, and the literature is filled with scientific ways in which this issue can be approached).
Before you can come up with a solution to your problem you will have to answer some questions yourself, regarding the tracking problem you want to solve. For example: what are the values that you what to track(this will define your state vector), how are those values related to one another, are you trying to perform single object tracking or multiple object tracking, how are the objects moving( do they have a relatively constant acceleration or velocity ) or not, do objects make turns, can objects also be occluded or not and so on.
The Kalman Filter is good solution to predict the next state of your system (once you have identified your process model). A deep learning alternative to the Kalman filter is the so called Deep Kalman Filter which essentially is used to do the same thing. In case your process or measurement models are not linear, you will have to linearize them before predicting the next state. Some solutions that deal with non-linear process or measurement models are the Extended Kalman Filter (EKF) or Unscented Kalman Filter (UKF).
Now related to fast moving objects, an idea you can use is to have a larger covariance matrix since the objects can move a lot more if they are fast, so the search space for the correct association has to be a bit larger. Additionally you can use multiple motion models in case your motion model cannot be satisfied with only one model.
In case of occlusions I will leave you this stack overflow thread, where I have given an answer covering more details regarding occlusion handling in case of tracking. I have added some references for you to read. You will have to provide more details in your question, if you would like to receive more information regarding a solution (for example you should define fast moving objects with respect to camera frame rate).
I personally do not think there is a silver bullet solution for the tracking problem, I prefer to tailor a solution to the problem I am trying to solve.
The tracking problem is complicated. It is also more in the realm of control systems than computer vision. It would be also helpful to know more about your situation, as the performance of the chosen method pretty much depends on your problem constraints. Are you interested in real-time tracking? Are you trying to reconstruct an existing trajectory? Are there multiple targets? Just one? Are the physical properties of the targets (i.e. velocity, direction, acceleration) constant?
One of the most basic tracking methods is implemented by a Linear Dynamic System (LDS) description, in concrete, a discrete implementation, since we’re working with discrete frames of information. This method is purely based on physics, and its prediction is very sensitive. Depending on your application, the error rate could be acceptable… or not.
A more robust solution is Kalman’s Filter, and it is pretty much the go-to answer when tracking is needed. It implements prediction based on all the measurements obtained so far during the model's lifetime. It mainly works on constant-based measurements (velocity and acceleration) although it can be extended to handle non-constant models. If you are working with targets that won't exhibit a drastic change in their velocity, this is what you (probably) should implement.
I'm sorry I can't provide you with more, but the topic is pretty extensive and, admittedly, the details are beyond my area of expertise. Hopefully, this info should give you a little bit of context for finding a solution.
The problem of tracking fast-moving objects (FMO) is a known research topic in computer vision. FMOs are defined as objects which move over a distance larger than their size in one video frame. The solutions which have been proposed use classical image processing and energy minimization to establish their trajectories and sharp appearance.
If you need a demo app, I would suggest this GitHub repository: https://github.com/rozumden/fmo-cpp-demo. The demo is written in OpenCV/C++ and runs in real-time. The authors also provide a mobile app version, which is still in testing mode. Using this demo app you can track any fast moving objects in real-time without even providing an object model. However, if you provide object size in real-world units, the app can also estimate object speed.
A more sophisticated algorithm is open-sourced here: https://github.com/rozumden/deblatting_python, written in Python and PyTorch for speed-up. The repository contains a solution to the deblatting (deblurring and matting) problem, exactly what happens when a Fast Moving Object appears in front of a camera.
currently I am struggling in implementing an algorithm to locate an object in an image. Suppose I have 100 training sets that all have a cat in there and each has the correct coordinate of each cat. My first idea is to create a fixed-sized square and traverse through the image. For each collection of pixels contained in the square, we can use it as a point for Support Vector Machine algorithm.
The problem is that I am not sure how to do that since usually, each point represents each class (has object or has no object) and it usually has a simple d features, while for this case, it has a dx3 matrix as its features (each features has RGB value).
A simple help will be welcome, thanks!
If I have understood your question well, applying machine learning for image Processing and computer vision is a bit different from Other kinds of problems. main difference is that you should somehow overcome the issue of locality and scale. does all kitties always appear in the specific coordinate (x,y) ?! of course not! they can be anywhere in the scene. so how is it possible to give a specific point to SVM for an object? it will not be generalized at all. This is the reason almost all basic operation in computer vision has something to do with convolution operation to extract features independent from their location. A pixel alone carries zero useful information, you need to analyse groups of pixels. there are 2 approach you can take:
classic methods:
use OpenCV to perform noise removal, edge detection, feature extraction using methods like SIFT and feed those features to a model like SVM, not the raw unprocessed pixels. feature extraction means to reach from d features to k more meaningful representation of inputs where usually (k < d) if not always.
Deep Learning:
Convolutional Neural networks(CNNs) Have shed lights on many Computer vision tasks which were far beyond reach until recently and more importantly with frameworks like Keras and tensorflow most problems in computer vision is just programming tasks to be honest and doesn't require much knowledge as one needed before. because (CNNs) extract features themselves and you don't need to do the feature engineering anymore which requires a well educated and knowledgeable person on the task.
so, choose whatever method you see fit for kitty detection =^.^= .
If I find optical flow between successive frames of a video and link all the optical flow vectors, does this mean I have implemented tracking?
If yes is this the simple way of tracking?
Tracking defines a computer vision task with the goal to track a given object, region of interesset or point in consequtive images of a video sequence. A very simple method would be based on the motion vectors estimated by an optical flow estimation method.
However, this would produce good results at very cooperative enviromental conditions. It would fail e.g. if the object get occluded. Recent state-of-the-art method are more robust and e.g. based on Kalman-Filter, Particel Filter or PHD filter technologies. This Survey on Object Tracking or Object tracking:A surveygives you an better overwiev of the challenges and solution of current object tracking methods.
I am working on an open source package for robot owners. I want to do a decent job of detecting when the robot is having movement problems. One of the problems the robot commonly has is that the back wheel gets "tucked underneath" in a bad way and makes it turn very slowly when on carpet. I believe that with a combination of accelerometer value inspection and (I hope) a relatively simple yet robust vision analysis technique, I will be able to tell when the robot is having this specific problem.
What I need is to be able to analyze two images, separated by about 1/2 second in time, and get a numerical value that tells about how close they are, but in a way that has some intelligence about the objects in the screen instead of just a simple color/hue/etc. analysis. I've heard of an algorithm called optical flow that is used in object and scene tracking, but I'm hoping I don't need something heavyweight.
Is there an machine vision algorithm/function that can analyze two JPEG's and tell if they belong to the same scene and viewpoint, yet can also deliver a numerical monotonically increasing value that tells me rough how different they are? If I could get that numerical value and compare it to the number of milliseconds past, while examining the current accelerometer activity, I believe I can detect when the robot is having the "slow turn of death" problem.
If so, please tell me the basic technique involved, and if you know of machine vision library that implements it, which one it is.
but in a way that has some intelligence about the objects in the screen instead of just a simple color/hue/etc. analysis
What you are suggesting is a complex problem by itself, so forget about 'lightweight' solutions. Probably you are going to need something like optical flow.
Other options I would recommend you looking into are:
Vanishing points detection and variation from image to image. This quite fits into your problem domain Wikipedia
Disparity map: related to optical flow. Used for stereographic vision, but I think you can use it for the kind of application you are looking for. Take a look at this
I want to do pedestrian detection and tracking.
Input: Video Stream from CCTV camera.
Output:
#(no of) people going from left to right
# people going from right to left
# No. of people in the middle
What have i done so far:
For pedestrian detection I am using HOG and SVM. The detection is decent with high false positive rate. And its very slow as i am running in android platform.
Question:
After detection how to do I calculate the required values listed above. Can anyone tell me what is the tracking algorithm I have to use and any good algorithm for pedestrian detection.
Or should I use tracking algorithm? Is there a way to do without it?
Any references to codes/blogs/technical papers is appreciated.
Platform: C++ & OpenCV / android.
--Thanks
This is somehow close to a research problem.
You may want to have a look to this website which gathers a lot of references.
In particular, the work done by the group from Oxford present therein is pretty close to what you are doing, since their are using HOG for detection. (That work has been extremely illuminating for me).
EPFL and Julich have as well work done in the field.
You may also want to give a look to this review which describes several detection/tracking techniques, often involving variants of the HOG algorithm.
Along with #Acorbe response, I suggest the publications section of this (archived) website.
A recent work at the end of last year also released a code base here:
https://bitbucket.org/rodrigob/doppia
There have also been earlier pedestrian detector works that have released code as well:
https://sites.google.com/site/wujx2001/home/c4
http://www.vision.caltech.edu/Image_Datasets/CaltechPedestrians
The best accurate way is to use tracking algorithm instead of statistic appearance counting of incoming people and detection occurred left right and middle..
You can use extended statistical models.. That produce how many inputs producing one of the outputs and back validate from output detection the input.
My experience is that tracking leads to better results than approach above. But is also little bit complicated. We talk about multi target tracking when the critical is match detection with tracked model which should be update based on detection. If tracking is matched with wrong model. The problems are there.
Here on youtube I developed some multi target tracker by simple LBP people detector, but multi model and kalman filter for tracking. Both capabilities are available in opencv. You need to when something is detected create new kalman filter for each object and update in case you match same detection. Predict in case detection is not here in frame and also remove the Kalman i it is not necessary to track any more.
1 Detect
2 Match detections with kalmans, hungarian algorithm and l2 norm. (for example)
3 Lot of work. Decide if kalman shoudl be established, remove, update, or results is not detected and should be predicted. This is lot of work here.
Pure statistic approach is less accurate, second one is for experience people at least one moth of coding and 3 month of tuning.. If you need to be faster and your resources are quite limited. You can by smart statistic achieve your results by pure detection much faster and little bit less accurate. People are judge the image and video tracking even multi target tracking is capable to beat human. Try to count and register each person in video and count exits point. You are not able to do this in some number of people. It is really repents on, what you want, application, customer you have, and results you show to customers. If this is 4 numbers income, left, right, middle and your error is 20 percent is still much more than one bored small paid guard should achieved by all day long counting..
https://www.youtube.com/watch?v=d-RCKfVjFI4
You can find on my BLOG Some dataset for people detection and car detection on my blog same as script for learning ideas, tutorials and tracking examples..
Opencv blog tutorials code and ideas
You can use KLT for this purpose as this will tell you the flow of person traveling from left to right then you can compute that by computing line length which in given example is drawn using cv2.line you can use input parameters of this functions to compute your case, little math involved. if there is a flow of pixels from left to right this is case 1 or right to left then case 3 and for no flow case 2. Or you can use this basic tutorial to track object movement. LINK