Does tracking mean nothing but linking optical flow vectors? - opencv

If I find optical flow between successive frames of a video and link all the optical flow vectors, does this mean I have implemented tracking?
If yes is this the simple way of tracking?

Tracking defines a computer vision task with the goal to track a given object, region of interesset or point in consequtive images of a video sequence. A very simple method would be based on the motion vectors estimated by an optical flow estimation method.
However, this would produce good results at very cooperative enviromental conditions. It would fail e.g. if the object get occluded. Recent state-of-the-art method are more robust and e.g. based on Kalman-Filter, Particel Filter or PHD filter technologies. This Survey on Object Tracking or Object tracking:A surveygives you an better overwiev of the challenges and solution of current object tracking methods.

Related

Lane tracking with a camera: how to get distance from camera to the lane?

i am doing final year project as lane tracking using a camera. the most challenging task now is how i can measure distance between the camera (the car that carries it actually) and the lane.
While the lane is easily recognized (Hough line transform) but i found no way to measure distance to it.
given the fact that there is a way to measure distance to object in front of camera based on Pixel width of the object, but it does not work here be because the nearest point of the line, is blind in the camera.
What you want is to directly infer the depth map with a monocular camera.
You can refer my answer here
https://stackoverflow.com/a/64687551/11530294
Usually, we need a photometric measurement from a different position in the world to form a geometric understanding of the world(a.k.a depth map). For a single image, it is not possible to measure the geometric, but it is possible to infer depth from prior understanding.
One way for a single image to work is to use a deep learning-based method to direct infer depth. Usually, the deep learning-based approaches are all based on python, so if you only familiar with python, then this is the approach that you should go for. If the image is small enough, i think it is possible for realtime performance. There are many of this kind of work using CAFFE, TF, TORCH etc. you can search on git hub for more option. The one I posted here is what i used recently
reference:
Godard, Clément, et al. "Digging into self-supervised monocular depth estimation." Proceedings of the IEEE international conference on computer vision. 2019.
Source code: https://github.com/nianticlabs/monodepth2
The other way is to use a large FOV video for a single camera-based SLAM. This one has various constraints such as need good features, large FOV, slow motion, etc. You can find many of this work such as DTAM, LSDSLAM, DSO, etc. There are a couple of other packages from HKUST or ETH that does the mapping given the position(e.g if you have GPS/compass), some of the famous names are REMODE+SVO open_quadtree_mapping etc.
One typical example for a single camera-based SLAM would be LSDSLAM. It is a realtime SLAM.
This one is implemented based on ROS-C++, I remember they do publish the depth image. And you can write a python node to subscribe to the depth directly or the global optimized point cloud and project it into a depth map of any view angle.
reference: Engel, Jakob, Thomas Schöps, and Daniel Cremers. "LSD-SLAM: Large-scale direct monocular SLAM." European conference on computer vision. Springer, Cham, 2014.
source code: https://github.com/tum-vision/lsd_slam

How to track Fast Moving Objects?

I'm trying to create an application that will be able to track rapidly moving objects in video/camera feed, however have not found any CV/DL solution that is good enough. Can you recommend any computer vision solution for tracking fast moving objects on regular laptop computer and web cam? A demo app would be ideal.
For example see this video where the tracking is done in hardware (I'm looking for software solution) : https://www.youtube.com/watch?v=qn5YQVvW-hQ
Target tracking is a very difficult problem. In target tracking you will have two main issues: the motion uncertainty problem, and the origin uncertainty problem. The first one refers to the way you model object motion so you can predict its future state, and the second refers to the issue of data association(what measurement corresponds to what track, and the literature is filled with scientific ways in which this issue can be approached).
Before you can come up with a solution to your problem you will have to answer some questions yourself, regarding the tracking problem you want to solve. For example: what are the values that you what to track(this will define your state vector), how are those values related to one another, are you trying to perform single object tracking or multiple object tracking, how are the objects moving( do they have a relatively constant acceleration or velocity ) or not, do objects make turns, can objects also be occluded or not and so on.
The Kalman Filter is good solution to predict the next state of your system (once you have identified your process model). A deep learning alternative to the Kalman filter is the so called Deep Kalman Filter which essentially is used to do the same thing. In case your process or measurement models are not linear, you will have to linearize them before predicting the next state. Some solutions that deal with non-linear process or measurement models are the Extended Kalman Filter (EKF) or Unscented Kalman Filter (UKF).
Now related to fast moving objects, an idea you can use is to have a larger covariance matrix since the objects can move a lot more if they are fast, so the search space for the correct association has to be a bit larger. Additionally you can use multiple motion models in case your motion model cannot be satisfied with only one model.
In case of occlusions I will leave you this stack overflow thread, where I have given an answer covering more details regarding occlusion handling in case of tracking. I have added some references for you to read. You will have to provide more details in your question, if you would like to receive more information regarding a solution (for example you should define fast moving objects with respect to camera frame rate).
I personally do not think there is a silver bullet solution for the tracking problem, I prefer to tailor a solution to the problem I am trying to solve.
The tracking problem is complicated. It is also more in the realm of control systems than computer vision. It would be also helpful to know more about your situation, as the performance of the chosen method pretty much depends on your problem constraints. Are you interested in real-time tracking? Are you trying to reconstruct an existing trajectory? Are there multiple targets? Just one? Are the physical properties of the targets (i.e. velocity, direction, acceleration) constant?
One of the most basic tracking methods is implemented by a Linear Dynamic System (LDS) description, in concrete, a discrete implementation, since we’re working with discrete frames of information. This method is purely based on physics, and its prediction is very sensitive. Depending on your application, the error rate could be acceptable… or not.
A more robust solution is Kalman’s Filter, and it is pretty much the go-to answer when tracking is needed. It implements prediction based on all the measurements obtained so far during the model's lifetime. It mainly works on constant-based measurements (velocity and acceleration) although it can be extended to handle non-constant models. If you are working with targets that won't exhibit a drastic change in their velocity, this is what you (probably) should implement.
I'm sorry I can't provide you with more, but the topic is pretty extensive and, admittedly, the details are beyond my area of expertise. Hopefully, this info should give you a little bit of context for finding a solution.
The problem of tracking fast-moving objects (FMO) is a known research topic in computer vision. FMOs are defined as objects which move over a distance larger than their size in one video frame. The solutions which have been proposed use classical image processing and energy minimization to establish their trajectories and sharp appearance.
If you need a demo app, I would suggest this GitHub repository: https://github.com/rozumden/fmo-cpp-demo. The demo is written in OpenCV/C++ and runs in real-time. The authors also provide a mobile app version, which is still in testing mode. Using this demo app you can track any fast moving objects in real-time without even providing an object model. However, if you provide object size in real-world units, the app can also estimate object speed.
A more sophisticated algorithm is open-sourced here: https://github.com/rozumden/deblatting_python, written in Python and PyTorch for speed-up. The repository contains a solution to the deblatting (deblurring and matting) problem, exactly what happens when a Fast Moving Object appears in front of a camera.

Implementation of state-of-the-art video shot boundary detection

I am working on a wide project involving object retrieval from videos.
According to "A Survey on Visual Content-Based Video Indexing and Retrieval", most popular methods are divided in:
simple "threshold based approach" (global or adaptative)
supervised learning-based classifiers (with SVM or AdaBoost)
unsupervised learning-based algorithm (mainly K-Means)
For the moment I implemented on my own a very simple and old-fashioned method based on differences of color-histograms among successive frames.
Nevertheless, I would like to try something more efficient and up to date, without spending too much time, considering that shot boundary detection is not the main topic of my research.
Does anybody know an implementation of an effective algorithm?
I've found the following implementation extremely useful and effective in my own research:
https://github.com/johmathe/Shotdetect
The workhorse of the codebase happens here:
https://github.com/johmathe/Shotdetect/blob/master/src/film.cc#L117-237
It mostly relies on color information for detection of shots.
Computing Chi-Square distance of RGB histograms of adjacent frames is one of the fast, simple and robust methods for shot boundary detection. You can see my implementation and usage of this method in here.

opencv how to track objects after optical flow?

After doing optical flow (lk) on a video what's the best way to find the objects based on this data and track them?
This probably sounds very noobish, but I would like to be able to define a clear outline around objects, so if it's a weirdly shaped bottle or something to be able to detect the edges.
I'm not sure LK is the best algorithm, since it computes the motion of a sparse set of corner-like points, and tracking behaves usually better from a dense optical flow result (such as Farneback or Horn Schunck). After computing the flow, as a first step, you can do some thresholding on its norm (to retain the moving parts), and try to extract connected regions from this result. But be warned that your tasks is not going to be easy if you don't have a model of the object you want to track.
On the other hand, if you are primarily interested in tracking and a bit of interactivity is acceptable, you can have a look at the camshift sample code to see how to select and track an image region based on its appearance.
--- EDIT ---
If your camera is static, then use background subtraction instead. Using OpenCV 2.4 beta, you have to look for the class BackgroundSubtractor and its subclasses in the video module documentation.
Note also that optical flow can be realtime (or not very far) with good choices of parameters, and also with GPU implementation. On windows, you can use flowlib from TU Graz/Gpu4Vision group. OpenCV also has some GPU dense optical flow, for example the class gpu::BroxOpticalFlow.
--- EDIT 2 ---
Joining single-pixel detections into big objects is a task called connected component labelling. There is a fast algorithm for that, implemented in OpenCV. So this gives you a pipeline which is:
motion detection (pix level) ---> connected comp. labeling ---> object tracking (adding motion information, possible trajectories for Kalman filtering...).
But we'll have to stop here, because we'll soon be far beyond the scope of your initial question ;-)
You can use TLD or CLM for doing object tracking (it is loosely based on the the idea of optical flow tracking and model learning at the same time).
You can find following links useful
https://www.gnebehay.com/tld/
https://www.gnebehay.com/cmt/

Simple technique to upsample/interpolate video features?

I'm trying to analyse audio and visual features in tandem. My audio speech features are mel-frequency cepstrum co-efficients sampled at 100fps using the Hidden Markov Model Toolkit. My visual features come from a lip-tracking programme I built and are sampled at 29.97fps.
I know that I need to interpolate my visual features so that the sample rate is also 100fps, but I can't find a nice explanation or tutorial on how to do this online. Most of the help I have found comes from the speech recognition community which assumes a knowledge of interpolation on behalf of the reader, i.e. most cover the step with a simple "interpolate the visual features so that the sample rate equals 100fps".
Can anyone pooint me in the right direction?
Thanks a million
Since face movement is not low-pass filtered prior to video capture, most of the classic DSP interpolation methods may not apply. You might as well try linear interpolation of your features vectors to get from one set of time points to a set at a different set of time points. Just pick the 2 closest video frames and interpolate to get more data points in between. You could also try spline interpolation if your facial tracking algorithm measures accelerations in face motion.

Resources