I am using Opencv (Python) default trackers in an object detection and tracking task. As far as I know, we need to initialize some bounding boxes in order to make our trackers work so a detection algorithm will handle this task for us. The problem is if one (or even more) object was not detected in the initializing phase, the trackers will have to deal with it. The simplest solution is to perform the detection algorithm several times through frames to make sure that all objects are included, which is probably very computationally expensive.
I wonder if there exists an effective way to deal with this problem?
Most people do it manually on the first frame using the cv2.selectROI() method, which returns a bounding box. This is described in some detail here:
https://www.learnopencv.com/multitracker-multiple-object-tracking-using-opencv-c-python/
For cases where that is not practical (e.g., thousands of objects), you can automate the object detection part, and just make sure you find a good frame to start with that gets things right.
Related
I'm trying to create an application that will be able to track rapidly moving objects in video/camera feed, however have not found any CV/DL solution that is good enough. Can you recommend any computer vision solution for tracking fast moving objects on regular laptop computer and web cam? A demo app would be ideal.
For example see this video where the tracking is done in hardware (I'm looking for software solution) : https://www.youtube.com/watch?v=qn5YQVvW-hQ
Target tracking is a very difficult problem. In target tracking you will have two main issues: the motion uncertainty problem, and the origin uncertainty problem. The first one refers to the way you model object motion so you can predict its future state, and the second refers to the issue of data association(what measurement corresponds to what track, and the literature is filled with scientific ways in which this issue can be approached).
Before you can come up with a solution to your problem you will have to answer some questions yourself, regarding the tracking problem you want to solve. For example: what are the values that you what to track(this will define your state vector), how are those values related to one another, are you trying to perform single object tracking or multiple object tracking, how are the objects moving( do they have a relatively constant acceleration or velocity ) or not, do objects make turns, can objects also be occluded or not and so on.
The Kalman Filter is good solution to predict the next state of your system (once you have identified your process model). A deep learning alternative to the Kalman filter is the so called Deep Kalman Filter which essentially is used to do the same thing. In case your process or measurement models are not linear, you will have to linearize them before predicting the next state. Some solutions that deal with non-linear process or measurement models are the Extended Kalman Filter (EKF) or Unscented Kalman Filter (UKF).
Now related to fast moving objects, an idea you can use is to have a larger covariance matrix since the objects can move a lot more if they are fast, so the search space for the correct association has to be a bit larger. Additionally you can use multiple motion models in case your motion model cannot be satisfied with only one model.
In case of occlusions I will leave you this stack overflow thread, where I have given an answer covering more details regarding occlusion handling in case of tracking. I have added some references for you to read. You will have to provide more details in your question, if you would like to receive more information regarding a solution (for example you should define fast moving objects with respect to camera frame rate).
I personally do not think there is a silver bullet solution for the tracking problem, I prefer to tailor a solution to the problem I am trying to solve.
The tracking problem is complicated. It is also more in the realm of control systems than computer vision. It would be also helpful to know more about your situation, as the performance of the chosen method pretty much depends on your problem constraints. Are you interested in real-time tracking? Are you trying to reconstruct an existing trajectory? Are there multiple targets? Just one? Are the physical properties of the targets (i.e. velocity, direction, acceleration) constant?
One of the most basic tracking methods is implemented by a Linear Dynamic System (LDS) description, in concrete, a discrete implementation, since we’re working with discrete frames of information. This method is purely based on physics, and its prediction is very sensitive. Depending on your application, the error rate could be acceptable… or not.
A more robust solution is Kalman’s Filter, and it is pretty much the go-to answer when tracking is needed. It implements prediction based on all the measurements obtained so far during the model's lifetime. It mainly works on constant-based measurements (velocity and acceleration) although it can be extended to handle non-constant models. If you are working with targets that won't exhibit a drastic change in their velocity, this is what you (probably) should implement.
I'm sorry I can't provide you with more, but the topic is pretty extensive and, admittedly, the details are beyond my area of expertise. Hopefully, this info should give you a little bit of context for finding a solution.
The problem of tracking fast-moving objects (FMO) is a known research topic in computer vision. FMOs are defined as objects which move over a distance larger than their size in one video frame. The solutions which have been proposed use classical image processing and energy minimization to establish their trajectories and sharp appearance.
If you need a demo app, I would suggest this GitHub repository: https://github.com/rozumden/fmo-cpp-demo. The demo is written in OpenCV/C++ and runs in real-time. The authors also provide a mobile app version, which is still in testing mode. Using this demo app you can track any fast moving objects in real-time without even providing an object model. However, if you provide object size in real-world units, the app can also estimate object speed.
A more sophisticated algorithm is open-sourced here: https://github.com/rozumden/deblatting_python, written in Python and PyTorch for speed-up. The repository contains a solution to the deblatting (deblurring and matting) problem, exactly what happens when a Fast Moving Object appears in front of a camera.
I am working on an open source package for robot owners. I want to do a decent job of detecting when the robot is having movement problems. One of the problems the robot commonly has is that the back wheel gets "tucked underneath" in a bad way and makes it turn very slowly when on carpet. I believe that with a combination of accelerometer value inspection and (I hope) a relatively simple yet robust vision analysis technique, I will be able to tell when the robot is having this specific problem.
What I need is to be able to analyze two images, separated by about 1/2 second in time, and get a numerical value that tells about how close they are, but in a way that has some intelligence about the objects in the screen instead of just a simple color/hue/etc. analysis. I've heard of an algorithm called optical flow that is used in object and scene tracking, but I'm hoping I don't need something heavyweight.
Is there an machine vision algorithm/function that can analyze two JPEG's and tell if they belong to the same scene and viewpoint, yet can also deliver a numerical monotonically increasing value that tells me rough how different they are? If I could get that numerical value and compare it to the number of milliseconds past, while examining the current accelerometer activity, I believe I can detect when the robot is having the "slow turn of death" problem.
If so, please tell me the basic technique involved, and if you know of machine vision library that implements it, which one it is.
but in a way that has some intelligence about the objects in the screen instead of just a simple color/hue/etc. analysis
What you are suggesting is a complex problem by itself, so forget about 'lightweight' solutions. Probably you are going to need something like optical flow.
Other options I would recommend you looking into are:
Vanishing points detection and variation from image to image. This quite fits into your problem domain Wikipedia
Disparity map: related to optical flow. Used for stereographic vision, but I think you can use it for the kind of application you are looking for. Take a look at this
I'm starting a search to implement a system that must count people flow of some place.
The final idea is to have something like http://www.youtube.com/watch?v=u7N1MCBRdl0 . I'm working with OpenCv to start creating it, I'm reading and studying about. But I'd like to know if some one can give me some hints of source code exemples, articles and anything elese that can make me get faster on my deal.
I started with blobtrack.exe sample to study, but I got not good results.
Tks in advice.
Blob detection is the correct way to do this, as long as you choose good threshold values and your lighting is even and consistent; but the real problem here is writing a tracking algorithm that can keep track of multiple blobs, being resistant to dropped frames. Basically you want to be able to assign persistent IDs to each blob over multiple frames, keeping in mind that due to changing lighting conditions and due to people walking very close together and/or crossing paths, the blobs may drop out for several frames, split, and/or merge.
To do this 'properly' you'd want a fuzzy ID assignment algorithm that is resistant to dropped frames (ie blob ID remains, and ideally predicts motion, if the blob drops out for a frame or two). You'd probably also want to keep a history of ID merges and splits, so that if two IDs merge to one, and then the one splits to two, you can re-assign the individual merged IDs to the resulting two blobs.
In my experience the openFrameworks openCv basic example is a good starting point.
I'll not put this as the right answer.
It is just an option for those who are able to read in Portugues or can use a translator. It's my graduation project and there is the explanation of a option to count people in it.
Limitations:
It's do not behave well on envirionaments that change so much the background light.
It must be configured for each location that you will use it.
Advantages:
It's fast!
I used OpenCV to do the basic features as, capture screen, go trough the pixels, etc. But the algorithm to count people was done by my self.
You can check it on this paper
Final opinion about this project: It's not prepared to go alive, to became a product. But it works very well as base for study.
I am working on a computer vision application and I am stuck at a conceptual roadblock. I need to recognize a set of logos in a video, and so far I have been using feature matching methods like SIFT (and ASIFT by Yu and Morel), SURF, FERNS -- basically everything in the "Common Interfaces of Generic Descriptor Matchers" section of the OpenCV documentation. But recently I have been researching methods used in OCR/Random Trees classifier (I was playing with this dataaset: http://archive.ics.uci.edu/ml/datasets/Letter+Recognition) and thinking that this might be a better way to go about finding the logos. The problem is that I can't find a reliable way to automatically segment an arbitrary image.
My questions:
Should I bother looking into methods other than descriptor/keypoint, or is this the
best way to recognize a typical logo (stylized, few colors, sharp edges)?
How can I segment an arbitary image (or a video frame, in my case) so that I can properly
match against a sample database?
It would seem that HaarCascades work in a similar way (databases of samples), but I
can't figure out how the processes are related. Is there segmentation going on there?
Sorry of these questions are too broad. I'm trying to wrap my head around this stuff with little help. Thanks!
It seems like segmentation is not what you want. I think it has to do more with object detection and recognition. You want to detect the presence of a certain set of logos, in a certain set of images. This doesn't seem related to segmentation which is about labeling surfaces or areas of a common color, texture, shape, etc., although examining segmentation based methods may be useful.
I would definitely encourage you to look at problem and examine all possible methods that can be applied, not only the fashionable ones (such as SIFT, GLOH, SURF, etc). I would recommend you look at older, simpler methods like simple template matching, chamfering, etc.
Haar cascades became popular after a 2000 paper by Viola and Jones used for face detection (similar to what you see in modern point and click cameras). It does sound a bit similar to the problem you are interested in. You should perhaps also examine this part of the problem, but try not to focus too much on the learning part.
I had an idea for which I need to be able to recognize certain objects or models from a rendered three dimensional digital movie.
After limited research, I know now that what I need is called feature detection in the field of Computer Vision.
So, what I want to do is:
create a few screenshots of a certain character in the movie (eg. front/back/leftSide/rightSide)
play the movie
while playing the movie, continuously create new screenshots of the movie
for each screenshot, perform feature detection (SIFT?, with openCV?) to see if any of our character appearances are there (they must still be recognized if the character is further away and thus appears smaller, or if the character is eg. lying down).
give a notice whenever the character is found
This would be possible with OpenCV, right?
The "issue" is that I would have to learn c++ or python to develop this application. This is not a problem if my movie and screenshots are applicable for what I want to do.
So, I would like to first test my screenshots of the movie. Is there a GUI version of OpenCV that I can input my test data and then execute it's feature detection algorithms manually as a means of prototyping?
Any feedback is appreciated. Thanks.
There is no GUI of OpenCV able to do what you want. You will be able to use OpenCV for some aspects of your problem, but there is no ready-made solution waiting there for you.
While it's definitely possible to solve your problem, the learning curve for this problem is quite long. If you're a professional, then an alternative to learning about it yourself would be to hire an expert to do it for you. It would cost money, but save you time.
EDIT
As far as template matching goes, you wouldn't normally use it to solve such a problem because the thing you're looking for is changing appearance and shape. There aren't really any "dynamic parameters to set". The closest thing you could try is have a massive template collection that would try to cover the expected forms that your target may take. But it would hardly be an elegant solution. Plus it wouldn't scale.
Next, to your point about face recognition. This is kind of related, but most facial recognition applications deal with a controlled environment: lighting, distance, pose, angle, etc. Outside of that controlled environment face detection effectiveness drops significantly. If you're detecting objects in a movie, then your environment isn't really controlled.
You may want to first try a simpler problem of accurately detecting where the characters are, without determining who they are (video surveillance, essentially). While it may sound simple, you'll find that it's actually non-trivial for arbitrary scenes. The result of solving that problem may be useful in identifying the characters.
There is Find-Object by Mathieu Labbé. It was very helpful for me to start getting an understanding of the descriptors since you can change them while your video is running to see what happens.
This is probably too late, but might help someone else looking for a solution.
Well, using OpenCV you would of taking a frame of a video file and do any computations on it.
You can do several different methods of detecting a character on that image, but it's not so easy to have it as flexible so you can even get that person if it's lying on the floor for example, if you only entered reference images of that character standing.
Basically you could try extracting all important features from your set of reference pictures and have a (in your case supervised) learning algorithm that gets a good feature-vector of that character for classification.
You then need to write your code that plays the video and which takes a video frame let's say each 500ms (or other as you desire), gets a segmentation of the object you thing would be that character and compare it with the reference values you get from your learning algorithm. If there's a match, your code can yell "Yehaaawww!" or do other things...
But all this depends on how flexible you want this to be. You could also try a template match or cross-correlation which basically shifts the reference image(s) over the frame and checks how equal both parts are. But this unfortunately is very sensitive for rotation, deformations or other noise... so you wouldn't get that person if its i.e. laying down. And I doubt you can get all those calculations done in realtime...
Basically: Yes OpenCV is good to use for your image processing/computer vision tasks. But it offers a lot of methods and ways and you'd need to find a way that works for your images... it's not a trivial task though...
Hope that helps...
Have you tried looking at some of the work of the Oxford visual geometry group?
Their Video Google system describes to a large extent what you want, instance detection.
Their work into Naming People in TV shows is also pretty relevant. A face detection and facial feature pipeline is included that can be run from Matlab. Are you familiar with Matlab?
Have you tried computer vision frameworks like Cassandra? There you can exactly do that just by some mouse clicks.