I'm starting a search to implement a system that must count people flow of some place.
The final idea is to have something like . I'm working with OpenCv to start creating it, I'm reading and studying about. But I'd like to know if some one can give me some hints of source code exemples, articles and anything elese that can make me get faster on my deal.
I started with blobtrack.exe sample to study, but I got not good results.
Blob detection is the correct way to do this, as long as you choose good threshold values and your lighting is even and consistent; but the real problem here is writing a tracking algorithm that can keep track of multiple blobs, being resistant to dropped frames. Basically you want to be able to assign persistent IDs to each blob over multiple frames, keeping in mind that due to changing lighting conditions and due to people walking very close together and/or crossing paths, the blobs may drop out for several frames, split, and/or merge.
To do this 'properly' you'd want a fuzzy ID assignment algorithm that is resistant to dropped frames (ie blob ID remains, and ideally predicts motion, if the blob drops out for a frame or two). You'd probably also want to keep a history of ID merges and splits, so that if two IDs merge to one, and then the one splits to two, you can re-assign the individual merged IDs to the resulting two blobs.
In my experience the openFrameworks openCv basic example is a good starting point.

It is just an option for those who are able to read in Portugues or can use a translator. It's my graduation project and there is the explanation of a option to count people in it.
It's do not behave well on envirionaments that change so much the background light.
It must be configured for each location that you will use it.
It's fast!
I used OpenCV to do the basic features as, capture screen, go trough the pixels, etc. But the algorithm to count people was done by my self.
You can check it on this paper
Final opinion about this project: It's not prepared to go alive, to became a product. But it works very well as base for study.


How to track Fast Moving Objects?

I'm trying to create an application that will be able to track rapidly moving objects in video/camera feed, however have not found any CV/DL solution that is good enough. Can you recommend any computer vision solution for tracking fast moving objects on regular laptop computer and web cam? A demo app would be ideal.
For example see this video where the tracking is done in hardware (I'm looking for software solution) :
Target tracking is a very difficult problem. In target tracking you will have two main issues: the motion uncertainty problem, and the origin uncertainty problem. The first one refers to the way you model object motion so you can predict its future state, and the second refers to the issue of data association(what measurement corresponds to what track, and the literature is filled with scientific ways in which this issue can be approached).
Before you can come up with a solution to your problem you will have to answer some questions yourself, regarding the tracking problem you want to solve. For example: what are the values that you what to track(this will define your state vector), how are those values related to one another, are you trying to perform single object tracking or multiple object tracking, how are the objects moving( do they have a relatively constant acceleration or velocity ) or not, do objects make turns, can objects also be occluded or not and so on.
The Kalman Filter is good solution to predict the next state of your system (once you have identified your process model). A deep learning alternative to the Kalman filter is the so called Deep Kalman Filter which essentially is used to do the same thing. In case your process or measurement models are not linear, you will have to linearize them before predicting the next state. Some solutions that deal with non-linear process or measurement models are the Extended Kalman Filter (EKF) or Unscented Kalman Filter (UKF).
Now related to fast moving objects, an idea you can use is to have a larger covariance matrix since the objects can move a lot more if they are fast, so the search space for the correct association has to be a bit larger. Additionally you can use multiple motion models in case your motion model cannot be satisfied with only one model.
In case of occlusions I will leave you this stack overflow thread, where I have given an answer covering more details regarding occlusion handling in case of tracking. I have added some references for you to read. You will have to provide more details in your question, if you would like to receive more information regarding a solution (for example you should define fast moving objects with respect to camera frame rate).
I personally do not think there is a silver bullet solution for the tracking problem, I prefer to tailor a solution to the problem I am trying to solve.
The tracking problem is complicated. It is also more in the realm of control systems than computer vision. It would be also helpful to know more about your situation, as the performance of the chosen method pretty much depends on your problem constraints. Are you interested in real-time tracking? Are you trying to reconstruct an existing trajectory? Are there multiple targets? Just one? Are the physical properties of the targets (i.e. velocity, direction, acceleration) constant?
One of the most basic tracking methods is implemented by a Linear Dynamic System (LDS) description, in concrete, a discrete implementation, since we’re working with discrete frames of information. This method is purely based on physics, and its prediction is very sensitive. Depending on your application, the error rate could be acceptable… or not.
A more robust solution is Kalman’s Filter, and it is pretty much the go-to answer when tracking is needed. It implements prediction based on all the measurements obtained so far during the model's lifetime. It mainly works on constant-based measurements (velocity and acceleration) although it can be extended to handle non-constant models. If you are working with targets that won't exhibit a drastic change in their velocity, this is what you (probably) should implement.
I'm sorry I can't provide you with more, but the topic is pretty extensive and, admittedly, the details are beyond my area of expertise. Hopefully, this info should give you a little bit of context for finding a solution.
The problem of tracking fast-moving objects (FMO) is a known research topic in computer vision. FMOs are defined as objects which move over a distance larger than their size in one video frame. The solutions which have been proposed use classical image processing and energy minimization to establish their trajectories and sharp appearance.
If you need a demo app, I would suggest this GitHub repository: The demo is written in OpenCV/C++ and runs in real-time. The authors also provide a mobile app version, which is still in testing mode. Using this demo app you can track any fast moving objects in real-time without even providing an object model. However, if you provide object size in real-world units, the app can also estimate object speed.
A more sophisticated algorithm is open-sourced here:, written in Python and PyTorch for speed-up. The repository contains a solution to the deblatting (deblurring and matting) problem, exactly what happens when a Fast Moving Object appears in front of a camera.

Publishing images without their source

I have more than a million images those I will like to use as training data. How do I make this data available freely without compromising security?
I want the users to be able to use it quickly for training purpose, without giving hackers a chance to rebuild images from the open source data. At the same time I do not want that the training quality will be affected in any way.
In other words how do I safely open-source images?
For e.g. This code generates numpy array. I just want to make it very difficult to reconstruct the original image from the ndarray "x" in this case.
from keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img
i = load_img('some_image.jpg' )
x = img_to_array(i)
x = x.reshape((1,) + x.shape)
I can share the array x once I know that the hackers can not use the data and create the same image.
If you aim to publish open-source pictures, a good start would be to understand how WikiCommons works. They had and must face many challenges of this kind, there is a lot of things to learn from there.
If your audience needs the complete picture to be served to make their models work, then no matter how you try to obfuscate the array containing the data. Smart guys that have enough time and creativity will be able to reconstruct the original picture. This is not a viable solution, it only provides a false secure feeling.
If you choose a destructive approach, not to serve the actual picture, but some digest/hash/fingerprint of it, then you will probably reduce the risk of reconstructing the original picture (beware there are very clever guys with strong cryptographic skills). But then your audience will not be able to learn from the picture itself so you may not achieve your goal.
Less destructive and may not fit your requirement: adding noise. It will not prevent disclosure of sensitive material (human eyes and brain are somehow good to classification) and it is a well know technique for AI confusion. Not a good solution too.
In anyway, if you serve without care sensitive material that does not fit open source, then you may get yourself and other people in trouble. This is not a good option.
My advice,
If your pictures really suit to open source policy, then serve them as this and do not worry about hackers, they are customers as well;
If your picture are sensitive, then do not serve them as open source. Instead provide a framework with a layer of security and implement required regulations you must take into account (ToS, IP, Copyright, GDPR).
All machine learning algorithms take the real images and convert the images to tensors, and process them in batches (multiple images at a time).
Couple of options for you:
You can share your images with your teammates and relay on trust.
You can somehow obfuscate the images as bunch of files, or you can create the algorithm to convert them to numpy array (or tensor), obfuscate them, and provide the procedure to revert them back without losses.
But in all these cases, non wanted people can somehow guess your procedure/obfuscation.
Ideal would be to create the Machine Learning model (like VGG, ResNet, Inception) from your images, and then you can distribute your model that learned what you planed from your images.
Bottom line, in ML you need images to learn something from them, and not the images per se.
Privacy is really a problem as we can see from this document dealing with how copyright is causing a decay in public datasets.
There are no many solutions to this problem, because privacy really matters. However, this idea with GANs may be encouraging.
If you don't use GANs, it is hard to tell what would be the right set of transforms you would need to undertake to escape the privacy policy concerns.
Just if you try to flip images, scale them, remove the metadata, normalize them, or transform one pixel is not enough. You would need make them indistinguishable from the originals.

Image analysis technique to determine approximate change in view over a short period of time?

I am working on an open source package for robot owners. I want to do a decent job of detecting when the robot is having movement problems. One of the problems the robot commonly has is that the back wheel gets "tucked underneath" in a bad way and makes it turn very slowly when on carpet. I believe that with a combination of accelerometer value inspection and (I hope) a relatively simple yet robust vision analysis technique, I will be able to tell when the robot is having this specific problem.
What I need is to be able to analyze two images, separated by about 1/2 second in time, and get a numerical value that tells about how close they are, but in a way that has some intelligence about the objects in the screen instead of just a simple color/hue/etc. analysis. I've heard of an algorithm called optical flow that is used in object and scene tracking, but I'm hoping I don't need something heavyweight.
Is there an machine vision algorithm/function that can analyze two JPEG's and tell if they belong to the same scene and viewpoint, yet can also deliver a numerical monotonically increasing value that tells me rough how different they are? If I could get that numerical value and compare it to the number of milliseconds past, while examining the current accelerometer activity, I believe I can detect when the robot is having the "slow turn of death" problem.
If so, please tell me the basic technique involved, and if you know of machine vision library that implements it, which one it is.
but in a way that has some intelligence about the objects in the screen instead of just a simple color/hue/etc. analysis
What you are suggesting is a complex problem by itself, so forget about 'lightweight' solutions. Probably you are going to need something like optical flow.
Other options I would recommend you looking into are:
Vanishing points detection and variation from image to image. This quite fits into your problem domain Wikipedia
Disparity map: related to optical flow. Used for stereographic vision, but I think you can use it for the kind of application you are looking for. Take a look at this

Use Azure Machine learning to detect symbol within an image

4 years ago I posted this question and got a few answers that were unfortunately outside my skill level. I just attended a build tour conference where they spoke about machine learning and this got me thinking of the possibility of using ML as a solution to my problem. i found this on the azure site but i dont think it will help me because its scope is pretty narrow.
Here is what i am trying to achieve:
i have a source image:
and i want to which one of the following symbols (if any) are contained in the image above:
the compare needs to support minor distortion, scaling, color differences, rotation, and brightness differences.
the number of symbols to match will ultimately at least be greater than 100.
is ML a good tool to solve this problem? if so, any starting tips?
As far as I know, Project Oxford (MS Azure CV API) wouldn't be suitable for your task. Their APIs are very focused to Face related tasks (detection, verification, etc), OCR and Image description. And apparently you can't extend their models or train new ones from the existing ones.
However, even though I don't know an out of the box solution for your object detection problem; there are easy enough approaches that you could try and that would give you some start point results.
For instance, here is a naive method you could use:
1) Create your dataset:
This is probably the more tedious step and paradoxically a crucial one. I will assume you have a good amount of images to work with. What would you need to do is to pick a fixed window size and extract positive and negative examples.
If some of the images in your dataset are in different sizes you would need to rescale them to a common size. You don't need to get too crazy about the size, probably 30x30 images would be more than enough. To make things easier I would turn the images to gray scale too.
2) Pick a classification algorithm and train it:
There is an awful amount of classification algorithms out there. But if you are new to machine learning I will pick the one I would understand the most. Keeping that in mind, I would check out logistic regression which give decent results, it's easy enough for starters and have a lot of libraries and tutorials. For instance, this one or this one. At first I would say to focus in a binary classification problem (like if there is an UD logo in the picture or not) and when you master that one you can jump to the multi-class case. There are resources for that too or you can always have several models one per logo and run this recipe for each one separately.
To train your model, you just need to read the images generated in the step 1 and turn them into a vector and label them accordingly. That would be the dataset that will feed your model. If you are using images in gray scale, then each position in the vector would correspond to a pixel value in the range 0-255. Depending on the algorithm you might need to rescale those values to the range [0-1] (this is because some algorithms perform better with values in that range). Notice that rescaling the range in this case is fairly easy (new_value = value/255).
You also need to split your dataset, reserving some examples for training, a subset for validation and another one for testing. Again, there are different ways to do this, but I'm keeping this answer as naive as possible.
3) Perform the detection:
So now let's start the fun part. Given any image you want to run your model and produce coordinates in the picture where there is a logo. There are different ways to do this and I will describe one that probably is not the best nor the more efficient, but it's easier to develop in my opinion.
You are going to scan the picture, extracting the pixels in a "window", rescaling those pixels to the size you selected in step 1 and then feed them to your model.
If the model give you a positive answer then you mark that window in the original image. Since the logo might appear in different scales you need to repeat this process with different window sizes. You also would need to tweak the amount of space between windows.
4) Rinse and repeat:
At the first iteration it's very likely that you will get a lot of false positives. Then you need to take those as negative examples and retrain your model. This would be an iterative process and hopefully on each iteration you will have less and less false positives and fewer false negatives.
Once you are reasonable happy with your solution, you might want to improve it. You might want to try other classification algorithms like SVM or Deep Learning Artificial Neural Networks, or to try better object detection frameworks like Viola-Jones. Also, you will probably need to use crossvalidation to compare all your solutions (you can actually use crossvalidation from the beginning). By this moment I bet you would be confident enough that you would like to use OpenCV or another ready to use framework in which case you will have a fair understanding of what is going on under the hood.
Also you could just disregard all this answer and go for an OpenCV object detection tutorial like this one. Or take another answer from another question like this one. Good luck!

GUI version of OpenCV for feature-detection (SIFT etc.) prototyping before actual project development?

I had an idea for which I need to be able to recognize certain objects or models from a rendered three dimensional digital movie.
After limited research, I know now that what I need is called feature detection in the field of Computer Vision.
So, what I want to do is:
create a few screenshots of a certain character in the movie (eg. front/back/leftSide/rightSide)
play the movie
while playing the movie, continuously create new screenshots of the movie
for each screenshot, perform feature detection (SIFT?, with openCV?) to see if any of our character appearances are there (they must still be recognized if the character is further away and thus appears smaller, or if the character is eg. lying down).
give a notice whenever the character is found
This would be possible with OpenCV, right?
The "issue" is that I would have to learn c++ or python to develop this application. This is not a problem if my movie and screenshots are applicable for what I want to do.
So, I would like to first test my screenshots of the movie. Is there a GUI version of OpenCV that I can input my test data and then execute it's feature detection algorithms manually as a means of prototyping?
Any feedback is appreciated. Thanks.
There is no GUI of OpenCV able to do what you want. You will be able to use OpenCV for some aspects of your problem, but there is no ready-made solution waiting there for you.
While it's definitely possible to solve your problem, the learning curve for this problem is quite long. If you're a professional, then an alternative to learning about it yourself would be to hire an expert to do it for you. It would cost money, but save you time.
As far as template matching goes, you wouldn't normally use it to solve such a problem because the thing you're looking for is changing appearance and shape. There aren't really any "dynamic parameters to set". The closest thing you could try is have a massive template collection that would try to cover the expected forms that your target may take. But it would hardly be an elegant solution. Plus it wouldn't scale.
Next, to your point about face recognition. This is kind of related, but most facial recognition applications deal with a controlled environment: lighting, distance, pose, angle, etc. Outside of that controlled environment face detection effectiveness drops significantly. If you're detecting objects in a movie, then your environment isn't really controlled.
You may want to first try a simpler problem of accurately detecting where the characters are, without determining who they are (video surveillance, essentially). While it may sound simple, you'll find that it's actually non-trivial for arbitrary scenes. The result of solving that problem may be useful in identifying the characters.
There is Find-Object by Mathieu Labbé. It was very helpful for me to start getting an understanding of the descriptors since you can change them while your video is running to see what happens.
This is probably too late, but might help someone else looking for a solution.
Well, using OpenCV you would of taking a frame of a video file and do any computations on it.
You can do several different methods of detecting a character on that image, but it's not so easy to have it as flexible so you can even get that person if it's lying on the floor for example, if you only entered reference images of that character standing.
Basically you could try extracting all important features from your set of reference pictures and have a (in your case supervised) learning algorithm that gets a good feature-vector of that character for classification.
You then need to write your code that plays the video and which takes a video frame let's say each 500ms (or other as you desire), gets a segmentation of the object you thing would be that character and compare it with the reference values you get from your learning algorithm. If there's a match, your code can yell "Yehaaawww!" or do other things...
But all this depends on how flexible you want this to be. You could also try a template match or cross-correlation which basically shifts the reference image(s) over the frame and checks how equal both parts are. But this unfortunately is very sensitive for rotation, deformations or other noise... so you wouldn't get that person if its i.e. laying down. And I doubt you can get all those calculations done in realtime...
Basically: Yes OpenCV is good to use for your image processing/computer vision tasks. But it offers a lot of methods and ways and you'd need to find a way that works for your images... it's not a trivial task though...
Hope that helps...
Have you tried looking at some of the work of the Oxford visual geometry group?
Their Video Google system describes to a large extent what you want, instance detection.
Their work into Naming People in TV shows is also pretty relevant. A face detection and facial feature pipeline is included that can be run from Matlab. Are you familiar with Matlab?
Have you tried computer vision frameworks like Cassandra? There you can exactly do that just by some mouse clicks.
