Estimate of the crowd head count in static image (openCV) - opencv

We are trying to count the number of people in a static image, this number may be as large as 100-150. I did some research and found some ways to do this, but we are not sure which one will work the best.So my question is this, will haartraining give us good results?
If you have more ideas please share with me.
Thankyou.

I would say it depends on how your haartraining goes. Are you wanting to train a classifier specifically to detect faces, and then detect the # of faces in the image and count them up? It's possible that your model will do one or more of these things: a. Count things that are not faces as faces b. Count single faces multiple times. If you can get strong training sets for both positive and negative images, it could definitely be worth a shot as far as ballparking a number. I wouldn't expect it to be exact though.

Related

How to get proper results detecting images with yolov7?

This is a general question. I tried detecting objects in pictures downloaded from roboflow-datasets with yolov7, these were quite simple objects with a solid background (sun in the sky, balloons in the sky, rabbits on grass etc. ) Even though i had a good amount of pictures (500 to 1000) and a good amount of epochs it did not have a good result. Many objects were not detected, and those who got detected had a bad frame containing too much of the picture or were too small and did not contain the whole object. So there are some questions on general improving of yolov7:
How many pictures are needed in general to generate good results for one object ?
How many epochs are needed ?
Is there a way to improve the backbone aka. the neural network from yolov7 ?
Is there a way to improve the hyp. parameters ("learning parameters") ?
I tried googleing it, tried to improve the arguments for train.py. It would be nice if you could help me out.

What is the appropriate way to train a face detector / classifier?

I want to build a face detector/classifier to generate a network that detects whether a face is present in an image/video.
I understand the basic concept, but what I have problems with is the choice of the number of classes.
Initially, I thought that two classes (with face / without face) would be sufficient. However, I was unsure which data I should use for the class 'without face'. So I threw together datasets of equipment and plants and animals, whereupon the classes were very unbalanced, which is apparently not good.
Then I thought it would be better to use as many classes as possible.
But again, I am unsure what would be the best/common approach to the problem?
You can experiment with any number of samples and different images for the negative class. If the datasets with equipment/plant/places you have are imbalanced, you can try to subsample, e.g. pick 100 images from each.
Just don't make the negative class too huge, w.r.t the number of images with human samples you have. The rest is up to experimentation.

What could be the reason that the model does not produce better results with lots more data?

So as I was playing with a Keras GAN from Jeff Heatons website.
Since the saying is the more data we have, the better the results we should get. I wanted to test this hypothesis. Also, I wanted to know if the GAN may just copy a sample from data.
That's why I created images with numbers ranging from 1 to 20000:
128px x 128px
Numbers are centered
Used the same for all (dark blue & yellow)
So to test this theory, I first trained the GAN with 5000 images. This is the result that I am getting:
And then I trained with with 20000 images:
I can't really see a big improvement. What gives? Do I need to try with much more images (50,000)? Do I need to improve the architecture of the GAN?
You do not need to modify the architecture of the GAN.
If I were you, I would first start looking for a good metric to check how you results improved/worsened.
Honestly when I am looking at the two different batches, the second one looks much more diverse, so it makes sense that the GAN is able to generate a wider range of numbers since it has seen many more pictures.

Grouping points that represent lines

I am looking for an Algorithm that is able to solve this problem.
The problem:
I have the following set points:
I want to group the points that represents a line (with some epsilon) in one group.
So, the optimal output will be something like:
Some notes:
The point belong to one and only line.
If the point can be belong to two lines, it should belong to the strongest.
A line is considered stronger that another when it has more belonging points.
The algorithm should not cover all points because they may be outliers.
The space contains many outliers it may hit 50% of the the total space.
Performance is critical, Real-Time is a must.
The solutions I found till now:
1) Dealing with it as clustering problem:
The main drawback of this method is that there is no direct distance metric between points. The distance metric is on the cluster itself (how much it is linear). So, I can not use traditional clustering methods and I have to (as far as I thought) use some kind of, for example, clustering us genetic algorithm where the evaluation occurs on the while cluster not between two points. I also do not want to use something like Genetic Algorithm While I am aiming real-time solution.
2) accumulative pairs and then do clustering:
While It is hard to make clustering on points directly, I thought of extracting pairs of points and then try to cluster them with others. So, I have a distance between two pairs that can represents the linearity (two pairs are in real 4 points).
The draw-back of this method is how to choose these pairs? If I depend on the Ecledian-Distance between them, it may not be accurate because two points may be so near to each other but they are so far from making a line with others.
I appreciate any solution, suggest, clue or note. Please you may ask about any clarification.
P.S. You may use any ready OpenCV function in thinking of any solution.
As Micka advised, I used Sequential-RANSAC to solve my problem. Results were fantastic and exactly as I want.
The idea is simple:
Apply RANSAC with fit-line model on the points.
Delete all points that are in-liers of the output of RANSAC.
While there are 2 or more points go to 1.
I have implemented my own fit-line RANSAC but unfortnantly I can not share code because it belongs to the company I work for. However, there is an excellent fit-line RANSAC here on SO that was implemented by Srinath Sridhar. The link of the post is : RANSAC-like implementation for arbitrary 2D sets.
It is easy to make a Sequential-RANSAC depending on the 3 simple steps I mentioned above.
Here are some results:

Human recognition using Template matching

I am using Emgu-CV to identify each person in a big room.
My camera is static and in-door.
I would like to count the number of persons who visited the room, that is I want to recognize each person even if I got the images in different angles at different times in a day.
I am using Haar classifiers to detetct the face, heads and full body from the image and then I am comparing this with the already detected image portions using template matching so that I can recognize the person. But I am getting very poor results.
Is this the right approach for this problem ? can anyone suggest a better approach ?
Or is there any better libraries available which can solve this problem ?
I think Template Matching is the weak point in your system. I would suggest training a Haar cascade for each person individually that will result replace (detecting + recognition) with (detect a precise object). Sure if the number of people you want to recognize is rather small. Or you can use some other stuff like SURF but note their licence.

Resources