Can yolov7 detect very tiny objects from thermal images? - machine-learning

I was using yolov5 to detect animals on the sea surface.
but yolo had a difficult time doing it because the waves were also similar in shape and color to the birds (lots of false positives). And when the drone flew really high or when non whitish birds were involved, it didn't detect them at all. They were tiny and not really distinct to the eye. Will this issue be resolved if I use a thermal camera? I hope the thermal sensors can distinguish the cold sea and the warm animals like this,
and yolo can easily detect them even if they were tiny. Is my assumption valid? Can yolov7 be used to train and detect the thermal imagery of tiny birds?

Related

Image background recognition

I want to detect regions on image, kind of clusterize it. The main purpose is to say "this is sky", "this is forest", and a few other patterns (about 10-15).
Now I have prototype for nature recognition, which uses HSV color distance.
So, sky pattern has its approximate HSV range, and for this photo it's detected such way:
But except of sky, forest, water, snow I'd like to recognize clothes, furniture etc.
How to detect sweater on this photo (it has specific texture of tiny 3-5 pixel squares):
Except of HSV, I know clustering (e.g. k-means), also worked with FFT for image convolution (they say FFT can be used for feature detection: lines, patterns).
I've heard about image descriptors (SIFT ?), but didn't work with anything related to them, so don't have intuitive understanding.
Important: This should be fast and have low-memory usage, maybe with no-ideal quality. :) When detecting features in real-time, I don't have ability to store large database of tagged images.
Which methods would you suggest using?
Thanks in advance.

best algorithm for face detection and pose estimation

I am looking for algorithms/publications on face detection. There are plenty in the web. But my scenario is somewhat specialized. I want to detect faces accurately in images taken by wearable devices (e.g. narrative clips), so there will be motion blur, and image quality will not be that good. I want to detect faces that are within 15 feet of the camera accurately. Next goal is to estimate the pose, primarily to find out if the person is looking toward the camera ( or better looking at the camera owner).
Any suggestion?
My go to for this would either be a deep-learning framework using convolutional layers for pixel classification, or K-means/ K-Nearest Neighbour algorithm.
This does depend on your data, however. From your post I am assuming that your data isn't labelled? meaning you are unable to feed in the 'truth' to the algorithm for classification.
you could perhaps use a CNN (convolutional neural network) for pixel classification (image segmentation) which should identify the location of a person. given this, perhaps you could run a 'local' CNN i a region close to the face identified to classify the region the body is located in as a certain pose.
This would probably be my first take on the problem but would depend on the exact structure of your data, and the structure of your labels (if you have any).
I have to say it does sound like a fun project!
I found OpenCV's Haar Cascades for Face Detection pretty accurate and robust for motion blur and "live" face recognition.
I'm saying that because I used them for implementing an Eye-Tracker in C++ with a laptop webcam (whose resolution was not excellent and motion blur was naturally always present).
They work in multiresolution and are therefore able to detect faces of any size, but you can easily tune them for your distance of interest.
They might not be your final optimal solution, but since they are already implemented and come with the OpenCV package, they could constitute a good starting point.

Haar Cascade vs Hog Detection

I have been working around with OpenCV for few days now and I have a project where I should detect cars and humans from the sky.
So here are my inputs:
A moving camera in the sky (embedded on a quadcopter) which is gonna capture frames.
A set of objects I should detect (humans and cars)
And here are my output:
A detection of those objects outlined by a rectangle or some contours
Based on that, my question is as follows: Which one between Haar Cascade and Hog Detection would you recommend to do so and why? Or any else?
Many thanks for your answers
HOG is usually better for human detection, than Haar. I have only experience in this so I thought I'd give some input on that. However, the limitation of HOG is that the human must be within a "perfect" area on the screen. Too close, it won't detect the human. Too far, it won't detect the human.
I have had better luck with HOG than Haar. Haar gave me too many false positives.
I have been trying to use HAAR to detect human, and it turns out to give too many false positives. I think HAAR is only suitable for face or eye detection.
since your camera is in the sky, the human is pretty small in the image and got a whole body shape. HOG would be a better choice.
You need to change scale factor and minimum neighbours in HAAR cascade which is not same for all the image. So it's better to use HOG.

Free Face Detection Algorithm for Video

I'm working on an application which needs to detect the location of a face in a video stream, using a web cam placed at desk height (and slightly off to the side of the user).
I've already implemented a version of OpenCV (using their Haar detection) and it works ok... the problem is that it tends to lose the position of the face if the user turns their head to the side (or looks up).
Since the webcam is sitting on the desk, it is tilted up at a 30 degree angle. The OpenCV detection algorithm is trained using fully frontal images, but not up-angle images like the ones I'm using. I know OpenCV also has a profile Haar file that can be used.. but from my research it seems that the results are quite mixed on profile detection. In addition, I don't really have control over the background or lighting of the image... so this sometimes also effects the efficacy of the OpenCV detection algorithm.
So, I guess what I'm asking is... are there other face detection algorithms (that are hopefully free, as this is part of my university research) that are better for detecting faces for this type of setup? It seems like some of the built-in webcams (for Macs and PCs) actually have fairly robust algorithms for detecting faces (and then overlaying cheesy cartoon images over the faces)... but they seem to work well regardless of background or lighting. Do you have any recommendations?
Thanks.
For research purposes, you can use the Haar cascades in OpenCV, things are different if you want to go commercial (in which case you need to consider LBP cascades instead). Just be sure to quote the Viola-Jones paper in your references.
To improve the results of face detection, you have several paths:
individual image detection: you can send rotated images to a frontal cascade to account for some variability without training your own cascade
individual image detection but more work) : train your own cascade in operating conditions closer to the ones of your app
stability in video streams (as in webcams & co.) : this is achieved by adding a layer of tracking around the face detection. Depending on your knowledge about this topic, you can use your own filter, have fun with OpenCV's particle or Kalman filter, implement a simple first or second order low pass filter on the face position or a PID tracker on the detected face...
Any of these tracking filters will enhance a lot your results when processing video streams.
Use CLM-framework for accurate realtime face detection and face landmark detection.
Example of the system in action: http://youtu.be/V7rV0uy7heQ
You may find it useful.

Rapid motion and object detection in opencv

How can we detect rapid motion and object simultaneously, let me give an example,....
suppose there is one soccer match video, and i want to detect position of each and every players with maximum accuracy.i was thinking about human detection but if we see soccer match video then there is nothing with human detection because we can consider human as objects.may be we can do this with blob detection but there are many problems with blobs like:-
1) I want to separate each and every player. so if players will collide then blob detection will not help. so there will problem to identify player separately
2) second will be problem of lights on stadium.
so is there any particular algorithm or method or library to do this..?
i've seen some research paper but not satisfied...so suggest anything related to this like any article,algorithm,library,any method, any research paper etc. and please all express your views in this.
For fast and reliable human detection, Dalal and Triggs' Histogram of Gradients is generally accepted as very good. Have you tried playing with that?
Since you mentioned rapid motion changes, are you worried about fast camera motion or fast player/ball motion?
You can do 2D or 3D video stabilization to fix camera motion (try the excellent Deshaker plugin for VirtualDub).
For fast player motion, background subtraction or other blob detection will definitely help. You can use that to get a rough kinematic estimate and use that as an estimate of your blur kernel. This can then be used to deblur the image chip containing the player.
You can do additional processing to establish identify based upon OCRing jersey numbers, etc.
You mentioned concern about lights on the stadium. Is the main issue that it will cast shadows? That can be dealt with by the HOG detector. Blob detection to get blur kernel should still work fine with the shadow.
If you have control over the camera, you may want to reduce exposure times to reduce blur. Denoising techniques can be used to reduce CCD noise that occurs with extreme low light and dense optical flow approaches align the frames and boost the signal back up to something reasonable via adding the denoised frames.

Resources