Saving bounding box coordinates for each frame in a video - machine-learning

I have a video from a camera with humans on the scene. I need to go through each frame of that video and manually save the coordinates (go through each frame and draw the square around each human) of the bounding box of the detected humans on the scene and the coordinate of the center of the head - so basically, top-left, bottom-right, head-center coordinates. The bounding box has to be a square.
An additional program will then read a file with coordinates of the square and center of head and the frame number, and extract the boxes as an image.
For anybody that has experience with computer vision - is there any open-source software that can accomplish what I am requesting? If not, what technology would you recommend building this tool on? Any starter code?

I don't know of any programs that can do specifically this, but I think it is an easy problem and you can code it yourself in no time.
As you are in the computer vision field you must be used to OpenCV. You can use it to extract the frames from a video and to select the box and head center.
Here are some links that can help you out:
Extract video frames
Detect mouse events

Related

Can I use OpenCV to analyze a video for how long a face is in the centre of the screen?

This is a frame from a video taken using the HTC Vive (it's from the user's perspective of a game I developed in Unity) I've overlaid those boxes in paint.
I'm trying to determine which character the person is looking at (assuming the white box is where they user is focusing)
I know this can be done in Unity without the need for a video, but I want to know if the video can be analyzed using something like OpenCV to detect how long each character faces is in the white box. I just made this in paint to get the idea across, the parameters aren't to scale or anything, I just have no idea where to start with a concept like this apart from OpenCV.
But to summarize, can I use OpenCV to detect how long each face is in the centre of the screen? i.e. How long the user looked at each character.

Screen detection using opencv / Emgucv

I'm working on computer screen detection using emgucv (a c# opencv wrapper ).
I want to detect my computer screnn and draw a rectangle on it.
To help in this process, I used 3 Infrared Leds on the screen of the computer which I detect firtsly and after the detection, I could find the screen areas below those 3 leds.
Here is the results after the detection of the 3 leds.
The 3 red boxes are the detected leds.
.
And in general I have something like this
Does anyone have an idea about how I can proceed to detect the whole screan area ?
This is just a suggestion but, if you know for a fact that your computer screen is below your LEDs, you could try using OpenCV GrabCut algorithm. Draw a rectangle below the LEDs, large enough to contain the screen (maybe you could guess the size from the space between the LEDs) and use it to initialize the GrabCut.
Let me know what kind of results you get.
You can try to use a camera with no IR filter(This is mostly all night vision cameras) so that you can get a more intense light from the LEDs hence making it stand out than what your display would have then its a simple blob detection to get there position.
Another solution would be using ARUCO markers on the display if the view angle you are tending to use are not very large then its should be a compelling option and even the relative position of the camera with the display can be predicted also if that is what you want. With the detection of ARUCO you can get the angles that the plane of the display is placed at hence making the estimation of the display area with them.

Fix a position in a webcam video using OpenCV

I'm trying to draw an arrow to moving video using OpenCV.
What I want to do is the following:
Select a position (eg with the mouse) in the video captured by my webcam. Then I want to draw an arrow at this position. While the camera is moving the arrow should get drawn at the right position relatively to the webcam video.
Can you give some hints on how to do this?
This is pretty much what I'm looking for, but it isn't that stable. Since I'm a newbie to OpenCV, it would be helpful to get some help.

Project image onto notebook using OpenCV

I am trying to implement an application that projects an image onto a page of a notebook, using OpenCV, a webcam and a projector. To achieve that, I am doing the following steps:
I am using a webcam to detect the four corners points of a page.
A homography is learned between the four corner points of the camera image and their projections on my desk, as seen in the camera. By using the inverse transformation, I will be able to know where I should draw something on my camera image, so that the projection "ends up" at a desired location.
I am applying the inverse transformation to the detected four corners points of the page.
I am warping the desired image to the new, transformed set of points.
So far it works well, if the notebook is on my desk and wide open. Like in this picture:
But if I try to close one side (or both), the following happens:
See the problem? In the first picture the image is perfectly aligned with the edges of the page and remains so if you rotate or translate the notebook, while keeping it on the desk. But that doesn't happen in the second image, where the the top edge of the image is no longer parallel to the top edge of the page (the image becomes more and more skewed).
Can anyone explain why I get this projection problem or at least point me to some resources where I can read about it? I need to mention that the projector and the webcam are placed above and to the left of the notebook, not right above them.
Any tips or suggestions are welcome. Thank you!
You want an effect that is called a key stone correction. The problem you are experiencing is most probably due to the fact that optical axes, positions, and focal lengths of a web camera and a projector are different. I suggest to calibrate your setup so you would know their relative pose and incorporate it in your inverse Homography.

iOS Camera Color Recognition in Real Time: Tracking a Ball

I have been looking for a bit and know that people are able to track faces with core image and openGL. However I am not that sure where to start the process of tracking a colored ball with the iOS camera.
Once I have a lead to being able to track the ball. I hope to create something to detect. when the ball changes directions.
Sorry I don't have source code, but I am unsure where to even start.
The key point is image preprocessing and filtering. You can use the Camera API-s to get the video stream from the camera. Take a snapshot picture from it, then you should use a Gaussian-blur on it (spatial enhance), then a Luminance Average Threshold Filter (to make black and white image). After that a morphological preprocessing should be wise (opening, closing operators), to hide the small noises. Then an Edge detection algorithm (with for example a Prewitt-operator). After these processes only the edges remain, your ball should be a circle (when the recording environment was ideal) After that you can use a Hough-transform to find the center of the ball. You should record the ball position and in the next frame, the small part of the picture can be processed (around the ball only).
Other keyword could be: blob detection
A fast library for image processing (on GPU with openGL) is Brad Larsons: GPUImage library https://github.com/BradLarson/GPUImage
It implements all the needed filter (except Hough-transformation)
The tracking process can be defined as following:
Having the initial coordinate and dimensions of an object with a given visual characteristics (image features)
In the next video frame, find the same visual characteristics near the coordinate of the last frame.
Near means considering basic transformations related to the last frame:
translation in each direction;
scale;
rotation;
The variation of these tranformations are strictly related with the frame rate. Higher the frame rate, nearest the position will be in the next frame.
Marvin Framework provides plug-ins and examples to perform this task. It's not compatible with iOs yet. However, it is open source and I think you can port the source code easily.
This video demonstrates some tracking features, starting at 1:10.

Resources