How to get the coordinates of the moving object with GPUImage motionDetection- swift - ios

How would I go about getting the screen coordinates of something that enters frame with motionDetection filter? I'm fairly new to programming, and would prefer a swift answer if possible.
Example - I have the iphone pointing at a wall - monitoring it with the motionDetector. If I bounce a tennis ball against the wall - I want the app to place an image of a tennis ball on the iphone display at the same spot it hit the wall.
To do this, I would need the coordinates of where the motion occurred.
I thought maybe the "centroid" argument did this.... but I'm not sure.

I should point out that the motion detector is pretty crude. It works by taking a low-pass filtered version of the video stream (a composite image generated by a weighted average of incoming video frames) and then subtracting that from the current video frame. Pixels that differ above a certain threshold are marked. The number of these pixels, along with the centroid of the marked pixels are provided as a result.
The centroid is a normalized (0.0-1.0) coordinate representing the centroid of all of these differing pixels. A normalized strength gives you the percentage of the pixels that were marked as differing.
Movement in a scene will generally cause a bunch of pixels to differ, and for a single moving object the centroid will generally be the center of that object. However, it's not a reliable measure, as lighting changes, shadows, other moving objects, etc. can also cause pixels to differ.
For true object tracking, you'll want to use feature detection and tracking algorithms. Unfortunately, the framework as published does not have a fully implemented version of any of these at present.

Related

Tracking of rotating objects using opencv

I need to track cars on the road from top-view video.
My application contain two main parts:
Detecting cars on the frame (Tensorflow trained network)
Tracking detected cars (opencv trackers)
I have troubles with opencv trackers. Initially i tried to different trackers, but only MOSSE is fast enough. This tracker works almost perfect for case with straight road, but i faced problems with rotating cars. This situation appears on crossroads.
As i understood, bounding box of rotated object is bigger that bbox of horizontal or vertical object. As result bbox contains big part of static background and the tracker lose target object.
Are there any alternative trackers which can track contours (not bounding boxes)?
Can i adjust quality of existing opencv trackers results by any settings or by adjusting picture?
Schema:
Real image:
If your camera is stationary the following scenario is feasible:
use ‌background subtraction methods to separate background image from foreground blobs.
Improve the foreground results using morphological operations.
Detect car blobs and remove other blobs.
Track foreground blobs in video i.e. binary track (simply use this or even apply KF).
A very basic but effective approach in this scenario might be to track the center coordinates of the bounding box, if the center coordinates only change along one axis (with a small tolerance for either axis), its a linear motion (not a rotation). If both x and y change, the car is moving in the roundabout.
This only has the weakness that it will detect diagonal motion, but since you are looking at a centered roundabout, that shouldn't be an issue.
It will also be very efficient memory-wise.
You should use PCA method, which can calculate the orientation of an detected object and which way it is facing. You can change the threshold of detection to select objects more like the cars (based upon shape and colour - a HSV conversion which in your case is red) in your picture.
Link to an introduction to Principal Component Analysis (PCA)
Method 1 :
- Detect bounding boxes and subtract the background to get blobs rotated rectangles.
Method 2 :
- implement your own version of detector with rotated boxes.
Method 3 :
- Use segmentation instead ... Unet for example.
There are no other trackers than the ones found in the library.
Your best bet is to filter the image and use findcontours.
Optical flow and background subtraction will help with this. You can combine optical flow with your car detector to rule out false positives.
https://docs.opencv.org/3.4/d4/dee/tutorial_optical_flow.html
https://docs.opencv.org/3.4/d1/dc5/tutorial_background_subtraction.html

Detecting balls on a pool table

I'm currently working on a project where I need to be able to very reliable get the positions of the balls on a pool table.
I'm using a Kinect v2 above the table as the source.
Initial image looks like this (after converting it to 8-bit from 16-bit by throwing away pixels which is not around table level):
Then a I subtract a reference image with the empty table from the current image.
After thresholding and equalization it looks like this: image
It's fairly easy to detect the individual balls on a single image, the problem is that I have to do it constantly with 30fps.
Difficulties:
Low resolution image (512*424), a ball is around 4-5 pixel in diameter
Kinect depth image has a lot of noise from this distance (2 meters)
Balls look different on the depth image, for example the black ball is kind of inverted compared to the others
If they touch each other then they can become one blob on the image, if I try to separate them with depth thresholding (only using the top of the balls) then some of the balls can disappear from the image
It's really important that anything other than balls should not be detected e.g.: cue, hands etc...
My process which kind of works but not reliable enough:
16bit to 8bit by thresholding
Subtracting sample image with empty table
Cropping
Thresholding
Equalizing
Eroding
Dilating
Binary threshold
Contour finder
Some further algorithms on the output coordinates
The problem is that a pool cue or hand can be detected as a ball and also if two ball touches then it can cause issues. Also tried with hough circles but with even less success. (Works nicely if the Kinect is closer but then it cant cover the whole table)
Any clues would be much appreciated.
Expanding comments above:
I recommend improving the IRL setup as much as possible.
Most of the time it's easier to ensure a reliable setup than to try to "fix" that user computer vision before even getting to detecting/tracking anything.
My suggestions are:
Move the camera closer to the table. (the image you posted can be 117% bigger and still cover the pockets)
Align the camera to be perfectly perpendicular to the table (and ensure the sensor stand is sturdy and well fixed): it will be easier to process a perfect top down view than a slightly tilted view (which is what the depth gradient shows). (sure the data can be rotated, but why waste CPU cycles when you can simply keep the sensor straight)
With a more reliable setup you should be able to threshold based on depth.
You can possible threshold to the centre of balls since the information bellow is occluded anyway. The balls do not deform, so it the radius decreases fast the ball probably went in a pocket.
One you have a clear threshold image, you can findContours() and minEnclosingCircle(). Additionally you should contrain the result based on min and max radius values to avoid other objects that may be in the view (hands, pool cues, etc.). Also have a look at moments() and be sure to read Adrian's excellent Ball Tracking with OpenCV article
It's using Python, but you should be able to find OpenCV equivalent call for the language you use.
In terms tracking
If you use OpenCV 2.4 you should look into OpenCV 2.4's tracking algorithms (such as Lucas-Kanade).
If you already use OpenCV 3.0, it has it's own list of contributed tracking algorithms (such as TLD).
I recommend starting with Moments first: use the simplest and least computationally expensive setup initially and see how robuts the results are before going into the more complex algorithms (which will take to understand and get the parameters right to get expected results out of)

OpenCV - calibrate camera using static images in water

I have a photocamera mounted vertically under water in a tank, looking downwards.
There is a flat grid on the bottom of the tank (approx 2m away from the camera).
I want to be able to place markers on the bottom, and use computer vision to know their real life exact position.
So, I need to map from pixels to mm.
If I am not mistaken, cv::calibrateCamera(...) does just this, but is dependent on moving a pattern in front of the camera.
I have just static pictures of the scene, and the camera never moves in relation to the grid. Thus, I have only a "single" image to find the parameters.
How can I do this using the grid?
Thank you.
Interesting problem! The "cute" part is the effect on the intrinsic parameters of the refraction at the water-glass interface, namely to increase the focal length (or, conversely, to reduce the field of view) compared to the same lens in air. In theory, you could calibrate in air and then correct for the difference in refraction index, but calibrating directly in water is likely to give you more accurate results.
Do know your accuracy requirements? And have you verified that your lens/sensor combination is adequate to meet them (with an adequate margin)? To answer the question you need to estimate (either by calculation from the lens and sensor specifications, or experimentally using a resolution chart) whether you can resolve in an image the minimal distances required by your application.
From the wording of your question I think that you are interested only in measurements on a single plane. So you only need to (a) remove the nonlinear (barrel or pincushion) lens distortion and (b) estimate the homography between the plane of interest and the image. Once you have the latter, you can directly convert from undistorted image coordinates to world ones by matrix multiplication. Additionally if (as I imagine) the plane of interest is roughly parallel to the image plane, you should not have any problem keeping the entire field-of-view in focus.
Of course, for all of this to work as expected, you should make sure that the tank bottom is really flat, within the measurement tolerances of your application. Otherwise you are really dealing with a 3D problem, and need to modify your procedures accordingly.
The actual procedure depends a lot on the size of the tank, which you don't indicate clearly. If it's small enough that it is practical to manufacture a chessboard-like movable calibration target, by all means go for it. You may want to take a look at this other answer for suggestions. In the following I'll discuss the more interesting case in which your tank is large, e.g. the size of a swimming pool.
I'd proceed by sticking calibration markers in a regular grid at the pool bottom. I'd probably choose checker-like markers like these, maybe printing them myself with a good laser printer on plastic with an adhesive backing (assuming you can leave them in place forever). You should plan on having quite a few of them, say, an 8x8 or 10x10 grid, covering as much as possible of the field of view of the camera in its operating position and pose. To help with lining up the grid nicely you might use a laser line projector of suitable fan angle, or a laser pointer attached to a rotating support. Note carefully that it is not necessary that they be affixed in a precise X-Y grid (which may be complicated, depending on the size of your pool), only that their positions with respect any arbitrarily chosen (but fixed) three of them be known. In other words, you can attach them to the bottom approximately in a grid, then measure the distances of three extreme corners from each other as accurately as you can, thus building a base triangle, then measure the distances of all the other corners from the vertices of the triangle, and finally reconstruct their true positions with a bit of trigonometry. It's basically a surveying problem and, depending on your accuracy requirements and budget, you may want to enroll a local friendly professional surveyor (and their tools) to get it done as precisely as necessary.
Once you have your grid, you can fill the pool, get your camera, focus and f-stop the lens as needed for the application. From now on you may not touch the focus and f-stop ever again, under penalty of miscalibrating - exposure can only be controlled by the exposure time, so make sure to have enough light. Disable any and all auto-focus and auto-iris functions, if any. If the camera has a non-rigid lens mount (e.g. a DLSR), you'll need some kind of mechanical rig to ensure that the lens-body pair stay rigid. F-stop as close as you can, given the available lighting and sensor, so to have a fair bit of depth of field available. Then take several photos (~ 10) of the grid, moving and rotating the camera, and going a bit closer and farther away than your expected operating distance from the plane. You'll want to "see" in some images some significant perspective foreshortening of the grid - this is needed to accurately calibrate the focal length. Avoid JPG and any other lossy compression format when storing the images - use lossless PNG or TIFF.
Once you have the images, you can manually mark and identify the checker markers in the images. For a once-off project like this I would not bother with automatic identification, just do it manually (e.g. in Matlab, or even in Photoshop or Gimp). To help identify the markers, you could, e.g. print a number next to them. Once you have the manual marks, you can refine them automatically to subpixel accuracy, e.g. using cv::findCornerSubpix.
You're almost done. Feed the "reference" measured position of the real corners, and the observed ones in all images, to your favorite camera calibration routine, e.g. cv::calibrateCamera. You use the nominal focal length of the camera (converted to pixels) for an initial estimate, along with null distortion. If all goes well, you will obtain the camera intrinsic parameters, which you will keep, and the camera poses at all images, which you'll throw away.
Now you can mount the camera in your final setup, as needed by your application, and take one further image of the grid. Mark and refine the corner positions as before. Undistort their image positions using the distortion parameters returned by the calibration. Finally compute the homography between the reference positions of the real markers (in meters) and their undistorted positions, and you're done.
HTH
To calibrate the camera you do need multiple images of the checkerboard (or one of the other patterns found here). What you can do, is calibrate the camera outside of the water or do a calibration sequence once.
Once you have that information (focal length, center of lens, distortion, etc). You can use the solvePNP function to estimate the orientation of a single board. This estimation provides you with a distance from the camera to the board.
A completely different alternative could be to find what kind of lens the camera uses and manually fill in the data. I've not tried this, so I'm uncertain how well this would work.

iOS Camera Color Recognition in Real Time: Tracking a Ball

I have been looking for a bit and know that people are able to track faces with core image and openGL. However I am not that sure where to start the process of tracking a colored ball with the iOS camera.
Once I have a lead to being able to track the ball. I hope to create something to detect. when the ball changes directions.
Sorry I don't have source code, but I am unsure where to even start.
The key point is image preprocessing and filtering. You can use the Camera API-s to get the video stream from the camera. Take a snapshot picture from it, then you should use a Gaussian-blur on it (spatial enhance), then a Luminance Average Threshold Filter (to make black and white image). After that a morphological preprocessing should be wise (opening, closing operators), to hide the small noises. Then an Edge detection algorithm (with for example a Prewitt-operator). After these processes only the edges remain, your ball should be a circle (when the recording environment was ideal) After that you can use a Hough-transform to find the center of the ball. You should record the ball position and in the next frame, the small part of the picture can be processed (around the ball only).
Other keyword could be: blob detection
A fast library for image processing (on GPU with openGL) is Brad Larsons: GPUImage library https://github.com/BradLarson/GPUImage
It implements all the needed filter (except Hough-transformation)
The tracking process can be defined as following:
Having the initial coordinate and dimensions of an object with a given visual characteristics (image features)
In the next video frame, find the same visual characteristics near the coordinate of the last frame.
Near means considering basic transformations related to the last frame:
translation in each direction;
scale;
rotation;
The variation of these tranformations are strictly related with the frame rate. Higher the frame rate, nearest the position will be in the next frame.
Marvin Framework provides plug-ins and examples to perform this task. It's not compatible with iOs yet. However, it is open source and I think you can port the source code easily.
This video demonstrates some tracking features, starting at 1:10.

What is the definition of a "disparity map"?

I've been asked to implement an edge-based disparity map, but I fundamentally don't understand what a disparity map is. What is the definition of a "disparity map"?
Disparity map refers to the apparent pixel difference or motion between a pair of stereo images. To experience this, try closing one of your eyes and then rapidly close it while opening the other. Objects that are close to you will appear to jump a significant distance while objects further away will move very little. That motion is the disparity.
In a pair of images derived from stereo cameras, you can measure the apparent motion in pixels for every point and make an intensity image out of the measurements.
See this for an example. You can see the objects in the foreground are brighter, denoting greater motion and lesser distance.

Resources