iOS Camera Color Recognition in Real Time: Tracking a Ball - ios

I have been looking for a bit and know that people are able to track faces with core image and openGL. However I am not that sure where to start the process of tracking a colored ball with the iOS camera.
Once I have a lead to being able to track the ball. I hope to create something to detect. when the ball changes directions.
Sorry I don't have source code, but I am unsure where to even start.

The key point is image preprocessing and filtering. You can use the Camera API-s to get the video stream from the camera. Take a snapshot picture from it, then you should use a Gaussian-blur on it (spatial enhance), then a Luminance Average Threshold Filter (to make black and white image). After that a morphological preprocessing should be wise (opening, closing operators), to hide the small noises. Then an Edge detection algorithm (with for example a Prewitt-operator). After these processes only the edges remain, your ball should be a circle (when the recording environment was ideal) After that you can use a Hough-transform to find the center of the ball. You should record the ball position and in the next frame, the small part of the picture can be processed (around the ball only).
Other keyword could be: blob detection
A fast library for image processing (on GPU with openGL) is Brad Larsons: GPUImage library https://github.com/BradLarson/GPUImage
It implements all the needed filter (except Hough-transformation)

The tracking process can be defined as following:
Having the initial coordinate and dimensions of an object with a given visual characteristics (image features)
In the next video frame, find the same visual characteristics near the coordinate of the last frame.
Near means considering basic transformations related to the last frame:
translation in each direction;
scale;
rotation;
The variation of these tranformations are strictly related with the frame rate. Higher the frame rate, nearest the position will be in the next frame.
Marvin Framework provides plug-ins and examples to perform this task. It's not compatible with iOs yet. However, it is open source and I think you can port the source code easily.
This video demonstrates some tracking features, starting at 1:10.

Related

Detecting balls on a pool table

I'm currently working on a project where I need to be able to very reliable get the positions of the balls on a pool table.
I'm using a Kinect v2 above the table as the source.
Initial image looks like this (after converting it to 8-bit from 16-bit by throwing away pixels which is not around table level):
Then a I subtract a reference image with the empty table from the current image.
After thresholding and equalization it looks like this: image
It's fairly easy to detect the individual balls on a single image, the problem is that I have to do it constantly with 30fps.
Difficulties:
Low resolution image (512*424), a ball is around 4-5 pixel in diameter
Kinect depth image has a lot of noise from this distance (2 meters)
Balls look different on the depth image, for example the black ball is kind of inverted compared to the others
If they touch each other then they can become one blob on the image, if I try to separate them with depth thresholding (only using the top of the balls) then some of the balls can disappear from the image
It's really important that anything other than balls should not be detected e.g.: cue, hands etc...
My process which kind of works but not reliable enough:
16bit to 8bit by thresholding
Subtracting sample image with empty table
Cropping
Thresholding
Equalizing
Eroding
Dilating
Binary threshold
Contour finder
Some further algorithms on the output coordinates
The problem is that a pool cue or hand can be detected as a ball and also if two ball touches then it can cause issues. Also tried with hough circles but with even less success. (Works nicely if the Kinect is closer but then it cant cover the whole table)
Any clues would be much appreciated.
Expanding comments above:
I recommend improving the IRL setup as much as possible.
Most of the time it's easier to ensure a reliable setup than to try to "fix" that user computer vision before even getting to detecting/tracking anything.
My suggestions are:
Move the camera closer to the table. (the image you posted can be 117% bigger and still cover the pockets)
Align the camera to be perfectly perpendicular to the table (and ensure the sensor stand is sturdy and well fixed): it will be easier to process a perfect top down view than a slightly tilted view (which is what the depth gradient shows). (sure the data can be rotated, but why waste CPU cycles when you can simply keep the sensor straight)
With a more reliable setup you should be able to threshold based on depth.
You can possible threshold to the centre of balls since the information bellow is occluded anyway. The balls do not deform, so it the radius decreases fast the ball probably went in a pocket.
One you have a clear threshold image, you can findContours() and minEnclosingCircle(). Additionally you should contrain the result based on min and max radius values to avoid other objects that may be in the view (hands, pool cues, etc.). Also have a look at moments() and be sure to read Adrian's excellent Ball Tracking with OpenCV article
It's using Python, but you should be able to find OpenCV equivalent call for the language you use.
In terms tracking
If you use OpenCV 2.4 you should look into OpenCV 2.4's tracking algorithms (such as Lucas-Kanade).
If you already use OpenCV 3.0, it has it's own list of contributed tracking algorithms (such as TLD).
I recommend starting with Moments first: use the simplest and least computationally expensive setup initially and see how robuts the results are before going into the more complex algorithms (which will take to understand and get the parameters right to get expected results out of)

How to get the coordinates of the moving object with GPUImage motionDetection- swift

How would I go about getting the screen coordinates of something that enters frame with motionDetection filter? I'm fairly new to programming, and would prefer a swift answer if possible.
Example - I have the iphone pointing at a wall - monitoring it with the motionDetector. If I bounce a tennis ball against the wall - I want the app to place an image of a tennis ball on the iphone display at the same spot it hit the wall.
To do this, I would need the coordinates of where the motion occurred.
I thought maybe the "centroid" argument did this.... but I'm not sure.
I should point out that the motion detector is pretty crude. It works by taking a low-pass filtered version of the video stream (a composite image generated by a weighted average of incoming video frames) and then subtracting that from the current video frame. Pixels that differ above a certain threshold are marked. The number of these pixels, along with the centroid of the marked pixels are provided as a result.
The centroid is a normalized (0.0-1.0) coordinate representing the centroid of all of these differing pixels. A normalized strength gives you the percentage of the pixels that were marked as differing.
Movement in a scene will generally cause a bunch of pixels to differ, and for a single moving object the centroid will generally be the center of that object. However, it's not a reliable measure, as lighting changes, shadows, other moving objects, etc. can also cause pixels to differ.
For true object tracking, you'll want to use feature detection and tracking algorithms. Unfortunately, the framework as published does not have a fully implemented version of any of these at present.

Stiching Aerial images with OpenCV with a warper that projects images to the ground

Have anyone done something like that?
My problems with the OpenCV sticher is that it warps the images for panoramas, meaning the images get stretched a lot as one moves away from the first image.
From what I can tell OpenCV also builds ontop of the assumption of the camera is in the same position. I am seeking a little guidence on this, if its just the warper I need to change or I also need to relax this asusmption about the camera position being fixed before that.
I noticed that opencv uses a bundle adjuster also, is it using the same assumption that the camera is fixed?
Aerial image mosaicing
The image warping routines that are used in remote sensing and digital geography (for example to produce geotiff files or more generally orthoimages) rely on both:
estimating the relative image motion (often improved with some aircraft motion sensors such as inertial measurement units),
the availability of a Digital Elevation Model of the observed scene.
This allows to estimate the exact projection on the ground of each measured pixel.
Furthermore, this is well beyond what OpenCV will provide with its built-in stitcher.
OpenCV's Stitcher
OpenCV's Stitcher class is indeed dedicated to the assembly of images taken from the same point.
This would not be so bad, except that the functions try to estimate just a rotation (to be more robust) instead of plain homographies (this is where the fixed camera assumption will bite you).
It adds however more functionality that are useful in the context of panoramao creation, especially the image seam cut detection part and the image blending in overlapping areas.
What you can do
With aerial sensors, it is usually sound to assume (except when creating orthoimages) that the camera - scene distance is big enough so that you can approach the inter-frame transform by homographies (expecially if your application does not require very accurate panoramas).
You can try to customize OpenCV's stitcher to replace the transform estimate and the warper to work with homographies instead of rotations.
I can't guess if it will be difficult or not, because for the most part it will consist in using the intermediate transform results and bypassing the final rotation estimation part. You may have to modify the bundle adjuster too however.

OpenCV intrusion detection

For a project of mine, I'm required to process images differences with OpenCV. The goal is to detect an intrusion in a zone.
To be a little more clear, here are the inputs and outputs:
Inputs:
An image of reference
A second image from approximately the same point of view (can be an error margin)
Outputs:
Detection of new objects in the scene.
Bonus:
Recognition of those objects.
For me, the most difficult part of it is to take off small differences (luminosity, camera position margin error, movement of trees...)
I already read a lot about OpenCV image processing (subtraction, erosion, threshold, SIFT, SURF...) and have some good results.
What I would like is a list of steps you think is the best to have a good detection (humans, cars...), and the algorithms to do each step.
Many thanks for your help.
Track-by-Detect, human tracker:
You apply the Hog detector to detect humans.
You draw a respective rectangle as foreground area on the foreground mask.
You pass this mask to "The OpenCV Video Surveillance / Blob Tracker Facility"
You can, now, group the passing humans based on their blob.{x,y} values into public/restricted areas.
I had to deal with this problem the last year.
I suggest an adaptive background-foreground estimation algorithm which produced a foreground mask.
On top of that, you add a blob detector and tracker, and then calculate if an intersection takes place between the blobs and your intrusion area.
Opencv comes has samples of all of these within the legacy code. Ofcourse, if you want you can also use your own or other versions of these.
Links:
http://opencv.willowgarage.com/wiki/VideoSurveillance
http://experienceopencv.blogspot.gr/2011/03/blob-tracking-video-surveillance-demo.html
I would definitely start with a running average background subtraction if the camera is static. Then you can use findContours() to find the intruding object's location and size. If you want to detect humans that are walking around in a scene, I would recommend looking at using the built-in haar classifier:
http://docs.opencv.org/doc/tutorials/objdetect/cascade_classifier/cascade_classifier.html#cascade-classifier
where you would just replace the xml with the upperbody classifier.

Determine the corners of a sheet of paper with iOS 5 AV Foundation and core-image in realtime

I am currently building a camera app prototype which should recognize sheets of paper lying on a table. The clue about this is that it should do the recognition in real time, so I capture the video stream of the camera, which in iOS 5 can easily be done with the AV foundation. I looked at here and here
They are doing some basic object recognition there.
I have found out that using OpenCV library in this realtime environment does not work in a performant way.
So what I need is an algorithm to determine the edges of an image without OpenCV.
Does anyone have some sample code snippets which lay out how to do this or point me in the right direction.
Any help would be appreciated.
You're not going to be able to do this with the current Core Image implementation in iOS, because corner detection requires some operations that Core Image doesn't yet support there. However, I've been developing an open source framework called GPUImage that does have the required capabilities.
For finding the corners of an object, you can use a GPU-accelerated implementation of the Harris corner detection algorithm that I just got working. You might need to tweak the thresholds, sensitivities, and input image size to work for your particular application, but it's able to return corners for pieces of paper that it finds in a scene:
It also finds other corners in that scene, so you may need to use a binary threshold operation or some later processing to identify which corners belong to a rectangular piece of paper and which to other objects.
I describe the process by which this works over at Signal Processing, if you're interested, but to use this in your application you just need to grab the latest version of GPUImage from GitHub and make the GPUImageHarrisCornerDetectionFilter the target for a GPUImageVideoCamera instance. You then just have to add a callback to handle the corner array that's returned to you from this filter.
On an iPhone 4, the corner detection process itself runs at ~15-20 FPS on 640x480 video, but my current CPU-bound corner tabulation routine slows it down to ~10 FPS. I'm working on replacing that with a GPU-based routine which should be much faster. An iPhone 4S currently handles everything at 20-25 FPS, but again I should be able to significantly improve the speed there. Hopefully, that would qualify as being close enough to realtime for your application.
I use Brad's library GPUImage to do that, result is perfectible but enough good.
Among detected Harris corners, my idea is to select:
The most in the upper left for the top-left corner of the sheet
The most in the upper right for the top-right corner of the sheet
etc.
#Mirco - Have you found a better solution ?
#Brad - In your screenshot, what parameters for Harris filter do you use to have just 5 corners detected ? I have a lot of than that ...

Resources