Faster Alternatives to cvFindContour() - opencv

Contour detection takes up the majority of my time in my computer vision, and it needs to be faster. I've so optimized everything else via NEON instructions and vectorizing that really, the contour detection dominates the profile. Unfortunately, it isn't obvious to me how to optimize this.
I'm doing the classic rectangle detection process to find fiducial markers, i.e. cvFindContours(), followed by approximating squares from the contours. In cases with many, many markers visible (or catastrophically, when a dense grid of rectangles that aren't markers is visible), the call to cvFindContours() alone can take >30ms on an iPhone.
I've already replaced the incredibly expensive C++ cv::FindContours() with cvFindContours(). Particularly if passed a vector >, the C++ version spent longer allocating and populating vectors than than its internal cvFindContours() took!
Now, I'm bound completely by the time in cvFindContours, or more specifically in cvFindNextContour(). The code inside cvFindNextContour is branch-heavy, and not obviously easy to vectorize. It also implements a complex algorithm that I don't trust myself not to get wrong in any attempt to optimize.
I have already looked at cvBlobLib (for disambiguition, I mean this one: http://code.google.com/p/cvblob/) to see if it provided alternative algorithms that could do the same thing faster. A base download of the source is incredibly slow because it records the contours into a std::list(), and spends almost all of its time in memory allocation. Replacing that list with a std::vector pre-sized to 256 elements to eliminate the initial copies on push_back() still leaves you with a function that takes 3x longer than cvFindContours(), 66% of that directly in cvb::cvLabel(). So it doesn't seem viable to go this way.
Does anyone have any ideas for how I can optimize detection of many rectangles. My vague handwaving includes:
Are there any fast implementations equivalent to cvFindContour(), ideally as source code as I'm multiplatform, out there?
The majority of contours are not required, only the "successful" rectangles are useful. In particular, their internal contours are then not useful. Theoretically, could I not call cvFindContours at all, and instead call cvStartFindContours/cvFindNextContour, testing each contour as found, and not recursing if I found a rectangle I'm looking for, as subrectangles are then guaranteed to be useless?
Is there a completely different rectangle detection algorithm I can use from the classic FindContours()/ApproxPoly() approach?
IS there a way to "prime" cvFindContours with useful regions of interest? E.g. a FAST corner detection almost always returns my fiducial marker corners even with a very agressive threshold. Is there a way to use that point set to limit detection? (Unfortunately, I'm not sure how much this helps, again in the case of many markers, or dense gridlines unrelated to the markers, which happens often in my app.)
In the same vein as above, since Blob detection can (if I understand correctly) be implemented as recursive flood-filling, are there any fast vectorized implementations of this that can then be used to somehow pull out interesting Blob rectangles, and seed contour detection from there?
Any ideas would be welcome!

Since your goal is rectangle detection and not contour detection, I would suggest making use of integral images for computation. An explanation of integral images can be found here. After computing the integral image of your desired image, calculating the pixel sum of a rectangle can be done with three operations.
Assuming you want to draw rectangles around every non-black object you can use a method as follows. Recursively keep dividing the image and its subimages to 4 and discard rectangles with a pixel sum below your desired threshold. You will be left with many small rectangles approximating your objects. Mergeing the neighbouring rectangles will yield a fast approximation of your detected objects.

Related

Detecting balls on a pool table

I'm currently working on a project where I need to be able to very reliable get the positions of the balls on a pool table.
I'm using a Kinect v2 above the table as the source.
Initial image looks like this (after converting it to 8-bit from 16-bit by throwing away pixels which is not around table level):
Then a I subtract a reference image with the empty table from the current image.
After thresholding and equalization it looks like this: image
It's fairly easy to detect the individual balls on a single image, the problem is that I have to do it constantly with 30fps.
Difficulties:
Low resolution image (512*424), a ball is around 4-5 pixel in diameter
Kinect depth image has a lot of noise from this distance (2 meters)
Balls look different on the depth image, for example the black ball is kind of inverted compared to the others
If they touch each other then they can become one blob on the image, if I try to separate them with depth thresholding (only using the top of the balls) then some of the balls can disappear from the image
It's really important that anything other than balls should not be detected e.g.: cue, hands etc...
My process which kind of works but not reliable enough:
16bit to 8bit by thresholding
Subtracting sample image with empty table
Cropping
Thresholding
Equalizing
Eroding
Dilating
Binary threshold
Contour finder
Some further algorithms on the output coordinates
The problem is that a pool cue or hand can be detected as a ball and also if two ball touches then it can cause issues. Also tried with hough circles but with even less success. (Works nicely if the Kinect is closer but then it cant cover the whole table)
Any clues would be much appreciated.
Expanding comments above:
I recommend improving the IRL setup as much as possible.
Most of the time it's easier to ensure a reliable setup than to try to "fix" that user computer vision before even getting to detecting/tracking anything.
My suggestions are:
Move the camera closer to the table. (the image you posted can be 117% bigger and still cover the pockets)
Align the camera to be perfectly perpendicular to the table (and ensure the sensor stand is sturdy and well fixed): it will be easier to process a perfect top down view than a slightly tilted view (which is what the depth gradient shows). (sure the data can be rotated, but why waste CPU cycles when you can simply keep the sensor straight)
With a more reliable setup you should be able to threshold based on depth.
You can possible threshold to the centre of balls since the information bellow is occluded anyway. The balls do not deform, so it the radius decreases fast the ball probably went in a pocket.
One you have a clear threshold image, you can findContours() and minEnclosingCircle(). Additionally you should contrain the result based on min and max radius values to avoid other objects that may be in the view (hands, pool cues, etc.). Also have a look at moments() and be sure to read Adrian's excellent Ball Tracking with OpenCV article
It's using Python, but you should be able to find OpenCV equivalent call for the language you use.
In terms tracking
If you use OpenCV 2.4 you should look into OpenCV 2.4's tracking algorithms (such as Lucas-Kanade).
If you already use OpenCV 3.0, it has it's own list of contributed tracking algorithms (such as TLD).
I recommend starting with Moments first: use the simplest and least computationally expensive setup initially and see how robuts the results are before going into the more complex algorithms (which will take to understand and get the parameters right to get expected results out of)

Water Edge Detection

Is there a robust way to detect the water line, like the edge of a river in this image, in OpenCV?
(source: pequannockriver.org)
This task is challenging because a combination of techniques must be used. Furthermore, for each technique, the numerical parameters may only work correctly for a very narrow range. This means either a human expert must tune them by trial-and-error for each image, or that the technique must be executed many times with many different parameters, in order for the correct result to be selected.
The following outline is highly-specific to this sample image. It might not work with any other images.
One bit of advice: As usual, any multi-step image analysis should always begin with the most reliable step, and then proceed down to the less reliable steps. Whenever possible, the less reliable step should make use of the result of more-reliable steps to augment its own accuracy.
Detection of sky
Convert image to HSV colorspace, and find the cyan located at the upper-half of the image.
Keep this HSV image, becuase it could be handy for the next few steps as well.
Detection of shrubs
Run Canny edge detection on the grayscale version of image, with suitably chosen sigma and thresholds. This will pick up the branches on the shrubs, which would look like a bunch of noise. Meanwhile, the water surface would be relatively smooth.
Grayscale is used in this technique in order to reduce the influence of reflections on the water surface (the green and yellow reflections from the shrubs). There might be other colorspaces (or preprocessing techniques) more capable of removing that reflection.
Detection of water ripples from a lower elevation angle viewpoint
Firstly, mark off any image parts that are already classified as shrubs or sky. Since shrub detection would be more reliable than water detection, shrub detection's result should be used to inform the less-reliable water detection.
Observation
Because of the low elevation angle viewpoint, the water ripples appear horizontally elongated. In fact, every image feature appears stretched horizontally. This is called Anisotropy. We could make use of this tendency to detect them.
Note: I am not experienced in anisotropy detection. Perhaps you can get better ideas from other people.
Idea 1:
Use maximally-stable extremal regions (MSER) as a blob detector.
The Wikipedia introduction appears intimidating, but it is really related to connected-component algorithms. A naive implementation can be done similar to Dijkstra's algorithm.
Idea 2:
Notice that the image features are horizontally stretched, a simpler approach is to just sum up the absolute values of horizontal gradients and compare that to the sum of absolute values of vertical gradients.

how to find shapes that are slightly elongated oval / rectangle with curved corners / sometimes sector of a circle?

how to recognise a zebra crossing from top view using opencv?
in my previous question the problem is to find the curved zebra crossing using opencv.
now I thought that the following way would be much easier way to detect it,
(i) canny it
(ii) find the contours in it
(iii) find the black stripes in it, in my case it is slightly oval in shape
now my question is how to find that slightly oval shape??
look here for images of the crossing: www.shaastra.org/2013/media/events/70/Tab/422/Modern_Warfare_ps_v1.pdf
Generally speaking, I believe there are two approaches you can consider.
One approach is the more brute force image analysis approach, as you described. Here you are applying heuristics based on your knowledge of the problem in order to identify the pixels involved in the parts of the path. Note that 'brute force' here is not a bad thing, just an adjective.
An alternative approach is to apply pattern recognition techniques to find the regions of the image which have high probability of being part of the path. Here you would be transforming your image into (hopefully) meaningful features: lines, points, gradient (eg: Histogram of Oriented Gradients (HOG)), relative intensity (eg: Haar-like features) etc. and using machine learning techniques to figure out how these features describe the, say, the road vs the tunnel (in your example).
As you are asking about the former, I'm going to focus on that here. If you'd like to know more about the latter have a look around the Internet, StackOverflow, or post specific questions you have.
So, for the 'brute force image analysis' approach, your first step would probably be to preprocess the image as you need;
Consider color normalization if you are going to analyze color later, this will help make your algorithm robust to lighting differences in your garage vs the event studio. It'll also improve robustness to camera collaboration differences, though the organization hosting the competition provide specs for the camera they will use (don't ignore this bit of info).
Consider blurring the image to reduce noise if you're less interested in pixel by pixel values (eg edges) and more interested in larger structures (eg gradients).
Consider sharpening the image for the opposite reasons of blurring.
Do a bit of research on image preprocessing. It's definitely an explored topic but hardly 'solved' (whatever that would mean). There are lots of things to try at this stage and, of course, crap in => crap out.
After that you'll want to generate some 'features'..
As you mentioned, edges seem like an appropriate feature space for this problem. Don't forget that there are many other great edge detection algorithms out there other than Canny (see Prewitt, Sobel, etc.) After applying the edge detection algorithm, though, you still just have pixel data. To get to features you'll want probably want to extract lines from the edges. This is where the Hough transform space will come in handy.
(Also, as an idea, you can think about colorspace in concert with the edge detectors. By that, I mean, edge detectors usually work in the B&W colorspace, but rather than converting your image to grayscale you could convert it to an appropriate colorspace and just use a single channel. For example, if the game board is red and the lines on the crosswalk are blue, convert the image to HSV and grab the hue channel as input for the edge detector. You'll likely get better contrast between the regions than just grayscale. For bright vs. dull use the value channel, for yellow vs. blue use the Opponent colorspace, etc.)
You can also look at points. Algorithms such as the Harris corner detector or the Laplacian of Gaussian (LOG) will extract 'key points' (with a different definition for each algorithm but generally reproducible).
There are many other feature spaces to explore, don't stop here.
Now, this is where the brute force part comes in..
The first thing that comes to mind is parallel lines. Even in a curve, the edges of the lines are 'roughly' parallel. You could easily develop an algorithm to find the track in your game by finding lines which are each roughly parallel to their neighbors. Note that line detectors like the Hough transform are usually applied such that they find 'peaks', or overrepresented angles in the dataset. Thus, if you generate a Hough transform for the whole image, you'll be hard pressed to find any of the lines you want. Instead, you'll probably want to use a sliding window to examine each area individually.
Specifically speaking to the curved areas, you can use the Hough transform to detect circles and elipses quite easily. You could apply a heuristic like: two concentric semi-circles with a given difference in radius (~250 in your problem) would indicate a road.
If you're using points/corners you can try to come up with an algorithm to connect the corners of one line to the next. You can put a limit on the distance and degree in rotation from one corner to the next that will permit rounded turns but prohibit impossible paths. This could elucidate the edges of the road while being robust to turns.
You can probably start to see now why these hard coded algorithms start off simple but become tedious to tweak and often don't have great results. Furthermore, they tend to rigid and inapplicable to other, even similar, problems.
All that said.. you're talking about solving a problem that doesn't have an out of the box solution. Thinking about the solution is half the fun, and half the challenge. Everything I've described here is basic image analysis, computer vision, and problem solving. Start reading some papers on these topics and the ideas will come quickly. Good luck in the competition.

Object Detection using OpenCV in C++

I'm currently working on a vision system for a UAV I am building. The goal of the system is to find target objects, which are rather well defined (see below), in a video stream that will be a 2-D flyover view of the ground. So far I have tried training and using a Haar-like feature based cascade, a la Viola Jones, to do the detection. I am training it with 5000+ images of the targets at different angles (perspective shifts) and ranges (sizes in the frame), but only 1900 "background" images. This does not yield good results at all, as I cannot find a suitable number of stages to the cascade that balances few false positives with few false negatives.
I am looking for advice from anyone who has experience in this area, as to whether I should: 1) ditch the cascade, in favor of something more suitable to objects defined by their outline and color (which I've read the VJ cascade is not).
2) improve my training set for the cascade, either by adding positives, background frames, organizing/shooting them better, etc.
3) Some other approach I can't fathom currently.
A description of the targets:
Primary shapes: triangles, squares, circles, ellipses, etc.
Distinct, solid, primary (or close to) colors.
Smallest dimension between two and eight feet (big enough to be seen easily from a couple hundred feet AGL
Large, single alphanumeric in the center of the object, with its own distinct, solid, primary or almost primary color.
My goal is use something very fast, such as the VJ cascade, to find possible objects and their associated bounding box, and then pass these on to finer processing routines to determine the properties (color of the object and AN, value of the AN, actual shape, and GPS location). Any advice you can give me towards completing this goal would be much appreciated. The source code I currently have is a little lengthy for post here, but is freely available should you like to see it for reference. Thanks in advance!
-JB
I would recommend ditching Haar classification, since you know a lot about your objects. In these cases you should start by checking what features you can use:
1) overhead flight means, as you said, you can basically treat these as fixed shapes on a 2D plane. There will be scaling, rotations and some minor affine transformations, which depends a lot on how wide-angled your camera is. If it isn't particularly wide-angled, that part can probably be ignored. Also, you probably know your altitude, by which you can probably also make very good assumption on the target size (scaling).
2) You know the colors, which also makes it quite easy to find objects. If these are very defined as primary color, then you can just filter the image based on color and find those contours. If you want to do something a little more advanced (which to me doesn't seem necessary though...) you can do a backprojection, which in my experience is very effective and fast. Note, if you're creating the objects, it would be better to use Red Green and Blue instead of primary colors (red green and yellow). Then you can simply split the image into it's respective channels and use a very high threshold.
3) You know the geometric shapes. I've never done this myself, but as far as I know, the options are using moments or using Hough transforms (although openCV only has hough algorithms for lines and circles, so you'd have to write your own for other shapes...). You might already have sufficiently good results without this step though...
If you want more specific recommendations, it would be very useful to upload a couple sample images. :)
May be solved but I came across a paper recently with an open-source license for generic object detection using normalised gradient features : http://mmcheng.net/bing/comment-page-9/
The details of the algorithms performance against illumination, rotation and scale may require a little digging. I can't remember on the top of my head where the original paper is.

EMGU OpenCV disparity only on certain pixels

I'm using the EMGU OpenCV wrapper for c#. I've got a disparity map being created nicely. However for my specific application I only need the disparity values of very few pixels, and I need them in real time. The calculation is taking about 100 ms now, I imagine that by getting disparity for hundreds of pixel values rather than thousands things would speed up considerably. I don't know much about what's going on "under the hood" of the stereo solver code, is there a way to speed things up by only calculating the disparity for the pixels that I need?
First of all, you fail to mention what you are really trying to accomplish, and moreover, what algorithm you are using. E.g. StereoGC is a really slow (i.e. not real-time), but usually far more accurate) compared to both StereoSGBM and StereoBM. Those last two can be used real-time, providing a few conditions are met:
The size of the input images is reasonably small;
You are not using an extravagant set of parameters (for instance, a larger value for numberOfDisparities will increase computation time).
Don't expect miracles when it comes to accuracy though.
Apart from that, there is the issue of "just a few pixels". As far as I understand, the algorithms implemented in OpenCV usually rely on information from more than 1 pixel to determine the disparity value. E.g. it needs a neighborhood to detect which pixel from image A map to which pixel in image B. As a result, in general it is not possible to just discard every other pixel of the image (by the way, if you already know the locations in both images, you would not need the stereo methods at all). So unless you can discard a large border of your input images for which you know that you'll never find your pixels of interest there, I'd say the answer to this part of your question would be "no".
If you happen to know that your pixels of interest will always be within a certain rectangle of the input images, you can specify the input image ROIs (regions of interest) to this rectangle. Assuming OpenCV does not contain a bug here this should speedup the computation a little.
With a bit of googling you can to find real-time examples of finding stereo correspondences using EmguCV (or plain OpenCV) using the GPU on Youtube. Maybe this could help you.
Disclaimer: this may have been a more complete answer if your question contained more detail.

Resources