Fast image segmentation algorithms for automatic object extraction - image-processing

I need to segment a set of unknown objects (books, cans, toys, boxes, etc.) standing on top of a surface (table top, floor…). I want to extract a mask (either binary or probabilistic) for each object on the scene.
I do know what the appearance of the surface is (a color model). The size, geometry, amount, appearance of the objects is arbitrary, and they could be texture-less as well). Multiple views might be available as well. No user interaction is available.
I have been struggling on picking the best kind of algorithm for this scenario (graph based, cluster based, super-pixels, etc.). This comes, naturally from a lack of experience with different methods. I'd like to know how they compare one to another.
I have some constraints:
Can’t use libraries (it’s a legal constraint, except for OpenCV). So any algorithm must be implemented by me. So I’d like to choose an algorithm that is simple enough to be implemented in a non-too-long period of time.
Performance is VERY important. There will be many other processes running at the same time, so I can’t afford to have a slow method.
It’s much preferred to have a fast and simple method with less resolution than something complex and slow that provides better results.
Any suggestion on some approach suitable for this scenario would be appreciated.

For speed, I'd quickly segment the image into surface and non-surface (stuff). So this at least gets you from 24 bits (color?) to 8 bits or even one bit, if you are brave.
From there you need to aggregate and filter the regions into blobs of appropriate size. For that, I'd try a morphological (or linear) filter that is keyed to a disk that would just fit inside the smallest object of interest. This would be an opening and a closing. Perhaps starting with smaller radii for better results.
From there, you should have an image of blobs that can be found and discriminated. Each blob or region should designate the objects of interest.
Note that if you can get to a 1-bit image, things can go VERY fast. However, efficient tools that can make use of this data form (8 pixels per character) are often not forthcoming.


How to count tablets successfully?

My last question on image recognition seemed to be too broad, so I would like to ask a more concrete question.
First the background. I have already developed a (round) pill counter. It uses something similar to this tutorial. After I made it I also found something similar with this other tutorial.
However my method fails for something like this image
Although the segmentation process is a bit complicated (because of the semi-transparency of the tablets) I have managed to get it
My problem is here. How can I count the elongated tablets, separating each one from the image, similar to the final results in the linked tutorials?
So far I have applied distance transform and then my own version of watershed and I got
As you can see it fails in the adjacent tablets (distance transform usually does).
Take into account that the solution does have to work for this image and also for other arrangements of the tablets, the most difficult being for example
I am open to use OpenCV or if necessary implement on my own algorithms. So far I have tried both (used OpenCV functions and also programmed my own libraries) I am also open to use C++, or python or other. (I programmed them in C++ and I have done it on C# too).
I am also working on this pill counting problem (I'm much earlier in this process than you are), and to solve the piece you are working on - of touching pills, my general idea how to solve this is to capture contours of the pills once you have a good mask of the pills, and then calculate the area of a single pill.
For this approach I'm assuming that I have enough pills in the image such that the amount of them that are untouching is greater than those which are touching, and no pills overlap one another. For my application, placing this restriction I think is reasonable (humans can do a quick look at the pills they've dumped out, and at least roughly make them not touching without too much work. It's also possible that I could design a tray with some sort of dimples in it such that it would coerce the pills to not be touching)
I do this by sorting the contour areas (which, with the right thresholding should lead to only pills and pill-groups being in the identified contours), and taking the median value.
Then, with a good value for the area of a pill, you can look for contours with areas that are a multiple of that median area (+/- some % error value).
I also use that median value to filter out contours that are clearly not big enough to be pills, and ones that are far too large to be a pill (the latter though could be more troublesome, since it could still be a grouping of touching pills).
Given that the pills are all identical and don’t overlap, simply divide the total pill area by the area of a single pill.
The area is estimated simply counting the number of “pill” pixels.
You do need to calibrate the method by giving it the area of a single pill. This can be trivially obtained by giving the correct solution to one of the images (manual counting), then all the other images can be counted automatically.

How to defend thresholding technique

On a job for a customer, I am locating items within a grayscale scene with nonuniform background illumination. Once the items are located, I need to do another search within each one for details. The items are easy enough to locate by masking with the output of a variance filter; and within the items, if the threshold is correct, the details are easy to locate as well. But the mean and contrast of these items varies substantially.
I played around with threshold calculation for a while, and none of the techniques I implemented is perfect; but the one that turns out simplest, as accurate as any other, and quite low cost, is to take the mean pixel value and add one standard deviation.
My question is: is there some analytical way to defend this calculation other than "it works well"? I mean, I did sort of fall on this technique accidentally (only later did I find this answer), and using it seems arbitrary.

Image Segmentation for Color Analysis in OpenCV

I am working on a project that requires me to:
Look at images that contain relatively well-defined objects, e.g.
and pick out the color of n-most (it's generic, could be 1,2,3, etc...) prominent objects in some space (whether it be RGB, HSV, whatever) and return it.
I am looking into ways to segment images like this into the independent objects. Once that's done, I'm under the impression that it won't be particularly difficult to find the contours of the segments and analyze them for average or centroid color, etc...
I looked briefly into the Watershed algorithm, which seems like it could work, but I was unsure of how to generate the marker image for an indeterminate number of blobs.
What's the best way to segment such an image, and if it's using Watershed, what's the best way to generate the corresponding marker image of integers?
Check out this possible approach:
Efficient Graph-Based Image Segmentation
Pedro F. Felzenszwalb and Daniel P. Huttenlocher
Here's what it looks like on your image:
I'm not an expert but I really don't see how the Watershed algorithm can be very useful to your segmentation problem.
From my limited experience/exposure to this kind of problems, I would think that the way to go would be to try a sliding-windows approach to segmentation. Basically this entails walking the image using a window of a set size, and attempting to determine if the window encompasses background vs. an object. You will want to try different window sizes and steps.
Doing this should allow you to detect the object in the image, presuming that the images contain relatively well defined objects. You might also attempt to perform segmentation after converting the image to black and white with a certain threshold the gives good separation of background vs. objects.
Once you've identified the object(s) via the sliding window you can attempt to determine the most prominent color using one of the methods you mentioned.
Based on your comment, here's another potential approach that might work for you:
If you believe the objects will have mostly uniform color you might attempt to process the image to:
remove noise;
map original image to reduced color space (i.e. 256 or event 16 colors)
detect connected components based on pixel color and determine which ones are large enough
You might also benefit from re-sampling the image to lower resolution (i.e. if the image is 1024 x 768 you might reduce it to 256 x 192) to help speed up the algorithm.
The only thing left to do would be to determine which component is the background. This is where it might make sense to also attempt to do the background removal by converting to black/white with a certain threshold.

Object recognition and measuring size

I'd like to create a system for use in a factory to measure the size of the objects coming off the assembly line. The objects are slabs of stone, approximately rectangular, and I'd like the width and height. Each stone is photographed in the same position with a flash, so the conditions are pretty controlled. The tricky part is the stones sometimes have patterns on their surface (often marble with ripples and streaks) and they are sometimes almost black, blending in with the shadows.
I tried simply subtracting each image from a reference image of the background, but there are enough small changes in the lighting and the positions of rollers and small bits of machinery that the output is really noisy.
The approach I plan to try next is to use Canny's edge detection algorithm and then use some kind of numerical optimization (Nelder-Mead maybe) to match a 4-sided polygon to the edges. Before I home-brew something, though, is there an existing approach that works well in this kind of situation?
If it helps, it would be possible to 'seed' the algorithm with a patch of the image known to be within the slab (they're always lined up in the corner) to help identify its surface pattern and colors. I could also produce a training set of annotated images if necessary.
Some sample images of the background and some stone slabs:
Have you tried an existing image segmentation algorithm?
I would start with the maxflow algorithm for image segmentation by Vladimir Kolmogorov here:
In the papers they fix areas of an image to belong to a particular segment, which would help for you problem, but it may not be obvious how to do this in the software.
Deep learning algorithms for parsing scenes by Richard Socher might also help:
And Eric Sudderth has at least one interesting method for visual scene understanding here:
I also haven't actually used any of this software, which is mostly, if not all, for research and not particularly user friendly.

How can I use computer vision to find a shape in an image?

I have a simple photograph that may or may not include a logo image. I'm trying to identify whether a picture includes the logo shape or not. The logo (rectangular shape with a few extra features) could be of various sizes and could have multiple occurrences. I'd like to use Computer Vision techniques to identify the location of these logo occurrences. Can someone point me in the right direction (algorithm, technique?) that can be used to achieve this goal?
I'm quite a novice to Computer Vision so any direction would be very appreciative.
Practical issues
Since you need a scale-invariant method (that's the proper jargon for "could be of various sizes") SIFT (as mentioned in Logo recognition in images, thanks overrider!) is a good first choice, it's very popular these days and is worth a try. You can find here some code to download. If you cannot use Matlab, you should probably go with OpenCV. Even if you end up discarding SIFT for some reason, trying to make it work will teach you a few important things about object recognition.
General description and lingo
This section is mostly here to introduce you to a few important buzzwords, by describing a broad class of object detection methods, so that you can go and look these things up. Important: there are many other methods that do not fall in this class. We'll call this class "feature-based detection".
So first you go and find features in your image. These are characteristic points of the image (corners and line crossings are good examples) that have a lot of invariances: whatever reasonable processing you do to to your image (scaling, rotation, brightness change, adding a bit of noise, etc) it will not change the fact that there is a corner in a certain point. "Pixel value" or "vertical lines" are bad features. Sometimes a feature will include some numbers (e.g. the prominence of a corner) in addition to a position.
Then you do some clean-up, like remove features that are not strong enough.
Then you go to your database. That's something you've built in advance, usually by taking several nice and clean images of whatever you are trying to find, running you feature detection on them, cleaning things up, and arrange them in some data structure for your next stage —
Look-up. You have to take a bunch of features form your image and try to match them against your database: do they correspond to an object you are looking for? This is pretty non-trivial, since on the face of it you have to consider all subsets of the bunch of features you've found, which is exponential. So there are all kinds of smart hashing techniques to do it, like Hough transform and Geometric hashing.
Now you should do some verification. You have found some places in the image which are suspect: it's probable that they contain your object. Usually, you know what is the presumed size, orientation, and position of your object, and you can use something simple (like a convolution) to check if it's really there.
You end up with a bunch of probabilities, basically: for a few locations, how probable it is that your object is there. Here you do some outlier detection. If you expect only 1-2 occurrences of your object, you'll look for the largest probabilities that stand out, and take only these points. If you expect many occurrences (like face detection on a photo of a bunch of people), you'll look for very low probabilities and discard them.
That's it, you are done!
