This question can be answered with any type of programming language, cause I would like some help with algorithms, but I prefer Delphi. I have a the task to detect and count multiple shapes (between 1 and N - mostly circular or a Elipse) of random pictures and calculate their middle and return them as coordinates of a picture. The middle of each shape can have a filling (but it doesn't matter). The shapes are at least 1+ pixel away from each other. None of the shapes will like blend in with another or the corner of a picture.
The background of the picture has always the same background color, which actually doesn't matter, cause the borders/frames of the shapes are always a different color compared to the background. This makes it easy to detect the shapes. I was thinking about going pixel by pixel and collect the coordinates and then draw like an invisible rectangle/square around every shape to calculate the middle. Then I also heard about scanline, but I don't think it would be faster in this case. So my question is, how can I calculate:
How many shapes are in the picture.
How can I calculate (more or less) the exact middle of them.
A few pictures to visualize the task:
This is a picture with random shapes (mostly close circles)
As you can see they are apart from each other just fine.
Then I could easily draw/calculate an imaginary rectangle/square around every shape and calculate the middle of it like that:
After I have the rectangles/squares. I can easily calculate the middle.
How do I start?
PS.: I've drawn some circles in mspaint. I have to add that all shapes are CLOSED, which makes it possible to flood fill EVERY shape in the picture with no problems!
Thank you for your help.

Calculate MSER (Maximally stable extremal regions) for the image. I can't explain that algorithm here. You can refer to the Maximally stable extremal regions article for more information about the algorithm.
That will give you centroid too.
This algorithm is implemented as inbuilt functions in OpenCv tool and Matlab 2012b.
Another method which i can think of and possibly simple than previous method is to apply connected components algorithm and count number of objects.More information of this can be found in book by Gonzalez and Woods on Digital Image Processing.


Recognition and counting of books from side using OpenCV

Just wish to receive some ideas on I can solve this problem.
For a clearer picture, here are examples of some of the image that we are looking at:
I have tried looking into thresholding it, like otsu, blobbing it, etc. However, I am still unable to segment out the books and count them properly. Hardcover is easy of course, as the cover clearly separates the books, but when it comes to softcover, I have not been able to successfully count the number of books.
Does anybody have any suggestions on what I can do? Any help will be greatly appreciated. Thanks.
I ran a sobel edge detector and used Hough transform to detect lines on the last image and it seemed to be working okay for me. You can then link the edges on the output of the sobel edge detector and then count the number of horizontal lines. Or, you can do the same on the output of the lines detected using Hough.
You can further narrow down the area of interest by converting the image into a binary image. The outputs of all of these operators can be seen in following figure ( I couldn't upload an image so had to host it here)
Refer to for some more useful examples on how to do edge, line and corner detection.
Hope this helps.
I think that #audiohead's recommendation is good but you should be careful when applying the Hough transform for images that will have the library's stamp as it might confuse it with another book (You can see that the letters form some break-lines that will be detected by sobel).
Consider to apply first an edge preserving smoothing algorithm such as a Bilateral Filter. When tuned correctly (setting of the Kernels) it can avoid these such of problems.
A Different Solution That Might Work (But can be slow)
Here is a different approach that is based on pixel marking strategy.
a) Based on some very dark threshold, mark all black pixels as visited.
b) While there are unvisited pixels: Pick the next unvisited pixel and apply a region-growing algorithm while marking its pixels with a unique number. At this stage you will need to analyse the geometric shape that this region is forming. A good criteria to detecting a book is that the region is creating some form of a rectangle where width >> height. This will detect a book and mark all its pixels to the unique number.
Once there are no more unvisited pixels, the number of unique numbers is the number of books you will have + For each pixel on your image you will now to which book does it belongs.
Do you have to keep the books this way? If you can change the books to face back side to the camera then I think you can get more information about the different colors used by different books.The lines by Hough transform or edge detection will be more prominent this way.
There exist more sophisticated methods which are much better in contour detection and segmentation, you can have a look at them here, however it is quite slow,
Once you get the ultrametric contour map, you can perform some computation on them to count the number of books
I would try a completely different approach; with paperbacks, the covers are medium-dark lines whilst the rest of the (assuming white pages) are fairly white and "bloomed", so I'd try to thicken up the dark edges to make them easy to detect, then that would give the edges akin to working with hardbacks which you say you've done.
I'd try something like an erosion to thicken up the edges. This would be a nice, fast operation.

How to get position (x,y) and number of particular objects or shape in a handdrawing image?

first, I've learning just couple of week about image processing, NN, dll, by myself, so I'm really new n really far to pro. n sorry for my bad english.
there's image or photo of my drawing, I want to get the coordinates of object/shape (black dot) n the number around it, the number indicating the sequence number of dot.
How to get it? How to detect the dots? Shape recognition for the dots? Number handwriting recognition for the numbers? Then segmentation to get the position? Or use template matching? But every dot has a bit different shape because of hand drawing. Use neural network? in NN, the neuron is usually contain every pixel to recognize an character, right? can I use an picture of character or drawing dot contained by each neuron to recognize my whole picture?
I'm very new, so I'm really need your advice, correct me if I wrong! Please tell me what I must learn, what I must do, what I must use.
Thank you very much. :'D
This is a difficult problem which can't be solved by a quick solution.
Here is how I would approach it:
Get a better picture. Your image is very noisy and is taken in low light with high ISO. Use a better camera and better lighting conditions so you can get the background to be as white as possible and the dots as black as possible. Try to maximize the contrast.
Threshold the image so that all the background is white and the dots and numbers are black. Maybe you could apply some erosion and/or dilation to help connect the dark edges together.
Detect the rectangle somehow and set your work area to be inside the rectangle (crop the rest of the image so that you are left with the area inside the rectangle). You could do this by detecting the contours in the image and then the contour that has the largest area is the rectangle (because it's the largest object in the image). Of course, this is not the only way. See this: OpenCV find contours
Once you are left with only the dots, circles and numbers you need to find a way to detect them and discriminate between them. You could again find all contours (or maybe you've found them all from the previous step). You need to figure out a way to see if a certain contour is a circle, a filled circle (dot) or a number. This is a problem in it's own. Maybe you could count the white/black pixels in the contour's bounding box. Dots have more black pixels than circles and numbers. You also need to do something about numbers that connect with dots (like the number 5 in your image)
Once you know what is a dot, circle or number you could use an OCR library (Tesseract or any other OCR lib) to try and recognize the numbers. You could also use a neural network library (maybe trained with the MNIST dataset) to recognize the digits. A good one would be a convolutional neural network similar to LeNet-5.
As you can see, this is a problem that requires many different steps to solve, and many different components are involved. The steps I suggested might not be the best, but with some work I think it can be solved.

Measuring an object from a picture using a known object size

So what I need to do is measuring a foot length from an image taken by an ordinary user. That image will contain a foot with a black sock wearing, a coin (or other known size object), and a white paper (eg A4) where the other two objects will be upon.
What I already have?
-I already worked with opencv but just simple projects;
-I already started to read some articles about Camera Calibration ("Learn OpenCv") but still don't know if I have to go so far.
What I am needing now is some orientation because I still don't understand if I'm following right way to solve this problem. I have some questions: Will I realy need to calibrate camera to get two or three measures of the foot? How can I find the points of interest to get the line to measure, each picture is a different picture or there are techniques to follow?
Ps: sorry about my english, I really have to improve it :-/
First, some image acquisition things:
Can you count on the black sock and white background? The colors don't matter as much as the high contrast between the sock and background.
Can you standardize the viewing angle? Looking directly down at the foot will reduce perspective distortion.
Can you standardize the lighting of the scene? That will ease a lot of the processing discussed below.
Lastly, you'll get a better estimate if you zoom (or position the camera closer) so that the foot fills more of the image frame.
Analysis. (Note this discussion will directed to your question of identifying the axes of the foot. Identifying and analyzing the coin would use a similar process, but some differences would arise.)
The next task is to isolate the region of interest (ROI). If your camera is looking down at the foot, then the ROI can be limited to the white rectangle. My answer to this Stack Overflow post is a good start to square/rectangle identification: What is the simplest *correct* method to detect rectangles in an image?
If the foot lies completely in the white rectangle, you can clip the image to the rect found in step #1. This will limit the image analysis to region inside the white paper.
"Binarize" the image using a threshold function: If you choose the threshold parameters well, you should be able to reduce the image to a black region (sock pixels) and white regions (non-sock pixel).
Now the fun begins: you might try matching contours, but if this were my problem, I would use bounding boxes for a quick solution or moments for a more interesting (and possibly robust) solution.
Use cvFindContours to find the contours of the black (sock) region:
Use cvApproxPoly to convert the contour to a polygonal shape
For the simple solution, use cvMinRect2 to find an arbitrarily oriented bounding box for the sock shape. The short axis of the box should correspond to the line in largura.jpg and the long axis of the box should correspond to the line in comprimento.jpg.
If you want more (possible) accuracy, you might try cvMoments to compute the moments of the shape.
Use cvGetSpatialMoment to determine the axes of the foot. More information on the spatial moment may be found here: and here
With the axes known, you can then rotate the image so that the long axis is axis-aligned (i.e. vertical). Then, you can simply count pixels horizontally and vertically to obtains the lengths of the lines. Note that there are several assumptions in this moment-oriented process. It's a fun solution, but it may not provide any more accuracy - especially since the accuracy of your size measurements is largely dependent on the camera positioning issues discussed above.
Lastly, I've provided links to the older C interface. You might take a look at the new C++ interface (I simply have not gotten around to migrating my code to 2.4)
Antonio Criminisi likely wrote the last word on this subject years ago. See his "Single View Metrology" paper , and his PhD thesis if you have time.
You don't have to calibrate the camera if you have a known-size object in your image. Well... at least if your camera doesn't distort too much and if you're not expecting high quality measurements.
A simple approach would be to detect a white (perspective-distorted) rectangle, mapping the corners to an undistorted rectangle (using e.g. cv::warpPerspective()) and use the known size of that rectangle to determine the size of other objects in the picture. But this only works for objects in the same plane as the paper, preferably not too far away from it.
I am not sure if you need to build this yourself, but if you just need to do it, and not code it. You can use KLONK Image Measurement for this. There is a free and payable versions.

square detection, image processing

I am looking for an efficient way to detect the small boxes around the numbers (see images)?
I already tried to use hough transformation with no success. Any ideas? I need some hints! I am using opencv...
For inspiration, you can have a look at the
Matlab video sudoku solver demo and explanation
Sudoku Grab, an Iphone App, whose author explains the computer vision part on his blog
Alternatively, if you are always hunting for the same grid you could deploy something like this:
Make a perfect artificial template of the grid and detect or save all coordinates from all corners.
In the target image, do the same thing, for example with Harris points. Be creative, you might also be able to use the distinct triangles that can be found in your images.
Using the coordinates from the template and the found harris points, determine the affine transformation x = Ax' between the template and the target image. That transformation can then be used to map the template grid onto the target image. At the very least this will give you some prior information to help guide further segmentation.
The gist of the idea and examples of the estimation of affine matrix A can be found on the site of Zissermans book Multiple View Geometry in Computer Vision and Peter Kovesi
I'd start by trying to detect the rectangular boundary of the overall sheet, then applying a perspective transform to make it truly rectangular. Crop that portion of the image out. If possible, then try to make the alternating white and grey sub-rectangles have an equal background brightness - maybe try adaptive histogram equalization.
Then the Hough transform might perform better. Alternatively, you could then take an approach that's broadly similar to this demonstration by Robert Bemis on MATLAB Central (it's analysing a DNA microarray image rather than Lotto cards, but it's essentially finding bounding boxes of items arranged in a grid). At a high level, the approach is to calculate the autocorrelation along columns and rows of pixels to detect the periodicity of the items in the grid, and use that to impose a bounding box on each item.
Sorry the above advice is mostly MATLAB-based; I'm afraid I'm not an opencv user, but hopefully it will give you some ideas at least.

how to remove background image and get fore image

there are two images
alt text
alt text
one is background image another one is a person's photo with the same background ,same size,what i want to do is remove the second image's background and distill the person's profile only. the common method is subtract first image from the second one,but my problem is if the color of person's wear is similar to the background. the result of subtract is awful. i can not get whole people's profile. who have good idea to remove the background give me some advice.
thank you in advance.
If you have a good estimate of the image background, subtracting it from the image with the person is a good first step. But it is only the first step. After that, you have to segment the image, i.e. you have to partition the image into "background" and "foreground" pixels, with constraints like these:
in the foreground areas, the average difference from the background image should be high
in the background areas, the average difference from the background image should be low
the areas should be smooth. Outline length and curvature should be minimal.
the borders of the areas should have a high contrast in the source image
If you are mathematically inclined, these constraints can be modeled perfectly with the Mumford-Shah functional. See here for more information.
But you can probably adapt other segmentation algorithms to the problem.
If you want a fast and simple (but not perfect) version, you could try this:
subtract the two images
find the largest consecutive "blob" of pixels with a background-foreground difference greater than some threshold. This is the first rough estimate for the "person area" in the foreground image, but the segmentation does not meet the criteria 3 and 4 above.
Find the outline of the largest blob (EDIT: Note that you don't have to start at the outline. You can also start with a larger polygon, as the steps will automatically shrink it to the optimal position.)
now go through each point in the outline and smooth the outline. i.e. for each point find the point that minimizes the formula: c1*L - c2*G, where L is the length of the outline polygon if the point were moved here and G is the gradient at the location the point would be moved to, c1/c2 are constants to control the process. Move the point to that position. This has the effect of smoothing the contour polygon in areas of low gradient in the source image, while keeping it tied to high gradients in the source image (i.e. the visible borders of the person). You can try different expressions for L and G, for example, L could take the length and curvature into account, and G could also take the gradient in the background and subtracted images into account.
you probably will have to re-normalize the outline polygon, i.e. make sure that the points on the outline are spaced regularly. Either that, or make sure that the distances between the points stay regular in the step before. ("Geodesic Snakes")
repeat the last two steps until convergence
You now have an outline polygon that touches the visible person-background border and continues smoothly where the border is not visible or has low contrast.
Look up "Snakes" (e.g. here) for more information.
Low-pass filter (blur) the images before you subtract them.
Then use that difference signal as a mask to select the pixels of interest.
A wide-enough filter will ignore the too-small (high-frequency) features that end up carving out "awful" regions inside your object of interest. It'll also reduce the highlighting of pixel-level noise and misalignment (the highest-frequency information).
In addition, if you have more than two frames, introducing some time hysteresis will let you form more stable regions of interest over time too.
One technique that I think is common is to use a mixture model. Grab a number of background frames and for each pixel build a mixture model for its color.
When you apply a frame with the person in it you will get some probability that the color is foreground or background, given the probability densities in the mixture model for each pixel.
After you have P(pixel is foreground) and P(pixel is background) you could just threshold the probability images.
Another possibility is to use the probabilities as inputs in some more clever segmentation algorithm. One example is graph cuts which I have noticed works quite well.
However, if the person is wearing clothes that are visually indistguishable from the background obviously none of the methods described above would work. You'd either have to get another sensor (like IR or UV) or have a quite elaborate "person model" which could "add" the legs in the right position if it finds what it thinks is a torso and head.
Good luck with the project!
Background vs Foreground detection is very subjective. The application scenario defines background or foreground. However in the application you detail, I guess you are implicitly saying that the person is the foreground.
Using the above assumption, what you seek is a person detection algorithm. A possible solution is:
Run a haar feature detector+ boosted cascade of weak classifiers
(see the opencv wiki for details)
Compute inter-frame motion (differences)
If there is a +ve face detection for a frame, cluster motion pixels
around the face (kNN algorithm)
voila... you should have a simple person detector.
Post the photo on Craigslist and tell them that you'll pay $5 for someone to do it.
Guaranteed you'll get hits in minutes.
Instead of a straight subtraction, you could step through both images, pixel by pixel, and only "subtract" the pixels which are exactly the same. That of course won't account for minor variances in colors, though.
