I have performed the watershed segmentation on a picture of clustered cells. There seems to be many clusters of cells that have not been segmented enough or not at all. There are also single cells that have been oversegmented. What methods could I use to merge the oversegmented single cells and further split the undersegmented clusters of cells?
Edit: The criteria for determining whether a cell has been over or undersegmented will be done by determining whether the area of the cell is within a certain average range of normal sized cells. I'm not sure if this is a good idea though. Any help will be appreciated, thanks.
You cannot expect to make zero errors and segment everything perfect. Maybe you have smaller or larger cells. Maybe your image quality was really bad.
If you know that cells have an area in a certain range, just adapt the watershed parameters (the threshold) until on average the estimated area is consistent with your prior knowledge.
If you have really large segments (large area, more than twice the average area or so) let the watershed run again locally with a higher threshold.
If you have locally really small segments, let the watershed run again locally with a smaller threshold.
I wouldn't do more except using another algorithm like for example ilastik which has semi-automated segmentation.
You have to decide what is the ideal or expected cell for you; apparently its some round shape with no reversal of curvature (i.e. it goes around with no reversal of direction) =simple shape. For that you can use shape features such as circularity: you need to determine what is the range of circularities that you accept.
For watershed I think it might be better to go for oversegmentation - then shapes that are near can be merged based on whether the combined shape fills the criteria (as above). Other shape features can be used (elongatedness etc).
If you go for undersegmentation you have no choice (under the method you are using) but to repeat the segmentation on the remaining shapes.
Related
I have an image with a collection of objects in K given perceived colors. Providing I extract those objects, how could I cluster them by their perceived color?
Let me give you an example. I am trying to cluster two football teams - so there will be two teams, referees and a keeper (or two, but that`s a rare situation) on the image - 3, 4 or 5 clusters.
For a human's eye, it`s an easy situation. On the picture above, we have white players, red players and a black ref. But it turns out not so easy for automatic processing.
What I have tried so far:
1) I've started working on the BGR colorspace, then tried HSV and now I am exploring CIE Luv, as I read it has unified distances describing the perceived differences between colors.
2) [BGR and HSV] taking the most common color from the contour (not the bounding box). this didn' work at all because of the noise (green field getting in the way), the quality of the image, the position of the player, etc. Colors were pretty much random.
3) [CIE Luv] Resizing all players' boxes to a common size and taking a small portion of the image from the middle (as marked by a black rectangle in the example below).
Taking the mean value of all pixels in each player's window and adding to the list (so, it`s one pixel with the mean value per player). Using K-means (with a defined number of clusters) to find out clusters on that list. This has proven somewhat successful, for the image above I have redish, white and blackish centres in the clusters.
Unfortunately, the assignment of players back to these clusters is pretty much random. I am doing that by calculating the mean color for each player like I described above and then measuring the distance to each cluster. A player might be assigned to the white cluster on one frame and to the red one on the next. Part of the problem might be that the window in the middle of the player's box will sometimes catch a number, grass or shorts, instead of the jersey.
I have already spent a considerable amount of time on trying to figure that out, grateful for any help.
I may be overcomplicating the problem since you just have 3 classes, but try training an SVM classifier based on HOG descriptors. maybe try LDA to improve speed
Some references -
1] http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.627.6465&rep=rep1&type=pdf - skip to recognition part
2] https://rodrigob.github.io/documents/2013_ijcnn_traffic_signs.pdf - skip to recognition part.
3] https://www.learnopencv.com/handwritten-digits-classification-an-opencv-c-python-tutorial/ - if you want to jump into the code right away
This will always work as long as your detection is good. and can also help to identify different players based on their shirt number
(maybe more???) if you train it right
EDIT: Okkay I have another idea, based on colour segmentation since that was your original approach and require less work (maybe not? color segmentation is a pain! also LIGHTING! LIGHTING! LIGHTING!).
Create a green mask and create a threshold so you detect as little grass as possible when doing your kmeans. Then instead of finding mean, try median instead, that will get you closer to red, coz white is detected as 0 and mean just drops drastically, median doesnt. So it'll be way more robust and you should be able to sort players better (hair color and skin color shouldnt affect it too much)
EDIT 2: Just noticed, if you use the black rectangle you'll get shirt number more (which is white), gonna mess up your classifier, use original box with green masked out
EDIT 3: Also. You can just create 3 thresholds for your required colors and split them up! don't really need Kmeans in this actually. Basically you just need your detected boxes to give out a value inside that threshold. Try the median method I mentioned above. Should improve. Also, might need some more minor tweaks here and there (blur, morphology etc to improve detection)
I am interested in detecting single object more precisely a fire extinguisher which has no inter class variability (all fire extinguisher looks same). However, The application is supposedly realtime i.e a robot is exploring the environment and whenever it sees the object of interest it should be able to detect it and give pixel coordinates of it.
My question is which algorithm will be good choice for this task?
1. Is this a classification problem and should we use features(sift/surf etc) + bow +svm?
2. some other solution (no idea yet).
Any kind of input will be appreciated.
Thanks.
(P.S bear with me i am newbie to computer vision and stack over flow)
update1:
Height varies all are mounted on the wall but with different height. I tried with SIFT features and bow but it is expensive to extract bow descriptors in testing part. Moreover I have no idea how to locate the object(pixel coordinates) inside the image after its been classified positive.
update 2:
I finally used sift + bow + svm and am able to classify the object. But using this technique, i only get output interms of whether the object is present in the scene or not?
How can i detect the object i.e getting the bounding box or centre of the object. what is the compatible approach with the above method for achieving these results.
Thank you all.
I would suggest using color as the main feature to look for, and only try other features as needed. The fire extinguisher red is very distinctive, and should not occur too often elsewhere in an office environment. Other, more computationally expensive tests can then be performed only in regions of the right color.
Here is a good tutorial for color detection that also explains how to find good thresholds for your desired color.
I would suggest the following approach:
denoise your image with a median filter
convert the image to HSV format (Hue, Saturation, Value)
select pixels close to that particular shade of red with InRange()
Now you have a binary image image that contains only the pixels that are red.
count the number of red pixels with CountNonZero()
If that number is too small, abort
remove noise from the binary image by morphological opening / closing
find contours of all blobs in your picture with findContours or the CvBlob library
check if there are blobs of the correct width, correct height and correct width/height ratio
since your fire extinguishers are vertical cylinders, the width/height ratio will be constant from every angle. The width and height will of course vary somewhat with distance to the camera.
if the width and height do not match, abort
repeat these steps to find the black-colored part on the bottom of the extinguisher,
abort if there is no black region with correct width/height below the red region
(perhaps also repeat these steps for the metallic top and the yellow rectangle)
These tests should all be very fast. If they are too slow, you could reduce the resolution of your input images.
Depending on your environment, it is possible that this is already a robust enough test. If not, you can proceed with sift/surf feature matching, but only in a small region around the blobs with the correct color. You also do not necessarily have to do that for each frame, each n-th frame should be be enough for confirmation.
This is a old question .. but will still like to give my recommendation to use YOLO algorithm to solve this problem.
YOLO fits very well to this scenario.
I have two images which I know represent the exact same object. In the picture below, they are referred as Reference and Match.
The image Match can undergo the following transformations compared to Reference:
The object may have changed its appearance locally by addition(e.g. dirt or lettering added to the side) or omission (side mirror has been taken out).
Stretched or reduced in size horizontally only (it is not resized in vertical direction)
Portions of Reference image are not present in Match (shaded in red in Reference Image).
Question: How can the regions which have "changed" in the ways mentioned above be identified ?
Idea#1: Dynamic Time Warping seems like a good candidate once the beginning and end of Match image (numbered 1 and 3 in the image) are aligned with corresponding columns in Reference Image, but I am not sure how to proceed.
Idea#2: Match SIFT features across images. The tessellation produced by feature point locations breaks up the image into non-uniform tiles. Use feature correspondences across images to determine which tiles to match across images. Use a similarity measure to figure out any changes.
You might want to consider an iterative registration algorithm. Basically you want to perform optimization to find the parameters of the transform, in your case horizontal scaling and horizontal translation. Once you optimize the parameters you will have the transformation between the two images, transform one to match the other, and can then use a subtraction to identify the regions with differences.
For registration take a look at the ITK library.
You can probably do a gradient decent optimization using mutual information as the metric. It has a number of different transforms that will capture translation and scaling. The code should run quickly on the sample images you show.
there are two images
alt text http://bbs.shoucangshidai.com/attachments/month_1001/1001211535bd7a644e95187acd.jpg
alt text http://bbs.shoucangshidai.com/attachments/month_1001/10012115357cfe13c148d3d8da.jpg
one is background image another one is a person's photo with the same background ,same size,what i want to do is remove the second image's background and distill the person's profile only. the common method is subtract first image from the second one,but my problem is if the color of person's wear is similar to the background. the result of subtract is awful. i can not get whole people's profile. who have good idea to remove the background give me some advice.
thank you in advance.
If you have a good estimate of the image background, subtracting it from the image with the person is a good first step. But it is only the first step. After that, you have to segment the image, i.e. you have to partition the image into "background" and "foreground" pixels, with constraints like these:
in the foreground areas, the average difference from the background image should be high
in the background areas, the average difference from the background image should be low
the areas should be smooth. Outline length and curvature should be minimal.
the borders of the areas should have a high contrast in the source image
If you are mathematically inclined, these constraints can be modeled perfectly with the Mumford-Shah functional. See here for more information.
But you can probably adapt other segmentation algorithms to the problem.
If you want a fast and simple (but not perfect) version, you could try this:
subtract the two images
find the largest consecutive "blob" of pixels with a background-foreground difference greater than some threshold. This is the first rough estimate for the "person area" in the foreground image, but the segmentation does not meet the criteria 3 and 4 above.
Find the outline of the largest blob (EDIT: Note that you don't have to start at the outline. You can also start with a larger polygon, as the steps will automatically shrink it to the optimal position.)
now go through each point in the outline and smooth the outline. i.e. for each point find the point that minimizes the formula: c1*L - c2*G, where L is the length of the outline polygon if the point were moved here and G is the gradient at the location the point would be moved to, c1/c2 are constants to control the process. Move the point to that position. This has the effect of smoothing the contour polygon in areas of low gradient in the source image, while keeping it tied to high gradients in the source image (i.e. the visible borders of the person). You can try different expressions for L and G, for example, L could take the length and curvature into account, and G could also take the gradient in the background and subtracted images into account.
you probably will have to re-normalize the outline polygon, i.e. make sure that the points on the outline are spaced regularly. Either that, or make sure that the distances between the points stay regular in the step before. ("Geodesic Snakes")
repeat the last two steps until convergence
You now have an outline polygon that touches the visible person-background border and continues smoothly where the border is not visible or has low contrast.
Look up "Snakes" (e.g. here) for more information.
Low-pass filter (blur) the images before you subtract them.
Then use that difference signal as a mask to select the pixels of interest.
A wide-enough filter will ignore the too-small (high-frequency) features that end up carving out "awful" regions inside your object of interest. It'll also reduce the highlighting of pixel-level noise and misalignment (the highest-frequency information).
In addition, if you have more than two frames, introducing some time hysteresis will let you form more stable regions of interest over time too.
One technique that I think is common is to use a mixture model. Grab a number of background frames and for each pixel build a mixture model for its color.
When you apply a frame with the person in it you will get some probability that the color is foreground or background, given the probability densities in the mixture model for each pixel.
After you have P(pixel is foreground) and P(pixel is background) you could just threshold the probability images.
Another possibility is to use the probabilities as inputs in some more clever segmentation algorithm. One example is graph cuts which I have noticed works quite well.
However, if the person is wearing clothes that are visually indistguishable from the background obviously none of the methods described above would work. You'd either have to get another sensor (like IR or UV) or have a quite elaborate "person model" which could "add" the legs in the right position if it finds what it thinks is a torso and head.
Good luck with the project!
Background vs Foreground detection is very subjective. The application scenario defines background or foreground. However in the application you detail, I guess you are implicitly saying that the person is the foreground.
Using the above assumption, what you seek is a person detection algorithm. A possible solution is:
Run a haar feature detector+ boosted cascade of weak classifiers
(see the opencv wiki for details)
Compute inter-frame motion (differences)
If there is a +ve face detection for a frame, cluster motion pixels
around the face (kNN algorithm)
voila... you should have a simple person detector.
Post the photo on Craigslist and tell them that you'll pay $5 for someone to do it.
Guaranteed you'll get hits in minutes.
Instead of a straight subtraction, you could step through both images, pixel by pixel, and only "subtract" the pixels which are exactly the same. That of course won't account for minor variances in colors, though.
I implemented some adaptive binarization methods, they use a small window and at each pixel the threshold value is calculated. There are problems with these methods:
If we select the window size too small we will get this effect (I think the reason is because of window size is small)
(source: piccy.info)
At the left upper corner there is an original image, right upper corner - global threshold result. Bottom left - example of dividing image to some parts (but I am talking about analyzing image's pixel small surrounding, for example window of size 10X10).
So you can see the result of such algorithms at the bottom right picture, we got a black area, but it must be white.
Does anybody know how to improve an algorithm to solve this problem?
There shpuld be quite a lot of research going on in this area, but unfortunately I have no good links to give.
An idea, which might work but I have not tested, is to try to estimate the lighting variations and then remove that before thresholding (which is a better term than "binarization").
The problem is then moved from adaptive thresholding to finding a good lighting model.
If you know anything about the light sources then you could of course build a model from that.
Otherwise a quick hack that might work is to apply a really heavy low pass filter to your image (blur it) and then use that as your lighting model. Then create a difference image between the original and the blurred version, and threshold that.
EDIT: After quick testing, it appears that my "quick hack" is not really going to work at all. After thinking about it I am not very surprised either :)
I = someImage
Ib = blur(I, 'a lot!')
Idiff = I - Idiff
It = threshold(Idiff, 'some global threshold')
EDIT 2
Got one other idea which could work depending on how your images are generated.
Try estimating the lighting model from the first few rows in the image:
Take the first N rows in the image
Create a mean row from the N collected rows. You know have one row as your background model.
For each row in the image subtract the background model row (the mean row).
Threshold the resulting image.
Unfortunately I am at home without any good tools to test this.
It looks like you're doing adaptive thresholding wrong. Your images look as if you divided your image into small blocks, calculated a threshold for each block and applied that threshold to the whole block. That would explain the "box" artifacts. Usually, adaptive thresholding means finding a threshold for each pixel separately, with a separate window centered around the pixel.
Another suggestion would be to build a global model for your lighting: In your sample image, I'm pretty sure you could fit a plane (in X/Y/Brightness space) to the image using least-squares, then separate the pixels into pixels brighter (foreground) and darker than that plane (background). You can then fit separate planes to the background and foreground pixels, threshold using the mean between these planes again and improve the segmentation iteratively. How well that would work in practice depends on how well your lightning can be modeled with a linear model.
If the actual objects you try to segment are "thinner" (you said something about barcodes in a comment), you could try a simple opening/closing operation the get a lighting model. (i.e. close the image to remove the foreground pixels, then use [closed image+X] as threshold).
Or, you could try mean-shift filtering to get the foreground and background pixels to the same brightness. (Personally, I'd try that one first)
You have very non-uniform illumination and fairly large object (thus, no universal easy way to extract the background and correct the non-uniformity). This basically means you can not use global thresholding at all, you need adaptive thresholding.
You want to try Niblack binarization. Matlab code is available here
http://www.uio.no/studier/emner/matnat/ifi/INF3300/h06/undervisningsmateriale/week-36-2006-solution.pdf (page 4).
There are two parameters you'll have to tune by hand: window size (N in the above code) and weight.
Try to apply a local adaptive threshold using this procedure:
convolve the image with a mean or median filter
subtract the original image from the convolved one
threshold the difference image
The local adaptive threshold method selects an individual threshold for each pixel.
I'm using this approach extensively and it's working fine with images having non uniform background.