How do you extract noisy connected components from an image? - opencv

I have a number of polygonal regions (red) in an image delineated by line segments (cyan). However the lines are noisy and incomplete, they aren't perfectly straight and the have chunks missing. Is there a way to robustly extract the intended red polygons?
If the lines were clean and not broken up connected components would solve this nicely. I've experimented with trying to complete the line segments using Hough transform with little success.
EDIT: Another thought I had was to detect the intersection points of the line segments by first taking the medial axis tranform of the cyan pixels then having a sliding window move over the image and finding windows where there are three or more separate red regions which would indicate locations of cyan intersections. But then not sure what next ..

I know you probably tried this out, but... Did you apply some morphology? maybe some dilations followed by some erosions, maybe at a ratio of 5:2 to preserve and enhance the connections of the components. Did you test using different Structuring Elements?

Related

OpenCV: How to detect rhombus on image?

I hame some image with plane which have perspective transform.
I need to detect center of each white rhombus or rhombus itself.
Here is examples:
As I unserstand the problem can be solved by simple template matching if we rectify image, but I need to do it automatically.
Is there any functions in OpenCV suitable for this task? Any other ideas?
Here are two quick tests I just did without correcting the perspective issue.
Pure mathematical morphology:
Extract the red channel
Big white top-hat in order to detect all the bright areas, but without the big bright reflexion.
Small white top-hat in order to detect only the thin lines between the rhombus
Result of 2 minus result of 3. The lines between the rhombus are then thinner or even disappeared.
Opening to clean the final result.
Here are two results: Image1 and Image2. The main issue is that the rhombus do not have the same sizes (different magnification and perspective), which can be problematic with the mathematical morphology.
So here is an other solution using the Hough transform:
You start with the resulting image of the step 3 from the previous algorithm.
You apply a hough transform.
Here are the results: Hough1 and Hough2. Then you have to filter between lines touching a rhombus or not, but you can use my first algorithm for that. Even if all the rhombus are not detected by the first algorithm, most will be and it will be enough to detect the lines touching the Rhombus. Then the line intersections will be the centroids that your are looking for.

Extract coordinates from image file

How to get an array of coordinates of a (drawn) line in image? Coordinates should be relative to image borders. Input: *.img . Output array of coordinates (with fixed step). Any 3rd party software to do this? For example there is high contrast difference - white background and color black line; or red and green etc.
Example:
Oh, you mean non-straight lines. You need to define a "line". Intuitively, you might mean a connected area of the image with a high aspect ratio between the length of its medial axis and the distance between medial axis and edges (ie relatively long and narrow, even if it winds around). Possible approach:
Threshold or select by color. Perhaps select by color based on a histogram of colors, or posterize as described here: Adobe Photoshop-style posterization and OpenCV, then call scipy.ndimage.measurements.label()
For each area above, skeletonize. Helpful tutorial: "Skeletonization using OpenCV-Python". However, you will likely need the distance to the edges as well, so use skimage.morphology.medial_axis(..., return_distance=True)
Do some kind of cleanup/filtering on the skeleton to remove short branches, etc. Thinking about your particular use, and assuming your lines don't loop around, you can just find the longest single path in the skeleton. This is where you can also decide if a shape is a "line" or not, based on how long the longest path in its skeleton is, relative to distance to the edges. Not sure how to best do that in opencv, but "Analyze Skeleton" in Fiji/ImageJ will let you filter by branch length.
What is left is the most elongated medial axis of the original "line" shape. You can resample that to some step that you prefer, or fit it with a spline, etc.
Due to the nature of what you want to do, it is hard to come up with a sample code that will work on a range of images. This is likely to require some careful tuning. I recommend using a small set of images (corpus), running any version of your algo on them and checking the results manually until it is pretty good, then trying it on a large corpus.
EDIT: Original answer, only works for straight lines:
You probably want to use the Hough transform (OpenCV tutorial).
Python sample code: Horizontal Line detection with OpenCV
EDIT: Related question with sample code to skeletonize: How can I get a full medial-axis line with its perpendicular lines crossing it?

OpenCV: comparing simple images with small difference

I have a bunch of "simple" images and I want to compare if they are similar together. I compare them to each other using template matching (cv::matchTemplate) and results are quite good.
Now I want to fine tune my program and I face a problem. For example I have two images which look very much alike. Only differences they have is that another one has thicker line and the digit front of item is different. When both images are small, one pixell difference in line thickness makes big result differences when doing template matching. When line thicknesses are same and only difference is the front digit, I get template matching result something like 0.98 with CV_TM_CCORR_NORMED when match successful. When line thickness is different matching result is something like 0.95.
I cannot decrease my threshold value below 0.98 because some other similar images have same line thickness.
Here are example images:
So what options do I have?
I have tried:
dilate the original and template
erode also both
morphologyEx both
calculating keypoints and comparing them
finding corners
But no big success yet. Are those images too simple that detecting "good features" is hard?
Any help is very wellcome.
Thank you!
EDIT:
Here are some other example images. What my program consider as similar are put in same zip-folder.
ZIP
A possible way might be thinning the two images, so that every line is of one pixel width, since the differing thickness is causing you the main problem with similarity.
The procedure would be to first binarize/threshold the images, then apply a thinning operation on both images, so both are now having the same thickness of 1 px. Then use the usual template matching that you used before with good results.
In case you'd like more details on the thinning/skeletonization of binary images here are a few OpenCV implementations posted on various discussion forums and OpenCV groups:
OpenCV code for thinning (Guo and Hall algo, works with CvMat inputs)
The JR Parker implementation using OpenCV
Possibly more efficient code here (uses OpenCV optimized access methods a lot, however most of the page is in Japanese!)
And lastly a brief overview of thinning in case you're interested.
You need something more elementary here, there isn't much reason to go for fancy methods. Your figures are already binary ones, and their shapes are very similar overall.
One initial idea: consider the upper points and bottom points in a certain image and form a upper hull and a bottom hull (simply a hull, not a convex hull or anything else). A point is said to be an upper point (respec. bottom point) if, given a column i, it is the first point starting at the top (bottom) of the image that is not a background point in i. Also, your image is mostly one single connected component (in some cases there are vertical bars separated, but that is fine), so you can discard small components easily. This step is important for your situation because I saw there are some figures with some form of noise that is irrelevant to the rest of the image. Considering that a connected component with less than 100 points is small, these are the hulls you get for the respective images included in the question:
The blue line is indicating the upper hull, the green line the bottom hull. If it is not apparent, when we consider the regional maxima and regional minima of these hulls we obtain the same amount in both of them. Furthermore, they are all very close except for some displacement in the y axis. If we consider the mean x position of the extrema and plot the lines of both images together we get the following figure. In this case, the lines in blue and green are for the second image, and the lines in red and cyan for the first. Red dots are in the mean x coordinate of some regional minima, and blue dots the same but for regional maxima (these are our points of interest). (The following image has been resized for better visualization)
As you can see, you get many nearly overlapping points without doing anything. If we do even less, i.e. not even care about this overlapping, and proceed to classify your images in the trivial way: if an image a and another image b have the same amount of regional maxima in the upper hull, the same amount of regional minima in the upper hull, the same amount of regional maxima in the bottom hull, and the same amount of regional minima in the bottom hull, then a and b belong to the same class. Doing this for all your images, all images are correctly grouped except for the following situation:
In this case we have only 3 maxima and 3 minima for the upper hull in the first image, while there are 4 maxima and 4 minima for the second. Following you see the plots for the hulls and points of interest obtained:
As you can notice, in the second upper hull there are two extrema very close. Smoothing this curve eliminates both extrema, making the images match by the trivial method. Also, note that if you draw a rectangle around your images, then this method will tell they are all equal. In that case you will want to compare multiple hulls, discarding the points in the current hull and constructing other ones. Nevertheless, this method is able to group all your images correctly given they are all very simple and mostly noisy-free.
From as much as I can get, the difficulty is when the shape is the same, just size is different. A simple hack approach could be:
- subtract the images, then erode. If the shapes were the same but one slightly bigger, subtracting will leave only the edges, which will be thin an vanish with erosion as noise.
Somewhat more formal, would be to take the contours and then the approximate polygons and do a invariants comparison (Hu Moments etc.)

How to create a single line/edge from a set of superimposing lines/edges in MATLAB?

I have a set of edges detected from an image using edge detector of MATLAB's computer vision toolbox. All these edges (18 of them) just form two lines. How do I get the lines from these edges? All that I am interested is to find the intersection point of these two lines.
edges looklike
and the hough lines look like
Peter Kovesi's CV website contains a great set of functions for line detection. Look at this example of using them.
Since you mentioned that the intention is to find the "center point" here goes a possible way (not MATLAB specific though):
Clarifications: when you mention
All these edges (18 of them) just form two lines
It's actually two components or contours that are formed. The Hough line transform will give you straight lines: not exactly what you wanted it seems.
Also, the two "lines" or "contours" do not intersect at least from what's seen in the picture. If you want to find the point of closest approach traverse each point on one contour and check the distance between that point and the points on the second contour. Find the minimum distance for each point on the contour. Then select the minimum from that.
If you meant intersection of two straight lines, simply solve the two equations (you can get them from knowing the end-points of the lines).

Image processing / super light OCR

I have 55 000 image files (in both JPG and TIFF format) which are pictures from a book.
The structure of each page is this:
some text
--- (horizontal line) ---
a number
some text
--- (horizontal line) ---
another number
some text
There can be from zero to 4 horizontal lines on any given page.
I need to find what the number is, just below the horizontal line.
BUT, numbers strictly follow each other, starting at one on page one, so in order to find the number, I don't need to read it: I could just detect the presence of horizontal lines, which should be both easier and safer than trying to OCR the page to detect the numbers.
The algorithm would be, basically:
for each image
count horizontal lines
print image name, number of horizontal lines
next image
The question is: what would be the best image library/language to do the "count horizontal lines" part?
Probably the easiest way to detect your lines is using the Hough transform in OpenCV (which has wrappers for many languages).
The OpenCV Hough tranform will detect all lines in the image and return their angles and start/stop coordinates. You should only keep the ones whose angles are close to horizontal and of adequate length.
O'Reilly's Learning OpenCV explains in detail the function's input and output (p.156).
If you have good contrast, try running connected components and analyze the result. It can be an alternative to finding lines through Hough and cover the case when your structured elements are a bit curved or a line algorithm picks up the lines you don’t want it to pick up.
Connected components is a super fast, two raster scan algorithm and will give you a mask with all you connected elements in it marked with different labels and accounted for. You can discard anything short ( in terms of aspect ratio). Overall, this can be more general, faster but probably a bit more involved than running Hough transform. The Hough transform on the other hand will be more tolerable for contrast artifacts and even accidental gaps in lines.
OpenCV has the function findContours() that find components for you.
you might want to try John' Resig's OCR and Neural Nets in Javascript

Resources