Delphi - image colors detection [duplicate] - delphi

I'm writing a program that works with images and at some point I need to posterize the image. This means I need to bin the colors, but I'm having trouble deciding how to tell how close one color is to another.
Given a color in RGB, I can think of at least 2 ways to see how different they are:
|r1 - r2| + |g1 - g2| + |b1 - b2|
sqrt((r1 - r2)^2 + (g1 - g2)^2 + (b1 - b2)^2)
And if I move into HSV, I can think of other ways of doing it.
So I ask, ignoring speed, what is the best way to tell how similar two colors are? Best meaning most accurate to the human eye.

Well, if speed is not an issue, the most accurate way would be to take some sample images and apply the filter to them using various cutoff values for the distance (distance being determined by one of the equations on the Color_difference page that astander linked to, meaning you'd have to use one of those color spaces listed there with the calculations, then convert to sRGB or something [which also means that you'd need to convert the image into the other color space first if it's not in it to begin with]), and then have a large number of people examine the images to see what looks best to them, then go with the cutoff value for the images that the majority agrees looks best.
Basically, it's largely a matter of subjectiveness; in fact, it also depends on how stylized you want the images, and you might even want to add in some sort of control so that you can alter the cutoff distance on the fly.
If speed does become a bit of an issue and/or you want more simplicity, then just use your second choice for distance calculation (which is simply the CIE76 equation; just make sure to use the Lab* color space) with the cutoff being around 2 or 2.3.

What do you mean by "posterize the image"?
If you're trying to cluster the colors into bins, you should look at
cluster analysis

Just a comment if you are going to move to HSV (or similar spaces):
Diffing on H: difference between 0° and 359° is numerically big but perceptually is negligible.
H difference if V or S are small - is small.
For computer vision apps, more important not perceptual difference (used mostly by paint manufacturers) but are these colors belong to the same object/segment or not. Which means that we might partially ignore V, which can change from lighting conditions.

Related

Interpolation vs Average

I'm new of computer vision concepts and I'd like to know why, when we double the size of an image, we should use bilinear interpolation where pixels haven't values instead of average between nearest known values pixels.
I'm not sure I agree with the premise that you "should use bilinear interpolation". You shouldn't blindly use anything without thinking about it. For example, if your pixels represent the result of a classification and 1 represents wheat, and 2 represents water, and 3 represents barley, you certainly shouldn't take the average and assume that when you enlarge an image of wheat and barley that some ocean suddenly appears in the middle between the fields.
Bilinear interpolation is actually just averaging, except a) it is in 2 dimensions because images are inherently 2-dimensional and b) if you know you are nearer to one point than another, surely it isn't unreasonable to weight your "guesstimated" value (which, after all, you don't actually know) more towards the geometrically closer value?
I guess my answer is really that there are several types of interpolation, and you should apply some thinking to deciding which one is best for your particular circumstances. Sometimes you don't want to introduce new colours because of classification or palette issues, and in these circumstances you need "nearest neighbour". Sometimes "bilinear" is what you need, sometimes "bicubic".

Detecting common colors for groups of objects

I have an image with a collection of objects in K given perceived colors. Providing I extract those objects, how could I cluster them by their perceived color?
Let me give you an example. I am trying to cluster two football teams - so there will be two teams, referees and a keeper (or two, but that`s a rare situation) on the image - 3, 4 or 5 clusters.
For a human's eye, it`s an easy situation. On the picture above, we have white players, red players and a black ref. But it turns out not so easy for automatic processing.
What I have tried so far:
1) I've started working on the BGR colorspace, then tried HSV and now I am exploring CIE Luv, as I read it has unified distances describing the perceived differences between colors.
2) [BGR and HSV] taking the most common color from the contour (not the bounding box). this didn' work at all because of the noise (green field getting in the way), the quality of the image, the position of the player, etc. Colors were pretty much random.
3) [CIE Luv] Resizing all players' boxes to a common size and taking a small portion of the image from the middle (as marked by a black rectangle in the example below).
Taking the mean value of all pixels in each player's window and adding to the list (so, it`s one pixel with the mean value per player). Using K-means (with a defined number of clusters) to find out clusters on that list. This has proven somewhat successful, for the image above I have redish, white and blackish centres in the clusters.
Unfortunately, the assignment of players back to these clusters is pretty much random. I am doing that by calculating the mean color for each player like I described above and then measuring the distance to each cluster. A player might be assigned to the white cluster on one frame and to the red one on the next. Part of the problem might be that the window in the middle of the player's box will sometimes catch a number, grass or shorts, instead of the jersey.
I have already spent a considerable amount of time on trying to figure that out, grateful for any help.
I may be overcomplicating the problem since you just have 3 classes, but try training an SVM classifier based on HOG descriptors. maybe try LDA to improve speed
Some references -
1] http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.627.6465&rep=rep1&type=pdf - skip to recognition part
2] https://rodrigob.github.io/documents/2013_ijcnn_traffic_signs.pdf - skip to recognition part.
3] https://www.learnopencv.com/handwritten-digits-classification-an-opencv-c-python-tutorial/ - if you want to jump into the code right away
This will always work as long as your detection is good. and can also help to identify different players based on their shirt number
(maybe more???) if you train it right
EDIT: Okkay I have another idea, based on colour segmentation since that was your original approach and require less work (maybe not? color segmentation is a pain! also LIGHTING! LIGHTING! LIGHTING!).
Create a green mask and create a threshold so you detect as little grass as possible when doing your kmeans. Then instead of finding mean, try median instead, that will get you closer to red, coz white is detected as 0 and mean just drops drastically, median doesnt. So it'll be way more robust and you should be able to sort players better (hair color and skin color shouldnt affect it too much)
EDIT 2: Just noticed, if you use the black rectangle you'll get shirt number more (which is white), gonna mess up your classifier, use original box with green masked out
EDIT 3: Also. You can just create 3 thresholds for your required colors and split them up! don't really need Kmeans in this actually. Basically you just need your detected boxes to give out a value inside that threshold. Try the median method I mentioned above. Should improve. Also, might need some more minor tweaks here and there (blur, morphology etc to improve detection)

Image Segmentation for Color Analysis in OpenCV

I am working on a project that requires me to:
Look at images that contain relatively well-defined objects, e.g.
and pick out the color of n-most (it's generic, could be 1,2,3, etc...) prominent objects in some space (whether it be RGB, HSV, whatever) and return it.
I am looking into ways to segment images like this into the independent objects. Once that's done, I'm under the impression that it won't be particularly difficult to find the contours of the segments and analyze them for average or centroid color, etc...
I looked briefly into the Watershed algorithm, which seems like it could work, but I was unsure of how to generate the marker image for an indeterminate number of blobs.
What's the best way to segment such an image, and if it's using Watershed, what's the best way to generate the corresponding marker image of integers?
Check out this possible approach:
Efficient Graph-Based Image Segmentation
Pedro F. Felzenszwalb and Daniel P. Huttenlocher
Here's what it looks like on your image:
I'm not an expert but I really don't see how the Watershed algorithm can be very useful to your segmentation problem.
From my limited experience/exposure to this kind of problems, I would think that the way to go would be to try a sliding-windows approach to segmentation. Basically this entails walking the image using a window of a set size, and attempting to determine if the window encompasses background vs. an object. You will want to try different window sizes and steps.
Doing this should allow you to detect the object in the image, presuming that the images contain relatively well defined objects. You might also attempt to perform segmentation after converting the image to black and white with a certain threshold the gives good separation of background vs. objects.
Once you've identified the object(s) via the sliding window you can attempt to determine the most prominent color using one of the methods you mentioned.
UPDATE
Based on your comment, here's another potential approach that might work for you:
If you believe the objects will have mostly uniform color you might attempt to process the image to:
remove noise;
map original image to reduced color space (i.e. 256 or event 16 colors)
detect connected components based on pixel color and determine which ones are large enough
You might also benefit from re-sampling the image to lower resolution (i.e. if the image is 1024 x 768 you might reduce it to 256 x 192) to help speed up the algorithm.
The only thing left to do would be to determine which component is the background. This is where it might make sense to also attempt to do the background removal by converting to black/white with a certain threshold.

Remove Background Color or Texture Before OCR Processing

When a typical mobile phone user takes picture for a card-size object, some background texture is usually included in the image -- please refer to the attached samples. In certain cases, that background could pollute OCR's accuracy.
I am wondering that whether there are solutions or not to remove the background (am positive that there are), or detect the background regions so one can just crop them off before OCR. In case of the attached images, wood tables and counter-top presenting are the candidate being removed. I would imagine that contrasting colors could be a solution but not so sure.
There are certain cases where you, as a human, have trouble discerning between background and foreground, so certainly there is no method to do correctly what you want. Since you mention OCR, I assume you actually want to eliminate everything that is not text. This doesn't make the question any easier actually, so what I'm actually assuming is that you want to keep objects that are highly contrasted against other objects (like foreground and background, or black text on a white background, for example). Again, there is no perfect method for that.
So, all this answer is going to do is present a simple method that might help you in your task. The method is a combination of ready morphological tools and the Otsu method for binarization since it is statistically optimal. The result are the regions that are potentially worth to look at. Note that you will certainly need to combine these results with many other different analysis, a good OCR system goes much beyond these direct approaches.
The method: 1) convert the image to grayscale (not interested in the colors, but a different method can certainly use them); 2) Use the h-dome transform to remove irrelevant maxima; 3) Calculate the morphological gradient; 4) Binarize by otsu; 5) Remove small objects by area opening. Removing irrelevant maxima is important for your task since you can have pretty horrible regions caused by a combination of bad camera's with bad camera's flash together with a inexperienced photographer. H-dome transform is based on morphological reconstruction, so if your library has the latter but not the former, it is straightforward to implement it (otherwise you could learn how to efficiently implement the latter). Morphological gradient for discrete images is a very simple method to apply which tends to work fine even with bad illumination, since it is a local method. Threshold on its result by Otsu keeps the strongest edges (which possibly includes noise and other minor features). You could precede all this by a gaussian smoothing, which might serve as an initial tool for noise suppression. The small features are readily removed by area opening. In Matlab, this can be done as in:
f = rgb2gray(imread(yourimage));
se = strel('square', 3);
g = imhmax(f, 50); % h-dome with h = 50
g = imdilate(g, se) - imerode(g, se); % morphological gradient
h = im2bw(g, graythresh(g)); % graythresh applies Otsu's method
w = bwareaopen(h, 50);
assuming that objects smaller than 50 pixels are irrelevant (which might not always be the case for small text).
Here are the w images for your examples:
These outputs give an indication of where you should look for text, i.e., the interior of the connected components.

pattern recognition between two very different images

So, my problem is that I have to find common points between two images of a microchip. Here's an example of two images:
Between these two images, we can clearly see some common pattern like the wires on the bottom right of the first images that can be found in relatively the same place in the second image. Also, the sort of white Z shape in the first image can be seen in the second images, a bit harder, but it's there.
I tried to match them with SURF (OpenCV), found no common point at all. Tried to apply some filter on both images, like edge detection, thresholding, and other filter that I could found in GIMP, but whatever I tried, no common point were ever found.
I'd like to know if you have any idea to solve this problem ? My suggestion right now would be to manually match key features in both images with line segments, but preferably, it should be automated.
A solution that uses OpenCV would be preferable, but I'm looking for any suggestion possible. In OpenCV, all pattern matching situation that I saw were problems way more obvious that this one. No difference in color and so on.
Unless realtime is required, do a simple approach to test if rotation can be automated:
Circuit boards like the ones in the images, are often based on perpendicular straight line segments. Hence you can "despeckle" and remove stuff like coffee stains, by finding linesegments.
Think about creating a kernel, that have a line with dark pixels on one side, and bright pixels on the other. Fold it on the image (or cross-correlate it) to identify all pixels that have a sequence of bright/dark pixels which are nearly vertical or horizontal.
you may interlace to speed things up.
edges of stains and speckles may survive this, if you want angles close to 45* representatations!
The resulting image can be interpreted as a sparse pointcloud.
You can now use RANSAC or other similar approaches to describe many of the remaining correlations, as line segments.
* use a 2 point line segment as input model for RANSAC, Degrade if small.
* Determine infinite lines that have many inliers
* use growth or binninng approaches to segmentate lines.
benefits:
high likelyhood of line segment representations that are actually present as circuitry in image. 2 point description of segments, possible transforms are easy.
easy interpretation of data, as it can be overlayed in openCV
Rotation should be easily found as the rotation that matches most found lines to horizontal and/or vertical axis'es.
apply rotation.
repeat for both images.
now you can determine best translation between the images, by simple x,y cross correlation.
If the top image is always of that quality (quasi bilevel patterns, easy edge detection), I would try a good geometric matching algorithm (such as Cognex or Halcon), training with the top image and searching the bottom one.
Maybe it is worth to first compensate rotation (I hope there is no scaling). You would do that by determining the dominant edge direction, possibly using a Hough transform. Or, much better, by careful mechanical alignment of the sensors.
Anyway, chances of success are low, this is a difficult problem.

Resources