Given an image (2048x2018) with a lot of noise of random pixels (>90% for each possible image), there are several pixels of the same color distributed across entire image. I want to remove noise and leave only pixels of the same color. Let's say from all random pixels 10 red, 15 orange and 14 black pixels should be left.
However, colors of repeating pixels are not known beforehand. There are multiple images with different colors. So I'm trying to write an algorithm that would filter an image until only repeating colors are left. Number of repeating colors should be as close as possible to actual number, but not necessarily exact.
I think this can be achieved by building a network to filter the noise out. But the data seem to be too random to use conventional networks since there are no patterns except of repeating pixels. Is there a way to calcify distributed values without any real pattern? Maybe there is any another way to do it without brute force sorting of each color?
Sounds like you need to use some sort of pooling, specifically Average Pooling. You could use pooling to first find out what colors are dominant in the image, and then remove pixels of all colors other than the dominant colors to leave only the colors that you want.
Related
I have an image with a collection of objects in K given perceived colors. Providing I extract those objects, how could I cluster them by their perceived color?
Let me give you an example. I am trying to cluster two football teams - so there will be two teams, referees and a keeper (or two, but that`s a rare situation) on the image - 3, 4 or 5 clusters.
For a human's eye, it`s an easy situation. On the picture above, we have white players, red players and a black ref. But it turns out not so easy for automatic processing.
What I have tried so far:
1) I've started working on the BGR colorspace, then tried HSV and now I am exploring CIE Luv, as I read it has unified distances describing the perceived differences between colors.
2) [BGR and HSV] taking the most common color from the contour (not the bounding box). this didn' work at all because of the noise (green field getting in the way), the quality of the image, the position of the player, etc. Colors were pretty much random.
3) [CIE Luv] Resizing all players' boxes to a common size and taking a small portion of the image from the middle (as marked by a black rectangle in the example below).
Taking the mean value of all pixels in each player's window and adding to the list (so, it`s one pixel with the mean value per player). Using K-means (with a defined number of clusters) to find out clusters on that list. This has proven somewhat successful, for the image above I have redish, white and blackish centres in the clusters.
Unfortunately, the assignment of players back to these clusters is pretty much random. I am doing that by calculating the mean color for each player like I described above and then measuring the distance to each cluster. A player might be assigned to the white cluster on one frame and to the red one on the next. Part of the problem might be that the window in the middle of the player's box will sometimes catch a number, grass or shorts, instead of the jersey.
I have already spent a considerable amount of time on trying to figure that out, grateful for any help.
I may be overcomplicating the problem since you just have 3 classes, but try training an SVM classifier based on HOG descriptors. maybe try LDA to improve speed
Some references -
1] http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.627.6465&rep=rep1&type=pdf - skip to recognition part
2] https://rodrigob.github.io/documents/2013_ijcnn_traffic_signs.pdf - skip to recognition part.
3] https://www.learnopencv.com/handwritten-digits-classification-an-opencv-c-python-tutorial/ - if you want to jump into the code right away
This will always work as long as your detection is good. and can also help to identify different players based on their shirt number
(maybe more???) if you train it right
EDIT: Okkay I have another idea, based on colour segmentation since that was your original approach and require less work (maybe not? color segmentation is a pain! also LIGHTING! LIGHTING! LIGHTING!).
Create a green mask and create a threshold so you detect as little grass as possible when doing your kmeans. Then instead of finding mean, try median instead, that will get you closer to red, coz white is detected as 0 and mean just drops drastically, median doesnt. So it'll be way more robust and you should be able to sort players better (hair color and skin color shouldnt affect it too much)
EDIT 2: Just noticed, if you use the black rectangle you'll get shirt number more (which is white), gonna mess up your classifier, use original box with green masked out
EDIT 3: Also. You can just create 3 thresholds for your required colors and split them up! don't really need Kmeans in this actually. Basically you just need your detected boxes to give out a value inside that threshold. Try the median method I mentioned above. Should improve. Also, might need some more minor tweaks here and there (blur, morphology etc to improve detection)
I am seeing many Machine learning(CNN) tutorial which converts the read image in grayscale. I want to know how the model will understand original color/use color as one identification criteria if the colors are converted throughout the model creation ?
In consideration with colours, there can be 2 cases in an image processing problem:
Colours are not relevant in object-identification
In this case, converting a coloured image to a grayscale image will not matter, because eventually the model will be learning from the geometry present in the image. The image-binarization will help in sharpening the image by identifying the light and dark areas.
Colours are relevant in object-identification
As you might know that all the colours can be represented as some combination of three primary RGB colours. Each of these R, G and B values usually vary from 0 to 255 for each pixel. However, in gray-scaling, a certain pixel value will be one-dimensional instead of three-dimensional, and it will just vary from 0 to 255. So, yes, there will be some information loss in terms of actual colours, but, that is in tradeoff with the image-sharpness.
So, there can be a combined score of R, G, B values at each point (probably their mean (R+G+B)/3), which can give a number between 0 to 255, which can eventually be used as their representative. So that, instead of specific colour information, the pixel just carries the intensity information.
Reference:
https://en.wikipedia.org/wiki/Grayscale
I would like to add to Shashank's answer.
A model when fed with an image, does not perceive it as we do. Humans perceive images with the variations in colors, stauration of the colors and the brightness of it. We are able to recognize objects and other shapes as well.
However, a model sees an image as a matrix with a bunch of numbers in it (if it is a greyscale image). In case of a color image, it sees it as three matrices stacked above one another filled with numbers(0 -255) in it.
So how does it learn color? Well it doesn't. What it does learn is the variation in the numbers within this matrix (in case of greyscale image). These variations are crucial to determine changes in the image. If the CNN is trained in this respect, it will be able to detect a structure in the image and can also be used for bject detection.
I have a sheet of paper on another surface. I want to figure out when the paper ends and the surface begins.
I'm using this approach:
1. Convert image to pixel array
2. Pick 3 random 20x20 squares and frequency count the colors
3. The highest frequency is the background
However, the problem is that I get over 100 colors every time I do this on an actual image taken by the camera.
I think I can fix it by putting the image in 16 colors palette. Is there a way to do this change on a UIImage or CGImage?
Thanks
Your colours are probably very close together. How about calculating the distance (the cumulative absolute difference between red, green and blue values) from each sampled colour to a reference colour - just use the first one you sample as reference. If the distance is large, you have a different colour. If the distance is small, you have the same colour with minor variations in lighting or other camera artefacts.
Basically this is applying a filter in a very simple manner. It is up to you to decide how big the difference has to be for the colours to be considered different, but you could decide that by looking at the median difference of all the colours and grouping them into over/under samples.
You might also get good results from applying a Core Image filter to the sample images, such as CIColorClamp (CISpotColor looks better but is OS X only). if you can find a suitable filter there is a good chance it will be simpler and faster than doing it yourself.
How to get rid of uneven illumination from images, that contain text data, usually printed but may be handwritten? It can have some spots of lights because the light reflected while making picture.
I've seen the Halcon program's segment_characters function that is doing this work perfectly,
but it is not open source.
I wish to convert an image to the image that has a constant illumination at background and more dark colored regions of text. So that binarization will be easy and without noise.
The text is assumed to be dark colored than it's background.
Any ideas?
Strictly speaking, assuming you have access to the image's pixels (you can search online for how to accomplish this in your programming language as the topic is abundantly available), the exercise involves going over the pixels once to determine a "darkness threshold". In order to do this you convert each pixel from RGB to HSL in order to get the lightness level component for each pixel. During this process you calculate an average lightness for the whole image which you can use as your "darkness threshold"
Once you have the image average lightness level, you can go over the image pixels once more and if a pixel is less than the darkness threshold, set it's color to full white RGB(255,255,255), otherwise, set it's color to full black RGB (0,0,0). This will give you a binary image with in which the text should be black - the rest should be white.
Of course, the key is in finding the appropriate darkness threshold - so if the average method doesn't give you good results you may have to come up with a different method to augment that step. Such a method could involve separating the image in the primary channels Red, Green, Blue and computing the darkness threshold for each channel separately and then using the aggressive threshold of the three..
And lastly, a better approach may be to compute the light levels distribution - as opposed to simply the average - and then from that, the range around the maximum is what you want to keep. Again, go over each pixel and if it's lightness fits the band make it black, otherwise, make it white.
EDIT
For further reading about HSL I recommend starting with the Wiky entry on HSL and HSV Color spaces.
Have you tried using morphological techniques? Closure-by-reconstruction (as presented in Gonzalez, Woods and Eddins) can be used to create a grayscale representation of background illumination levels. You can more-or-less standardize the effective illumination by:
1) Calculating the mean intensity of all the pixels in the image
2) Using closure-by-reconstruction to estimate background illumination levels
3) Subtract the output of (2) from the original image
4) Adding the mean intensity from (1) to every pixel in the output of (3).
Basically what closure-by-reconstruction does is remove all image features that are smaller than a certain size, erasing the "foreground" (the text you want to capture) and leaving only the "background" (illumination levels) behind. Subtracting the result from the original image leaves behind only small-scale deviations (the text). Adding the original average intensity to those deviations is simply to make the text readable, so that the resulting picture looks like a light-normalized version of the original image.
Use Local-Thresholding instead of the global thresholding algorithm.
Divide your image(grayscale) in to a grid of smaller images (say 50x50 px) and apply the thresholding algorithm on each individual image.
If the background features are generally larger than the letters, you can try to estimate and subsequently remove the background.
There are many ways to do that, a very simple one would be to run a median filter on your image. You want the filter window to be large enough that text inside the window rarely makes up more than a third of the pixels, but small enough that there are several windows that fit into the bright spots. This filter should result in an image without text, but with background only. Subtract that from the original, and you should have an image that can be segmented with a global threshold.
Note that if the bright spots are much smaller than the text, you do the inverse: choose the filter window such that it removes the light only.
The first thing you need to try and do it change the lighting, use a dome light or some other light that will give you a more diffuse and even light.
If that's not possible, you can try some of the ideas in this question or this one. You want to implement some type of "adaptive threshold", this will apply a local threshold to individual parts of the image so that the change in contrast won't be as noticable.
There is also a simple but effective method explained here. The simple outline of the alrithm is the following:
Split the image up into NxN regions or neighbourhoods
Calculate the mean or median pixel value for the neighbourhood
Threshold the region based on the value calculated in 2) or the value from 2) minus C (where C is a chosen constant)
It seems like what you're trying to do is improve local contrast while attenuating larger scale lighting variations. I'll agree with other posters that optimizing the image through better lighting should always be the first move.
After that, here are two tricks.
1) Use smooth_image() operator to convolve a gaussian on your original image. Use a relaitively large kernel, like 20-50px. Then subtract this blurred image from your original image. Apply scale and offset within sub_image() operator, or use equ_histo() to equalize histogram.
This basically subtracts the low spatial frequency information from the original, leaving the higher frequency information intact.
2) You could try highpass_image() operator, or one of the laplacian operators to extract a gradiant image.
I am trying to teach my camera to be a scanner: I take pictures of printed text and then convert them to bitmaps (and then to djvu and OCR'ed). I need to compute a threshold for which pixels should be white and which black, but I'm stymied by uneven illumination. For example if the pixels in the center are dark enough, I'm likely to wind up with a bunch of black pixels in the corners.
What I would like to do, under relatively simple assumptions, is compensate for uneven illumination before thresholding. More precisely:
Assume one or two light sources, maybe one with gradual change in light intensity across the surface (ambient light) and another with an inverse square (direct light).
Assume that the white parts of the paper all have the same reflectivity/albedo/whatever.
Find some algorithm to estimate degree of illumination at each pixel, and from that recover the reflectivity of each pixel.
From a pixel's reflectivity, classify it white or black
I have no idea how to write an algorithm to do this. I don't want to fall back on least-squares fitting since I'd somehow like to ignore the dark pixels when estimating illumination. I also don't know if the algorithm will work.
All helpful advice will be upvoted!
EDIT: I've definitely considered chopping the image into pieces that are large enough so they still look like "text on a white background" but small enough so that illumination of a single piece is more or less even. I think if I then interpolate the thresholds so that there's no discontinuity across sub-image boundaries, I will probably get something halfway decent. This is a good suggestion, and I will have to give it a try, but it still leaves me with the problem of where to draw the line between white and black. More thoughts?
EDIT: Here are some screen dumps from GIMP showing different histograms and the "best" threshold value (chosen by hand) for each histogram. In two of the three a single threshold for the whole image is good enough. In the third, however, the upper left corner really needs a different threshold:
I'm not sure if you still need a solution after all this time, but if you still do. A few years ago I and my team photographed about 250,000 pages with a camera and converted them to (almost black and white ) grey scale images which we then DjVued ( also make pdfs of).
(See The catalogue and complete collection of photographic facsimiles of the 1144 paper transcripts of the French Institute of Pondicherry.)
We also ran into the problem of uneven illumination. We came up with a simple unsophisticated solution which worked very well in practice. This solution should also work to create black and white images rather than grey scale (as I'll describe).
The camera and lighting setup
a) We taped an empty picture frame to the top of a table to keep our pages in the exact same position.
b) We put a camera on a tripod also on top of the table above and pointing down at the taped picture frame and on a bar about a foot wide attached to the external flash holder on top of the camera we attached two "modelling lights". These can be purchased at any good camera shop. They are designed to provide even illumination. The camera was shaded from the lights by putting small cardboard box around each modelling light. We photographed in greyscale which we then further processed. (Our pages were old browned paper with blue ink writing so your case should be simpler).
Processing of the images
We used the free software package irfanview.
This software has a batch mode which can simultaneously do color correction, change the bit depth and crop the images. We would take the photograph of a page and then in interactive mode adjust the brightness, contrast and gamma settings till it was close to black and white. (We used greyscale but by setting the bit depth to 2 you will get black and white when you batch process all the pages.)
After determining the best color correction we then interactively cropped a single image and noted the cropping settings. We then set all these settings in the batch mode window and processed the pages for one book.
Creating DjVu images.
We used the free DjVu Solo 3.1 to create the DjVu images. This has several modes to create the DjVu images. The mode which creates black and white images didn't work well for us for photographs, but the "photo" mode did.
We didn't OCR (since the images were handwritten Sanskrit) but as long as the letters are evenly illuminated I think your OCR software should ignore big black areas like between a two page spread. But you can always get rid of the black between a two page spread or at the edges by cropping the pages twices once for the left hand pages and once for the right hand pages and the irfanview software will allow you to cleverly number your pages so you can then remerge the pages in the correct order. I.e rename your pages something like page-xxxA for lefthand pages and page-xxxB for righthand pages and the pages will then sort correctly on name.
If you still need a solution I hope some of the above is useful to you.
i would recommend calibrating the camera. considering that your lighting setup is fixed (that is the lights do not move between pictures), and your camera is grayscale (not color).
take a picture of a white sheet of paper which covers the whole workable area of your "scanner". store this picture, it tells what is white paper for each pixel. now, when you take take a picture of a document to scan, you can reload your "white reference picture" and even the illumination before performing a threshold.
let's call the white reference REF, the picture DOC, the even illumination picture EVEN, and the maximum value of a pixel MAX (for 8bit imaging, it is 255). for each pixel:
EVEN = DOC * (MAX/REF)
notes:
beware of the parenthesis: most image processing library uses the image pixel type for performing computation on pixel values and a simple multiplication will overload your pixel. eventually, write the loop yourself and use a 32 bit integer for intermediate computations.
the white reference image can be smoothed before being used in the process. any smoothing or blurring filter will do, and don't hesitate to apply it aggressively.
the MAX value in the formula above represents the target pixel value in the resulting image. using the maximum pixel value targets a bright white, but you can adjust this value to target a lighter gray.
Well. Usually the image processing I do is highly time sensitive, so a complex algorithm like the one you're seeking wouldn't work. But . . . have you considered chopping the image up into smaller pieces, and re-scaling each sub-image? That should make the 'dark' pixels stand out fairly well even in an image of variable lighting conditions (I am assuming here that you are talking about a standard mostly-white page with dark text.)
Its a cheat, but a lot easier than the 'right' way you're suggesting.
This might be horrendously slow, but what I'd recommend is to break the scanned surface into quarters/16ths and re-color them so that the average grayscale level is similar across the page. (Might break if you have pages with large margins though)
I assume that you are taking images of (relatively) small black letters on a white background.
One approach could be to "remove" the small black objects, while keeping the illumination variations of the background. This gives an estimate of how the image is illuminated, which can be used for normalizing the original image. It is often enough to subtract the illumination estimate from the original image and then do a threshold based segmentation.
This approach is based on gray scale morphological filters, and could be implemented in matlab like below:
img = imread('filename.png');
illumination = imclose(img, strel('disk', 10));
imgCorrected = img - illumination;
thresholdValue = graythresh(imgCorrected);
bw = imgCorrected > thresholdValue;
For an example with real images take a look at this guide from mathworks. For further reading about the use of morphological image analysis this book by Pierre Soille can be recommended.
Two algorithms come to my mind:
High-pass to alleviate the low-frequency illumination gradient
Local threshold with an appropriate radius
Adaptive thresholding is the keyword. Quote from a 2003 article by R.
Fisher, S. Perkins, A. Walker, and E. Wolfart: “This more sophisticated version
of thresholding can accommodate changing lighting conditions in the image, e.g.
those occurring as a result of a strong illumination gradient or shadows.”
ImageMagick's -lat option can do it, for example:
convert -lat 50x50-2000 input.jpg output.jpg
input.jpg
output.jpg
You could try using an edge detection filter, then a floodfill algorithm, to distinguish the background from the foreground. Interpolate the floodfilled region to determine the local illumination; you may also be able to modify the floodfill algorithm to use the local background value to jump across lines and fill boxes and so forth.
You could also try a Threshold Hysteresis with a rate of change control. Here is the link to the normal Threshold Hysteresis. Set the first threshold to a typical white value. Set the second threshold to less than the lowest white value in the corners.
The difference is that you want to check the difference between pixels for all values in between the first and second threshold. Ideally if the difference is positive, then act normally. But if it is negative, you only want to threshold if the difference is small.
This will be able to compensate for lighting variations, but will ignore the large changes between the background and the text.
Why don't you use simple opening and closing operations?
Try this, just lool at the results:
src - cource image
src - open(src)
close(src) - src
and look at the close - src result
using different window size, you will get backgound of the image.
I think this helps.