Efficiently analyze dominant color in UIImage - ios

I am trying to come up with an efficient algorithm to query the top k most dominant colors in a UIImage (jpg or png). By dominant color I mean the color that is present in the most amount of pixels. The use case is mostly geared toward finding the top-1 (single most dominant) color to figure out the images background. Below I document the algorithm I am currently trying to implement and wanted some feedback.
First my algorithm takes an image and draws it into a bitmap context, I do this so I have control over how many bytes per pixel will be present in my image-data for processing/parsing.
Second, after drawing the image, I loop through every pixel in the image buffer. I quickly realized looping through every pixel would not scale, this became a huge performance bottleneck. In order to optimize this, I realized I need not analyze every pixel in the image. I also realized that as I scaled down an image, the dominant colors in the image became more prominent and it resulted in looping through far fewer pixels. So the second step is actually to loop through every pixel in the resized image buffer:
Third, as I loop through each pixel I build up a CountedSet of UIColor objects, this basicly keeps track of a histogram of color counts.
Finally, once I have my counted set it is very easy to then loop through it and respond to top-k dominant color queries.
So my algorithm in short is to resize the image (scale it down by some function proportional to the images size), draw it into a bitmap context buffer once and cache it, loop through all data in the buffer and build out a histogram.
My question to the stack overflow community is how efficient is my algorithm, and are there any gains or further optimizations I can make? I just need something that gives me reasonable performance and after doing some performance testing this seemed to work pretty darn well. Furthermore, how accurate will this be? Particularly the rescale operation kinda worries me. Am I trading signficant amount of accuracy for performance here? At the end of the day this will mostly just be used to determine the background color of an image.
Ideas for potential performance improvements:
1) When analyzing a single dominant color do some math to figure out if I have already found the most dominant color based on number of pixels analyzed and exit early.
2) For the top k query, answer it quickly by levaraging a binary-heap data structure (typical top-k query algo)

You can use some performance tweaks to avoid the downscaling altogether. As we do not see your implementation is hard to say where the bottleneck really is. So here some pointers what to look/check for or improve. Take in mind I do not code for your environment so take extreme prejudice:
pixel access
Most pixel access function I saw are SLOOOW especially functions called putpixel,getpixel,pixels,.... Because in each single pixel access they are doing too many sanity/safety checks and color/space/address conversions. Instead use direct pixel access. Most of the image interfaces I saw have some kind of ScanLine[] access which gives you direct pointer to a single line in an image. So if you fill your own array of pointers with it you obtain direct pixel access without any slowdowns. This usually speeds up the algorithm from 100 to 10000 times on most platforms (depends on the usage).
To check for this try to read or fill image 1024*1024*32bit and measure the time. On standard PC it should take up to few [ms] or less. If you got slow access it could be even seconds. For more info see Display an array of color in C
Dominant color
if #1 is still not fast enough you can take advantage of that dominant color has highest probability in the image. So in theory you do not need to sample whole image instead you could:
sample every n-th pixel (which is downscaling with nearest neighbor filter) or use randomized pixel positions for sampling. Both approaches have their pros and cons but if you combine them you could get much better results with much less pixels to process then the whole image. Of coarse this will lead to wrong results on some occasions (when you miss many of the dominant pixels) which is improbable but possible.
histogram structure
for low color count like up to 16bit you can use bucket sort/histogram acquisition which is fast and can be done in O(n) where n is the number of pixels. No searching needed. So if you reduce colors from true color to 16 bit you can significantly boost the speed of histogram computation. Because you lower the constant time hugely and also the complexity goes from O(n*m) to O(n) which is for high color count m really big difference. See my C++ histogram example it is in HSV but in RGB is almost the same...
In case you need true-color you got 16.7M colors which is not practical for bucket sort style. So you need to use binary search and dictionary to speed up the color search in histogram. If you do not have this then this is your slow down.
histogram sort
How did you sort the histogram? If you got wrong sort implementation it could take much time for big color counts. I usually use bubble-sort in my examples because it is less code to write and usually enough. But I saw here on SO too many times wrongly implemented bubble sort using alway the worse case time T(n^2) which is wrong (and even I sometimes do it). For time sensitive code I use quick-sort. See bubble sort in C++.
Also your task is really resembling Color quantization (or it is just me?) so take a look at: Effective gif/image color quantization?

Downscaling an image requires looking at each pixel so you can pick a new pixel that is closest to the average color of some group of neighbors. The reason this appears to happen so fast compared to your implementation of iterating through all the pixels is that CoreGraphics hands the scaling task off to the GPU hardware, whereas your approach uses the CPU to iterate through each pixel which is much slower.
So the thing you need to do is write some GPU-based code to scan through your original image and look at each pixel, tallying up the color counts as you go. This has the advantage not only of being very fast, but you'll also get an accurate count of colors. Downsampling produces as I mentioned pixels that are color averages, so you won't end up with reliably correct color counts that correlate to your original image (unless you happen to be downscaling solid colors, but in the typical case you'll end up with something other than you started with).
I recommend looking into Apple's Metal framework for an API that lets you write code directly for the GPU. It'll be a challenge to learn, but I think you'll find it interesting and when you're done your code will scan original images extremely fast without having to go through any extra downsampling effort.

Related

Image Segmentation for Color Analysis in OpenCV

I am working on a project that requires me to:
Look at images that contain relatively well-defined objects, e.g.
and pick out the color of n-most (it's generic, could be 1,2,3, etc...) prominent objects in some space (whether it be RGB, HSV, whatever) and return it.
I am looking into ways to segment images like this into the independent objects. Once that's done, I'm under the impression that it won't be particularly difficult to find the contours of the segments and analyze them for average or centroid color, etc...
I looked briefly into the Watershed algorithm, which seems like it could work, but I was unsure of how to generate the marker image for an indeterminate number of blobs.
What's the best way to segment such an image, and if it's using Watershed, what's the best way to generate the corresponding marker image of integers?
Check out this possible approach:
Efficient Graph-Based Image Segmentation
Pedro F. Felzenszwalb and Daniel P. Huttenlocher
Here's what it looks like on your image:
I'm not an expert but I really don't see how the Watershed algorithm can be very useful to your segmentation problem.
From my limited experience/exposure to this kind of problems, I would think that the way to go would be to try a sliding-windows approach to segmentation. Basically this entails walking the image using a window of a set size, and attempting to determine if the window encompasses background vs. an object. You will want to try different window sizes and steps.
Doing this should allow you to detect the object in the image, presuming that the images contain relatively well defined objects. You might also attempt to perform segmentation after converting the image to black and white with a certain threshold the gives good separation of background vs. objects.
Once you've identified the object(s) via the sliding window you can attempt to determine the most prominent color using one of the methods you mentioned.
UPDATE
Based on your comment, here's another potential approach that might work for you:
If you believe the objects will have mostly uniform color you might attempt to process the image to:
remove noise;
map original image to reduced color space (i.e. 256 or event 16 colors)
detect connected components based on pixel color and determine which ones are large enough
You might also benefit from re-sampling the image to lower resolution (i.e. if the image is 1024 x 768 you might reduce it to 256 x 192) to help speed up the algorithm.
The only thing left to do would be to determine which component is the background. This is where it might make sense to also attempt to do the background removal by converting to black/white with a certain threshold.

GPUImage Taking sum of columns of image

Im using GPUImage in my project and I need an efficient way of taking the column sums. Naive way would obviously be retrieving the raw data and adding values of every column. Can anybody suggest a faster way for that?
One way to do this would be to use the approach I take with the GPUImageAverageColor class (as described in this answer), only instead of reducing the total size of each frame at each step, only do this for one dimension of the image.
The average color filter determines the average color of the overall image by stepping down in a factor of four in both X and Y, averaging 16 pixels into one at each step. If operating in a single direction, you should be able to use hardware interpolation to get an 18X reduction in a single direction per step with good performance. Your final step might either require a quick CPU-based iteration on the much smaller image or a tweaked version of this shader that pulls the last few pixels in a column together into the final result pixel for that column.
You notice that I've been talking about averaging here, because the output values for any OpenGL ES operation will need to be in terms of colors, which only have a 0-255 range per channel. A sum will easily overflow this, but you could use an average as an approximation of your sum, with a more limited dynamic range.
If you only care about one color channel, you could possibly encode a larger value into the RGBA channels and maintain a 32-bit sum that way.
Beyond what I describe above, you could look at performing this sum with the help of the Accelerate framework. While probably not quite as fast as doing a shader-based reduction, it might be good enough for your needs.

why we should use gray scale for image processing

I think this can be a stupid question but after read a lot and search a lot about image processing every example I see about image processing uses gray scale to work
I understood that gray scale images use just one channel of color, that normally is necessary just 8 bit to be represented, etc... but, why use gray scale when we have a color image? What are the advantages of a gray scale? I could imagine that is because we have less bits to treat but even today with faster computers this is necessary?
I am not sure if I was clear about my doubt, I hope someone can answer me
thank you very much
As explained by John Zhang:
luminance is by far more important in distinguishing visual features
John also gives an excellent suggestion to illustrate this property: take a given image and separate the luminance plane from the chrominance planes.
To do so you can use ImageMagick separate operator that extracts the current contents of each channel as a gray-scale image:
convert myimage.gif -colorspace YCbCr -separate sep_YCbCr_%d.gif
Here's what it gives on a sample image (top-left: original color image, top-right: luminance plane, bottom row: chrominance planes):
To elaborate a bit on deltheil's answer:
Signal to noise. For many applications of image processing, color information doesn't help us identify important edges or other features. There are exceptions. If there is an edge (a step change in pixel value) in hue that is hard to detect in a grayscale image, or if we need to identify objects of known hue (orange fruit in front of green leaves), then color information could be useful. If we don't need color, then we can consider it noise. At first it's a bit counterintuitive to "think" in grayscale, but you get used to it.
Complexity of the code. If you want to find edges based on luminance AND chrominance, you've got more work ahead of you. That additional work (and additional debugging, additional pain in supporting the software, etc.) is hard to justify if the additional color information isn't helpful for applications of interest.
For learning image processing, it's better to understand grayscale processing first and understand how it applies to multichannel processing rather than starting with full color imaging and missing all the important insights that can (and should) be learned from single channel processing.
Difficulty of visualization. In grayscale images, the watershed algorithm is fairly easy to conceptualize because we can think of the two spatial dimensions and one brightness dimension as a 3D image with hills, valleys, catchment basins, ridges, etc. "Peak brightness" is just a mountain peak in our 3D visualization of the grayscale image. There are a number of algorithms for which an intuitive "physical" interpretation helps us think through a problem. In RGB, HSI, Lab, and other color spaces this sort of visualization is much harder since there are additional dimensions that the standard human brain can't visualize easily. Sure, we can think of "peak redness," but what does that mountain peak look like in an (x,y,h,s,i) space? Ouch. One workaround is to think of each color variable as an intensity image, but that leads us right back to grayscale image processing.
Color is complex. Humans perceive color and identify color with deceptive ease. If you get into the business of attempting to distinguish colors from one another, then you'll either want to (a) follow tradition and control the lighting, camera color calibration, and other factors to ensure the best results, or (b) settle down for a career-long journey into a topic that gets deeper the more you look at it, or (c) wish you could be back working with grayscale because at least then the problems seem solvable.
Speed. With modern computers, and with parallel programming, it's possible to perform simple pixel-by-pixel processing of a megapixel image in milliseconds. Facial recognition, OCR, content-aware resizing, mean shift segmentation, and other tasks can take much longer than that. Whatever processing time is required to manipulate the image or squeeze some useful data from it, most customers/users want it to go faster. If we make the hand-wavy assumption that processing a three-channel color image takes three times as long as processing a grayscale image--or maybe four times as long, since we may create a separate luminance channel--then that's not a big deal if we're processing video images on the fly and each frame can be processed in less than 1/30th or 1/25th of a second. But if we're analyzing thousands of images from a database, it's great if we can save ourselves processing time by resizing images, analyzing only portions of images, and/or eliminating color channels we don't need. Cutting processing time by a factor of three to four can mean the difference between running an 8-hour overnight test that ends before you get back to work, and having your computer's processors pegged for 24 hours straight.
Of all these, I'll emphasize the first two: make the image simpler, and reduce the amount of code you have to write.
I disagree with the implication that gray scale images are always better than color images; it depends on the technique and the overall goal of the processing. For example, if you wanted to count the bananas in an image of a fruit bowl image, then it's much easier to segment when you have a colored image!
Many images have to be in grayscale because of the measuring device used to obtain them. Think of an electron microscope. It's measuring the strength of an electron beam at various space points. An AFM is measuring the amount of resonance vibrations at various points topologically on a sample. In both cases, these tools are returning a singular value- an intensity, so they implicitly are creating a gray-scale image.
For image processing techniques based on brightness, they often can be applied sufficiently to the overall brightness (grayscale); however, there are many many instances where having a colored image is an advantage.
Binary might be too simple and it could not represent the picture character.
Color might be too much and affect the processing speed.
Thus, grayscale is chosen, which is in the mid of the two ends.
First of starting image processing whether on gray scale or color images, it is better to focus on the applications which we are applying. Unless and otherwise, if we choose one of them randomly, it will create accuracy problem in our result. For example, if I want to process image of waste bin, I prefer to choose gray scale rather than color. Because in the bin image I want only to detect the shape of bin image using optimized edge detection. I could not bother about the color of image but I want to see rectangular shape of the bin image correctly.

Threshold values for binary filtering

How to determine good values for the two threshold values for binary filtering?
The images I want to filter are MRI or CT images like these http://pubimage.hcuge.ch:8080/, the images are also most likely gray scale images.
I'm trying to extract a surface model from a stack of 2D images using marching cubes algorithm and binary filtering on the iPad. For the binary filtering I use a lower and upper threshold value, the pixel is set to inside value if lowerThreshold <= pixelValue <= upperThreshold.
Thanks for your help, Manu
Update: I have asked one of my image processing professors about this question now. He said if the histogramm of the image is bimodal (which means there are two hills in the histogramm) the solution is relatively easy which is the case in my images
If your image background is black and your object of interest of any other shade, then you can try to guess a threshold from the histogram of your image (note though, that you may have to try hard to find a suitable percentage threshold that suits all your images).
This may not be sufficient however. A tool that would be interesting for this task is clearly active contours (aka snakes), but it's hard to guess if you can afford the time and effort needed to use them (there is an implementation of geodesic active contours in ITK, but I don't know how much effort it requires before use). If snakes are an option, then you can make the contour evolve from the boundary of your image until they meet your object and fit its contour.

EMGU OpenCV disparity only on certain pixels

I'm using the EMGU OpenCV wrapper for c#. I've got a disparity map being created nicely. However for my specific application I only need the disparity values of very few pixels, and I need them in real time. The calculation is taking about 100 ms now, I imagine that by getting disparity for hundreds of pixel values rather than thousands things would speed up considerably. I don't know much about what's going on "under the hood" of the stereo solver code, is there a way to speed things up by only calculating the disparity for the pixels that I need?
First of all, you fail to mention what you are really trying to accomplish, and moreover, what algorithm you are using. E.g. StereoGC is a really slow (i.e. not real-time), but usually far more accurate) compared to both StereoSGBM and StereoBM. Those last two can be used real-time, providing a few conditions are met:
The size of the input images is reasonably small;
You are not using an extravagant set of parameters (for instance, a larger value for numberOfDisparities will increase computation time).
Don't expect miracles when it comes to accuracy though.
Apart from that, there is the issue of "just a few pixels". As far as I understand, the algorithms implemented in OpenCV usually rely on information from more than 1 pixel to determine the disparity value. E.g. it needs a neighborhood to detect which pixel from image A map to which pixel in image B. As a result, in general it is not possible to just discard every other pixel of the image (by the way, if you already know the locations in both images, you would not need the stereo methods at all). So unless you can discard a large border of your input images for which you know that you'll never find your pixels of interest there, I'd say the answer to this part of your question would be "no".
If you happen to know that your pixels of interest will always be within a certain rectangle of the input images, you can specify the input image ROIs (regions of interest) to this rectangle. Assuming OpenCV does not contain a bug here this should speedup the computation a little.
With a bit of googling you can to find real-time examples of finding stereo correspondences using EmguCV (or plain OpenCV) using the GPU on Youtube. Maybe this could help you.
Disclaimer: this may have been a more complete answer if your question contained more detail.

Resources