GPUImage Taking sum of columns of image - ios

Im using GPUImage in my project and I need an efficient way of taking the column sums. Naive way would obviously be retrieving the raw data and adding values of every column. Can anybody suggest a faster way for that?

One way to do this would be to use the approach I take with the GPUImageAverageColor class (as described in this answer), only instead of reducing the total size of each frame at each step, only do this for one dimension of the image.
The average color filter determines the average color of the overall image by stepping down in a factor of four in both X and Y, averaging 16 pixels into one at each step. If operating in a single direction, you should be able to use hardware interpolation to get an 18X reduction in a single direction per step with good performance. Your final step might either require a quick CPU-based iteration on the much smaller image or a tweaked version of this shader that pulls the last few pixels in a column together into the final result pixel for that column.
You notice that I've been talking about averaging here, because the output values for any OpenGL ES operation will need to be in terms of colors, which only have a 0-255 range per channel. A sum will easily overflow this, but you could use an average as an approximation of your sum, with a more limited dynamic range.
If you only care about one color channel, you could possibly encode a larger value into the RGBA channels and maintain a 32-bit sum that way.
Beyond what I describe above, you could look at performing this sum with the help of the Accelerate framework. While probably not quite as fast as doing a shader-based reduction, it might be good enough for your needs.

Related

Efficiently analyze dominant color in UIImage

I am trying to come up with an efficient algorithm to query the top k most dominant colors in a UIImage (jpg or png). By dominant color I mean the color that is present in the most amount of pixels. The use case is mostly geared toward finding the top-1 (single most dominant) color to figure out the images background. Below I document the algorithm I am currently trying to implement and wanted some feedback.
First my algorithm takes an image and draws it into a bitmap context, I do this so I have control over how many bytes per pixel will be present in my image-data for processing/parsing.
Second, after drawing the image, I loop through every pixel in the image buffer. I quickly realized looping through every pixel would not scale, this became a huge performance bottleneck. In order to optimize this, I realized I need not analyze every pixel in the image. I also realized that as I scaled down an image, the dominant colors in the image became more prominent and it resulted in looping through far fewer pixels. So the second step is actually to loop through every pixel in the resized image buffer:
Third, as I loop through each pixel I build up a CountedSet of UIColor objects, this basicly keeps track of a histogram of color counts.
Finally, once I have my counted set it is very easy to then loop through it and respond to top-k dominant color queries.
So my algorithm in short is to resize the image (scale it down by some function proportional to the images size), draw it into a bitmap context buffer once and cache it, loop through all data in the buffer and build out a histogram.
My question to the stack overflow community is how efficient is my algorithm, and are there any gains or further optimizations I can make? I just need something that gives me reasonable performance and after doing some performance testing this seemed to work pretty darn well. Furthermore, how accurate will this be? Particularly the rescale operation kinda worries me. Am I trading signficant amount of accuracy for performance here? At the end of the day this will mostly just be used to determine the background color of an image.
Ideas for potential performance improvements:
1) When analyzing a single dominant color do some math to figure out if I have already found the most dominant color based on number of pixels analyzed and exit early.
2) For the top k query, answer it quickly by levaraging a binary-heap data structure (typical top-k query algo)
You can use some performance tweaks to avoid the downscaling altogether. As we do not see your implementation is hard to say where the bottleneck really is. So here some pointers what to look/check for or improve. Take in mind I do not code for your environment so take extreme prejudice:
pixel access
Most pixel access function I saw are SLOOOW especially functions called putpixel,getpixel,pixels,.... Because in each single pixel access they are doing too many sanity/safety checks and color/space/address conversions. Instead use direct pixel access. Most of the image interfaces I saw have some kind of ScanLine[] access which gives you direct pointer to a single line in an image. So if you fill your own array of pointers with it you obtain direct pixel access without any slowdowns. This usually speeds up the algorithm from 100 to 10000 times on most platforms (depends on the usage).
To check for this try to read or fill image 1024*1024*32bit and measure the time. On standard PC it should take up to few [ms] or less. If you got slow access it could be even seconds. For more info see Display an array of color in C
Dominant color
if #1 is still not fast enough you can take advantage of that dominant color has highest probability in the image. So in theory you do not need to sample whole image instead you could:
sample every n-th pixel (which is downscaling with nearest neighbor filter) or use randomized pixel positions for sampling. Both approaches have their pros and cons but if you combine them you could get much better results with much less pixels to process then the whole image. Of coarse this will lead to wrong results on some occasions (when you miss many of the dominant pixels) which is improbable but possible.
histogram structure
for low color count like up to 16bit you can use bucket sort/histogram acquisition which is fast and can be done in O(n) where n is the number of pixels. No searching needed. So if you reduce colors from true color to 16 bit you can significantly boost the speed of histogram computation. Because you lower the constant time hugely and also the complexity goes from O(n*m) to O(n) which is for high color count m really big difference. See my C++ histogram example it is in HSV but in RGB is almost the same...
In case you need true-color you got 16.7M colors which is not practical for bucket sort style. So you need to use binary search and dictionary to speed up the color search in histogram. If you do not have this then this is your slow down.
histogram sort
How did you sort the histogram? If you got wrong sort implementation it could take much time for big color counts. I usually use bubble-sort in my examples because it is less code to write and usually enough. But I saw here on SO too many times wrongly implemented bubble sort using alway the worse case time T(n^2) which is wrong (and even I sometimes do it). For time sensitive code I use quick-sort. See bubble sort in C++.
Also your task is really resembling Color quantization (or it is just me?) so take a look at: Effective gif/image color quantization?
Downscaling an image requires looking at each pixel so you can pick a new pixel that is closest to the average color of some group of neighbors. The reason this appears to happen so fast compared to your implementation of iterating through all the pixels is that CoreGraphics hands the scaling task off to the GPU hardware, whereas your approach uses the CPU to iterate through each pixel which is much slower.
So the thing you need to do is write some GPU-based code to scan through your original image and look at each pixel, tallying up the color counts as you go. This has the advantage not only of being very fast, but you'll also get an accurate count of colors. Downsampling produces as I mentioned pixels that are color averages, so you won't end up with reliably correct color counts that correlate to your original image (unless you happen to be downscaling solid colors, but in the typical case you'll end up with something other than you started with).
I recommend looking into Apple's Metal framework for an API that lets you write code directly for the GPU. It'll be a challenge to learn, but I think you'll find it interesting and when you're done your code will scan original images extremely fast without having to go through any extra downsampling effort.

Is "color quantization" the right name for color quantization in OpenCV?

I'm looking for a way to get a complete list of all the RGB values for each pixel in a given image using OpenCV, now i call this "color quantization".
The problem is that according to what I have found online, at least at this point, this "color quantization" thing is about histograms or "color reduction" or similar discrete computation solutions.
Since I know what I want and the "internet" seems to have a different opinion about what this words mean, I was wondering: maybe there is not a real solution for this ? a workable way or a working algorithm in the OpenCV lib.
Generally speaking, quantization is an operation that takes an input signal with real (mathematical) values to a set of discrete values. A possible algorithm to implement this process is to compute the histogram of the data, then retaining the n values that correspond to the n bins of the histogram with the higher population.
What you are trying to do would be called maybe color listing.
If you ar eworking with 8 bits quantized images (type CV_8UC3), my guess is that you do what you desire by taking the histogram of the input image (bin width equal to 1) then searching the result for non-empty bins.
Color quantization is the conversion of infinite natural colors in the finite digital color space. Anyway to create a full color 'histogram' you can use opencv's sparse matrix implementation and write your own function to compute it. Of course you have to access the pixels one by one, if you have no other structural or continuity information about the image.

Image retrieval - edge histogram

My lecturer has slides on edge histograms for image retrieval, whereby he states that one must first divide the image into 4x4 blocks, and then check for edges at the horizontal, vertical, +45°, and -45° orientations. He then states that this is then represented in a 14x1 histogram. I have no idea how he came about deciding that a 14x1 histogram must be created. Does anyone know how he came up with this value, or how to create an edge histogram?
Thanks.
The thing you are referring to is called the Histogram of Oriented Gradients (HoG). However, the math doesn't work out for your example. Normally you will choose spatial binning parameters (the 4x4 blocks). For each block, you'll compute the gradient magnitude at some number of different directions (in your case, just 2 directions). So, in each block you'll have N_{directions} measurements. Multiply this by the number of blocks (16 for you), and you see that you have 16*N_{directions} total measurements.
To form the histogram, you simply concatenate these measurements into one long vector. Any way to do the concatenation is fine as long as you keep track of the way you map the bin/direction combo into a slot in the 1-D histogram. This long histogram of concatenations is then most often used for machine learning tasks, like training a classifier to recognize some aspect of images based upon the way their gradients are oriented.
But in your case, the professor must be doing something special, because if you have 16 different image blocks (a 4x4 grid of image blocks), then you'd need to compute less than 1 measurement per block to end up with a total of 14 measurements in the overall histogram.
Alternatively, the professor might mean that you take the range of angles in between [-45,+45] and you divide that into 14 different values: -45, -45 + 90/14, -45 + 2*90/14, ... and so on.
If that is what the professor means, then in that case you get 14 orientation bins within a single block. Once everything is concatenated, you'd have one very long 14*16 = 224-component vector describing the whole image overall.
Incidentally, I have done a lot of testing with Python implementations of Histogram of Gradient, so you can see some of the work linked here or here. There is also some example code at that site, though a more well-supported version of HoG appears in scikits.image.

How to match texture similarity in images?

What are the ways in which to quantify the texture of a portion of an image? I'm trying to detect areas that are similar in texture in an image, sort of a measure of "how closely similar are they?"
So the question is what information about the image (edge, pixel value, gradient etc.) can be taken as containing its texture information.
Please note that this is not based on template matching.
Wikipedia didn't give much details on actually implementing any of the texture analyses.
Do you want to find two distinct areas in the image that looks the same (same texture) or match a texture in one image to another?
The second is harder due to different radiometry.
Here is a basic scheme of how to measure similarity of areas.
You write a function which as input gets an area in the image and calculates scalar value. Like average brightness. This scalar is called a feature
You write more such functions to obtain about 8 - 30 features. which form together a vector which encodes information about the area in the image
Calculate such vector to both areas that you want to compare
Define similarity function which takes two vectors and output how much they are alike.
You need to focus on steps 2 and 4.
Step 2.: Use the following features: std() of brightness, some kind of corner detector, entropy filter, histogram of edges orientation, histogram of FFT frequencies (x and y directions). Use color information if available.
Step 4. You can use cosine simmilarity, min-max or weighted cosine.
After you implement about 4-6 such features and a similarity function start to run tests. Look at the results and try to understand why or where it doesnt work. Then add a specific feature to cover that topic.
For example if you see that texture with big blobs is regarded as simmilar to texture with tiny blobs then add morphological filter calculated densitiy of objects with size > 20sq pixels.
Iterate the process of identifying problem-design specific feature about 5 times and you will start to get very good results.
I'd suggest to use wavelet analysis. Wavelets are localized in both time and frequency and give a better signal representation using multiresolution analysis than FT does.
Thre is a paper explaining a wavelete approach for texture description. There is also a comparison method.
You might need to slightly modify an algorithm to process images of arbitrary shape.
An interesting approach for this, is to use the Local Binary Patterns.
Here is an basic example and some explanations : http://hanzratech.in/2015/05/30/local-binary-patterns.html
See that method as one of the many different ways to get features from your pictures. It corresponds to the 2nd step of DanielHsH's method.

Fast and quick pixel matching algorithm

I am stuck in a pixel matching algorithm for finding symbols in an image. I have two images of symbols that I intend to find in an image that has big resolution.
Instead of a pixel by pixel matching algorithm, is there a fast algorithm that gives the same result as that of pixel matching algorithm. The result should be similar to: (percentage of pixel matched) divide by (total pixels).
My problem is that I wish to find certain symbols in a 1 bit image. The symbol appear with exact similarity in the target image and 95% of total pixel match with the target block in the image. but it takes hours to do iterations. The image is 10k X 10k and the symbol size is 20 X 20, so it will 10 power of 10 calculations which is too much to handle. Is there any filter/NN combination or any other algorithm that can give same results as that of pixel matching in a few minutes?
The point here is that pixels are almost same in the but problem is that size is very large. I do not want complex features for noise handling or edges, fuzzy etc. just a simple algorithm to do pixel matching quickly and the result should be similar to: (percentage of pixel matched) divide by (total pixels)
object recognition is tricky in that any simple algorithm is generally going to be way too slow, as you've apparently realized.
Luckily, if you have a rather large collection of these images on hand that are already correctly labeled, then I have a very simply solution for you.
Simply make 3 layer feedforward network with one input unit per pixel, all of which connect to a much smaller hidden layer, and then those in turn connect to 1 output unit (representing which symbol is present in the image). Then just run the backpropagation algorithm on your dataset until the network learns to identify the symbols.
Unfortunately, this doesn't scale very well, so you might have to look into convolutional NNs for better performance.
Additionally, if you don't have any training data (i.e. labeled examples), then your best bet is probably to decompose your symbols into features and then sweep the image for those. If you can decompose them into lines, then a hough transform can do this quite rapidly.
Maybe an (Adaptive Resonance Theory) ART-1 network could help.
The algorithm can also be written that all Prototypes are checked in parallel in the same time and it can be blazing fast because it esentially uses binary math a lot.

Resources