I have the following setup: a list of integers, where some of them repeat more than others.
I want to split them into bins, using a histogram, but such that, areas (on the Naturals) in which I have a lot of examples, will have smaller bins (and on the limit a bar per integer), while in areas which I have only a little information, the bar will include more integers, just like h in kernel estimations.
Any known method for such an example?
Related
I have a palette of colours, that's either 32 or 256. Then I have a stream of incoming colours (in RGB). I want to find out which colour in the palette the incoming colour matches most closely with. I believe this sort of algorithm is used in many image editing software.
So far, I came up with the following:
For each incoming colour, find the colour in the palette with the least distance, by finding out the distance from each colour in the palette.
For finding the distance, one of the following approaches:
Sum of squares of differences of R, G and B values ((R1-R2)² + (G1-G2)² + (B1-B2)²)
Convert the colour to HSV, use a weighted average of H, S and V values as the distance indicator. Something like 3 ✕ (H1-H2)² + 2 ✕ (S1-S2)² + (V1-V2)²
Distance in YCbCr
I am looking for two things in particular.
Is there a better way than to check the distance with each colour in the palette? I'm looking for some sort of binning algorithm to find the right colour from the palette.
If sticking with checking the distance with each item in the palette, are there andy standard formulae which are considered standard?
As often, the answer depends on why exactly you want to do that.
If the goal is to minimize the numerical approximation error, the sums of squared difference (or the maximum of the absolute differences) can do.
I would abstain from and arbitrary formula such a a weighted sum of differences in some colorimetric space, which have no precise justification and are hard to interpret.
If you want to keep the image as similar as possible to the original for a human eye, you can use a "perceptually uniform" metric, such as CIELAB ΔE*. Anyway, in my personal opinion, such metrics are overly sophisticated and bring little benefit. You will only see differences for colors that are distant from the colors in your palette, if at all.
I use example code to compare HSV histograms using EMD.
I want to find similar images in people's (mobile) picture library. It's quite common that people take several images of the same subject (in a row) with just slight changes: zooming in/out a bit, different angle, different exposure as a result of changing position, other pose, ....
I selected 4 sets of 4 similar images to test this algorithm. When comparing the images inside the sets, I get 22 EMD-L1 values between roughly 0.25 and 2.25 (average 1.47) and 2 outliers around 7.2.
When I cross-comparing between sets I get values between 2 and 15 with an average around 8.
Yes, there is a significant range difference between the two result sets. But I was disappointed that there was no (gap) between these ranges, and instead a small overlap [2.0, 2.25]. I'm hoping to improve the algorithm.
How can I optimise my comparison for my particular use-case? There are various histogram forms, various histogram comparison algorithms, and then each has various parameters.
Does OpenCV implement the fastest known EMD algorithm? I was surprised that the comparison of some histograms took up to a second; especially with the relatively small bin numbers.
Then, some cross-comparisons give good EMD results, but have totally different RGB histograms. Here are two images:
My current EMD-L1 says 1.95, but the RGB histograms are totally different.
Probably you've already refined your comparison method. But this might not be obvious, you could divide the image into overlapping subregions, and then compute the EMD for all 4 parts.
Is there anything faster than sliding window? I tried sort of binary search with overlapping rectangles - it kinda works but sometimes cuts off part of the blob (expected, right) - see the video in http://juick.com/lurker/2142051
Binary search makes no sense, because it is an algorithm for searching for specific values in a sorted structure.
Unless you have some apriori knowledge about the image, you need to check all possible locations, which is the sliding window method you suggested.
Chris is correct, unless you can say something about the statistics of the surrounding regions, e.g., "certain arrangements of pixels around the spot I'm looking for are unlikely". Note, this is different from saying "will never happen", and any algorithm based on statistical approaches will have an associated probability of (wrong box found).
If you think the statistics of the larger regions around your desired location might be informative, you might be able to do some block-processing on larger blocks before doing the fine-level sliding window. For example, if you can say with high probability that a certain 64 x 64 region doesn't contain the max, then, you can throw out a lot of [64 x 64] pixel regions, with 32 pixel overlap using (maybe) only a few features.
You can train something like AdaBoost to do this. See the classic Viola-Jones work which does this for face-detection http://en.wikipedia.org/wiki/Viola%E2%80%93Jones_object_detection_framework
If you absolutely need the maxima location, then like Chris said, you need to search everywhere.
Im using GPUImage in my project and I need an efficient way of taking the column sums. Naive way would obviously be retrieving the raw data and adding values of every column. Can anybody suggest a faster way for that?
One way to do this would be to use the approach I take with the GPUImageAverageColor class (as described in this answer), only instead of reducing the total size of each frame at each step, only do this for one dimension of the image.
The average color filter determines the average color of the overall image by stepping down in a factor of four in both X and Y, averaging 16 pixels into one at each step. If operating in a single direction, you should be able to use hardware interpolation to get an 18X reduction in a single direction per step with good performance. Your final step might either require a quick CPU-based iteration on the much smaller image or a tweaked version of this shader that pulls the last few pixels in a column together into the final result pixel for that column.
You notice that I've been talking about averaging here, because the output values for any OpenGL ES operation will need to be in terms of colors, which only have a 0-255 range per channel. A sum will easily overflow this, but you could use an average as an approximation of your sum, with a more limited dynamic range.
If you only care about one color channel, you could possibly encode a larger value into the RGBA channels and maintain a 32-bit sum that way.
Beyond what I describe above, you could look at performing this sum with the help of the Accelerate framework. While probably not quite as fast as doing a shader-based reduction, it might be good enough for your needs.
My lecturer has slides on edge histograms for image retrieval, whereby he states that one must first divide the image into 4x4 blocks, and then check for edges at the horizontal, vertical, +45°, and -45° orientations. He then states that this is then represented in a 14x1 histogram. I have no idea how he came about deciding that a 14x1 histogram must be created. Does anyone know how he came up with this value, or how to create an edge histogram?
Thanks.
The thing you are referring to is called the Histogram of Oriented Gradients (HoG). However, the math doesn't work out for your example. Normally you will choose spatial binning parameters (the 4x4 blocks). For each block, you'll compute the gradient magnitude at some number of different directions (in your case, just 2 directions). So, in each block you'll have N_{directions} measurements. Multiply this by the number of blocks (16 for you), and you see that you have 16*N_{directions} total measurements.
To form the histogram, you simply concatenate these measurements into one long vector. Any way to do the concatenation is fine as long as you keep track of the way you map the bin/direction combo into a slot in the 1-D histogram. This long histogram of concatenations is then most often used for machine learning tasks, like training a classifier to recognize some aspect of images based upon the way their gradients are oriented.
But in your case, the professor must be doing something special, because if you have 16 different image blocks (a 4x4 grid of image blocks), then you'd need to compute less than 1 measurement per block to end up with a total of 14 measurements in the overall histogram.
Alternatively, the professor might mean that you take the range of angles in between [-45,+45] and you divide that into 14 different values: -45, -45 + 90/14, -45 + 2*90/14, ... and so on.
If that is what the professor means, then in that case you get 14 orientation bins within a single block. Once everything is concatenated, you'd have one very long 14*16 = 224-component vector describing the whole image overall.
Incidentally, I have done a lot of testing with Python implementations of Histogram of Gradient, so you can see some of the work linked here or here. There is also some example code at that site, though a more well-supported version of HoG appears in scikits.image.