When to use geometric vs arithmetic mean? - mean

So I guess this isn't technically a code question, but it's something that I'm sure will come up for other folks as well as myself while writing code, so hopefully it's still a good one to post on SO.
The Google has directed me to plenty of nice lengthy explanations of when to use one or the other as regards financial numbers, and things like that.
But my particular context doesn't fit in, and I'm wondering if anyone here has some insight. I need to take a whole bunch of individual users' votes on how "good" a particular item is. I.e., some number of users each give a particular item a score between 0 and 10, and I want to report on what the 'typical' score is. What would be the intuitive reasons to report the geometric and/or arithmetic mean as the typical response?
Or, for that matter, would I be better off reporting the median instead?
I imagine there's some psychology involved in what the "best" method might be...
Anyway, there you have it.
Thanks!

Generally speaking, the arithmetic mean will suffice. It is much less computationally intensive than the geometric mean (which involves taking an n-th root).
As for the psychology involved, the geometric mean is never greater than the arithmetic mean, so arithmetic is the best choice if you'd prefer higher scores in general.
The median is most useful when the data set is relatively small and the chance of a massive outlier relatively high. Depending on how much precision these votes can take, the median can sometimes end up being a bit arbitrary.
If you really really want the most accurate answer possible, you could go for calculating the arithmetic-geomtric mean. However, this involved calculating both arithmetic and geometric means repeatedly, so it is very computationally intensive in comparison.

you want the arithmetic mean. since you aren't measuring the average change in average or something.

Arithmetic mean is correct.
Your scale is artificial:
It is bounded, from 0 and 10
8.5 is intuitively between 8 and 9
But for other scales, you would need to consider the correct mean to use.
Some other examples
In counting money, it has been argued that wealth has logarithmic utility. So the median between Bill Gates' wealth and a bum in the inner city would be a moderately successful business person. (Arithmetic average would hive you Larry Page.)
In measuring sound level, decibels already normalizes the effect. So you can take arithmetic average of decibels.
But if you are measuring volume in watts, then use quadratic means (RMS).

The answer depends on the context and your purpose. Percent changes were mentioned as a good time to use geometric mean. I use geometric mean when calculating antennas and frequencies since the percentage change is more important than the average or middle of the frequency range or average size of the antenna is concerned. If you have wildly varying numbers, especially if most are similar but one or two are "flyers" (far from the range of the others) the geometric mean will "smooth" the results (not let the different ones exert a change in the results more than they should). This method is used to calculate bullet group sizes (the "flyer" was probably human error, not the equipment, so the average is ""unfair" in that case). Another variation similar to geometric mean is the root mean square method. First you take the square root of the numbers, take THAT mean, and then square your answer (this provides even more smoothing). This is often used in electrical calculations and most electical meters are calculated in "RMS" (root mean square), not average readings. Hope this helps a little. Here is a web site that explains it pretty well. standardwisdom.com

Related

What is a Distance Sensitive Data how it Differs from other Data? Any Examples will be helpful

i was reading about Classification Algorithm KNN and came across with one term Distance Sensitive Data. I was not able to Found what exactly is Distance Sensitive Data wha are it's classifications, How to say if our Data is Distance-Sensitive or Not?
Suppose that xi and xj are vectors of observed features in cases i and j. Then, as you probably know, kNN is based on distances ||xi-xj||, such as the Euclidean one.
Now if xi and xj contain just a single feature, individual's height in meters, we are fine, as there are no other "competing" features. Suppose that next we add annual salary in thousands. Consequently, we look at distances between vectors like (1.7, 50000) and (1.8, 100000).
Then, in the case of the Euclidean distance, clearly salary feature dominates height and it's almost like we are using the salary feature alone. That is,
||xi-xj||2 ≈ |50000-100000|.
However, if the two features actually have similar importance, then we are doing a poor job. It is even worse if salary is actually irrelevant and we should be using height alone. Interestingly, under weak conditions, our classifier still has nice properties such as universal consistency even in such bad situations. The problem is that in finite samples the performance is our classifier is very bad so that the convergence is very slow.
So, as to deal with that, one may want to consider different distances, such that do something about the scale. Commonly people standardize (set the mean to zero and variance to 1) each feature, but that's not a complete solution either. There are various proposals what could be done (see, e.g., here).
On the other hand, algorithms based on decision trees do not suffer from this. In those cases we just look for a point where to split the variable. For instance, if salary takes values in [0,100000] and the split is at 40000, then Salary/10 would be slit at 4000 so that the results would not change.

When are precision and recall inversely related?

I am reading about precision and recall in machine learning.
Question 1: When are precision and recall inversely related? That is, when does the situation occur where you can improve your precision but at the cost of lower recall, and vice versa? The Wikipedia article states:
Often, there is an inverse relationship between precision and recall,
where it is possible to increase one at the cost of reducing the
other. Brain surgery provides an obvious example of the tradeoff.
However, I have seen research experiment results where both precision and recall increase simultaneously (for example, as you use different or more features).
In what scenarios does the inverse relationship hold?
Question 2: I'm familiar with the precision and recall concept in two fields: information retrieval (e.g. "return 100 most relevant pages out of a 1MM page corpus") and binary classification (e.g. "classify each of these 100 patients as having the disease or not"). Are precision and recall inversely related in both or one of these fields?
The inverse relation only holds when you have some parameter in the system that you can vary in order to get more/less results. Then there's a straightforward relationship: you lower the threshold to get more results and among them some are TPs and some FPs. This, actually, doesn't always mean that precision or recall will rise and fall simultaneously - the real relationship can be mapped using the ROC curve. As for Q2, likewise, in both of these tasks precision and recall are not necessarily inversely related.
So, how do you increase recall or precision, not impacting the other simultaneously? Usually, by improving the algorithm or model. I.e. when you just change parameters of a given model, the inverse relationship will usually hold, although you should mind that it will also be usually non-linear. But if you, for example, add more descriptive features to the model, you can increase both metrics at once.
Regarding the first question, I interpret these concepts in terms of how restrictive your results must be.
If you're more restrictive, I mean, if you're more "demanding on the correctness" of the results, you want it to be more precise. For that, you might be willing to reject some correct results as long as everything you get is correct. Thus, you're raising your precision and lowering your recall. Conversely, if you do not mind getting some incorrect results as long as you get all the correct ones, you're raising your recall and lowering your precision.
On what concerns the second question, if I look at it from the point of view of the paragraphs above, I can say that yes, they are inversely related.
To the best of my knowledge, In order to be able to increase both, precision and recall, you'll need either, a better model (more suitable for your problem) or better data (or both, actually).

Complex interpolation on an FPGA

I have a problem in that I need to implement an algorithm on an FPGA that requires a large array of data that is too large to fit into block or distributed memory. The array contains complex fixed-point values, and it turns out that I can do a good job by reducing the total number of stored values through decimation and then linearly interpolating the interim values on demand.
Though I have DSP blocks (and so fixed-point hardware multipliers) which could be used trivially for real and imaginary part interpolation, I actually want to do the interpolation on the amplitude and angle (of the polar form of the complex number) and then convert the result to real-imaginary form. The data can be stored in polar form if it improves things.
I think my question boils down to this: How should I quickly convert between polar complex numbers and real-imaginary complex numbers (and back again) on an FPGA (noting availability of DSP hardware)? The solution need not be exact, just close, but be speed optimised. Alternatively, better strategies are gladly received!
edit: I know about cordic techniques, so this would be how I would do it in the absence of a better idea. Are there refinements specific to this problem I could invoke?
Another edit: Following from #mbschenkel's question, and some more thinking on my part, I wanted to know if there were any known tricks specific to the problem of polar interpolation.
In my case, the dominant variation between samples is a phase rotation, with a slowly varying amplitude. Since the sampling grid is known ahead of time and is regular, one trick could be to precompute some complex interpolation factors. So, for two complex values a and b, if we wish to find (N-1) intermediate equally spaced values, we can precompute the factor
scale = (abs(b)/abs(a))**(1/N)*exp(1j*(angle(b)-angle(a)))/N)
and then find each intermediate value iteratively as val[n] = scale * val[n-1] where val[0] = a.
This works well for me as I need the samples in order and I compute them all. For small variations in amplitude (i.e. abs(b)/abs(a) ~= 1) and 0 < n < N, (abs(b)/abs(a))**(n/N) is approximately linear (though linear is not necessarily better).
The above is all very good, but still results in a complex multiplication. Are there other options for approximating this? I'm interested in resource and speed constraints, not accuracy. I know I can do the rotation with CORDIC, but still need a pair of multiplications for the scaling, so I'm adding lots of complexity and resource usage for potentially limited results. I don't really have a feel for the convergence of CORDIC, so perhaps I just truncate early, or use lots of resources to converge quickly.

Genetic Algorithm, large population vs small one

Im wondering if there is a general rule of thumb for population sizing. Ive read in a book that 2x the chromosome length is a good starting point. Am i correct in assuming then that if i had an equation with 5 variables, i should have a population of 10?
Im also wondering if the following is correct:
Larger Population Size.
Pros:
Larger diversity so more likely to pick up on traits which return a good fitness.
Cons:
Requires longer to process.
vs
Smaller Population Size.
Pros:
Larger number of generations experienced per unit time.
Cons:
Mutation will have to be more prominent in order to compensate for smaller population??
EDIT
A little additional info, say i have an equation which has 5 unknown parameters. For each parameter i have anywhere between 10-50 values i would like to try assign to each of these variables. So for example
variable1 = 20 different values
variable2 = 15 different values
...
I thought a GA would be a decent approach to such a problem as the search space is quite large, ie worst case for the above would be 312,500,000 permutations (unless i have screwed up?) n!/(n-k)! where n = 50 and k = 1 => 50 * 50 * 50 * 50 * 50
unfortunately the number of parameters/range of values to check can vary alot so i was looking for some sort of rule of thumb as to how large i should set the population.
Thanks for ur help + if there is any more info you need/prefer to discuss in one of the chatrooms, just give me a shout.
I'm not sure where you read that 2x the chromosome length is a good starting point, but I'm guessing it's a book that concentrated on larger problems.
If you only have five variables, a genetic algorithm is probably not the right choice for converging upon a solution. With a chromosome length of five you're probably going to find that you very quickly reach a non-deterministic(this will change in subsequent runs) local minimum and slowly iterate around that space until you find the true local minimum.
However, if you are insistent on using a GA I would suggest abandoning that rule of thumb for this problem and really think about starting population as a measure of how far from the final solution you expect a random solution to be.
The reason that many rule of thumbs is dependent on chromosome length is because that's a decent proxy for this, if I have a hundred variables, and given randomly generating dna sequence is going to be further from ideal than if I had only one variable.
Additionally, if you're worried about computation intensity I'm going to go ahead and say that it shouldn't be an issue since you're dealing with such a small solution set. I think a better rule of thumb for smaller sets like this would be along the lines of:
(ln(chromosome_length*(solution_space/granularity)/mutation_rate))^2
Probably with a constant thrown in to scale for the particular problem.
It's definitely not a great rule of thumb (no rule is) but here's my logic for it:
Chromosome length is just a proxy for size of solution space, so taking into account the size of the solution space will necessarily increase the accuracy of this proxy
A smaller mutation rate necessitates a larger population size to compensate for the fact that you are more prone to get caught in local minima
Any rule of thumb should scale logarithmically since a genetic algorithm is akin to a tree search of your solution space.
The squared term was mostly the result of trying this out, but it looks like the logarithmic scaling was a little aggressive, though the general shape seemed right.
However I think a better choice would be to start at a reasonable number (100) and try iterating up and down until you find a population size that seems to balance accuracy with execution speed.
As with most genetic algorithm parameters population size is highly dependant on the problem. There are certain factors that can help to point in the direction of whether you should have a large or small population size but a lot of the time testing different values against a known solution before running it on your problem is a good idea (if this is possible of course).
A population size of 10 does seem rather small though. You say you have an equation with five variables. Is your problem represented by a chromosome of 5 values? It seems small for a chromosome and if this is the case it's likely that using a genetic algorithm may not be the best way to solve the problem. Perhaps if you give a bit more detail on your problem and how you are representing it people may have a better idea of how to advise you.
I'd also add that your cons for large and small population sizes aren't exactly correct. A larger population size does take longer to process than a small one but since it can often solve the problem quicker then overall the processing time isn't necessarily longer. gain, it's highly dependant on the problem. With a smaller population size mutation shouldn't have to be more prominent. Mutation is generally used to stop the genetic algorithm from becoming stuck in a local maximum and should usually be a very small value. A small population is more likely to become stuck in a local maximum but if you have a mutation value which is too high you may be nullifying the natural improvement of the genetic algorithm.

Levenshtein Distance Algorithm better than O(n*m)?

I have been looking for an advanced levenshtein distance algorithm, and the best I have found so far is O(n*m) where n and m are the lengths of the two strings. The reason why the algorithm is at this scale is because of space, not time, with the creation of a matrix of the two strings such as this one:
Is there a publicly-available levenshtein algorithm which is better than O(n*m)? I am not averse to looking at advanced computer science papers & research, but haven't been able to find anything. I have found one company, Exorbyte, which supposedly has built a super-advanced and super-fast Levenshtein algorithm but of course that is a trade secret. I am building an iPhone app which I would like to use the Levenshtein distance calculation. There is an objective-c implementation available, but with the limited amount of memory on iPods and iPhones, I'd like to find a better algorithm if possible.
Are you interested in reducing the time complexity or the space complexity ? The average time complexity can be reduced O(n + d^2), where n is the length of the longer string and d is the edit distance. If you are only interested in the edit distance and not interested in reconstructing the edit sequence, you only need to keep the last two rows of the matrix in memory, so that will be order(n).
If you can afford to approximate, there are poly-logarithmic approximations.
For the O(n +d^2) algorithm look for Ukkonen's optimization or its enhancement Enhanced Ukkonen. The best approximation that I know of is this one by Andoni, Krauthgamer, Onak
If you only want the threshold function - eg, to test if the distance is under a certain threshold - you can reduce the time and space complexity by only calculating the n values either side of the main diagonal in the array. You can also use Levenshtein Automata to evaluate many words against a single base word in O(n) time - and the construction of the automatons can be done in O(m) time, too.
Look in Wiki - they have some ideas to improve this algorithm to better space complexity:
Wiki-Link: Levenshtein distance
Quoting:
We can adapt the algorithm to use less space, O(m) instead of O(mn), since it only requires that the previous row and current row be stored at any one time.
I found another optimization that claims to be O(max(m, n)):
http://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Levenshtein_distance#C
(the second C implementation)

Resources