Most meaningful way to compare multiple time series - time-series

I need to write a program that performs arithmetic (+-*/) on multiples time series of different date range (mostly from 2007-2009) and frequency (weekly, monthly, yearly...).
I came up with:
find the series with the highest freq. then fill in the other series with zeros so they have the same number of elements. then perform the operation.
How can I present the data in the most meaningful way?
Trying to think of all the possibilities

If zero can be a meaningful value for this time series (e.g. temperature in Celsius degrees), it might not be a good idea to fill all gaps with zeros (i.e. you will not be able to distinguish between the real and stub values afterwards). You might want to interpolate your time series. Basic data structure for this can be array/double linked list.

You can take several approaches:
use the finest-grained time series data (for instance, seconds) and interpolate/fill data when needed
use the coarsest-grained (for instance, years) and summarize data when needed
any middle step between the two extremes
You should always know your data, because:
in case of interpolating you have to choose the best algorithm (linear or quadratic interpolation, splines, exponential...)
in case of summarizing you have to choose an appropriate aggregation function (sum, maximum, mean...)
Once you have the same time scales for all the time series you can perform your arithmetical magick, but be aware that interpolation generates extra information, and summarization removes available information.

I've studied this problem fairly extensively. The danger of interpolation methods is that you bias various measures - especially volatility - and introduce spurious correlation. I found that Fourier interpolation mitigated this to some extent but the better approach is to go the other way: aggregate your more frequent observations to match the periodicity of your less frequent series, then compare these.

Related

Machine learning algorithm to predict/find/converge to correct parameters in mathematical model

I am currently trying to find a machine learning algorithm that can predict about 5 - 15 parameters used in a mathematical model(MM). The MM has 4 different ordinary differential equations(ODE) and a few more will be added and thus more parameters will be needed. Most of the parameters can be measured, but others need to be guessed. We know all the 15 parameters, but we want the computer to guess 5 or even 10. To test if parameters are correct, we fill in the parameters in the MM and then calculate the ODEs with a numerical method. Subsequently we calculate the error between the calculations of the model with the parameters we know(and want to guess) and the calculated values of the MM for which we guessed the parameters. Calculating the values of the models ODEs is done multiple times, the ODEs represent one minute in real time and we calculate for 24 hours, thus 1440 calculations.
Currently we are using a particle filter to gues the variables, this works okay but we want to see if there are any better methods out there to gues parameters in a model. The particle filter takes a random value for a parameter which lies between a range we know about the parameter, e.g. 0,001 - 0,01. this is done for each parameter that needs to be guessed.
If you can run a lot of full simulations (tens of thousands) you can try black-box optimization. I'm not sure if black-box is the right approach for you (I'm not familiar with particle filters). But if it is, CMA-ES is a clear match here and easy to try.
You have to specify a loss function (e.g. the total sum of square errors for a whole simulation) and an initial guess (mean and sigma) for your parameters. Among black-box algorithms CMA-ES is a well-established baseline. It is hard to beat if you have only few (at most a few hundreds) continuous parameters and no gradient information. However anything less black-box-ish that can e.g. exploit the ODE nature of your problem will do better.

When do you control for initial judgment vs. take the difference between first and second judgment?

I am analyzing data for my dissertation, and I have participants see initial information, make judgments, see additional information, and make the same judgments again. I don't know how or if I need to control for these initial judgments when doing analyses about the second judgments.
I understand that the first judgments cannot be covariates because they are affected by my IV/manipulations. Also, I only expect the second judgments to change for some conditions, so if I use the difference between first and second judgments, I only expect that to change for two of my four conditions.
A common way to handle comparisons between the first and second judgments would be as paired data. If condition is a between-subjects factor, then a between x within design using repeated measures ANOVA or for judgments where the scaling isn't such that you're willing to make assumptions necessary for linear models, using a generalized linear model setup that handles repeated measurements might be applicable. In SPSS for linear models, you can set up the judgments as two different variables and condition as a third, then use Analyze>General Linear Models>Repeated Measures. For generalized linear models you can use with generalized estimating equations (GEE) or mixed models, though these require a fair amount of data to be reliable. In the menus, there are Analyze>Generalized Linear Models>Generalized Estimating Equations and Analyze>Mixed Models>Generalized Linear, respectively. Each of these requires data setup for repeated measures to be in the "long" or "narrow" format, where you have a subject ID variable, a time index, the judgment variable, and the condition variable. You'd have two cases per subject, one for each time point.

What does it mean to have zero mean in the data?

I'm trying to find ways to normalize my dataset (represented as a matrix with documents as rows and columns as features) and I came across a technique called feature scaling. I found a Wikipedia article on it here.
One of the methods listed is Standardization which says "Feature standardization makes the values of each feature in the data have zero-mean and unit-variance." What does that mean (no pun intended)?
In this method, "we subtract the mean from each feature. Then we divide the values (mean is already subtracted) of each feature by its standard deviation." When they say 'subtract the mean', is it the mean of the entire matrix or the mean of the column pertaining to that feature?
Also, if this feature scaling method is applied, does the mean not have to be subtracted from columns when performing Principal Component Analysis (PCA) on the data?
The basic idea is to do a simple (and reversible) transformation on your dataset set to make it easier to handle. You are subtracting a constant from each column and then dividing each column by a (different) constant. Those constants are column-specific.
When they say 'subtract the mean', is it the mean of the entire matrix
or the mean of the column pertaining to that feature?
The mean of the column pertaining to that feature.
...does the mean not have to be subtracted from columns when performing Principal Component Analysis (PCA) on the data?
Correct. PCA requires data with a mean of zero. Usually this is enforced by subtracting the mean as a first step. If the mean has already been subtracted that step is not required. However, there is no harm in performing the "subtract the mean" operation twice. Because the second time the mean will be zero, so nothing will change. Formally, we might say that standardization is idempotent.
From looking at the article, my understanding is that you would subtract the mean of that feature. This will give you a set of data for the feature that describes the same layout of the data but normalized.
Imagine you added data for a new feature. You're probably going to want the data for your original features to remain the same, and not be influenced by the new feature.
I guess you would still get a "standardized" range of values if you subtracted the mean of the whole data set, but that would be something different - you're probably more interested in how the data of a single feature lies around its mean.
You could also have a look (or ask the question) on math.stackexchange.com.

Complex interpolation on an FPGA

I have a problem in that I need to implement an algorithm on an FPGA that requires a large array of data that is too large to fit into block or distributed memory. The array contains complex fixed-point values, and it turns out that I can do a good job by reducing the total number of stored values through decimation and then linearly interpolating the interim values on demand.
Though I have DSP blocks (and so fixed-point hardware multipliers) which could be used trivially for real and imaginary part interpolation, I actually want to do the interpolation on the amplitude and angle (of the polar form of the complex number) and then convert the result to real-imaginary form. The data can be stored in polar form if it improves things.
I think my question boils down to this: How should I quickly convert between polar complex numbers and real-imaginary complex numbers (and back again) on an FPGA (noting availability of DSP hardware)? The solution need not be exact, just close, but be speed optimised. Alternatively, better strategies are gladly received!
edit: I know about cordic techniques, so this would be how I would do it in the absence of a better idea. Are there refinements specific to this problem I could invoke?
Another edit: Following from #mbschenkel's question, and some more thinking on my part, I wanted to know if there were any known tricks specific to the problem of polar interpolation.
In my case, the dominant variation between samples is a phase rotation, with a slowly varying amplitude. Since the sampling grid is known ahead of time and is regular, one trick could be to precompute some complex interpolation factors. So, for two complex values a and b, if we wish to find (N-1) intermediate equally spaced values, we can precompute the factor
scale = (abs(b)/abs(a))**(1/N)*exp(1j*(angle(b)-angle(a)))/N)
and then find each intermediate value iteratively as val[n] = scale * val[n-1] where val[0] = a.
This works well for me as I need the samples in order and I compute them all. For small variations in amplitude (i.e. abs(b)/abs(a) ~= 1) and 0 < n < N, (abs(b)/abs(a))**(n/N) is approximately linear (though linear is not necessarily better).
The above is all very good, but still results in a complex multiplication. Are there other options for approximating this? I'm interested in resource and speed constraints, not accuracy. I know I can do the rotation with CORDIC, but still need a pair of multiplications for the scaling, so I'm adding lots of complexity and resource usage for potentially limited results. I don't really have a feel for the convergence of CORDIC, so perhaps I just truncate early, or use lots of resources to converge quickly.

Levenshtein Distance Algorithm better than O(n*m)?

I have been looking for an advanced levenshtein distance algorithm, and the best I have found so far is O(n*m) where n and m are the lengths of the two strings. The reason why the algorithm is at this scale is because of space, not time, with the creation of a matrix of the two strings such as this one:
Is there a publicly-available levenshtein algorithm which is better than O(n*m)? I am not averse to looking at advanced computer science papers & research, but haven't been able to find anything. I have found one company, Exorbyte, which supposedly has built a super-advanced and super-fast Levenshtein algorithm but of course that is a trade secret. I am building an iPhone app which I would like to use the Levenshtein distance calculation. There is an objective-c implementation available, but with the limited amount of memory on iPods and iPhones, I'd like to find a better algorithm if possible.
Are you interested in reducing the time complexity or the space complexity ? The average time complexity can be reduced O(n + d^2), where n is the length of the longer string and d is the edit distance. If you are only interested in the edit distance and not interested in reconstructing the edit sequence, you only need to keep the last two rows of the matrix in memory, so that will be order(n).
If you can afford to approximate, there are poly-logarithmic approximations.
For the O(n +d^2) algorithm look for Ukkonen's optimization or its enhancement Enhanced Ukkonen. The best approximation that I know of is this one by Andoni, Krauthgamer, Onak
If you only want the threshold function - eg, to test if the distance is under a certain threshold - you can reduce the time and space complexity by only calculating the n values either side of the main diagonal in the array. You can also use Levenshtein Automata to evaluate many words against a single base word in O(n) time - and the construction of the automatons can be done in O(m) time, too.
Look in Wiki - they have some ideas to improve this algorithm to better space complexity:
Wiki-Link: Levenshtein distance
Quoting:
We can adapt the algorithm to use less space, O(m) instead of O(mn), since it only requires that the previous row and current row be stored at any one time.
I found another optimization that claims to be O(max(m, n)):
http://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Levenshtein_distance#C
(the second C implementation)

Resources