I'd like to estimate the predictability of a time series that is possibly chaotic. For this reason I thought that the Lyapunov exponents would be a good candidate.
I've done some research on the Internet but the explanations that I've found are not enough (at least to me) to implement it in Java.
Do you know where I could get a detailed algorithm / pseudocode for the Lyapunov exponents estimation for time series?
Alternatively, other measurements of time series predictability are welcome.
Thank you!
You can use Thomas S. Parker's and Leon Chua's book, Practical Numerical Algorithms for Chaotic Systems. It is a good practical book.
Related
Almost a year ago, a question (stackoverflow.com/questions/71027922) was asked concerning computing Hessians or other higher order derivatives using AutoDiff. I was wondering whether there has been any movement on this front.
I've found that for Jacobians, Drake's AutoDiff has much better performance (often over 10x wall clock time) vs. other frameworks like Jax for complicated functions I'm differentiating. To compute second-order derivatives, I've had to analytically compute the first derivative and then differentiate that.
If there are any unofficial workarounds in the meantime that could circumvent the need to compute the analytical first derivative so that I could do something like hessian(function, x), that would be much appreciated!
I'm a programmer who is interested in processing and analyzing time-series data. I know basic statistics and math, but I'm afraid that's all.
Can you please recommend good books and/or articles that does not require Ph.D. to understand them?
As for my concrete tasks - I want to be able to spot trends, eliminate outliers, be able to make predictions and calculate stats over a range of values. We have quite a bit of events coming off our systems.
I started reading "Introduction to Time Series and Forecasting" by Brockwell and Davis - and I'm completely lost in math.
update on outliers by outliers I mean data points that doesn't necessarily make sense. e.g. the exchange rate is 1.5$(+-10 cents) for a pound on average, but a guy around the corner offers 1.09$ and says he's completely legit.
I've found the NIST Engineering Statistics Handbook's chapter on time series to be a simple and clear introduction to basic time series modeling. It discusses exponential smoothing, auto-regressive, moving average, and eventually ARMA time series modeling. These can be used for trend analysis and possibly prediction, subject to validation.
Outlier/anomaly detection is a much different task; the NIST book doesn't have much on this. It would be helpful to know what kind of outliers you are trying to detect.
I've gone through numerous books and articles and here are my findings. May be they will help others like me.
Regarding theory - I found an article An Introductory Study on Time Series Modeling and Forecasting very well written. That doesn't mean I understood all of its contents, but it's a really good overview of available time series models.
If you're like me and like to see some actual code - there's article series on QuantStart. Examples are in R, but I guess many of them are portable to Python.
I can highly recommend QuantStart blog by Michael Halls-Moore, I found articles easy to read and the author has done a great job trying not to overwhelm a reader with math. I also read Michael's first book and it's a good one for a beginner in the space like me.
Textbooks on the topic are extremely hard for me to read. I tried Time Series Analysis by Hamilton, but haven't gotten far.
Regarding outlier detection I mentioned - I've found this question on SO and its stats counterpart. By the looks of it, it's not something you can study and implement in a couple of evenings, at least not for me.
I'm looking to analyze and compare the following `signals':
(Edit: better renderings here: oscillations good and here: oscillations bad)
What you see are plots of neuron activations from a type of artificial neural network plotted against time. Each line in the plot is a neuron's activation over time which can have a value between -1 and 1.
In the first plot, the activities are stable and consistent while the second exemplifies more chaotic activity (for want of a better term)-- some kind of destructive interference seems to occur ever so often..
Anyhow, I would like to do some kind of 'clever' analysis but since signal analysis is really not my strong point, thought I'd ask for some advice here...
EDIT: Let me clarify a bit. Ultimately, I would like to characterize the data. This could for example involve the pinpointing of correlations between the individual signals contained in each plot. I would like to measure 'regularity' or data invariance: in the above examples, the upper plot is more regular than the lower plot. I guess therefore I could compute the variance of each signal and take that as a measure; but I was wondering if some more comprehensive signal-processing technique could be better suited (I'm not sure). In fact I'm not even sure if signal-processing is what I really want now that I think about it. Perhaps some kind of wavelet or ft analysis...
For those interested, I am working on the computational modelling of worm locomotion.
You should consult some good books on nonlinear time series analysis. For instane, a measure for the regularity of your signal could be the Lyapunov spectrum. Another possibility would entropy. If you are interested in the correlation between signal, you could use transfer-entropy or granger causality, or for neurons it would be good to have a look at some measure for phase synchronization. The bayesian stuff could also be worth trying.
But – most important – firstly you need a proper question about what you really want to know. Once you've got that it is far more easy to pick the right tool.
And one final hint. Look for tools outside the engineering community. Their tools are mostly linear, but you are dealing with a highly nonlinear system. Wavelets, FFT and stuff are useful if you don't know anything about your signal and you want to have another perspective on it, but they are not suited for your kind of problem.
Consider an optimization problem of some dimension n, Given some linear set of equations(inequalities) or constraints on the inputs which form a convex region, finding the maximum\minimum value of some expression which is some linear combination of inputs(or dimensions).
For larger dimension, these optimization problems take much time to give the exact answer.
So, can we use machine learning techniques, to get some approximate solution in lesser time.
if we can use machine learning techniques in this context, How the Training set should be??
Do you mean "How big should the training set be?" If so, then that is very much a "how long is a piece of string" question. It needs to be large enough for the algorithm being used, and to represent the data that is being modeled.
This doesn't strike me as being especially focused on machine learning, as is typically meant by the term anyway. It's just a straightforward constrained optimization problem. You say that it takes too long to find solutions now, but you don't mention how you're trying to solve the problem.
The simplex algorithm is designed for this sort of problem, but it's exponential in the worst case. Is that what you're trying that's taking too long? If so, there are tons of metaheuristics that might perform well. Tabu search, simulated annealing, evolutionary algorithms, variable depth search, even simple multistart hill climbers. I would probably try something along those lines before I tried anything exotic.
I am working on testing several Machine Learning algorithm implementations, checking whether they can work as efficient as described in the papers and making sure they could offer a great power to our statistic NLP (Natural Language Processing) platform.
Could u guys show me some methods for testing an algorithm implementation?
1)What aspects?
2)How?
3)Do I have to follow some basic steps?
4)Do I have to consider diversity specific situations when using different programming languages?
5)Do I have to understand the algorithm? I mean, does it offer any help if I really know what the algorithm is and how it works?
Basically, we r using C or C++ to implement the algorithm and our working env is Linux/Unix. Our testing methods only focus on black box testing and testing input/output of functions. I am eager to improve them but I dont have any better idea now...
Great Thx!! LOL
For many machine learning and statistical classification tasks, the standard metric for measuring quality is Precision and Recall. Most published algorithms will make some kind of claim about these metrics, or you could implement them and run these tests yourself. This should provide a good indicative measure of the quality you can expect.
When you talk about efficiency of an algorithm, this is usually some statement about the time or space performance of an algorithm in terms of the size or complexity of its input (often expressed in Big O notation). Most published algorithms will report an upper bound on the time and space characteristics of the algorithm. You can use that as a comparative indicator, although you need to know a little bit about computational complexity in order to make sure you're not fooling yourself. You could also possibly derive this information from manual inspection of program code, but it's probably not necessary, because this information is almost always published along with the algorithm.
Finally, understanding the algorithm is always a good idea. It makes it easier to know what you need to do as a user of that algorithm to ensure you're getting the best possible results (and indeed to know whether the results you are getting are sensible or not), and it will allow you to apply quality measures such as those I suggested in the first paragraph of this answer.