Pydrake Autodiff Hessian - drake

Almost a year ago, a question (stackoverflow.com/questions/71027922) was asked concerning computing Hessians or other higher order derivatives using AutoDiff. I was wondering whether there has been any movement on this front.
I've found that for Jacobians, Drake's AutoDiff has much better performance (often over 10x wall clock time) vs. other frameworks like Jax for complicated functions I'm differentiating. To compute second-order derivatives, I've had to analytically compute the first derivative and then differentiate that.
If there are any unofficial workarounds in the meantime that could circumvent the need to compute the analytical first derivative so that I could do something like hessian(function, x), that would be much appreciated!

Related

How can I further analyze high frequency data from discrete wavelet transform?

I applied a discrete wavelet transform to horizontal wind speed data to receive the below plot. I'm basically trying to use the information from the detail coefficient (the turbulent flow) for further analysis, but I'm not sure the best direction to go in. I don't have much experience with Wavelet Transform, so forgive me if there are obvious options, but the examples I've seen usually discard the higher frequency information since it's the noise of the signal. Is there anything further I can do with this discrete wavelet transform like statistic analysis or forecasting?
The path to pursue really depends on the question that you are trying to answer.
First of all, I would suggest double checking that your DWT is actually doing what you expect it to do. The plot that you shared suggests that it is successful in separating the low frequency coherent (laminar?) flow from the high frequency turbulent flow, but it would be helpful to figure out which frequencies are present in the high frequency component in order to confirm that the processing parameters (e.g. decomposition level) were properly chosen.
Once convinced that your wavelet decomposition provides you with useful information about the turbulent flow, what should you do with these high pass filtered data?
I suggest computing their variance over 1 hour long intervals. This is a measure of the "energy" of the signal over the chosen interval. If you are dealing with large amounts of data this would allow you to boil down your time series into a single sample per hour. Maybe you will be able to spot diurnal variations in the turbulent flow (e.g. maybe turbulent flow is higher at dawn). If you have multiple stations it would be interesting to study if the turbulence variations share the same behavior.
Before venturing into time series forecasting, I would really take a closer look at you data and try to identify trends or nail down possible outliers.
Last but not least, I would suggest posting your question on Physics Stack Exchange (e.g. https://physics.stackexchange.com/) rather than on SO.

Recommended local search optimization algorithm for control domain

Background: I am trying to find a list of floating point parameters for a low level controller that will lead to balance of a robot while it is walking.
Question: Can anybody recommend me any local search algorithms that will perform well for the domain I just described? The main criteria for me is the speed of convergence to the right solution.
Any help will be greatly appreciated!
P.S. Also, I conducted some research and found out that "Evolutianry
Strategy" algorithms are a good fit for continuous state space. However, I am not entirely sure, if they will fit well my particular problem.
More info: I am trying to optimize 8 parameters (although it is possible for me to reduce the number of parameters to 4). I do have a simulator and a criteria for me is speed in number of trials because simulation resets are costly (take 10-15 seconds on average).
One of the best local search algorithms for low number of dimensions (up to about 10 or so) is the Nelder-Mead simplex method. By the way, it is used as the default optimizer in MATLAB's fminsearch function. I personally used this method for finding parameters of some textbook 2nd or 3rd degree dynamic system (though very simple one).
Other option are the already mentioned evolutionary strategies. Currently the best one is the Covariance Matrix Adaption ES, or CMA-ES. There are variations to this algorithm, e.g. BI-POP CMA-ES etc. that are probably better than the vanilla version.
You just have to try what works best for you.
In addition to evolutionary algorithm, I recommend you also check reinforcement learning.
The right method depends a lot on the details of your problem. How many parameters? Do you have a simulator? Do you work in simulation only, or also with real hardware? Speed is in number of trials, or CPU time?

Apple Accelerate vDSP fft vs DFT and scaling factors

I am an experienced programmer but I don't have a lot of experience implementing DSP routines.
I've been banging my head against this for weeks if not months. My question is two fold, concerning Apple's Accelerate framework:
1)
In the vDSP.h header there are comments to the effect of: please use vDSP_DFT_XXX instead of the (i guess) older versions vDSP_fft_XXX. However there are zero examples of this outside of Apple's https://developer.apple.com/library/prerelease/mac/samplecode/vDSPExamples/Listings/DemonstrateDFT_c.html#//apple_ref/doc/uid/DTS10004300-DemonstrateDFT_c-DontLinkElementID_6. Maybe it's just that the DFT functions are newer? If so, fine and dandy.
2)
Scaling factors. I can read the documentation (https://developer.apple.com/library/mac/documentation/Performance/Conceptual/vDSP_Programming_Guide/UsingFourierTransforms/UsingFourierTransforms.html#//apple_ref/doc/uid/TP40005147-CH202-16195), it says in the case of an FFT on a real input, like audio that I am working with, the resulting value of each of the Fourier coefficients is 2x the actual, mathematical value.
And yet, in every example, including Apple's own, the scaling factor used for the resulting vsmul() function looks like it is 1/2*N instead of 1/2 as I would expect.
Further, there is no documentation about the scaling factors for the vDSP_DFT_XXX routines, but I assume that they just wrap the older ones?
Any insight into either of these questions would be greatly appreciated! Hopefully I'm just missing something basic about the way that FFT's are implemented in this framework (or in general).
There are at least 3 different FFT scaling options that produce "mathematical" results, and there is no single standard scaling. Energy preserving (see Parseval's theorem) FFT libraries need to be scaled by on the order of 1/N for input magnitude results, since a longer signal of the same magnitude will have proportionally more energy. vDSP uses an energy preserving forward FFT.

Non mahout java - implementation of Canopy clustering

I have my own java based implementation of clustering (knn). However I am facing scalability issues. I do not plan to use Mahout because my requirements are very simple and mahout requires lot of work. I am looking for java based Canopy clustering implementation which i can plug into my algo and do parellel processing.
Mahout based Canopy libraries are coupled with Vectors and indexes and does not work on plain strings. If you know of the way, where i can use canopy clustering on strings using simple library, it would fix my issue.
My requirement is to pass list of strings(say 10K) to Canopy clustering algo and it should return sublists based on T1 and T2.
Canopy clustering is mostly useful as a preprocessing step for parallelization. I'm not sure how much it will get you on a single node. I figure you might as well compute the actual algorithm right away, or build an index such as an M-tree.
The strength of Canopy clustering is that you can run it independently on a number of nodes and then just overlap their results.
Also check if it actually is compatible to your approach. I figure that canopy might need metric properties to be correct. Is your string distance a proper metric (i.e. triangle inequality)?
10,000 data points, if that's all you're concerned with, should be no problem with standard k-means. I'd look at optimising that before you consider canopy clustering (which is really designed for millions or even billions of examples). Some things you may have missed:
pre-compute the feature vectors for each string. Don't do it every time you want to compare s_1 to s_2 or s_1 to cluster centroid
you only need to keep the summary statistics in memory: the sum of all points assigned to a cluster and the number of points assigned to a cluster. When you're done with an iteration, divides sums by ns and you have your new centroids.
what's the dimensionality of your feature space? be aware that you should use a distance metric where the dimensions where both vectors are zero have no impact, so you should only need to compute for non-zero dimensions. Store your points as sparse vectors to facilitate this.
Can you do some analysis and determine where the bottle-neck in your implementation is? I'm a little perplexed by your comment about Mahout not working with plain strings.
You should give the clustering algorithms in ELKI a try. Sorry for so shamelessly promoting a project I'm closely affiliated with. But it is the largest collection of clustering and outlier detection algorithms that are implemented in a comparable fashion. (If you'd take all the clustering algorithms available in some R package, you might end up with more algorithms, but they won't be really comparable because of implementation differences)
And benchmarking showed enormous speed differences with different implementations of the same algorithm. See our benchmarking web site on how much performance can vary even on simple algorithms such as k-means.
We do not yet have Canopy Clustering. The reason is that it's more of a preprocessing index than actually a clustering algorithm. Kind of like a primitive variant of the M-tree, or of DBSCAN clustering. However, we should would like to see a contributed canopy clustering as such a preprocessing step.
ELKIs abilities to process strings are also a bit limited so far. You can load typical TF-IDF vectors just fine and we have somewhat optimized sparse vector classes and similarity functions. They don't fully exploit sparsity for k-means yet, though, and there is no spherical k-means yet either. But there are various reasons why k-means results on sparse vectors cannot be expected to be very meaningful; it's more of a heuristic.
But it would be interesting if you could give it a try for your problem and report back your experiences. Was the performance somewhat competitive with your implementation? And we would love to see contributed modules for text processing, such as e.g. further optimized similarity functions, or a spherical k-means variant.
Update: ELKI now actually includes CanopyClustering: CanopyPreClustering (will be part of 0.6.0 then). But as of now, it's just another clustering algorithm, and not yet used to accelerate other algorithms such as k-means. I need to check how to best use it as some kind of index to accelerate algorithms. I can imagine it also helps for speeding up DBSCAN if you set T1=epsilon and T2=0.5*T1. The big issue with CanopyClustering IMHO is how to choose a good radius.

What should I look for in the analysis of the attached signals?

I'm looking to analyze and compare the following `signals':
(Edit: better renderings here: oscillations good and here: oscillations bad)
What you see are plots of neuron activations from a type of artificial neural network plotted against time. Each line in the plot is a neuron's activation over time which can have a value between -1 and 1.
In the first plot, the activities are stable and consistent while the second exemplifies more chaotic activity (for want of a better term)-- some kind of destructive interference seems to occur ever so often..
Anyhow, I would like to do some kind of 'clever' analysis but since signal analysis is really not my strong point, thought I'd ask for some advice here...
EDIT: Let me clarify a bit. Ultimately, I would like to characterize the data. This could for example involve the pinpointing of correlations between the individual signals contained in each plot. I would like to measure 'regularity' or data invariance: in the above examples, the upper plot is more regular than the lower plot. I guess therefore I could compute the variance of each signal and take that as a measure; but I was wondering if some more comprehensive signal-processing technique could be better suited (I'm not sure). In fact I'm not even sure if signal-processing is what I really want now that I think about it. Perhaps some kind of wavelet or ft analysis...
For those interested, I am working on the computational modelling of worm locomotion.
You should consult some good books on nonlinear time series analysis. For instane, a measure for the regularity of your signal could be the Lyapunov spectrum. Another possibility would entropy. If you are interested in the correlation between signal, you could use transfer-entropy or granger causality, or for neurons it would be good to have a look at some measure for phase synchronization. The bayesian stuff could also be worth trying.
But – most important – firstly you need a proper question about what you really want to know. Once you've got that it is far more easy to pick the right tool.
And one final hint. Look for tools outside the engineering community. Their tools are mostly linear, but you are dealing with a highly nonlinear system. Wavelets, FFT and stuff are useful if you don't know anything about your signal and you want to have another perspective on it, but they are not suited for your kind of problem.

Resources