Process and measurement covariance in EKF - robotics

What is a practical way to find process/measurement covariances in EKF? I may have some ground-truth data for this task.
Also, how does non-constant measurement time intervals influence the above covariances?

Related

Isolation Forest for time series data

I just wonder if the isolation Forest (iForest) can work with time-series data. As far as I know, iForest is used for anomaly detection and it is based on randomization techniques to randomly and recursively partition the data and then save the partition in a tree structure.
I have a theoretical question. I just wonder if the iForest can work with the time series data since it is based on some randomization techniques. Would this violate the time series characteristics as the randomization may break the time dependencies?.
Isolation forest will help with detecting point anomalies by default, since in principle it is just working on the rarity of these observations.
But let’s say I am interested in anomalies in time series data. Isolation forest will be able to pick out the extreme Peaks and troughs that occur as point anomalies here but for collective anomalies, you may need to transform the data such that each observation represents a collection of observations (rolling window operations) etc.
The reason is that in time series data you are interested in additive outliers or temporal changes and thus your observations must represent that individually if you plan to use Isolation forest. But you can try other techniques such as STL decomposition, Arima, regression trees, exponential smoothing. You should find a lot of material on how to use the above for anomaly detection in time series.

Kalman Filtering need

I have a swarm robotic project. The localization system is done using ultrasonic and infrared transmitters/receivers. The accuracy is +-7 cm. I was able to do follow the leader algorithm. However, i was wondering why do i still have to use Kalman filter if the sensors raw data are good? what will it improve? isn't just will delay the coordinate being sent to the robots (the coordinates won't be updated instantly as it will take time to do the kalman filter math as each robot send its coordinates 4 times a second)
Sensor data is NEVER the truth, no matter how good they are. They will always be perturbed by some noise. Additionally, they do have finite precision. So sensor data is nothing but an observation that you make, and what you want to do is estimate the true state based on these observations. In mathematical terms, you want to estimate a likelihood or joint probability based on those measurements. You can do that using different tools depending on the context. One such tool is the Kalman filter, which in the simplest case is just a moving average, but is usually used in conjunction with dynamic models and some assumptions on error/state distributions in order to be useful. The dynamic models model the state propagation (e.g motion knowing previous states) and the observation (measurements), and in robotics/SLAM one often assumes the error is Gaussian. A very important and useful product of such filters is an estimation of the uncertainty in terms of covariances.
Now, what are the potential improvements? Basically, you make sure that your sensor measurements are coherent with a mathematical model and that they are "smooth". For example, if you want to estimate the position of a moving vehicle, the kinematic equations will tell you where you expect the vehicle to be, and you have an associated covariance. Your measurements also come with a covariance. So, if you get measurements that have low certainty, you will end up trusting the mathematical model instead of trusting the measurements, and vice-versa.
Finally, if you are worried about the delay... Note that the complexity of a standard extended Kalman filter is roughly O(N^3) where N is the number of landmarks. So if you really don't have enough computational power you can just reduce the state to pose, velocity and then the overhead will be negligible.
In general, Kalman filter helps to improve sensor accuracy by summing (with right coefficients) measurement (sensor output) and prediction for the sensor output. Prediction is the hardest part, because you need to create model that predicts in some way sensors' output. And I think in your case it is unnecessary to spend time creating this model.
Although you are getting accurate data from sensors but they cannot be consistent always. The Kalman filter will not only identify any outliers in the measurement data but also can predict it when some measurement is missing. However, if you are really looking for something with less computational requirements then you can go for a complimentary filter.

In a group of correlated variables, how can I deduce which subset of variables best describe the remaining variables?

I have a data set of 60 sensors making 1684 measurements. I wish to decrease the number of sensors used during experiment, and use the remaining sensor data to predict (using machine learning) the removed sensors.
I have had a look at the data (see image) and uncovered several strong correlations between the sensors, which should make it possible to remove X sensors and use the remaining sensors to predict their behaviour.
How can I “score” which set of sensors (X) best predict the remaining set (60-X)?
Are you familiar with Principal Component Analysis (PCA)? It's a child of Analysis of Variance (ANOVA). Dimensionality Reduction is another term to describe this process.
These are usually aimed at a set of inputs that predict a single output, rather than a set of peer measurements. To adapt your case to these methods, I would think that you'd want to begin by considering each of the 60 sensors, in turn, as the "ground truth", to see which ones can be most reliably driven by the remainder. Remove those and repeat the process until you reach your desired threshold of correlation.
I also suggest a genetic method to do this winnowing; perhaps random forests would be of help in this phase.

Time Series DFT Signals Clustering

I have a number of time series data sets, which I want to transform to dft signals in order to reduce dimensionality. After transforming to dft, I want to cluster the resulting dft data sets using k-means algorithm.
Since dft signals contain an imaginary number how can one cluster them?
You could simply treat the imaginary part as another component in your vectors. In other applications, you will want to ignore it!
But you'll be facing other, more severe challenges.
Data mining, and clustering in particular, rarely is as easy as appliyng function a (dft) and function b (k-means) and then you have the result, hooray. Sorry - that is not how exploratory data mining works.
First of all, for many time series, DFT will not be helpful at all. On others, you will first have to do appropriate resampling, or segmentation, or get rid of uninteresting effects such as seasonality. Even if DFT works, it may emphasize artifacts such as the sampling frequency or some interferences.
And then you'll run into one major problem: k-means is based on the assumption that all attributes have the same importance. And DFT is based on the very opposite idea: the first components capture most of the signal, the later ones only minor deviations from it (and that is the very motivation for using this as dimensionality reduction).
So based on this intuition, you maybe never should apply k-means on DFT coefficients at all. At the same time, data-mining repeatedly has shown that appfoaches that are "statistical nonsense" can nevertheless provide useful results... so you can try, but verify your resultd with care, and avoid being too enthusiastic or optimistic.
With the help of FFT, it converts dataset into dft signals. It helps to calculates DFT for each small data set.

How to test the quality of a probabilities estimator?

I created a heuristic (an ANN, but that's not important) to estimate the probabilities of an event (the results of sports games, but that's not important either). Given some inputs, this heuristics tell me what are the probabilities of the event. Something like : Given theses inputs, team B as 65% chances to win.
I have a large set of inputs data for which I now the result (games previously played). Which formula/metric could I use to qualify the accuracy of my estimator.
The problem I see is, if the estimator says the event has a probability of 20% and the event actually do occurs. I have no way to tell if my estimator is right or wrong. Maybe it's wrong and the event was more likely than that. Maybe it's right, the event as about 20% chance to occur and did occur. Maybe it's wrong, the event has really low chances to occurs, say 1 in 1000, but happened to occur this time.
Fortunately I have lots of theses actual test data, so there is probably a way to use them to qualify my heuristic.
anybody got an idea?
There are a number of measurements that you could use to quantify the performance of a binary classifier.
Do you care whether or not your estimator (ANN, e.g.) outputs a calibrated probability or not?
If not, i.e. all that matters is rank ordering, maximizing area under ROC curve (AUROC) is a pretty good summary of the performance of the metric. Others are "KS" statistic, lift. There are many in use, and emphasize different facets of performance.
If you care about calibrated probabilities then the most common metrics are the "cross entropy" (also known as Bernoulli probability/maximum likelihood, the typical measure used in logistic regression) or "Brier score". Brier score is none other than mean squared error comparing continuous predicted probabilites to binary actual outcomes.
Which is the right thing to use depends on the ultimate application of the classifier. For example, your classifier may estimate probability of blowouts really well, but be substandard on close outcomes.
Usually, the true metric that you're trying to optimize is "dollars made". That's often hard to represent mathematically but starting from that is your best shot to coming up with an appropriate and computationally tractable metric.
In a way it depends on the decision function you are using.
In the case of a binary classification task (predicting whether an event occurred or not [ex: win]), a simple implementation is to predict 1 if the probability is greater than 50%, 0 otherwise.
If you have a multiclass problem (predicting which one of K events occurred [ex: win/draw/lose]), you can predict the class with the highest probability.
And the way to evaluate your heuristic is to compute the prediction error by comparing the actual class of each input with the prediction of your heuristic for that instance.
Note that you would usually divide your data into train/test parts to get better (unbiased) estimates of the performance.
Other tools for evaluation exist such as ROC curves, which is a way to depict the performance with regard to true/false postitives.
As you stated, if you predict that an event has a 20% of happening - and 80% not to happen - observing a single isolated event would not tell you how good or poor your estimator was. However, if you had a large sample of events for which you predicted 20% success, but observe that over that sample, 30% succeeded, you could begin to suspect that your estimator is off.
One approach would be to group your events by predicted probability of occurrence, and observe the actual frequency by group, and measure the difference. For instance, depending on how much data you have, group all events where you predict 20% to 25% occurrence, and compute the actual frequency of occurrence by group - and measure the difference for each group. This should give you a good idea of whether your estimator is biased, and possibly for which ranges it's off.

Resources