I'm trying to get the delta of a metric in the edges of a time interval in Graphite, but I couldn't find anything related to it in the documentation.
I'm not looking for the derivation, but the absolute difference.
Is using summarize(nonNegativeDerivative(a.metric), "30mins") will do the job ?
I be glad if someone can point me to the right function, thanks!
The graphite nonNegativeDerivative function isn't a "true" derivative, it will return the delta between successive points, which seems to be what you're looking for.
The "true" derivative function in graphite is perSecond which returns the delta normalized to a per-second rate.
So, try using nonNegativeDerivative without the summarize wrapper and see if that gives you what you're looking for.
Related
I want to process the value from InfluxDB on Grafana.
The final demand is to show how many miles the current vehicle has traveled in a certain time frame.
You can use the formula: average velocity * time.
Do the seniors have any good methods?
So what I'm thinking is: I've got the mean function for the average speed over a fixed period of time and the corresponding mileage, and then I want to add all the mileage together. How do I do that?
What if you only use SQL?
1.) InfluxDB uses InfluxQL, not a SQL
2.) Your approach average velocity * time is innacurate
3.) Use suitable InfluxDB functions, I would say INTEGRAL() is the best function for this case + some basic arithmetic. Don't expect the 100% accuracy. Accuracy depends heavily on the metric sampling, e.g. 1 minute sampling - but what if vehicle is driving 59 seconds and it is not moving for that second when sampling is happening. So don't be supprised, when even 10 sec sampling will be inacurrate.
In a database there are time-series data with records:
device - timestamp - temperature - min limit - max limit
device - timestamp - temperature - min limit - max limit
device - timestamp - temperature - min limit - max limit
...
For every device there are 4 hours of time series data (with an interval of 5 minutes) before an alarm was raised and 4 hours of time series data (again with an interval of 5 minutes) that didn't raise any alarm. This graph describes better the representation of the data, for every device:
I need to use RNN class in python for alarm prediction. We define alarm when the temperature goes below the min limit or above the max limit.
After reading the official documentation from tensorflow here, i'm having troubles understanding how to set the input to the model. Should i normalise the data beforehand or something and if yes how?
Also reading the answers here didn't help me as well to have a clear view on how to transform my data into an acceptable format for the RNN model.
Any help on how the X and Y in model.fit should look like for my case?
If you see any other issue regarding this problem feel free to comment it.
PS. I have already setup python in docker with tensorflow, keras etc. in case this information helps.
You can begin with a snippet that you mention in the question.
Any help on how the X and Y in model.fit should look like for my case?
X should be a numpy matrix of shape [num samples, sequence length, D], where D is a number of values per timestamp. I suppose D=1 in your case, because you only pass temperature value.
y should be a vector of target values (as in the snippet). Either binary (alarm/not_alarm), or continuous (e.g. max temperature deviation). In the latter case you'd need to change sigmoid activation for something else.
Should i normalise the data beforehand
Yes, it's essential to preprocess your raw data. I see 2 crucial things to do here:
Normalise temperature values with min-max or standardization (wiki, sklearn preprocessing). Plus, I'd add a bit of smoothing.
Drop some fraction of last timestamps from all of the time-series to avoid information leak.
Finally, I'd say that this task is more complex than it seems to be. You might want to either find a good starter tutorial on time-series classification, or a course on machine learning in general. I believe you can find a better method than RNN.
Yes you should normalize your data. I would look at differencing by every day. Aka difference interval is 24hours / 5 minutes. You can also try and yearly difference but that depends on your choice in window size(remember RNNs dont do well with large windows). You may possibly want to use a log-transformation like the above user said but also this seems to be somewhat stationary so I could also see that not being needed.
For your model.fit, you are technically training the equivelant of a language model, where you predict the next output. SO your inputs will be the preciding x values and preceding normalized y values of whatever window size you choose, and your target value will be the normalized output at a given time step t. Just so you know a 1-D Conv Net is good for classification but good call on the RNN because of the temporal aspect of temperature spikes.
Once you have trained a model on the x values and normalized y values and can tell that it is actually learning (converging) then you can actually use the model.predict with the preciding x values and preciding normalized y values. Take the output and un-normalize it to get an actual temperature value or just keep the normalized value and feed it back into the model to get the time+2 prediction
In order to clusterize a set of time series I'm looking for a smart distance metric.
I've tried some well known metric but no one fits to my case.
ex: Let's assume that my cluster algorithm extracts this three centroids [s1, s2, s3]:
I want to put this new example [sx] in the most similar cluster:
The most similar centroids is the second one, so I need to find a distance function d that gives me d(sx, s2) < d(sx, s1) and d(sx, s2) < d(sx, s3)
edit
Here the results with metrics [cosine, euclidean, minkowski, dynamic type warping]
]3
edit 2
User Pietro P suggested to apply the distances on the cumulated version of the time series
The solution works, here the plots and the metrics:
nice question! using any standard distance of R^n (euclidean, manhattan or generically minkowski) over those time series cannot achieve the result you want, since those metrics are independent of the permutations of the coordinate of R^n (while time is strictly ordered and it is the phenomenon you want to capture).
A simple trick, that can do what you ask is using the cumulated version of the time series (sum values over time as time increases) and then apply a standard metric. Using the Manhattan metric, you would get as a distance between two time series the area between their cumulated versions.
Another approach would be by utilizing DTW which is an algorithm to compute the similarity between two temporal sequences. Full disclosure; I coded a Python package for this purpose called trendypy, you can download via pip (pip install trendypy). Here is a demo on how to utilize the package. You're just just basically computing the total min distance for different combinations to set the cluster centers.
what about using standard Pearson correlation coefficient? then you can assign the new point to the cluster with the highest coefficient.
correlation = scipy.stats.pearsonr(<new time series>, <centroid>)
Pietro P's answer is just a special case of applying a convolution to your time series.
If I gave the kernel:
[1,1,...,1,1,1,0,0,0,0,...0,0]
I would get a cumulative series .
Adding a convolution works because you're giving each data point information about it's neighbours - it's now order dependent.
It might be interesting to try with a guassian convolution or other kernels.
Thanks for reading this. I am currently studying bayesoptimization problem and follow the tutorial. Please see the attachment.bayesian optimization tutorial
In page 11, about the acquisition function. Before I raise my question I need state my understanding about bayesian optimization to see if there is anything wrong.
First we need take some training points and assume them as multivariable gaussian ditribution. Then we need use acquisiont function to find the next point we want to sample. So for example we use x1....x(t) as training point then we need use acquisition function to find x(t+1) and sample it. Then we'll assume x1....x(t),x(t+1) as multivariable gaussian ditribution and then use acquisition function to find x(t+2) to sample so on and so forth.
In page 11, seems we need find the x that max the probability of improvement. f(x+) is from the sample training point(x1...xt) and easy to get. But how to get u(x) and that variance here? I don't know what is the x in the eqaution. It should be x(t+1) but the paper doesn't say that. And if it is indeed x(t+1), then how could I get its u(x(t+1))? You may say use equation at the bottom page 8, but we can use that equation on condition that we have found the the x(t+1) and put it into multivariable gaussian distribution. Now we don't know what is the next point x(t+1) so I have no way to calculate, in my opinion.
I know this is a tough question. Thanks for answering!!
In fact I have got the answer.
Indeed it is x(t+1). The direct way is we compute every u and varaince of the rest x outside of the training data and put it into acquisition function to find which one is the maximum.
This is time consuming. So we use nonlinear optimization like DIRECT to get the x that max the acquisition function instead of trying one by one
In linear regression with 1 variable I can clearly see on plot prediction line and I can see if it properly fits the training data. I just create a plot with 1 variable and output and construct prediction line based on found values of Theta 0 and Theta 1. So, it looks like this:
But how can I check validity of gradient descent results implemented on multiple variables/features. For example, if number of features is 4 or 5. How to check if it works correctly and found values of all thetas are valid? Do I have to rely only on cost function plotted against number of iterations carried out?
Gradient descent converges to a local minimum, meaning that the first derivative should be zero and the second non-positive. Checking these two matrices will tell you if the algorithm has converged.
We can think of gradient descent as of something solving a problem of f'(x) = 0 where f' denotes gradient of f. For checking this problem convergence, as far as I know, the standard approach is to calculate discrepancy on each iteration and see if it converges to 0.
That is, check if ||f'(x)|| (or its square) converges to 0.
There are some things you can try.
1) Check if your cost/energy function is not improving as your iteration progresses. Use something like "abs(E_after - E_before) < 0.00001*E_before", i.e. check if the relative difference is very low.
2) Check if your variables have stopped changing. You can opt a very similar strategy like above to check this.
There is actually no perfect way to fully make sure that your function has converged, but some of the things mentioned above are what usually people try.
Good luck!