How does Prometheus DB calculate average value - time-series

I have following temperature values stored inside Prometheus DB (each minute):
4
7
11
52
97
19
95
89
43
19
. . .
Now, I would like to get average temperature in each 5 minute interval.
/api/v1/query_range?query=avg_over_time(current_temp[5m])&start=1475483802.739&end=1475498202.739&step=300&_=1475493021942
I get following data back:
"values":[[1475488602.739,"4"],[1475488902.739,"37.2"],[1475489202.739,"51"],[1475489502.739,"79.6"] . . .
I really can not relate these values (4, 37.2, 51, 79.6 ...) with average data. Can some one help me with this?
Thanks
Here are two example through Prometheus graphing tool:

Let me answer my own question, the thing is that with the query I gave here:
/api/v1/query_range?query=avg_over_time(current_temp[5m])&start=1475483802.739&end=1475498202.739&step=300&_=1475493021942
following happens:
Each 300 seconds (from step parameter), read current temperature five minutes before that (each point you have) and calculate average from that. Do this in timespan between 1475483802.739 and 1475498202.739.
More information here https://github.com/prometheus/prometheus/issues/2051

Related

missing data in time series

As im so new to this field and im trying to explore the data for a time series, and find the missing values and count them and study a distribution of their length and fill in these gaps, the thing is i have, let's say 10 file.txt and for each file i have 2 columns as follows:
C1 C2
944 0
920 1
920 2
928 3
912 7
920 8
920 9
880 10
888 11
920 12
944 13
and so on... lets say till 100 and not necessarily the 10 files have the same number of observations.
so here for example the missing values and not necessarily appears in all files that i have, missing value are: 4,5 and 6 in C2 and the corresponding 1st column C1(measured in milliseconds, so the value of 928ms is not a time neighbor of 912ms). So i want to find those gaps(the total missing values in all 10 files) and show a histogram of their lengths.
i wrote a piece of code in R, but the problem is that i don't get the exact total number that i should have for the missing values.
path = "files path"
out.file<-data.frame(TS = 0, Index = 0, File = '')
file.names <- dir(path, pattern =".txt")
for(i in 1:length(file.names)){
file <- cbind(read.table(file.names[i],
header=F,
sep ="\t",
stringsAsFactors=FALSE),
file.names[i])
colnames(file) <- c('TS', 'Index', 'File')
out.file <- rbind(out.file, file)
}
d = dim(out.file)[1]
misDa = 0
for(i in 2:(d-1)){
if(abs(out.file$Index[i]-out.file$Index[i+1]) > 1)
misDa = misDa+1
}
Hard to give specific hints without having a more extensive example of your data that contains some of the actual NAs.
If you are using R (like it seems) the naniar and the imputeTS packages offer nice functions for missing data visualizations.
Some examples from the naniar package, which is especially good for multivariate data (more plot examples):
Some examples from the imputeTS package, which is especially good for time series data (additional plot examples):

Find last value in column A, if condition in column B is true

I've got hiking distance data from a start point in column A and a column with a yes/no condition (let's say a "Y" denotes a campsite, for example).
What I'm trying to achieve is to calculate the distance between each distance marker in column A that has the condition "Y" in column B. (Desired output is column C.)
A B C
--------------
0 Y
12
26 Y 26 (26 - 0 = 26)
57
124 Y 98 (124 - 26 = 98)
137
152 Y 28 (152 - 124 = 28)
169
. . .
. . .
. . .
I can pull out the distance from column A with a simple IF statement, but that doesn't get me anywhere, of course.
I've searched the Internet extensively and there are a ton of threads out there about finding the last value or last non-empty value in a column.
So I've tried to use INDEX, FILTER, and LOOKUP in all sorts of combinations, but sadly nothing produces the result I'm looking for.
The tricky part, I guess, is to find the last value with a Y above the "current" Y (if that makes any sense).
In C2 try
=ArrayFormula(if(B2:B="y", A2:A-iferror(vlookup(row(A2:A)-1, filter({row(A2:A), A2:A}, len(B2:B)),2)),))
and see if that works?

Missing Values per participant in a repeated measures design using SPSS

I've got a dataset with repeated measures that looks roughly like this:
ID v1 v2 v3 v4
1 3 4 2 NA
1 2 NA 6 7
2 4 3 6 4
2 NA 2 7 9
. . . . .
n . . . .
What I want to know is how many NAs are there for each participants over the variables v1 - v4 (e.g. participant 1 is missing 2 of 8 responses)?
Missing Values are always displayed per Variable not per participant so how do I do this? Maybe there is a way using the AGGREGATE command with ID as BREAK?
Use COUNT to count the missing values as a new variable and then aggregate by the Id or split files by I'd and freq.

NMF Sparse Matrix Analysis (using SKlearn)

Just looking for some brief advice to put me back on the right track. I have been working on a solution to a problem where I have a very sparse input matrix (~25% of information filled, rest is 0's) stored in a sparse.coo_matrix:
sparse_matrix = sparse.coo_matrix((value, (rater, blurb))).toarray()
After some work on building this array from my data set and messing around with some other options, I currently have my NMF model fitter function defined as follows:
def nmf_model(matrix):
model = NMF(init='nndsvd', random_state=0)
W = model.fit_transform(matrix);
H = model.components_;
result = np.dot(W,H)
return result
Now, the issue is my output doesn't seem to be accounting for the 0 values correctly. Any value that was a 0 gets bumped to some value less than 1 and my known values fluctuate from the actual quite a bit (All data are ratings between 1 and 10). Can anyone spot what I am doing wrong? From the documentation for scikit, I assumed using the nndsvd initialization would help account for the empty values correct. Sample output:
#Row / Column / New Value
35 18 6.50746917334 #Actual Value is 6
35 19 0.580996641675 #Here down are all "estimates" of my function
35 20 1.26498699492
35 21 0.00194119935464
35 22 0.559623469753
35 23 0.109736902936
35 24 0.181657421405
35 25 0.0137801897011
35 26 0.251979684515
35 27 0.613055371646
35 28 6.17494590041 #Actual values is 5.5
Appreciate any advice any more experienced ML coders can offer!

How to change hours in duration to a number in google spreedsheet?

I want to subtract a number form a duration but not sure how can I do it.
A1 : 137:47:00 (formatted as duration)
A2 : 126 (formatted as number)
When I subtract it is showing unexpected value
=(A1-A2) = -120.26
I was expecting something similar to 11.
Subtracting a number (without dimension) from a duration does not really make a lot of sense but if 137:47:00 represented 137 hours and 47 minutes then subtracting 126 hours from that would (and give a result between 11 and 12 hours). To be able to compare like with like, the duration can be represented as a number by accessing the fact that Google spreadsheets treats 24 hours as number 1. So multiply 137:47:00 (if representing hour, minutes, seconds) by 24 to get a number from which another number can be subtracted to give a meaningful result (ie 11.7833333 - representing 11 hours 47 minutes if to subtract 126 hours from 137 hours and 47 minutes). Therefore:
=24*A1-A2
might suit.
Calculating time worked per day on Web Applications addresses a vaguely similar issue.

Resources