Understanding histogram_quantile based on rate in Prometheus - histogram

According to Prometheus documentation in order to have a 95th percentile using histogram metric I can use following query:
histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))
Source: https://prometheus.io/docs/practices/histograms/#quantiles
Since each bucket of histogram is a counter we can calculate rate each of the buckets as:
per-second average rate of increase of the time series in the range vector.
See: https://prometheus.io/docs/prometheus/latest/querying/functions/#rate
So, for instance, if bucket value[t-5m] = 100 and bucket value[t] = 200 then bucket rate[t] = (200-100)/(10*60) = 0.167
And finally, the most confusing part is how can histogram_quantile function find 95th percentile for given metric knowing all the bucket rates?
Is there any code or algorithm I can take a look to better understand it?

A solid example will explain histogram_quantile well.
Assumptions:
ONLY ONE series for simplicity
10 buckets for metric http_request_duration_seconds.
10ms, 50ms, 100ms, 200ms, 300ms, 500ms, 1s, 2s, 3s, 5s
http_request_duration_seconds is a metric type of COUNTER
time
value
delta
rate (quantity of items)
t-10m
50
N/A
N/A
t-5m
100
50
50 / (5*60)
t
200
100
100 / (5*60)
...
...
...
...
We have at least two scrapes of the series covering 5 minutes for rate() to calculate the quantity for each bucket
rate_xxx(t) = (value_xxx[t]-value_xxx[t-5m]) / (5m*60) is the quantity of items for [t-5m, t]
We are looking at 2 samples(value(t) and value(t-5m)) here.
10000 http request durations (items) were recorded, that is,
10000 = rate_10ms(t) + rate_50ms(t) + rate_100ms(t) + ... + rate_5s(t).
bucket(le)
10ms
50ms
100ms
200ms
300ms
500ms
1s
2s
3s
5s
+Inf
range
~10ms
10~50ms
50~100ms
100~200ms
200~300ms
300~500ms
500ms~1s
1~2s
2s~3s
3~5s
5s~
rate_xxx(t)
3000
3000
1500
1000
800
400
200
40
30
5
5
Bucket is the essence of histogram. We just need 10 numbers in rate_xxx(t) to do the quantile calculation
Let's take a close look at this expression (aggregation like sum() is omitted for simplicity)
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))
We are actually looking for the 95%th item in rate_xxx(t) from bucket=10ms to bucket=+Inf. And 95%th means 9500th here since we got 10000 items in total (10000 * 0.95).
From the table above, there are 9300 = 3000+3000+1500+1000+800 items together before bucket=500ms.
So the 9500th item is the 200th item (9500-9300) in bucket=500ms(range=300~500ms) which got 400 items within
And Prometheus assumes that items in a bucket spread evenly in a linear pattern.
The metric value for the 200th item in bucket=500ms is 400ms = 300+(500-300)*(200/400)
That is, 95% is 400ms.
There are a few to bear in mind
Metric should be COUNTER in nature for histogram metric type
Series for quantile calculation should always get label le defined
Items (Data) in a specific bucket spread evenly a linear pattern (e.g.: 300~500ms)
Prometheus makes this assumption at least
Quantile calculation requires buckets being sorted(defined) in some ascending/descending order (e.g.: 1ms < 5ms < 10ms < ...)
Result of histogram_quantile is an approximation
P.S.:
The metric value is not always accurate due to the assumption of Items (Data) in a specific bucket spread evenly a linear pattern
Say, the max duration in reality (e.g.: from nginx access log) in bucket=500ms(range=300~500ms) is 310ms, however, we will get 400ms from histogram_quantile via above setup which is quite confusing sometimes.
The smaller bucket distance is, the more accurate approximation is.
So setup the bucket distances that fit your needs.

You can refer to my reply here
Actually the rate() function is just used to specify the time window, the denominator has no effect in the computation of the pecentile value.

I believe this is the code for it in prometheus
The general idea is that you use the data in the buckets to extrapolate / approximate the quantiles
Elasticsearch also does something similar (yet different/much simpler) in their rollup capabilities

You have to use reset because counters can be reset, rate automatically considers resets and give you the right count for each second. Just remember that always use rate before using counters.

Related

Calculate the InfluxDB average

I want to process the value from InfluxDB on Grafana.
The final demand is to show how many miles the current vehicle has traveled in a certain time frame.
You can use the formula: average velocity * time.
Do the seniors have any good methods?
So what I'm thinking is: I've got the mean function for the average speed over a fixed period of time and the corresponding mileage, and then I want to add all the mileage together. How do I do that?
What if you only use SQL?
1.) InfluxDB uses InfluxQL, not a SQL
2.) Your approach average velocity * time is innacurate
3.) Use suitable InfluxDB functions, I would say INTEGRAL() is the best function for this case + some basic arithmetic. Don't expect the 100% accuracy. Accuracy depends heavily on the metric sampling, e.g. 1 minute sampling - but what if vehicle is driving 59 seconds and it is not moving for that second when sampling is happening. So don't be supprised, when even 10 sec sampling will be inacurrate.

Prometheus query for last local peak value

What Prometheus query (PromQl) can be used to identify the last local peak value in the last X minutes in a graph?
A local peak is a point that is larger than its previous and next datapoint. (So ​​the current time is definitely not a local peak)
(p: peak point, i: cornjob interval, m: missed execuation)
I want this value to find an anomaly in the execution of a cron job. As you can see in the picture, I have written a query to calculate the elapsed time since the last execution of a job. Now to set an alert rule to calculate the elapsed time from the last successful execution and find missed execution, I need the amount of time that the last execution of the job occurred in that interval. This interval is unknown for the query (In other words, the interval of the job is specified by another program), so I can not compare elapsed time with a fixed time.
Use z-score to detecting anomalies
If you know the average value and standard deviation (σ) of a series, you can use any sample in the series to calculate the z-score. The z-score is measured in the number of standard deviations from the mean. So a z-score of 0 would mean the z-score is identical to the mean in a data set with a normal distribution, while a z-score of 1 is 1.0 σ from the mean, etc.
Calculate the average and standard deviation for the metric using data with large sample size.
# Long-term average value for the series
- record: job:cronjob_duration_time_seconds_count:rate10m:avg_over_time_1w
expr: avg_over_time(sum(rate(cronjob_duration_time_seconds_count[10m]))[1w:])
# Long-term standard deviation for the series
- record: job:cronjob_duration_time_seconds_count:rate5m:stddev_over_time_1w
expr: stddev_over_time(sum(rate(cronjob_duration_time_seconds_count[10m]))[1w:])
calculate the z-score for the Prometheus query once you have the average and standard deviation for the aggregation.
# Z-Score for aggregation
(
job:cronjob_duration_time_seconds_count:rate10m -
job:cronjob_duration_time_seconds_count:rate10m:avg_over_time_1w
) / stddev_over_time(sum(rate(cronjob_duration_time_seconds_count[10m]))[1w:])
Based on the statistical principles of normal distributions, you can assume that any value that falls outside of the range of roughly +1 to -1 is an anomaly. For example, you can get an alert when our aggregation is out of this range for more than five minutes.
If what you want is an alert to be fired when the elapsed time has been longer than a fixed duration, you can set an alert similar to the up alert, based on the changes > 0 expression, which is only true (i.e. > 0) when the job is running.
An example would be:
rules:
- alert: CronJobNotRunning
expr: |
changes(
sum(
rate(
cronjob_duration_time_seconds_count{
status="ok", namespace="<namespace>", exported_job="<job>"
}[1m]
)
)[1m:]
) == 0
for: <alert_duration>
Note that subqueries ([1m:]) are expensive, and introducing a recording rule there can help performance, especially in a dashboard.
Also, in your case, the time since the last time the second derivative was non-zero can be used too, as that happens when a job starts/finishes (the drops in the graph, or when it starts to rise).

Is high label cardinality but low metric/label count and infrequent sampling an acceptable use-case for Prometheus?

I have a use-case of monitoring that I'm not entirely sure if it's a good
match for Prometheus or not, and I wanted to ask for opinions before I delve
deeper.
The numbers of what I'm going to store:
Only 1 metric.
That metric has 1 label with 1,000,000 to 2,000,000 distinct values.
The values are gauges (but does it make a difference if they are counters?)
Sample rate is once every 5 minutes. Retaining data for 180 days.
Estimated storage size if I have 1 million distinct label values:
(According to formula in Prometheus' documentation: retention_time_seconds *
ingested_samples_per_second * bytes_per_sample)
(24*60)/5=288 5-minute intervals in a day.
(180*288) * (1,000,000) * 2 = 103,680,000,000 ~= 100GB
samples/label-value label-value-count bytes/sample
So I assume 100-200GB will be required.
Is this estimation correct?
I read in multiple places about avoiding high-cardinality labels, and I would
like to ask about this. Considering I will be looking at one time-series at a time Is the problem with high-cardinality labels? Or
having a high number of time-series? As each label value produces another
time-series? I also read in multiple places that Prometheus can handle
millions of time-series at once, so even if I have 1 label with one million
distinct values, I should be fine in terms of time-series count, do I have to
worry about the labels having high cardinality in this case? I'm aware that
it depends on the strength of the server, but assuming average capacity, I
would like to know if Prometheus' implementation has a problem handling this
case efficiently.
And also, if it's a matter of time-series count, am I correct in assuming
that it will not make a significant difference between the following
options?
1 metric with 1 label of 1,000,000 distinct label values.
10 metrics each with 1 label of 100,000 distinct label values.
X metrics each with 1 label of Y distinct label values.
where X * Y = 1,000,000
Thanks for the help!
That might work, but it's not what Prometheus is designed for and you'll likely run into issues. You probably want a database rather than a monitoring system, maybe Cassandra here.
How the cardinality is split out across metrics won't affect ingestion performance, however it'll be relatively slow to have to read 1M series in a query.
Note that Victoria Metrics is an easy to configure backend for Prometheus which will reduce storage requirements significantly.

Throughput vs. Latency Confusion

According To This Article about Throughput and Latency H
"When You Go To Buy a Water Pipe, There Are Two Completely Independent Parameters That You Look At: The Diameter of the Pipe and Its Length"
But I Think These Two Parameters Are Related. Throughput Is Measured As Per Unit Time, So A Long Latency Will Affect Throughput, Say, If The Droplet Is Fast, More Of Them Will Pass The Pipe In One Second,
Can Any One Help Me Understand This?
EDIT:
the confusion is originated from counting queuing time as part of latency which we should not. Once a request is handled, the latency is independent of throughput.
Let me give you another anology...Think of a car travelling on a single lane road from location A to location B..time taken by that car to travel from A to B is your latency...and the number of cars travelling at an interval, maintaining the latency is your throughput.
The factors that affect here is your medium of travel ie by road and no of lanes on the road.
You're thinking about frequency. Say you have a window into the water pipe at some given point, and you send water droplets at some constant interval (say 1 droplet ever second). You count how often you see a single droplet pass by, and take the inverse (1/seconds). So if you count 1 second of elapsed time between droplets being observed, then you have a frequency of 1Hz.
Now say that you keep this frequency constant (1Hz), but you elongate the pipe. You send one droplet down and count how much time elapses before it reaches the end of the pipe. So say it takes 2 seconds for a single drop to travel from the beginning to the end of the pipe, then you have a latency of 2 seconds.
Now say that you widen the diameter of the pipe, and now you are able to send 2 droplets with a frequency of 1Hz. At the end of the pipe you will count 2 droplets coming out every second. So your throughput will be 2 droplets per second.
Here is my bit in a language which I can understand
When you go to buy a water pipe, there are two completely independent parameters that you look at: the diameter of the pipe and its length. The diameter determines the throughput of the pipe and the length determines the latency, i.e., the time it will take for a water droplet to travel across the pipe. Key point to note is that the length and diameter are independent, thus, so are are latency and throughput of a communication channel.
More formally, Throughput is defined as the amount of water entering or leaving the pipe every second and latency is the average time required to for a droplet to travel from one end of the pipe to the other.
Let’s do some math:
For simplicity, assume that our pipe is a 4inch x 4inch square and its length is 12inches. Now assume that each water droplet is a 0.1in x 0.1in x 0.1in cube. Thus, in one cross section of the pipe, I will be able to fit 1600 water droplets. Now assume that water droplets travel at a rate of 1 inch/second.
Throughput: Each set of droplets will move into the pipe in 0.1 seconds. Thus, 10 sets will move in 1 second, i.e., 16000 droplets will enter the pipe per second. Note that this is independent of the length of the pipe. Latency: At one inch/second, it will take 12 seconds for droplet A to get from one end of the pipe to the other regardless of pipe’s diameter. Hence the latency will be 12 seconds.

Number Generator wave cycle to graph output

I'm looking to generate a wave form generated by a cycle of numbers that increase and then decrease on a given rate. The frequency can vary between 1 to 40 per minute and the amplitude varies between 100 and 3000. The idea is to form a breathing like pattern for "breaths per minute" (1-40) and an inhaled volume per breath (100-3000).
I'm new here and I can only find random generators. I have looked at NSTimer and UIGraphs from the Ios-Developer Tesla tutorial app.
Could anyone point me in the right direction.
Many Thanks.

Resources