How to understand Prometheus query for Grafana - histogram_quantile, sum, and rate functions & WRONG grafana graph data

How to understand Prometheus query for Grafana - histogram_quantile, sum, and rate functions & WRONG grafana graph data - histogram

The query that I am using to grab 99th percentile of API request latency is:
histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket{method="POST",handler="TestController.testLatencyTime"}[1m])) by (handler, method, le))
My buckets for latency histogram buckets are defined as [0.05, 0.25, 0.5, 1.0, 2.0, 4.0, 8.0] in my code or hitting the metrics endpoint for a sample API endpoint (i.e. TestController.java class, and testLatencyTime() method):
http_request_duration_seconds_bucket{method="POST",handler="TestController.testLatencyTime",code="200",le="0.05",}
http_request_duration_seconds_bucket{method="POST",handler="TestController.testLatencyTime",code="200",le="0.25",}
http_request_duration_seconds_bucket{method="POST",handler="TestController.testLatencyTime",code="200",le="0.5",}
http_request_duration_seconds_bucket{method="POST",handler="TestController.testLatencyTime",code="200",le="1.0",}
http_request_duration_seconds_bucket{method="POST",handler="TestController.testLatencyTime",code="200",le="2.0",}
http_request_duration_seconds_bucket{method="POST",handler="TestController.testLatencyTime",code="200",le="4.0",}
http_request_duration_seconds_bucket{method="POST",handler="TestController.testLatencyTime",code="200",le="8.0",}
http_request_duration_seconds_bucket{method="POST",handler="TestController.testLatencyTime",code="200",le="+Inf",}
So its a http_request_duration_seconds_bucket function, passed to a rate function. Per this stackoverflow post, they stated that the rate function should be a Cumulative Density Distribution function, that is "rate applied on buckets calculates a set of rate of increments that happened on all the buckets in the span of the last 1 minute. So, to answer your question, it is a cumulative density distribution on the rate of changes calculated in a given time frame". Per this YouTube video, is it correct to assume the Cumulative Density Distribution function is the area under the curve to the LEFT of a point of interest? https://www.youtube.com/watch?v=3xAIWiTJCvE
reference:
what's the math behind histogram_quantile(0.9, rate(http_request_duration_seconds_bucket[10m])) in PromQL
Furthermore, this is passed to the sum function, where the values returned by the rate function are summed or aggregated. I'm trying to understand this sentence from the prometheus documentation - "The quantile is calculated for each label combination in http_request_duration_seconds"
reference:
https://prometheus.io/docs/prometheus/latest/querying/functions/
My problem is that when I use this dummy REST Controller (Spring Boot, TestController.java class, and testLatencyTime() method to visualize the data in a locally running Prometheus/Grafana instance using Docker, if I make a "dummy" request with Thread.sleep(4000) and see how Grafana plots it, its not making sense
#RestController
#RequestMapping("/test")
public class TestController {
#PostMapping("/latency/{wait}")
public ResponseEntity<String> testLatencyTime(#PathVariable Long wait) throws InterruptedException {
Thread.sleep(wait);
return new ResponseEntity("Request completed!", HttpStatus.OK);
}
}
For example, the above 4000ms sleep will get marked as an "8 second" spike in Grafana, for the 99th percentile query I posted above. Also, if I make an mock API call take 3seconds, it gets marked as 4seconds. It's almost as if Grafana is marking the graph at the UPPER BOUND bucket value the request falls into! (i.e. upper bound for 2sec or 3sec api call would be 4; upper bound for 4sec api call is 8) Is my understanding of the statistical mathematics behind how Grafana graphs this incorrect, or is this being plotted incorrectly or is my query is wrong/not accurate??? Please help! I can add any more information that is requested but I think I included a good amount!
example:

Related

SLO calculation for 90% of requests under 1000ms

I'm trying to figure out the PromQL for an SLO for latency, where we want 90% of all requests to be served in 1000ms or less.
I can get the 90th percentile of requests with this:
histogram_quantile( 0.90, sum by (le) ( rate(MyMetric_Request_Duration_bucket{instance="foo"}[1h]) ) )
And I can find what percentage of ALL requests were served in 1000ms or less with this.
((sum(rate(MyMetric_Request_Duration_bucket{le="1000",instance="foo"}[1h]))) / (sum (rate(MyMetric_Request_Duration_count{instance="foo"}[1h])))) *100
Is it possible to combine these into one query that tells me what percentage of requests in the 90th percentile were served in 1000ms or less?
I tried the most obvious (to me anyway) solution, but got no data back.
histogram_quantile( 0.90, sum by (le) ( rate(MyMetric_Request_Duration_bucket{le="1000",instance="foo"}[1h]) ) )
The goal is to get a measure that shows For the 90th percentile of requests, how many of those requests were under 1000ms? Seems like this should be simple but I can't find a PromQL query that allows me to do it.

Welcome to SO.
Out of all the requests how many are getting served under 1000ms, to find that I would divide the total number of requests under 1000ms by the total number of requests.. In my gcp world, it translates to a query like this:
You are basically measuring your
(sum(rate(istio_request_duration_milliseconds_bucket{reporter="destination",namespace="abcxyz",le="1000"}[1m]))/sum(rate(istio_request_duration_milliseconds_count{reporter="destination",namespace="abcxyz"}[1m])))*100
Once you have a graph setup with the above query in grafana, you can setup an alert on anything below 93 that way you are alerted even before your reach your SLO of 90%.

Prometheus doesn't provide a function, which could be used for calculating the share (aka the percentage) of requests served in under one second from histogram buckets. But such a function exists in VictoriaMetrics - this is Prometheus-like monitoring system I work on. The function is histogram_share(). For example, the following query returns the share of requests with durations smaller than one second served during the last hour:
histogram_share(1s, sum(rate(http_request_duration_seconds_bucket[1h])) by (le))
Then the following query can be used for alerting when the share or requests, which are served in less than one second, drops below 90%:
histogram_share(1s, sum(rate(http_request_duration_seconds_bucket[1h])) by (le)) < 0.9
Please note that all the functions, which work over histogram buckets, return the estimated results. Their accuracy highly depends on the used histogram buckets' boundaries. See this article for details.

How to graph individual Summary metric instances in Prometheus?

I'm using Prometheus' Summary metric to collect the latency of an API call. Instead of making an actual API call, I'm simply calling Thread.sleep(1000) to simulate a 1 second api-call latency value -- this makes the Summary hold a value of .01 (for 1 second of latency). But if, for example, I invoke Thread.sleep(1000) twice in the same minute, the Summary metric ends up with a value of .02 (for 2 seconds of latency), instead of two individual instances of .01 latency that just happened to occur within the same minute. My problem is the Prometheus query. The Prometheus query I am currently using is: rate(my_custom_summary_sum[1m]).
What should my Prometheus query be, such that I can see the latency of each individual Thread.sleep(1000) invocation. As of right now, the Summary metric collects and displays the total latency sum per minute. How can I display the latency of each individual call to Thread.sleep(1000) (i.e. the API request)?
private static final Summary mySummary = Summary.build()
.name("my_custom_summary")
.help("This is a custom summary that keeps track of latency")
.register();
Summary.Timer requestTimer = mySummary.startTimer(); //starting timer for mySummary 'Summary' metric
Thread.sleep(1000); //sleep for one second
requestTimer.observeDuration(); //record the time elapsed
This is the graph that results from this query:
Prometheus graph

Prometheus is a metrics-based monitoring system, it cares about overall performance and behaviour - not individual requests.
What you are looking for is a logs-based system, such as Graylog or the ELK stack.

Measure service latency with Prometheus

I am new to Prometheus and Grafana. My primary goal is to get the response time per request.
For me it seemed to be a simple thing - but whatever I do I do not get the results I require.
I need to be able to analyse the service latency in the last minutes/hours/days. The current implementation I found was a simple SUMMARY (without definition of quantiles) which is scraped every 15s.
Is it possible to get the average request latency of the last minute from my Prometheus SUMMARY?
If YES: How? If NO: What should I do?
Currently I am using the following query:
rate(http_response_time_sum{application="myapp",handler="myHandler", status="200"}[1m])
/
rate(http_response_time_count{application="myapp",handler="myHandler", status="200"}[1m])
I am getting two "datasets". The value of the first is "NaN". I suppose this is the result from a division by zero.
(I am using spring-client).

Your query is correct. The result will be NaN if there have been no queries in the past minute.

What is the meaning of OneMinuteRate in JMX?

I am trying to calculate the Read/second and Write/Second in my Cassandra 2.1 cluster. After searching and reading, I came to know about JMX bean
org.apache.cassandra.metrics:type=ClientRequest,scope=Write,name=Latency
Here I can see oneMinuteRate. I have started a brand new cluster and started collected these metrics from 0.
When I started my first record, I can see
Count = 1
OneMinuteRate = 0.01599111...
Does it mean that my write/s is 0.0159911?
Or does it mean that based on 1 minute data, my write latency is 0.01599 where Write Latency refers to the response time for writing a record?
Please help me understand the value.
Thanks.

It means that in the last minute, your writes per second were occuring at a rate of .01599 writes per second. Think about it this way: the rate of writes in the last 60 seconds would be
WritesInLastMinute ÷ 60
So in your case
1 ÷ 60 = 0.0166
Or more precisely, .01599.
If you observed no further writes after that, the value would descend down to zero over the next minute.

OneMinuteRate, FiveMinuteRate, and FifteenMinuteRate are exponential moving averages, meaning they are not simply dividing readings against time, instead, as the name implies they take an exponential series of averages as below:
result(t) = (1 - w) * result(t - 1) + (w) * event_this_period
where w is the weighting factor, t is the ticking time, in other words, simply they take 20% or the new reading and 80% of old readings, it's the same way UNIX systems measure CPU loads.
however, if this applies to requests that the server receives, below is a chart from one request to a server, measures taken by dropwizard.
as you can see, from one request a curve is drawn by time, it's really useful to determine trends, but not sure if they are great to monitor live traffic and especially critical one.

How does OpenTSDB downsample data

I have a 2 part question regarding downsampling on OpenTSDB.
The first is I was wondering if anyone knows whether OpenTSDB takes the last end point inclusive or exclusive when it calculates downsampling, or does it count the end data point twice?
For example, if my time interval is 12:30pm-1:30pm and I get DPs every 5 min starting at 12:29:44pm and my downsample interval is summing every 10 minute block, does the system take the DPs from 12:30-12:39 and summing them, 12:40-12:49 and sum them, etc or does it take the DPs from 12:30-12:40, then from 12:40-12:50, etc. Yes, I know my data is off by 15 sec but I don't control that.
I've tried to calculate it by hand but the data I have isn't helping me. The numbers I'm calculating aren't adding up to the above, nor is it matching what the graph is showing. I don't have access to the system that's pushing numbers into OpenTSDB so I can't setup dummy data to check.
The second question is how does downsampling plot its points on the graph from my time range and downsample interval? I set downsample to sum 10 min blocks. I set my range to be 12:30pm-1:30pm. The graph shows the first point of the downsampled graph to start at 12:35pm. That makes logical sense.I change the range to be 12:24pm-1:29pm and expected the first point to start at 12:30 but the first point shown is 12:25pm.
Hopefully someone can answer these questions for me. In the meantime, I'll continue trying to find some data in my system that helps show/prove how downsampling should work.
Thanks in advance for your help.

Downsampling isn't currently working the way you expect, although since this is a reasonable and commonly made expectations, we are thinking of changing this in a later release of OpenTSDB.
You're assuming that if you ask for a "10 min sum", the data points will be summed up within each "round" (or "aligned") 10 minute block (e.g. 12:30-12:39 then 12:40-12:49 in your example), but that's not what happens. What happens is that the code will start a 10-minute block from whichever data point is the first one it finds. So if the first one is at time 12:29:44, then the code will sum all subsequent data points until 600 seconds later, meaning until 12:39:44.
Within each 600 second block, there may be a varying number of data points. Some blocks may have more data points than others. Some blocks may have unevenly spaced data points, e.g. maybe all the data points are within one second of each other at the beginning of the 600s block. So in order to decide what timestamp will result from the downsampling operation, the code uses the average timestamp of all the data points of the block.
So if all your data points are evenly spaced throughout your 600s block, the average timestamp will fall somewhere in the middle of the block. But if you have, say, all the data points are within one second of each other at the beginning of the 600s block, then the timestamp returned will reflect that by virtue of being an average. Just to be clear, the code takes an average of the timestamps regardless of what downsampling function you picked (sum, min, max, average, etc.).
If you want to experiment quickly with OpenTSDB without writing to your production system, consider setting up a single-node OpenTSDB instance. It's very easy to do as is shown in the getting started guide.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart