Is there a way in grafana to get number of requests at a given time instance? - devops

I have one endpoint for which I would like to see number of requests at a given time (not period). For instance, how many requests were received at 9.30 a.m.
The function I believe I can make use of might be: echo_requests_total, but it just accumulates the count and if I try increase() or rate() functions even those do not produce the expected output, which is obvious.
I am not even sure about what I want is even possible or not.
Any help would be appreciated.

Related

Circumventing negative side effects of default request sizes

I have been using Reactor pretty extensively for a while now.
The biggest caveat I have had coming up multiple times is default request sizes / prefetch.
Take this simple code for example:
Mono.fromCallable(System::currentTimeMillis)
.repeat()
.delayElements(Duration.ofSeconds(1))
.take(5)
.doOnNext(n -> log.info(n.toString()))
.blockLast();
To the eye of someone who might have worked with other reactive libraries before, this piece of code
should log the current timestamp every second for five times.
What really happens is that the same timestamp is returned five times, because delayElements doesn't send one request upstream for every elapsed duration, it sends 32 requests upstream by default, replenishing the number of requested elements as they are consumed.
This wouldn't be a problem if the environment variable for overriding the default prefetch wasn't capped to minimum 8.
This means that if I want to write real reactive code like above, I have to set the prefetch to one in every transformation. That sucks.
Is there a better way?

Prometheus increase not handling process restarts

I am trying to figure out the behavior of Prometheus' increase() querying function with process restarts.
When there is a process restart within a 2m interval and I query:
sum(increase(my_metric_total[2m]))
I get a value less than expected.
For example, in a simple experiment I mock:
3 lcm_restarts
1 process restart
2 lcm_restarts
All within a 2 minute interval.
Upon querying:
sum(increase(lcm_restarts[2m]))
I receive a value of ~4.5 when I am expecting 5.
lcm_restarts graph
sum(increase(lcm_restarts[2m])) result
Could someone please explain?
Pretty concise and well-prepared first question here. Please keep this spirit!
When working with counters, functions as rate(), irate() and also increase() are adjusting on resets due to restarts. Other than the name suggests, the increase() function does not calculate the absolute increase in the given time frame but is a different way to write rate(metric[interval]) * number_of_seconds_in_interval. The rate() function takes the first and the last measurement in a series and calculates the per-second increase in the given time. This is the reason why you may observe non-integer increases even if you always increase in full numbers as the measurements are almost never exactly at the start and end of the interval.
For more details about this, please have a look at the prometheus docs for the increase() function. There are also some good hints on what and what not to do when working with counters in the robust perception blog.
Having a look at your label dimensions, I also think that counter resets don't apply to your constructed example. There is one label called reason that changed between the restarts and so created a second time series (not continuing the existing one). Here you are also basically summing up the rates of two different time series increases that (for themselves) both have their extrapolation happening.
So basically there isn't really anything wrong what you are doing, you just shouldn't rely on getting highly precise numbers out of prometheus for your use case.
Prometheus may return unexpected results from increase() function due to the following reasons:
Prometheus may return fractional results from increase() over integer counter because of extrapolation. See this issue for details.
Prometheus may return lower than expected results from increase(m[d]) because it doesn't take into account possible counter increase between the last raw sample just before the specified lookbehind window [d] and the first raw sample inside the lookbehind window [d]. See this article and this comment for details.
Prometheus skips the increase for the first sample in a time series. For example, increase() over the following series of samples would return 1 instead of 11: 10 11 11. See these docs for details.
These issues are going to be fixed according to this design doc. In the mean time it is possible to use other Prometheus-like systems such as VictoriaMetrics, which are free from these issues.

Measure service latency with Prometheus

I am new to Prometheus and Grafana. My primary goal is to get the response time per request.
For me it seemed to be a simple thing - but whatever I do I do not get the results I require.
I need to be able to analyse the service latency in the last minutes/hours/days. The current implementation I found was a simple SUMMARY (without definition of quantiles) which is scraped every 15s.
Is it possible to get the average request latency of the last minute from my Prometheus SUMMARY?
If YES: How? If NO: What should I do?
Currently I am using the following query:
rate(http_response_time_sum{application="myapp",handler="myHandler", status="200"}[1m])
/
rate(http_response_time_count{application="myapp",handler="myHandler", status="200"}[1m])
I am getting two "datasets". The value of the first is "NaN". I suppose this is the result from a division by zero.
(I am using spring-client).
Your query is correct. The result will be NaN if there have been no queries in the past minute.

how to cluster percentile of events by time delta?

After a mailing at t0, I will have several "delivered" (and open and click) events (schema and example)
mailing_name, timestamp, email_id, event_type
niceattack, 2016-07-14 12:11:00, 42, open
niceattack, 2016-07-14 12:11:08, 842, open
niceattack, 2016-07-14 12:11:34, 847, open
I would like to see for a mailing how long it takes to be delivered to half of the recipients. So say that I'm sending an email to 1000 addresses now, the first open event is in 2 min, the last one is going to be in a week (and min/max first last seems to be easy to find) but what I'd like to see is that half of the recipients opened it in the first 2 hours after it was sent.
The goal is to send being able to compare is sending now vs on sat morning makes a difference on how fast it's open on average, or if one specific mailing get quicker exposure, and correlate that with other events (how many click on a link, take a specific action on our site...)
I tried to use a cumulate function (how many open event for mailing for each point), but it seems that the cumulative function isn't yet implemented https://github.com/influxdata/influxdb/issues/813
How do you solve that problem with influxdb?
Solving this problem with InfluxDB alone is not currently possible, however if you're willing to add Kapacitor into the mix, then it should be possible. In particular you'll need to write a User Defined Function (UDF) for that cumulative function in Kapacitor.
The general process will look like the following:
Install and Configure Kapacitor
Create a UDF for the cumulative function you're looking for
Enable that UDF inside of Kapacitor
Write a TICKscript that uses the UDF and writes the results back to InfluxDB
Enable a task defined by the TICKscript you've written
Query the InfluxDB instance to get the results of the cumulative function.
My appoligies for being so high level on this. This is a fairly involved process, but should give you the result you're looking for.

Is there any point to using Any() linq expression for optimisation purposes?

I have a MVC application which returns 2 types of Json responses from 2 controller methods; AnyRemindersExist() and GetAllUserReminders(). The first returns a boolean, 2nd returns an array, both wrapped as Json.
I have a JavaScript timer checking for calendar reminders against a user. It makes the first call (AnyRemindersExist) to check whether reminders exist and whether the client should then make the 2nd call.
For example, if the result of the Json response is false from the Any() query, it doesn't then make the 2nd controller action which makes a LINQ select call. If there are reminders that exist, it then goes further and then requests them (making use of the LINQ SELECT).
Imagine a system ramped up where 100-1000s users use the system and on the client, every 30-60 seconds a request comes in to load in the reminders. Does this Any() call help in anyway in reducing load on the server?
If you're always going to get the actual values afterwards, then no - it would make more sense to have fewer requests, and just always give the full results. I very much doubt that returning no results is slower than returning an indication that there are no results.
EDIT: tvanfosson's latest comment (at the time of this writing) is worth promoting:
You can really only tell by measuring and I'd only resort to it IFF the performance of the select only option didn't meet the requirements.
That's the most important thing about performance: the value of a guess is much less than the value of test data.
I would say that it depends on how the underlying queries are translated. If the any call is translated into an indexed lookup when the select (perhaps due to a join to get related data) must do some sort of table scan, then it will save some work in the case when there are no reminders to be found. It will cause a little extra work when there are reminders. It might be useful if the majority of the calls don't result in any results.
In the general case, though, I would just select the data and only try to optimize IF that turns out to not be fast enough. The conditions under which it will actually save effort on the server are pretty narrow and might only apply if you hand-craft the SQL rather than depend on your ORM.
Any only checks to see if there is at least one item in the Collection that is being returned. Versus using something like Count > 0 which counts the total amount of items in the collection then yes this is more optimal.
If your AnyRemindersExist method is operating on a similar principle then not calling a second call to the server would reduce your load.
So you are asking if not doing work the application doesn't need to do would reduce the workload on the server?
Of course. How would this answer every be "yes, doing extra work for no reason won't effect the server load".
It ultimately depends on how much faster the Any check is compared to getting the results and how often it will be false.
If the Any call takes near as long as the select then it pretty
much never makes sense.
If the Any call is much faster than the select but 90% of the
time it's true, then it probably isn't worth it (best case you
get 10% improvement, worst case it's actually more work).
If the Any call is much faster than the select and 90% of the
time it's false, then it probably makes sense to check if there
are any before actually getting results.
So the answer is it depends on your specific scenario. Ultimately you're going to need to measure both the relative performance (on different typical loads, maybe some queries are more intensive than others) as well as the frequency that there are no results to return.
Actually it should almost never make sense to check Any in this case.
If Any returns false then you don't need to grab the results.
However this means it would have returned no results anyway, so
unless your Any check is significantly faster than a select
returning 0 results, there's no added benefit here.
On the other hand, if Any returns true, then you'll need to get the
results anyway, so in this case Any is purely additional work done.

Resources