How to show a label prometheus Gauge as Grafana times series or state timeline with instant value (no range)? - time-series

The Problem
Context
I'm trying to display the differents values of my Prometheus Gauge over times (depending on label). I opted for State Timeline
Entry Data
Actualy the entry data are the following Gauge:
{"metric":{"__name__":"xxxx","instance":"xxxxx","name":"eck-operator","domain":"xxxxxx","infra":"xxx","productname":"xxxx","state":"Success","version":"x.x.x"},"values":[1,1,1,1,...],"timestamps":[....]}
{"metric":{"__name__":"xxxxx","instance":"xxxxxxx","name":"eck-operator","domain":"xxxxxxx","infra":"xx","productname":"xxxxxx","state":"OpenShift-fr01: Request timeout","version":"x.x.x"},"values":[0,0],"timestamps":[<VALUE>,<VALUE2>]}
{"metric":{"__name__":"xxxxx","instance":"xxxxxxx","name":"eck-operator","domain":"xxxxxxx","infra":"xx","productname":"xxxxxx","state":"Helm Client: Helm repo add failled","version":"x.x.x"},"values":[0],"timestamps":[<VALUE>]}
As you can see over the last 90 days there are only two timestamps set to 0 for the state OpenShift-fr01: Request timeout and one for Helm Client: Helm repo add failled. That why there are two KO Row on different state.
Expected behavior
Data is send every minutes so the red row should not fill the space but only a little bar?
Can someone explein me what i did wrong ?
Thanks ~

Related

How can I take all targets' metrics in one page at Prometheus

I have Prometheus setup on AWS EC2. I have 11 targets configured which have 2+ endpoints. I would like to setup a endpoint/query etc to gather all the metrics in one page. I am pretty stuck right now. I could use some help thanks:)
my prometheus targets file
Prometheus adds an unique instance label per each scraped target according to these docs.
Prometheus provides an ability to select time series matching the given series selector. For example, the following series selector selects time series containing {instance="1.2.3.4:56"} label, e.g. all the time series obtained from the target with the given instance label.
Prometheus provides the /api/v1/series endpoint, which returns time series matching the provided match[] series selector.
So, if you need obtaining all the time series from a particular target my-target, you can issue the following request to /api/v1/series:
curl 'http://prometheus:9090/api/v1/series?match[]={instance="my-target"}'
If you need obtaining metrics from the my-target at the given timestamp, then issue the query with the series selector to /api/v1/query:
curl 'http://prometheus:9090/api/v1/query?query={instance="my-target"}&time=needed-timestamp'
If you need obtaining all the raw samples from the my-target on the given time range (end_timestamp+d ... end_timestamp], then use the following query:
curl 'http://prometheus:9090/api/v1/query?query={instance="my-target"}[d]&time=end_timestamp'
See these docs for details on how to read raw samples from Prometheus.
If you need obtaining all the metrics / series from all the targets, then just use the following series selector: {__name__!=""}
See also /api/v1/query_range - this endpoint is used by Grafana for building graphs from Prometheus data.

How to send non aggregated metric to Influx from Springboot application?

I have a SpringBoot application that is under moderate load. I want to collect metric data for a few of the operations of my app. I am majorly interested in Counters and Timers.
I want to count the number of times a method was invoked (# of invocation over a window, for example, #invocation over last 1 day, 1 week, or 1 month)
If the method produces any unexpected result increase failure count and publish a few tags with that metric
I want to time a couple of expensive methods, i.e. I want to see how much time did that method took, and also I want to publish a few tags with metrics to get more context
I have tried StatsD-SignalFx and Micrometer-InfluxDB, but both these solutions have some issues I could not solve
StatsD aggregates the data over flush window and due to aggregation metric tags get messed up. For example, if I send 10 events in a flush window with different tag values, and the StatsD agent aggregates those events and publishes only one event with counter = 10, then I am not sure what tag values it's sending with aggregated data
Micrometer-InfluxDB setup has its own problems, one of them being micrometer sending 0 values for counters if no new metric is produced and in that fake ( 0 value counter) it uses same tag values from last valid (non zero counter)
I am not sure how, but Micrometer also does some sort of aggregation at the client-side in MeterRegistry I believe, because I was getting a few counters with a value of 0.5 in InfluxDB
Next, I am planning to explore Micrometer/StatsD + Telegraf + Influx + Grafana to see if it suits my use case.
Questions:
How to avoid metric aggregation till it reaches the data store (InfluxDB). I can do the required aggregation in Grafana
Is there any standard solution to the problem that I am trying to solve?
Any other suggestion or direction for my use case?

How to set different resolution on different chart in netdata?

I have setup a netdata server to monitor application metrics, and would like to use statsd to collect metrics data. After research some days, I still have no idea of how to set different resolution on different chart.
For example, I would like to show total sales every hour, however the request count every minute. It seem netdata will only refresh the chart every one second(the global setting of 'update every'). So how to make netdata refresh total sales chart per hour(a lower resolution), and the total request count per minute? or I just misunderstood netdata/statsd?
Thanks in advance.

InfluxDB: query to calculate average of StatsD "executionTime" values

I'm sending metrics in StatsD format to Telegraf, which forwards them to InfluxDB 0.9.
I'm measuring execution times (of some event) from multiple hosts. The measurement is called "execTime", and the tag is "host". Once Telegraf gets these numbers, it calculates mean/upper/lower/count, and stores them in separate measurements.
Sample data looks like this in influxdb:
TIME...FIELD..............HOST..........VALUE
t1.....execTime.count.....VM1...........3
t1.....execTime.mean......VM1...........15
t1.....execTime.count.....VM2...........6
t1.....execTime.mean......VM2...........22
(So at time t1, there were 3 events on VM1, with mean execution time 15ms, and on VM2 there were 6 events, and the mean execution time was 22ms)
Now I want to calculate the mean of the operation execution time across both hosts at time t1. Which is (3*15 + 6*22)/(3+6) ms.
But since the count and mean values are in two different series, I can't simply use "select mean(value) from execTime.mean"
Do I need to change my schema, or can I do this with the current setup?
What I need is essentially a new series, which is a combination of the execTime.count and execTime.mean across all hosts. Instead of calculating this on-the-fly, the best approach seems to be to actually create the series along with the others.
So now I have two timer stats being generated on each host for each event:
1. one event with actual hostname for the 'host' tag
2. second event with one tag "host=all"
I can use the first set of series to check mean execution times per host. And the second series gives me the mean time for all hosts combined.
It is possible to do mathematical operations on fields from two different series, provided both series are members of the same measurement. I suspect your schema is non-optimized for your use case.

Kibana 4: how to aggregate by hour (or minute)?

I want to answer the question: what time window my servers are least used?
My idea is to have a 24h histogram (with 10min buckets) with the count of requests made to my server. In other words, I want to ignore date portion in datetime aggregations.
How do I create such graph in Kibana 4?

Resources