How can I take all targets' metrics in one page at Prometheus - devops

I have Prometheus setup on AWS EC2. I have 11 targets configured which have 2+ endpoints. I would like to setup a endpoint/query etc to gather all the metrics in one page. I am pretty stuck right now. I could use some help thanks:)
my prometheus targets file

Prometheus adds an unique instance label per each scraped target according to these docs.
Prometheus provides an ability to select time series matching the given series selector. For example, the following series selector selects time series containing {instance="1.2.3.4:56"} label, e.g. all the time series obtained from the target with the given instance label.
Prometheus provides the /api/v1/series endpoint, which returns time series matching the provided match[] series selector.
So, if you need obtaining all the time series from a particular target my-target, you can issue the following request to /api/v1/series:
curl 'http://prometheus:9090/api/v1/series?match[]={instance="my-target"}'
If you need obtaining metrics from the my-target at the given timestamp, then issue the query with the series selector to /api/v1/query:
curl 'http://prometheus:9090/api/v1/query?query={instance="my-target"}&time=needed-timestamp'
If you need obtaining all the raw samples from the my-target on the given time range (end_timestamp+d ... end_timestamp], then use the following query:
curl 'http://prometheus:9090/api/v1/query?query={instance="my-target"}[d]&time=end_timestamp'
See these docs for details on how to read raw samples from Prometheus.
If you need obtaining all the metrics / series from all the targets, then just use the following series selector: {__name__!=""}
See also /api/v1/query_range - this endpoint is used by Grafana for building graphs from Prometheus data.

Related

How do you display most failed http requests in prometheus/grafana table?

I am monitoring my nodejs application using prometheus/grafana/express-prom-bundle which exposes a counter metric called http_request_duration_seconds_count. The metric has three labels of interest. status_code, path and method.
I would like to display a table in my grafana dashboard to list the most frequently failed paths/method (status_code="500") within the dashboard date range.
is that possible and if so what is prometheus query and grafana table settings that I need to achieve this list.
Thank you in advance for your help.
Here you want the topk aggregator, so
topk(5,
sum by (method, path) (
rate(http_request_duration_seconds_count{status_code="500"}[5m])
)
)

Create sum of multiple queries with influxdb

I have four singlestat panels which show my used space on different hosts (every host has also different type_instances):
The query for one of this singlestats is the following:
Question: Is there a way to create a fifth singlestat panel which sows the sum of the other 4 singlestats ? (The sum of all "storj_value" where type=shared)
The influx query language does not currently support aggregations across metrics (eg, JOINs). It is possible with Kapacitor but that requires that new aggregated values for all the measurements are written to the DB, by writing code to do it, which will need to be queried separately.
Only option currently is to use an API that does have cross-metric function support, for example Graphite with an InfluxDB storage back-end, InfluxGraph.
The two APIs are quite different - Influx's is query language based, Graphite is not - and tagged InfluxDB data will need to be configured as a Graphite metric path via templates, see configuration examples.
After that, Graphite functions that act across series can be used, in particular for the above question, sumSeries.

Stackdriver: ElementCount from Specific Dataflow PCollection Output

I've got a dataflow job that pulls messages off of several Google Pub/Sub topics, does some parallel processing on the individual elements contained within those messages, then passes the collection on for further consumption by various resources. I'd like to put together a Stackdriver dashboard showing how many individual elements have been processed for each topic. Each ParDo step outputs a PCollection.
I've set up a dashboard using ElementCount, but I'm only able to filter by job, not by step. If I mouseover the line in the chart produced using ElementCount, I can see counts for every single step. Indeed, it does appear that the metrics for these are being reported, as I can use the gcloud commandline utility in the following manner:
gcloud beta dataflow metrics list [jobid] --filter ElementCount
...
name:
context:
original_name: extract_value_topic_1/Map-out0-ElementCount
output_user_name: extract_value_topic_1/Map-out0
name: ElementCount
origin: dataflow/v1b3
scalar: 7000
updateTime: '2017-05-03T18:13:22.804Z'
---
name:
context:
original_name: extract_value_topic_2/Map-out0-ElementCount
output_user_name: extract_value_topic_2/Map-out0
name: ElementCount
origin: dataflow/v1b3
scalar: 12000
updateTime: '2017-05-03T18:13:22.804Z'
I have several of these, but I don't see a straightforward way of building Stackdriver charts based on them (aside from logging to the console for every element processed then using that to generate a log-based metric, but that seems like it'd be extremely inefficient on a number of levels.) Am I missing something? How would one create a chart based on these ElementCounts?
Edit: Additionally, if I open up the Metrics Explorer I can enter dataflow/job/element_count into the search box then pcollection into the filter box, but I'm unable to build a dashboard with this chart in it as the filter selection in the dashboard chart builder does not allow for filtering by pcollection.
Unfortunately, you currently cannot build a dashboard with a filter on a metric label. As you noticed, the new (Beta) Metric Explorer provides the filtering functionality and the Stackdriver team is actively working on providing that functionality to the dashboard charts as well.
I will follow up if I receive any further updates or details from the Stackdriver team.
--Andrea

Split a KV<K,V> PCollection into multiple PCollections

Hi after performing a group by key on a KV Pcollection, I need to:-
1) Make every element in that PCollection a separate individual PCollection.
2) Insert the records in those individual PCollections into a BigQuery Table.
Basically my intention is to create a dynamic date partition in the BigQuery table.
How can I do this?
An example would really help.
For Google Dataflow to be able to perform the massive parallelisation which makes it as one of its kind (as a service on the public cloud), the job flow needs to be predefined before submitting it to on the Google cloud console. Everytime you execute the jar file that conatins your pipleline code (which includes pipeline options and the transforms), a json file with the description of the job is created and submitted to Google cloud platform. The managed service then uses this to execute your job.
For the use case mentioned in the question, it demands that the input PCollection be split into as many PCollections as their are unique dates. For the split, the Tuple Tags needed to split the collection should be created dynamically which is not possible at this time. Creating tuple tags dynamically is not allowed because that doesn't help in creating the job description json file and beats the whole design/purpose with which dataflow was built.
I can think of a couple of solutions to this problem (both having its own pros and cons) :
Solution 1 (a workaround for the exact use case in the question):
Write a dataflow transform that takes the input PCollection and for each element in the input -
1. Checks the date of the element.
2. Appends the date to a pre-defined Big Query Table Name as a decorator (in the format yyyyMMDD).
3. Makes an HTTP request to the BQ API to insert the row into the table with the table name added with a decorator.
You will have to take into consideration the cost perspective in this approach because there is single HTTP request for every element rather than a BQ load job that would have done it if we had used the BigQueryIO dataflow sdk module.
Solution 2 (best practice that should be followed in these type of use cases):
1. Run the dataflow pipeline in the streaming mode instead of batch mode.
2. Define a time window with whatever is suitable to the scenario in which it is being is used.
3. For the `PCollection` in each window, write it to a BQ table with the decorator being the date of the time window itself.
You will have to consider rearchitecting your data source to send data to dataflow in the real time but you will have a dynamically date partitioned big query table with the results of your data processing being near real time.
References -
Google Big Query Table Decorators
Google Big Query Table insert using HTTP POST request
How job description files work
Note: Please mention in the comments and I will elaborate the answer with code snippets if needed.

Grafana dynamically display new hosts added by collectd

How to get grafana to dynamically add graphs for newly added hosts? For example, I have grafana chart to display load average for existing hosts. When I add a new host, the collectd will send the new host metrics to influxdb. But every time I have to manually add one more graph in grafana which is not desired? Is there a way to get grafana automatically plot the new host metrics without changing grafana?
You have to make use of the Grafana HTTP api and update your dashboard by adding the new graph that you want. This practically means that you have to:
use the api to take the json of the dashboard
handle this data and add your extra code for the new panel that you want to add
use the api again to update the dashboard
The hierarchy is simple: a dashboard has rows and rows have panels. Probably you will have to add some json code inside panels. Go check your json file and all these will make sense to you...
You can use regexp patterns in InfluxDB 0.8 (see also the 0.9 equivalent docs) to match all your newly added hosts. InfluxDB regexps use the Golang syntax.
For example, to match all series starting with stats.cpuNUMBER:
series: /^stats\.cpu\d+/
select: avg(load)
However this way you won't get one new plot for each newly added host, but a line for every host in the same plot.
You have to add regex in your select clause.
SELECT mean(value) FROM /logstash.*.requests.count/ WHERE $timeFilter
GROUP BY time($interval)
Above script will plot each series matching above regex automatically for all hosts without changing the grafana.
logstash.ABC1.requests.count
logstash.ABC2.requests.count
logstash.ABC3.requests.count
When ABC4 host is added and it is shipped correctly, new graph will be plotted automatically.

Resources