How to get grafana to dynamically add graphs for newly added hosts? For example, I have grafana chart to display load average for existing hosts. When I add a new host, the collectd will send the new host metrics to influxdb. But every time I have to manually add one more graph in grafana which is not desired? Is there a way to get grafana automatically plot the new host metrics without changing grafana?
You have to make use of the Grafana HTTP api and update your dashboard by adding the new graph that you want. This practically means that you have to:
use the api to take the json of the dashboard
handle this data and add your extra code for the new panel that you want to add
use the api again to update the dashboard
The hierarchy is simple: a dashboard has rows and rows have panels. Probably you will have to add some json code inside panels. Go check your json file and all these will make sense to you...
You can use regexp patterns in InfluxDB 0.8 (see also the 0.9 equivalent docs) to match all your newly added hosts. InfluxDB regexps use the Golang syntax.
For example, to match all series starting with stats.cpuNUMBER:
series: /^stats\.cpu\d+/
select: avg(load)
However this way you won't get one new plot for each newly added host, but a line for every host in the same plot.
You have to add regex in your select clause.
SELECT mean(value) FROM /logstash.*.requests.count/ WHERE $timeFilter
GROUP BY time($interval)
Above script will plot each series matching above regex automatically for all hosts without changing the grafana.
logstash.ABC1.requests.count
logstash.ABC2.requests.count
logstash.ABC3.requests.count
When ABC4 host is added and it is shipped correctly, new graph will be plotted automatically.
Related
I have Prometheus setup on AWS EC2. I have 11 targets configured which have 2+ endpoints. I would like to setup a endpoint/query etc to gather all the metrics in one page. I am pretty stuck right now. I could use some help thanks:)
my prometheus targets file
Prometheus adds an unique instance label per each scraped target according to these docs.
Prometheus provides an ability to select time series matching the given series selector. For example, the following series selector selects time series containing {instance="1.2.3.4:56"} label, e.g. all the time series obtained from the target with the given instance label.
Prometheus provides the /api/v1/series endpoint, which returns time series matching the provided match[] series selector.
So, if you need obtaining all the time series from a particular target my-target, you can issue the following request to /api/v1/series:
curl 'http://prometheus:9090/api/v1/series?match[]={instance="my-target"}'
If you need obtaining metrics from the my-target at the given timestamp, then issue the query with the series selector to /api/v1/query:
curl 'http://prometheus:9090/api/v1/query?query={instance="my-target"}&time=needed-timestamp'
If you need obtaining all the raw samples from the my-target on the given time range (end_timestamp+d ... end_timestamp], then use the following query:
curl 'http://prometheus:9090/api/v1/query?query={instance="my-target"}[d]&time=end_timestamp'
See these docs for details on how to read raw samples from Prometheus.
If you need obtaining all the metrics / series from all the targets, then just use the following series selector: {__name__!=""}
See also /api/v1/query_range - this endpoint is used by Grafana for building graphs from Prometheus data.
I am monitoring my nodejs application using prometheus/grafana/express-prom-bundle which exposes a counter metric called http_request_duration_seconds_count. The metric has three labels of interest. status_code, path and method.
I would like to display a table in my grafana dashboard to list the most frequently failed paths/method (status_code="500") within the dashboard date range.
is that possible and if so what is prometheus query and grafana table settings that I need to achieve this list.
Thank you in advance for your help.
Here you want the topk aggregator, so
topk(5,
sum by (method, path) (
rate(http_request_duration_seconds_count{status_code="500"}[5m])
)
)
I have four singlestat panels which show my used space on different hosts (every host has also different type_instances):
The query for one of this singlestats is the following:
Question: Is there a way to create a fifth singlestat panel which sows the sum of the other 4 singlestats ? (The sum of all "storj_value" where type=shared)
The influx query language does not currently support aggregations across metrics (eg, JOINs). It is possible with Kapacitor but that requires that new aggregated values for all the measurements are written to the DB, by writing code to do it, which will need to be queried separately.
Only option currently is to use an API that does have cross-metric function support, for example Graphite with an InfluxDB storage back-end, InfluxGraph.
The two APIs are quite different - Influx's is query language based, Graphite is not - and tagged InfluxDB data will need to be configured as a Graphite metric path via templates, see configuration examples.
After that, Graphite functions that act across series can be used, in particular for the above question, sumSeries.
I'm not fully understanding the structure of influxDB and I'm wishing to drop some data from it. I have the following:
net,agent_host=10.0.1.1,host=debian
net,host=debian,interface=all
net,host=debian,interface=eth0
net,host=dns1,interface=all
net,host=dns1,interface=eth0
net,host=dns2,interface=all
net,host=dns2,interface=eth0
net,host=plex,interface=all
net,host=plex,interface=eth0
I've been able to empty the series (?) net,agent_host=10.0.1.1,host=debian, but as you can see the value is still listed when I do show series.
I don't want to nuke all of the net series, but I don't want the index(?) net,agent_host=10.0.1.1,host=debian showing up. It adds extra fields in Grafana and I just don't want it.
Can I get rid of this without nuking all of the net series?
I dropped the data by issuing: drop series from net where agent_host="10.0.1.1"
EDIT: Totally forgot. InfluxDB version 1.1.1
I'm using Telegraf/InfluxDB/Grafana to register and view metrics for my servers. Occasionally one of these components crash and metrics stop flowing into InfluxDB.
To be able to notice when this happens (on top of using Monit to restart the service) I would like to create a Grafana dashboard where I have a singlestat panel for each host that shows the most recent timestamp (or better, how much time has passed) since the last metric was received. I'd also like to colorize the background of the singlestat depending on how long it's been. I would like to be able to do this for any InfluxDB metric, as different metrics can have different reasons for lagging behind.
Right now, I've tried something like this in InfluxQL, but I just get an error that at least one non-time field must be present in the query:
SELECT last(time) FROM "system" WHERE "load1" > -1 GROUP BY "host"
If I try to change it to this I get a "Multiple series error":
SELECT last(time), last("load1") FROM "system" GROUP BY "host"
Is what I'm trying to do not easily doable or am I missing something obvious?
...the query syntax was taken from somewhere in this issue.
I haven't yet looked at the singlestat panel, but I suggest creating a 'scripted dashboard'.
Start by manually creating a singlestat dash with a dummy number. Then export it to see what the json looks like.
Then, recreate that same json, with the live result from the above query, using a scripted dashboard.
Look in grafana's /public/dashboards for 'scripted.js' or 'scripted_templated.js'. Make a copy of one of those in the same folder, then hack away to generate your json. Here and here and here are some nice examples for the javascript magic to submit the query to influxdb and parse the result into the json for your singlestat dash. Good luck.