Grafana - How to consolidate Metrics from InfluxDB and CollectD - influxdb

we've been setting up CollectD with InfluxDB to collect metrics. The problem is to consolidate e.G. Metrics from cpu1, cpu2 and cpu3. In collectd (at least version 5.2) it is possible to enable the 'aggregate' plugin to exactly do what i need to. But we're using Debian 7 and surprise - collectd is only available in version 5.1.
Do you guys know how e.G. write a regex in grafana like this, so that i dont need to specify each metric for each cpu (it is not working below):
SELECT mean("value") FROM ".cpu-{0-3}.cpu-idle" WHERE
$timeFilter GROUP BY time($interval) fill(null)
Thank you very much!
EDIT:
I actually find out that iam able to specify multiple data sources ...FROM ".cpu-1.cpu-idle", ".cpu-2.cpu-idle" ... that results in one data line per source (which is obviously way too much).

Thanks #AussieDan. It's actually kind a embarrassing that i haven't even take a look on their website. I only visited the debian.org website.
The Answer solves my Problem on the most perfect way.

Related

Influxdb high CPU usage jumping to 80 %?

I am relatively new to time series db world . I am running a Influxdb 1.8.x as a docker container, and I have configured the influxdb.conf file as a default config. Currently I am facing a issue of high CPU usage by influxdb, the CPU jumps to 80 to 90% and creating a problem for other process running on same machine.
I have tried a solution given here ->> Influx high CPU issue but unfortunately It did not work? I am unable to understand the reason behind the issue and also struggling to get support in terms of documentation or community help.
What I have tried so far:
updated the monitor section of influxdb.conf file like this ->> monitor DB
Checked the series cardinality SHOW SERIES CARDINALITY and it looks well within limits--9400(I am also not sure about the ideal number for high cardinality red flag)
I am looking for an approach, which will help me understand this problem the root cause?
Please let me know if you need any further information on same.
After reading about Influxdb debug and CPU profiling HTTP API influxdb I was able to pin-down the issue, the problem was in the way I was making the query, my query involved more complex functions and also GROUP BAY tag.I also tried query analysis using EXPLAIN ANALYZE (query) command to check how much time a query is taking to execute. I resolved that and noticed a huge Improvement in CPU load.
Basically I can suggest the following:
Run the CPU profile analysis using influxdb HTTP API with the command curl -o <file name> http://localhost:8086/debug/pprof/all?cpu=true e and collect result.
Visualize the result using tool like Pprof tool and find the problem
Also one can run basic commands like SHOW SERIES CARDINALITY and EXPLAIN ANALYZE <query> to understand the execution of the query
Before designing any schema and Influx client check the hardware recommendation ->> Hardware sizing guidelines

bosun and telegraf metrics meta information

hello i really want to use bosun/tsdbrelay/opentsdb with the telegraf collector, as it gets all the metrics we want to monitor out of the box.
i allready have a small setup to push metrics from 5 servers to bosun for indexing and opentsdb for storage.
i used the haproxy configs from kyle brandts bosun infrastructure blog to make the tsdbs ha-ready
but bosun is showing that it cannot use the auto-type for metrics, and also in the primary stats view does not show any graphs for cpu / mem etc.
what can i provide that the graphs show up.
kind regards.
Both of these features are mostly scollector specific. The "host" view (I've considered ripping that out, it was done in the early days, better to use something like grafana) depends on scollector specific metrics such as os.cpu.
As far as "Auto" for rate vs gauge, that is also metadata that comes from scollector and sent to bosun. If you want to try to mimic the behavior see https://github.com/bosun-monitor/bosun/blob/master/metadata/metadata.go#L30 and https://github.com/bosun-monitor/bosun/blob/master/metadata/metadata.go#L195 - you would need to create at least the "rate" key for each metric you are getting from telegraph.

How can we collect performance metrics from CAdvisor docker container?

Sorry I just started to learn docker. My question may seem stupid for some of you.
In fact, I would like to know if there is a way to collect performance metrics from "CAdvisor" container (not from cgroup) at runtime ? I mean, extract performance values from the curves designed by cadvisor like memory usage or network traffic.
I need to record this values and save them in a database so that, I can perform a statistic analyzes upon these generated values (like comparing memory consumption for two docker containers at t=50s).
Thanks in advance.
As other answers mention, cAdvisor doesn't provide its own performance data API, instead it exposes metrics which are typically handled in a separate database if one wants to derive performance data beyond "real time". For example, cAdvisor exports Prometheus metrics natively:
http://prometheus.io/docs/instrumenting/exporters/
The Prometheus metric types:
http://prometheus.io/docs/concepts/metric_types/
Prometheus supports a fairly rich functional expression language that can be used for querying and visualization:
http://prometheus.io/docs/querying/basics/
cAdvisor does provide a rest endpoint to get any stats in real time. By default, it keeps latest two minute of data. You can configure it to keep more or less. It also supports a storage backend to keep dumping stats to an influxdb database.
REST Api:
eg. /api/v1.3/containers
doc: https://github.com/google/cadvisor/blob/master/docs/api.md
Doc on setting up InfluxDB:
https://github.com/google/cadvisor/blob/master/docs/influxdb.md
I think you could use https://github.com/tutumcloud/container-metrics for this. Basically what that would be doing is using influxdb http://influxdb.com/ as a time series data store.
There is some more information available here: http://blog.tutum.co/2014/08/25/panamax-docker-application-template-with-cadvisor-elasticsearch-grafana-and-influxdb/
A couple of people seemed to be looking into the ELK stack (Elastic Search, Logstash, Kibana) for visualising some of this data here: https://github.com/google/cadvisor/issues/634

Graphite Network interface monitoring

I'm trying to monitor my network usage with Graphite but I can't figure out how to do this, could you help me please?
In addition to this, I would like to monitor some other services, as nginx, mysql, etc,..
Thanks for your help!
Best.
Ofcourse there are multiple solutions possible. The solution I'm using is collectd. With collectd you can collect statistic data from plugins in rrd files. It has a lot of plugins like network, nginx and mysql.
It does not generate graphs by itself, but there are multiple ways to generate graphs. One of them is to send the collected data to graphite with the graphite plugin.

Best tool to record CPU and memory usage with Grinder?

I am using grinder in order to generate reports for the performance tests for my application. But I noticed that it does not generate any report on CPU and memory usage. On further investigation, I found that Grinder does not provide this information. Now, my question is, is there any tool that can be hooked up with grinder, to record the CPU and memory usage details?
As you have discovered, this is not supported directly in The Grinder itself. You will need to use a collection of tools to accomplish this.
I use a combination of Quickstatd, Graphite, and Grinder to Graphite to get all my results in the same place where I can see them. If you need to support Windows, you can probably use collectd (with ssc-serv and the Graphite plugin) instead of Quickstatd, which is based on bash scripts.
You can also pull in server side metrics (like DB lookups per second, etc.) with tools like jmxtrans, statsd, and metrics.
Having all that information in the same place is really powerful, and can give you some good insights.
If you grind a Java server, you can get data via JMX from OperatingSystemMXBean and MemoryMXBean.
Then add the data to a Grinder user Statistic and the data will end up in the -data.log
grinder.statistics.registerDataLogExpression("Load", "userDouble0")
..
grinder.statistics.forCurrentTest.setDouble("userDouble0", systemLoadAverage)
the -data.log can directly be fed into Gnuplot
gnuplot> plot 'client-0-data.log' using 2:7 title "System Load"

Resources