Collecting metrics from multiple telegraf to prometheus - monitoring

Continue on from the question of Sending metrics from telegraf to prometheus, which covers the case of single telegraf agent, what's the suggested setup to collect metrics from multiple telegraf to prometheus?
In the end, I want prometheus to chart (on the same graph), CPU usage of server-1, server-2, ... to server-n, in their own lines.

Taking the configuration from the original post, you can simply add targets to you telegraf job; supposing that the same Telegraf config is used on each server.
scrape_configs:
- job_name: 'telegraf'
scrape_interval: 5s
static_configs:
- targets: ['server-1:9126','server-2:9126',...]
It will produce the metrics (ex: cpu_time_user) with different instance tag corresponding to the targets configured. Typing the metric name in Prometheus will display all of them.
If you really want to see only the name of the server, you can use metric_relabel_configs to generate an additional label:
scrape_configs:
- job_name: 'telegraf'
...
metric_relabel_configs:
- source_labels: [instance]
regex: '(.*):\d+'
target_label: server
Automatically adding servers to your Prometheus config is a matter of service discovery.

Related

Prometheus with Dockerfile

I have the following Dockerfile:
FROM prom/prometheus
ADD prometheus.yml /etc/prometheus/
with prometheus.yml:
global:
scrape_interval: 15s
external_labels:
monitor: 'codelab-monitor'
scrape_configs:
- job_name: 'prometheus'
metrics_path: /metrics
scrape_interval: 15s
static_configs:
- targets: ['localhost:9090']
- job_name: 'auth-service'
scrape_interval: 15s
metrics_path: /actuator/prometheus
static_configs:
- targets: ['localhost:8080']
And run it with the following command:
docker build -t prometheus .
docker run -d -p 9090:9090 --rm prometheus
prometheus has status up
auth-service has status down (Get "http://localhost:8080/actuator/prometheus": dial tcp 127.0.0.1:8080: connect: connection refused)
How can I solve problem with auth-service, because from local machine I can get metrics from this address http://localhost:8080/actuator/prometheus:
v.balun#macbook-vbalun Trainter-Prometheus % curl -X GET
http://localhost:8080/actuator/prometheus
# HELP jvm_memory_committed_bytes The amount of memory in bytes that is committed for the
Java virtual machine to use
# TYPE jvm_memory_committed_bytes gauge
jvm_memory_committed_bytes{area="heap",id="G1 Survivor Space",} 4194304.0
jvm_memory_committed_bytes{area="heap",id="G1 Old Gen",} 3.145728E7
jvm_memory_committed_bytes{area="nonheap",id="Metaspace",} 3.0982144E7
jvm_memory_committed_bytes{area="nonheap",id="CodeHeap 'non-nmethods'",} 2555904.0
jvm_memory_committed_bytes{area="heap",id="G1 Eden Space",} 2.7262976E7
jvm_memory_committed_bytes{area="nonheap",id="Compressed Class Space",} 4325376.0
jvm_memory_committed_bytes{area="nonheap",id="CodeHeap 'non-profiled nmethods'",} 6291456.0
The issue you are having seems not related to prometheus, it seems it is at the docker network level.
Inside your prometheus container you are saying this:
static_configs:
- targets: ['localhost:8080']
But remember that localhost is NOT now your physical host (As when you ran it locally outside Docker), it's now inside the container, and inside the same container most likely you don't have your service running...
With the information provided I suggest you the following:
Try first instead localhost use your real IP, depending on the network configuration you are using for your container, it will be enough...
You can use instead localhost the ip address of your auth-service, this is the one given by docker, you can run a docker inspect... to get it.
If #1 and #2 didn't work and if auth-service is running in another container inside the same physical host, then you could use a bridge network to make the communication between the containers possible, more details here: https://docs.docker.com/network/bridge/
👆 Once both containers are running in the same network you can use the container name to reference it instead localhost, something like:
static_configs:
- targets: ['auth-service:8080']

Prometheus Operator change the scrape_interval

I want to set scrape_interval for the Prometheus to 15 seconds. My config below doesn't work, there is an error in the last line. I am wondering how should I config the 15 seconds scrape_interval?
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
name: main
spec:
serviceAccountName: prometheus
replicas: 1
version: v1.7.1
serviceMonitorNamespaceSelector: {}
serviceMonitorSelector:
matchLabels:
team: frontend
ruleSelector:
matchLabels:
role: alert-rules
prometheus: rules
resources:
requests:
memory: 400Mi
scrape_interval: 15s ##Error in this line.
I got this error message when compiling the config above:
error: error validating "promethus.yml": error validating data: ValidationError(Prometheus): unknown field "scrape_interval" in com.coreos.monitoring.v1.Prometheus; if you choose to ignore these errors, turn validation off with --validate=false
Thanks!
scrape_interval is probably a parameter name in the prometheus config and not for the Prometheus object in k8s (which is read by prometheus-operator and used to generate actual config).
You can see in the prometheus operator documentation that the parameter you are looking for is scrapeInterval. Ensure correct indentation, this is supposed to be part of spec:.
Note that you do not have to change scrape interval globally. You can have per scrape target intervals defined in your ServiceMonitor objects.
scrape_interval should go under the global Prometheus configuration:
Prometheus configuration is YAML. The Prometheus download comes
with a sample configuration in a file called prometheus.yml that is a
good place to get started.
Here is an example of a valid configuration YAML. Please notice:
# my global config
global:
scrape_interval: 15s
evaluation_interval: 30s
# scrape_timeout is set to the global default (10s).
Your file named "promethus.yml" with apiVersion: monitoring.coreos.com/v1 is not the same as the config file prometheus.yml I mentioned above and so, adding the scrape_interval to it would result in a validation error. You cannot mix Prometheus configs with Prometheus Operator's ones. These are different concepts.
I also recommend going through the official guide to get a better grip of the Prometheus and it's configuration options. Or stick with the Prometheus Operator.

cAdvisor prometheus integration returns container_cpu_load_average_10s as 0

I have configured Prometheus to scrape metrics from cAdvisor. However, the metric "container_cpu_load_average_10s" only returns 0. I am able to see the CPU metrics under the cAdvisor web UI correctly but Prometheus receives only 0. It is working fine for other metrics like "container_cpu_system_seconds_total". Could someone point if I am missing something here?
Prometheus version: 2.1.0
Prometheus config:
scrape_configs:
- job_name: cadvisor
scrape_interval: 5s
metrics_path: /metrics
scheme: http
static_configs:
- targets:
- 172.17.0.2:8080
cAdvisor version: 0.29.0
In order to get the metric container_cpu_load_average_10s, the cAdvisor must run with the option
--enable_load_reader=true
which is set fo false by default. This is described here.
If the value is zero, it means the container is idle.
You don't need 'enable_load_reader'. I don't enable it as it may make cAdvisor unstable.
Some useful links:
Linux Load Averages: Solving the Mystery
High CPU utilization but low load average question
cAdvisor enable_load_reader

Prometheus create label from metric label

We are running the node-exporter in containers. To quickly identify on which host each node-exporter is running, I created a metric that looks like this: host{host="$HOSTNAME",node="$CONTAINER_ID"} 1
I'm looking for a way to extract the hostname in host= and create a label for each node-exporter instance as a hostname label. I tried numerous configurations and none seem to work. Current prometheus config looks like this:
scrape_configs:
- job_name: 'node'
scrape_interval: 10s
scrape_timeout: 5s
metrics_path: /metrics
scheme: http
dns_sd_configs:
- names:
- tasks.master-nodeexporter
refresh_interval: 30s
type: A
port: 9100
relabel_configs:
- source_labels: ['host']
regex: '"(.*)".*'
target_label: 'hostname'
replacement: '$1'
This is not possible, as target relabelling happens before the scrape.
What you want to do here is use service discovery to have the right hostname in the first place, which is not possible with dns_sd_configs. You might look at something like Consul and https://www.robustperception.io/controlling-the-instance-label/
If someone comes across this:
Create this as docker-entrypoint.sh and make it executable.
#!/bin/sh -e
# Must be executable by others
NODE_NAME=$(cat /etc/nodename) echo "node_meta{node_id=\"$NODE_ID\", container_label_com_docker_swarm_node_id=\"$NODE_ID\", node_name=\"$NODE_NAME\"} 1" > /etc/node-exporter/node-meta.prom
set -- /bin/node_exporter "$#"
exec "$#"
Than create a Dockerfile like this
FROM prom/node-exporter:latest
ENV NODE_ID=none
USER root
COPY conf /etc/node-exporter/
ENTRYPOINT [ "/etc/node-exporter/docker-entrypoint.sh" ]
CMD [ "/bin/node_exporter" ]
Than build it and you will always get the Hostname as a node_meta metric
This answer explains how to export node name via node_name label at node_meta metric. Then it is possible to add node_name label to any metric exposed by node_exporter with group_left() modifier during querying. For example, the following PromQL query adds node_name label from node_meta metric to node_memory_Active_bytes metric:
node_memory_Active_bytes
* on(job,instance) group_left(node_name)
node_meta
See more details about group_left() modifier in these docs and this article.

Docker - Prometheus container dies immediately

I have cadvisor running with port mapping 4000:8080 and I have to link it with a container with prometheus.
My prometheus.yml is:
scrape_configs:
# Scrape Prometheus itself every 2 seconds.
- job_name: 'prometheus'
scrape_interval: 2s
target_groups:
- targets: ['localhost:9090', 'cadvisor:8080']
This file has path /home/test/prometheus.yml.
To run the container with prometheus, I do:
docker run -d -p 42047:9090 --name=prometheus -v /home/test/prometheus.yml:/etc/prometheus/prometheus.yml --link cadvisor:cadvisor prom/prometheus -config.file=/etc/prometheus/prometheus.yml -storage.local.path=/prometheus -storage.local.memory-chunks=10000
The container is created, but it dies immediately.
Can you tell me where the problem is?
Messages form docker events& :
2016-11-21T11:43:04.922819454+01:00 container start 69d03c68525c5955cc40757dc973073403b13fdd41c7533f43b7238191088a25 (image=prom/prometheus, name=prometheus)
2016-11-21T11:43:05.152141981+01:00 container die 69d03c68525c5955cc40757dc973073403b13fdd41c7533f43b7238191088a25 (exitCode=1, image=prom/prometheus, name=prometheus)
Config format is changed. targets come under static_config in the latest version.
scrape_configs:
# Scrape Prometheus itself every 2 seconds.
- job_name: 'prometheus'
scrape_interval: 2s
static_configs:
- targets: ['localhost:9090', 'cadvisor:8080']
Prometheus Documentation for further help
Yes, target_groups is renamed to static_configs. Please use the latest Prometheus image with the following.
static_configs:
- targets: ['localhost:9090', 'cadvisor:8080']
The above worked for me.
I think target_groups have been deprecated from scrape_configs in latest version of prometheus.
you can try static_configs or file_sd_config
scrape_config
static_config
file_sd_config
scrape_configs:
- job_name: node_exporter
static_configs:
- targets:
- "stg-elk-app-01:9100"
- "stg-app-02:9100"
The indentation isn't correct, try:
scrape_configs:
# Scrape Prometheus itself every 2 seconds.
- job_name: 'prometheus'
scrape_interval: 2s
target_groups:
- targets: ['localhost:9090', 'cadvisor:8080']
As you said in your earlier comment:
from logs: time="2016-11-21T11:21:40Z" level=error msg="Error loading config: couldn't load configuration (-config.file=/etc/prometheus/prometheus.yml): unknown fields in scrape_config: target_groups" source="main.go:149"
Which clearly means that the field "target_groups" is causing the problem. This is due to the fact that the new version of Prometheus (v1.5 onwards) have discarded the use of "target_groups" field and simply provide the targets. Even I faced this issue about 6 months ago. Please try it with a new version. The docker pull prom/prometheus might be getting you the old one.
Hope this helps...!!!
The name of the container is prometheus.
Generally, when a container exists immediately after it starts, I would recommend adding the -log.level=debug right after -config.file.
docker run -d -p 42047:9090 --name=prometheus -v /home/test/prometheus.yml:/etc/prometheus/prometheus.yml --link cadvisor:cadvisor prom/prometheus -config.file=/etc/prometheus/prometheus.yml -log.level=debug -storage.local.path=/prometheus -storage.local.memory-chunks=10000
Next, see the logs for the container:
docker logs prometheus
Any issues with the configuration will be there.

Resources