We are running the node-exporter in containers. To quickly identify on which host each node-exporter is running, I created a metric that looks like this: host{host="$HOSTNAME",node="$CONTAINER_ID"} 1
I'm looking for a way to extract the hostname in host= and create a label for each node-exporter instance as a hostname label. I tried numerous configurations and none seem to work. Current prometheus config looks like this:
scrape_configs:
- job_name: 'node'
scrape_interval: 10s
scrape_timeout: 5s
metrics_path: /metrics
scheme: http
dns_sd_configs:
- names:
- tasks.master-nodeexporter
refresh_interval: 30s
type: A
port: 9100
relabel_configs:
- source_labels: ['host']
regex: '"(.*)".*'
target_label: 'hostname'
replacement: '$1'
This is not possible, as target relabelling happens before the scrape.
What you want to do here is use service discovery to have the right hostname in the first place, which is not possible with dns_sd_configs. You might look at something like Consul and https://www.robustperception.io/controlling-the-instance-label/
If someone comes across this:
Create this as docker-entrypoint.sh and make it executable.
#!/bin/sh -e
# Must be executable by others
NODE_NAME=$(cat /etc/nodename) echo "node_meta{node_id=\"$NODE_ID\", container_label_com_docker_swarm_node_id=\"$NODE_ID\", node_name=\"$NODE_NAME\"} 1" > /etc/node-exporter/node-meta.prom
set -- /bin/node_exporter "$#"
exec "$#"
Than create a Dockerfile like this
FROM prom/node-exporter:latest
ENV NODE_ID=none
USER root
COPY conf /etc/node-exporter/
ENTRYPOINT [ "/etc/node-exporter/docker-entrypoint.sh" ]
CMD [ "/bin/node_exporter" ]
Than build it and you will always get the Hostname as a node_meta metric
This answer explains how to export node name via node_name label at node_meta metric. Then it is possible to add node_name label to any metric exposed by node_exporter with group_left() modifier during querying. For example, the following PromQL query adds node_name label from node_meta metric to node_memory_Active_bytes metric:
node_memory_Active_bytes
* on(job,instance) group_left(node_name)
node_meta
See more details about group_left() modifier in these docs and this article.
Related
I have followed the instructions on this page: https://docs.ksqldb.io/en/latest/operate-and-deploy/monitoring/
So this is my ksqldb-server part of docker-compose:
ksqldb-server:
image: confluentinc/ksqldb-server:0.15.0
hostname: ksqldb-server
container_name: ksqldb-server
depends_on:
- kafka
- schema-registry
- kafka-connect
ports:
- "8088:8088"
- "1099:1099"
environment:
KSQL_LISTENERS: http://0.0.0.0:8088
KSQL_BOOTSTRAP_SERVERS: kafka:29092
KSQL_KSQL_LOGGING_PROCESSING_STREAM_AUTO_CREATE: "true"
KSQL_KSQL_LOGGING_PROCESSING_TOPIC_AUTO_CREATE: "true"
KSQL_KSQL_SCHEMA_REGISTRY_URL: http://schema-registry:8081
KSQL_KSQL_CONNECT_URL: http://kafka-connect:8083
KSQL_KSQL_QUERY_PULL_METRICS_ENABLED: "true"
KSQL_JMX_OPTS: >
-Djava.rmi.server.hostname=localhost
-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.port=1099
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.rmi.port=1099
I have setup Prometheus in the same docker-compose file, and when I visit {prometheus-url}/targets, I see Get "http://ksqldb-server:1099/metrics": EOF
I have already tried plenty configurations during my research, including changing the -Djava.rmi.server.hostname either to the host's IP address or to ksqldb-server's container IP address, but none of them worked. Does anyone have a solution?
Well, six months later after having dealt with this topic once again, I managed to set this up. This follows the approach suggested by swist in my GitHub issue I created back then when I created this issue, too.
You need JMX Exporter. Download it here
You need a YAML file, telling the JMX exporter which metrics to export. You can get it here. If you are only interested in the ksqlDB metrics, remove all other patterns, e.g. the kafka patterns.
Place the JMX Exporter and the YAML file on every node on which you want to monitor a ksqlDB instance
Before starting ksqlDB, create the environment variable KSQL_JMX_OPTS as follows:
export KSQL_JMX_OPTS="-Dcom.sun.management.jmxremote \
-Dcom.sun.management.jmxremote.authenticate=false \
-Dcom.sun.management.jmxremote.ssl=false \
-Djava.util.logging.config.file=logging.properties \
-javaagent:[BLUB]/jmx_prometheus_javaagent.jar=7010:ksqldb.yml"
You need to either create this variable every time you have a new session or create it permantently. [BLUB] is the absolute path to your JMX JAR.
Now you can run ksqlDB and the metrics become available at port 7010 (you can specify any other free port). If you want to have a good dashboard, go with this one.
The jmxremote.port value is also not a proper Prometheus target; it's for jconsole, Visualvm, or other JMX monitoring tools, as the documentation you've linked to says
If you want to use Prometheus, you need to download and mount the JMX exporter agent JAR into the container and modify the JVM arguments to include the agent+scraper port+mbeans config file...
You could also switch to using minikube and apply the Confluent ksqlDB Helm Chart, which does this for you
I have the following Dockerfile:
FROM prom/prometheus
ADD prometheus.yml /etc/prometheus/
with prometheus.yml:
global:
scrape_interval: 15s
external_labels:
monitor: 'codelab-monitor'
scrape_configs:
- job_name: 'prometheus'
metrics_path: /metrics
scrape_interval: 15s
static_configs:
- targets: ['localhost:9090']
- job_name: 'auth-service'
scrape_interval: 15s
metrics_path: /actuator/prometheus
static_configs:
- targets: ['localhost:8080']
And run it with the following command:
docker build -t prometheus .
docker run -d -p 9090:9090 --rm prometheus
prometheus has status up
auth-service has status down (Get "http://localhost:8080/actuator/prometheus": dial tcp 127.0.0.1:8080: connect: connection refused)
How can I solve problem with auth-service, because from local machine I can get metrics from this address http://localhost:8080/actuator/prometheus:
v.balun#macbook-vbalun Trainter-Prometheus % curl -X GET
http://localhost:8080/actuator/prometheus
# HELP jvm_memory_committed_bytes The amount of memory in bytes that is committed for the
Java virtual machine to use
# TYPE jvm_memory_committed_bytes gauge
jvm_memory_committed_bytes{area="heap",id="G1 Survivor Space",} 4194304.0
jvm_memory_committed_bytes{area="heap",id="G1 Old Gen",} 3.145728E7
jvm_memory_committed_bytes{area="nonheap",id="Metaspace",} 3.0982144E7
jvm_memory_committed_bytes{area="nonheap",id="CodeHeap 'non-nmethods'",} 2555904.0
jvm_memory_committed_bytes{area="heap",id="G1 Eden Space",} 2.7262976E7
jvm_memory_committed_bytes{area="nonheap",id="Compressed Class Space",} 4325376.0
jvm_memory_committed_bytes{area="nonheap",id="CodeHeap 'non-profiled nmethods'",} 6291456.0
The issue you are having seems not related to prometheus, it seems it is at the docker network level.
Inside your prometheus container you are saying this:
static_configs:
- targets: ['localhost:8080']
But remember that localhost is NOT now your physical host (As when you ran it locally outside Docker), it's now inside the container, and inside the same container most likely you don't have your service running...
With the information provided I suggest you the following:
Try first instead localhost use your real IP, depending on the network configuration you are using for your container, it will be enough...
You can use instead localhost the ip address of your auth-service, this is the one given by docker, you can run a docker inspect... to get it.
If #1 and #2 didn't work and if auth-service is running in another container inside the same physical host, then you could use a bridge network to make the communication between the containers possible, more details here: https://docs.docker.com/network/bridge/
👆 Once both containers are running in the same network you can use the container name to reference it instead localhost, something like:
static_configs:
- targets: ['auth-service:8080']
Continue on from the question of Sending metrics from telegraf to prometheus, which covers the case of single telegraf agent, what's the suggested setup to collect metrics from multiple telegraf to prometheus?
In the end, I want prometheus to chart (on the same graph), CPU usage of server-1, server-2, ... to server-n, in their own lines.
Taking the configuration from the original post, you can simply add targets to you telegraf job; supposing that the same Telegraf config is used on each server.
scrape_configs:
- job_name: 'telegraf'
scrape_interval: 5s
static_configs:
- targets: ['server-1:9126','server-2:9126',...]
It will produce the metrics (ex: cpu_time_user) with different instance tag corresponding to the targets configured. Typing the metric name in Prometheus will display all of them.
If you really want to see only the name of the server, you can use metric_relabel_configs to generate an additional label:
scrape_configs:
- job_name: 'telegraf'
...
metric_relabel_configs:
- source_labels: [instance]
regex: '(.*):\d+'
target_label: server
Automatically adding servers to your Prometheus config is a matter of service discovery.
I have a docker image for an FTP server in a repository, this image will be used for several machines, I need to deploy container and change PORT variable depending on the destination machine.
This is my image (I've deleted lines for proftpd installation cause it is not relevant to this case):
FROM alpine:3.5
ARG vcs_ref="Unknown"
ARG build_date="Unknown"
ARG build_number="1"
LABEL org.label-schema.vcs-ref=$vcs_ref \
org.label-schema.build-date=$build_date \
org.label-schema.version="alpine-r${build_number}"
ENV PORT=10000
COPY assets/port.conf /usr/local/etc/ports.conf
COPY replace.sh /replace.sh
#It is for a proFTPD Server
CMD ["/replace.sh"]
My port.conf file (Also deleted not relevant information for this case)
# This is a basic ProFTPD configuration file (rename it to
# 'proftpd.conf' for actual use. It establishes a single server
# and a single anonymous login. It assumes that you have a user/group
# "nobody" and "ftp" for normal operation and anon.
ServerName "ProFTPD Default Installation"
ServerType standalone
DefaultServer on
# Port 21 is the standard FTP port.
Port {{PORT}}
.
.
.
And replace.sh script is:
#!/bin/bash
set -e
[ -z "${PORT}" ] && echo >&2 "PORT is not set" && exit 1
sed -i "s#{{PORT}}#$PORT#g" /usr/local/etc/ports.conf
/usr/local/sbin/proftpd -n -c /usr/local/etc/proftpd.conf
... Is there any way to avoid using replace.sh and use ansible as the one who replace PORT variable in /usr/local/etc/proftpd.conf the file inside the container?
My actual ansible script for container is:
- name: (ftpd) Run container
docker_container:
name: "myimagename"
image: "myimage"
state: present
pull: true
restart_policy: always
env:
"PORT": "{{ myportUsingAnsible}}"
networks:
- name: "{{ network }}"
Resuming all that I need is to use Ansible to replace configuration variable instead of using a shell script that replaces variables before running services, is it possible?
Many thanks
You are using the docker_container module which will need a pre-built image. The file port.conf is baked inside the image. What you need to do is set a static port inside this file. Inside the container, you always use
the static port 21 and depending on the machine, you map this port onto a different port using ansible.
Inside port.conf always use port 21
# Port 21 is the standard FTP port.
Port 21
The ansible task would look like:
- name: (ftpd) Run container
docker_container:
name: "myimagename"
image: "myimage"
state: present
pull: true
restart_policy: always
networks:
- name: "{{ network }}"
ports:
- "{{myportUsingAnsible}}:21"
Now when you connect to the container, you need to use the <hostnamne>:{{myportUsingAnsible}}. This is the standard docker way of doing things. The port inside the image is static and you change the port mappings based on the
available ports that you have.
I have cadvisor running with port mapping 4000:8080 and I have to link it with a container with prometheus.
My prometheus.yml is:
scrape_configs:
# Scrape Prometheus itself every 2 seconds.
- job_name: 'prometheus'
scrape_interval: 2s
target_groups:
- targets: ['localhost:9090', 'cadvisor:8080']
This file has path /home/test/prometheus.yml.
To run the container with prometheus, I do:
docker run -d -p 42047:9090 --name=prometheus -v /home/test/prometheus.yml:/etc/prometheus/prometheus.yml --link cadvisor:cadvisor prom/prometheus -config.file=/etc/prometheus/prometheus.yml -storage.local.path=/prometheus -storage.local.memory-chunks=10000
The container is created, but it dies immediately.
Can you tell me where the problem is?
Messages form docker events& :
2016-11-21T11:43:04.922819454+01:00 container start 69d03c68525c5955cc40757dc973073403b13fdd41c7533f43b7238191088a25 (image=prom/prometheus, name=prometheus)
2016-11-21T11:43:05.152141981+01:00 container die 69d03c68525c5955cc40757dc973073403b13fdd41c7533f43b7238191088a25 (exitCode=1, image=prom/prometheus, name=prometheus)
Config format is changed. targets come under static_config in the latest version.
scrape_configs:
# Scrape Prometheus itself every 2 seconds.
- job_name: 'prometheus'
scrape_interval: 2s
static_configs:
- targets: ['localhost:9090', 'cadvisor:8080']
Prometheus Documentation for further help
Yes, target_groups is renamed to static_configs. Please use the latest Prometheus image with the following.
static_configs:
- targets: ['localhost:9090', 'cadvisor:8080']
The above worked for me.
I think target_groups have been deprecated from scrape_configs in latest version of prometheus.
you can try static_configs or file_sd_config
scrape_config
static_config
file_sd_config
scrape_configs:
- job_name: node_exporter
static_configs:
- targets:
- "stg-elk-app-01:9100"
- "stg-app-02:9100"
The indentation isn't correct, try:
scrape_configs:
# Scrape Prometheus itself every 2 seconds.
- job_name: 'prometheus'
scrape_interval: 2s
target_groups:
- targets: ['localhost:9090', 'cadvisor:8080']
As you said in your earlier comment:
from logs: time="2016-11-21T11:21:40Z" level=error msg="Error loading config: couldn't load configuration (-config.file=/etc/prometheus/prometheus.yml): unknown fields in scrape_config: target_groups" source="main.go:149"
Which clearly means that the field "target_groups" is causing the problem. This is due to the fact that the new version of Prometheus (v1.5 onwards) have discarded the use of "target_groups" field and simply provide the targets. Even I faced this issue about 6 months ago. Please try it with a new version. The docker pull prom/prometheus might be getting you the old one.
Hope this helps...!!!
The name of the container is prometheus.
Generally, when a container exists immediately after it starts, I would recommend adding the -log.level=debug right after -config.file.
docker run -d -p 42047:9090 --name=prometheus -v /home/test/prometheus.yml:/etc/prometheus/prometheus.yml --link cadvisor:cadvisor prom/prometheus -config.file=/etc/prometheus/prometheus.yml -log.level=debug -storage.local.path=/prometheus -storage.local.memory-chunks=10000
Next, see the logs for the container:
docker logs prometheus
Any issues with the configuration will be there.