I'm monitoring multiple computers in the same cluster, for that I'm using prometheus.
Here is my config file prometheus.yml:
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: "Server-monitoring-Api"
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ["localhost:9090"]
- targets: ["localhost:9182"]
- targets: ["192.168.1.71:9182"]
- targets: ["192.168.1.84:9182"]
I'm new to Prometheus, I want to show the name of my target, i.e: rather than using for example 192.168.1.71:9182 I only want the target name to be shown, I have a research, I found this:
relabel_configs:
- source_labels: [__meta_ec2_tag_Name]
target_label: instance
But I dont know how to use to relabel my targets(instances), any help will be appreciated, thanks for your help.
The snippet that you found should work only if you're using the EC2 service discover features of Prometheus (which doesn't seem your case since you're using some static targets).
I see a couple of options. You could expose directly in your metrics a different metrics (hostname) with the value of the hostname. Or you could use the textfile collector to expose the same metric as a static value (on a different port).
I recommend reading this post which explains why having a different metric for the "name" or "role" of the machine is usually a better approach than having a hostname label in your metrics.
It is also possible to add a custom label in the Prometheus config directly, something like (since you have your static targets anyhow). Finally, if you are already using the Prometheus node exporter you could use the node_uname_info metric (the nodename label).
- job_name: 'Kafka'
metrics_path: /metrics
static_configs:
- targets: ['10.0.0.4:9309']
labels:
hostname: hostname-a
Related
I am scraping logs from docker with Promtail to Loki.
Works very well, but I would like to remove timestamp from log line once it has been extracted by Promtail.
The reason is that I end up with log panel that half of screen is occupied by timestamp. If I want to display timestamp in panel, I can do that, so I dont really need it in log line.
I have been reading documentation, but not sure how to approach it. logfmt? replace? timestamp?
https://grafana.com/docs/loki/latest/clients/promtail/stages/logfmt/
promtail-config.yml
server:
http_listen_port: 9080
grpc_listen_port: 0
positions:
filename: /tmp/positions.yaml
clients:
- url: http://loki:3100/loki/api/v1/push
scrape_configs:
# local machine logs
- job_name: local logs
static_configs:
- targets:
- localhost
labels:
job: varlogs
__path__: /var/log/*log
# docker containers
- job_name: containers
docker_sd_configs:
- host: unix:///var/run/docker.sock
refresh_interval: 15s
pipeline_stages:
- docker: {}
relabel_configs:
- source_labels: ['__meta_docker_container_label_com_docker_compose_service']
regex: '(.*)'
target_label: 'service'
Thank you
Actually I just realized I was looking for wrong thing. I just wanted to display less logs in Grafana, logs were formatted properly. I just had to select fields to display.
Thanks!
One of the targets in static_configs in my prometheus.yml config file is secured with basic authentication. As a result, an error of description "Connection refused" is always displayed against that target in the Prometheus Targets' page.
I have researched how to setup prometheus to provide the security credentials when trying to scrape that particular target but couldn't find any solution. What I found was how to set it up on the scrape_config section in the docs. This won't work for me because I have other targets that are not protected with basic_auth.
Please help me out with this challenge.
Here is part of my .yml config as regards my challenge.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
# Override the global default and scrape targets from this job every 5 seconds.
scrape_interval: 5s
scrape_timeout: 5s
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ['localhost:5000']
labels:
service: 'Auth'
- targets: ['localhost:5090']
labels:
service: 'Approval'
- targets: ['localhost:6211']
labels:
service: 'Credit Assessment'
- targets: ['localhost:6090']
labels:
service: 'Sweep'
- targets: ['localhost:6500']
labels:
I would like to add more details to the #PatientZro answer.
In my case, I need to create another job (as specified), but basic_auth needs to be at the same level of indentation as job_name. See example here.
As well, my basic_auth cases require a path as they are not displayed at the root of my domain.
Here is an example with an API endpoint specified:
- job_name: 'myapp_health_checks'
scrape_interval: 5m
scrape_timeout: 30s
static_configs:
- targets: ['mywebsite.org']
metrics_path: "/api/health"
basic_auth:
username: 'email#username.me'
password: 'cfgqvzjbhnwcomplicatedpasswordwjnqmd'
Best,
Create another job for the one that needs auth.
So just under what you've posted, do another
- job_name: 'prometheus-basic_auth'
scrape_interval: 5s
scrape_timeout: 5s
static_configs:
- targets: ['localhost:5000']
labels:
service: 'Auth'
basic_auth:
username: foo
password: bar
I'm running prometheus and telegraf on the same host.
I'm using a few inputs plugins:
inputs.cpu
inputs.ntpq
I've configured to the prometheus_client output plugin to send data to prometheus
Here's my config:
[[outputs.prometheus_client]]
## Address to listen on.
listen = ":9126"
## Use HTTP Basic Authentication.
# basic_username = "Foo"
# basic_password = "Bar"
## If set, the IP Ranges which are allowed to access metrics.
## ex: ip_range = ["192.168.0.0/24", "192.168.1.0/30"]
# ip_range = []
## Path to publish the metrics on.
path = "/metrics"
## Expiration interval for each metric. 0 == no expiration
#expiration_interval = "0s"
## Collectors to enable, valid entries are "gocollector" and "process".
## If unset, both are enabled.
# collectors_exclude = ["gocollector", "process"]
## Send string metrics as Prometheus labels.
## Unless set to false all string metrics will be sent as labels.
# string_as_label = true
## If set, enable TLS with the given certificate.
# tls_cert = "/etc/ssl/telegraf.crt"
# tls_key = "/etc/ssl/telegraf.key"
## Export metric collection time.
#export_timestamp = true
Here's my prometheus config
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ['localhost:9090']
# - job_name: 'node_exporter'
# scrape_interval: 5s
# static_configs:
# - targets: ['localhost:9100']
- job_name: 'telegraf'
scrape_interval: 5s
static_configs:
- targets: ['localhost:9126']
If i'm going to http://localhost:9090/metrics i don't see any metrics which are coming from telegraf.
I've captured some logs from telegraf as well
/opt telegraf --config /etc/telegraf/telegraf.conf --input-filter filestat --test
➜ /opt tail -F /var/log/telegraf/telegraf.log
2019-02-11T17:34:20Z D! [outputs.prometheus_client] wrote batch of 28 metrics in 1.234869ms
2019-02-11T17:34:20Z D! [outputs.prometheus_client] buffer fullness: 0 / 10000 metrics.
2019-02-11T17:34:30Z D! [outputs.file] wrote batch of 28 metrics in 384.672µs
2019-02-11T17:34:30Z D! [outputs.file] buffer fullness: 0 / 10000 metrics.
2019-02-11T17:34:30Z D! [outputs.prometheus_client] wrote batch of 30 metrics in 1.250605ms
2019-02-11T17:34:30Z D! [outputs.prometheus_client] buffer fullness: 9 / 10000 metrics.
I don't see an issue in the logs.
The /metrics endpoint of your Prometheus server exports metrics about the server itself, not metrics that it scraped from targets like the telgraf exporter.
Go to http://localhost:9090/targets, you should see a list of targets that your Prometheus server is scraping. If configured correctly, the telegraf exporter should be one of them.
To query Prometheus for telegraf exporter generated metrics, navigate your browser to http://localhost:9090/graph and enter e.g. cpu_time_user in the query field. If the CPU plugin is enabled it should have that and more metrics.
You should use the following Prometheus config file in order to scrape metrics exported by prometheus_client at Telegraf:
scrape_configs:
- job_name: telegraf
static_configs:
- targets:
- "localhost:9126"
Path to this file must be passed to --config.file command-line flag when starting Prometheus.
See more details about Prometheus config in these docs.
P.S. There is an alternative solution to push metrics collected by Telegraf directly to Prometheus-like system such as VictoriaMetrics instead of InfluxDB - see these docs. Later these metrics can be queried with PromQL-compatible query language - MetricsQL.
I have a Prometheus setup that monitors metrics exposed by my own services. This works fine for a single instance, but once I start scaling them, Prometheus gets completely confused and starts tracking incorrect values.
All services are running on a single node, through docker-compose.
This is the job in the scrape_configs:
- job_name: 'wowanalyzer'
static_configs:
- targets: ['prod:8000']
Each instance of prod tracks metrics in its memory and serves it at /metrics. I'm guessing Prometheus picks a random container each time it scraps which leads to the huge increase in counts recorded, building up over time. Instead I'd like Prometheus to read /metrics on all instances simultaneously, regardless of the amount of instances active at that time.
docker-gen (https://github.com/jwilder/docker-gen) was developed for this purpose.
You would need to create a sidecart container running docker-gen that generates a new set of targets.
If I remember well the host names generated are prod_1, prod_2, prod_X, etc.
I tried a lot to find something to help us with this issue but it looks an unsolved issue.
So, I decided to create this tool that helps us with this service-discovery.
https://github.com/juliofalbo/docker-compose-prometheus-service-discovery
Feel free to contribute and open issues!
You can use DNS service discovery feature. For example:
docker-compose.yml:
version: "3"
services:
myapp:
image: appimage:v1
restart: always
networks:
- back
prometheus:
image: "prom/prometheus:v2.32.1"
container_name: "prometheus"
restart: "always"
ports: [ "9090:9090" ]
volumes:
- "./prometheus.yml:/etc/prometheus/prometheus.yml"
- "prometheus_data:/prometheus"
networks:
- back
prometheus.yml sample:
global:
scrape_interval: 15s
evaluation_interval: 60s
scrape_configs:
- job_name: 'monitoringjob'
dns_sd_configs:
- names: [ 'myapp' ] <-- service name from docker-compose
type: 'A'
port: 8080
metrics_path: '/actuator/prometheus'
You can check your DNS records using nslookup util from any container in this network:
docker exec -it myapp bash
bash-4.2# yum install bind-utils
bash-4.2# nslookup myapp
Server: 127.0.0.11
Address: 127.0.0.11#53
Non-authoritative answer:
Name: myapp
Address: 172.22.0.2
Name: myapp
Address: 172.22.0.7
I have a Docker Swarm with a Prometheus container and 1-n containers for a specific microservice.
The microservice-container can be reached by a url. I suppose the requests to this url is kind of load-balanced (of course...).
Currently I have spawned two microservice-container. Querying the metrics now seems to toggle between the two containers. Example: Number of total requests: 10, 13, 10, 13, 10, 13,...
This is my Prometheus configuration. What do I have to do? I do not want to adjust the Prometheus config each time I kill or start a microservice-container.
scrape_configs:
- job_name: 'myjobname'
metrics_path: '/prometheus'
scrape_interval: 15s
static_configs:
- targets: ['the-service-url:8080']
labels:
application: myapplication
UPDATE 1
I changed my configuration as follows which seems to work. This configuration uses a dns lookup inside of the Docker Swarm and finds all instances running the specified service.
scrape_configs:
- job_name: 'myjobname'
metrics_path: '/prometheus'
scrape_interval: 15s
dns_sd_configs:
- names: ['tasks.myServiceName']
type: A
port: 8080
The question here is: Does this configuration recognize that a Docker instance is stopped and another one is started?
UPDATE 2
There is a parameter for what I am asking for:
scrape_configs:
- job_name: 'myjobname'
metrics_path: '/prometheus'
scrape_interval: 15s
dns_sd_configs:
- names: ['tasks.myServiceName']
type: A
port: 8080
# The time after which the provided names are refreshed
[ refresh_interval: <duration> | default = 30s ]
That should do the trick.
So the answer is very simple:
There are multiple, documented ways to scrape.
I am using the dns-lookup-way:
scrape_configs:
- job_name: 'myjobname'
metrics_path: '/prometheus'
scrape_interval: 15s
dns_sd_configs:
- names ['tasks.myServiceName']
type: A
port: 8080
refresh_interval: 15s