I have a Docker Swarm with a Prometheus container and 1-n containers for a specific microservice.
The microservice-container can be reached by a url. I suppose the requests to this url is kind of load-balanced (of course...).
Currently I have spawned two microservice-container. Querying the metrics now seems to toggle between the two containers. Example: Number of total requests: 10, 13, 10, 13, 10, 13,...
This is my Prometheus configuration. What do I have to do? I do not want to adjust the Prometheus config each time I kill or start a microservice-container.
scrape_configs:
- job_name: 'myjobname'
metrics_path: '/prometheus'
scrape_interval: 15s
static_configs:
- targets: ['the-service-url:8080']
labels:
application: myapplication
UPDATE 1
I changed my configuration as follows which seems to work. This configuration uses a dns lookup inside of the Docker Swarm and finds all instances running the specified service.
scrape_configs:
- job_name: 'myjobname'
metrics_path: '/prometheus'
scrape_interval: 15s
dns_sd_configs:
- names: ['tasks.myServiceName']
type: A
port: 8080
The question here is: Does this configuration recognize that a Docker instance is stopped and another one is started?
UPDATE 2
There is a parameter for what I am asking for:
scrape_configs:
- job_name: 'myjobname'
metrics_path: '/prometheus'
scrape_interval: 15s
dns_sd_configs:
- names: ['tasks.myServiceName']
type: A
port: 8080
# The time after which the provided names are refreshed
[ refresh_interval: <duration> | default = 30s ]
That should do the trick.
So the answer is very simple:
There are multiple, documented ways to scrape.
I am using the dns-lookup-way:
scrape_configs:
- job_name: 'myjobname'
metrics_path: '/prometheus'
scrape_interval: 15s
dns_sd_configs:
- names ['tasks.myServiceName']
type: A
port: 8080
refresh_interval: 15s
Related
One of the targets in static_configs in my prometheus.yml config file is secured with basic authentication. As a result, an error of description "Connection refused" is always displayed against that target in the Prometheus Targets' page.
I have researched how to setup prometheus to provide the security credentials when trying to scrape that particular target but couldn't find any solution. What I found was how to set it up on the scrape_config section in the docs. This won't work for me because I have other targets that are not protected with basic_auth.
Please help me out with this challenge.
Here is part of my .yml config as regards my challenge.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
# Override the global default and scrape targets from this job every 5 seconds.
scrape_interval: 5s
scrape_timeout: 5s
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ['localhost:5000']
labels:
service: 'Auth'
- targets: ['localhost:5090']
labels:
service: 'Approval'
- targets: ['localhost:6211']
labels:
service: 'Credit Assessment'
- targets: ['localhost:6090']
labels:
service: 'Sweep'
- targets: ['localhost:6500']
labels:
I would like to add more details to the #PatientZro answer.
In my case, I need to create another job (as specified), but basic_auth needs to be at the same level of indentation as job_name. See example here.
As well, my basic_auth cases require a path as they are not displayed at the root of my domain.
Here is an example with an API endpoint specified:
- job_name: 'myapp_health_checks'
scrape_interval: 5m
scrape_timeout: 30s
static_configs:
- targets: ['mywebsite.org']
metrics_path: "/api/health"
basic_auth:
username: 'email#username.me'
password: 'cfgqvzjbhnwcomplicatedpasswordwjnqmd'
Best,
Create another job for the one that needs auth.
So just under what you've posted, do another
- job_name: 'prometheus-basic_auth'
scrape_interval: 5s
scrape_timeout: 5s
static_configs:
- targets: ['localhost:5000']
labels:
service: 'Auth'
basic_auth:
username: foo
password: bar
I've a docker swarm configured with nodeA as manager and nodeB as worker and have Promethues installed as docker container on nodeA with prometheus.yml file as -
global:
scrape_interval: 10s
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets:
- 'localhost:9090'
- job_name: 'node resources'
dns_sd_configs:
- names: ['tasks.node']
type: 'A'
port: 9100
params:
collect[]:
- cpu
- meminfo
- diskstats
- netdev
- netstat
- job_name: 'node storage'
scrape_interval: 1m
dns_sd_configs:
- names: ['tasks.node']
type: 'A'
port: 9100
params:
collect[]:
- filefd
- filesystem
- xfs
- job_name: 'cadvisor'
dns_sd_configs:
- names: ['tasks.cadvisor']
type: 'A'
port: 8080
In Prometheus, if I execute query - container_cpu_usage_seconds_total it gives results like -
✔container_cpu_usage_seconds_total{cpu="cpu15",id="/user.slice",instance="10.0.1.220:8080",job="cadvisor"}
✔container_cpu_usage_seconds_total{cpu="cpu15",id="/user.slice",instance="10.0.1.219:8080",job="cadvisor"}
I can see instance having values 10.0.1.219:8080 and 10.0.1.220:8080 but these are not ip addresses of machines in swarm.
How can I differentiate which instance is for which machine? Is there something which I should configure?
I'm trying to configure Prometheus and Grafana with my Hyperledger fabric v1.4 network to analyze the peer and chaincode mertics. I've mapped peer container's port 9443 to my host machine's port 9443 after following this documentation. I've also changed the provider entry to prometheus under metrics section in core.yml of peer. I've configured prometheus and grafana in docker-compose.yml in the following way.
prometheus:
image: prom/prometheus:v2.6.1
container_name: prometheus
volumes:
- ./prometheus/:/etc/prometheus/
- prometheus_data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--web.console.libraries=/etc/prometheus/console_libraries'
- '--web.console.templates=/etc/prometheus/consoles'
- '--storage.tsdb.retention=200h'
- '--web.enable-lifecycle'
restart: unless-stopped
ports:
- 9090:9090
networks:
- basic
labels:
org.label-schema.group: "monitoring"
grafana:
image: grafana/grafana:5.4.3
container_name: grafana
volumes:
- grafana_data:/var/lib/grafana
- ./grafana/datasources:/etc/grafana/datasources
- ./grafana/dashboards:/etc/grafana/dashboards
- ./grafana/setup.sh:/setup.sh
entrypoint: /setup.sh
environment:
- GF_SECURITY_ADMIN_USER={ADMIN_USER}
- GF_SECURITY_ADMIN_PASSWORD={ADMIN_PASS}
- GF_USERS_ALLOW_SIGN_UP=false
restart: unless-stopped
ports:
- 3000:3000
networks:
- basic
labels:
org.label-schema.group: "monitoring"
When I curl 0.0.0.0:9443/metrics on my remote centos machine, I get all the list of metrics. However, when I run Prometheus with the above configuration, it throws the error Get http://localhost:9443/metrics: dial tcp 127.0.0.1:9443: connect: connection refused. This is what my prometheus.yml looks like.
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'prometheus'
scrape_interval: 10s
static_configs:
- targets: ['localhost:9090']
- job_name: 'peer_metrics'
scrape_interval: 10s
static_configs:
- targets: ['localhost:9443']
Even, when I go to endpoint http://localhost:9443/metrics in my browser, I get all the metrics. What am I doing wrong here. How come Prometheus metrics are being shown on its interface and not peer's?
Since the targets are not running inside the prometheus container, they cannot be accessed through localhost. You need to access them through the host private IP or by replacing localhost with docker.for.mac.localhost or host.docker.internal.
The problem: On Prometheus you added a service for scraping but on http://localhost:9090/targets the endpoint state is Down
with an error:
Get http://localhost:9091/metrics: dial tcp 127.0.0.1:9091: connect:
connection refused
Solution: On prometheus.yml you need to verify that
scraping details pointing to the right endpoint.
the yml indentation is correct.
using curl -v http://<serviceip>:<port>/metrics should prompt the metrics in plaintext in your terminal.
Note: If you pointing to some service in another docker container, your localhost might be represented not as localhost but as servicename ( service name that shown in docker ps ) or docker.host.internal (the internal ip that running the docker container ).
for this example: I'll be working with 2 dockers containers prometheus and "myService".
sudo docker ps
CONTAINER ID IMAGE CREATED PORTS NAMES
abc123 prom/prometheus:latest 2 hours ago 0.0.0.0:9090->9090/tcp prometheus
def456 myService/myService:latest 2 hours ago 0.0.0.0:9091->9091/tcp myService
and then edit the file prometheus.yml (and rerun prometheus)
- job_name: myService
scrape_interval: 15s
scrape_timeout: 10s
metrics_path: /metrics
static_configs:
- targets: // Presenting you 3 options
- localhost:9091 // simple localhost
- docker.host.internal:9091 // the localhost of agent that runs the docker container
- myService:9091 // docker container name (worked in my case)
Your prometheus container isn't running on host network. It's running on its own bridge (the one created by docker-compose). Therefore the scrape config for peer should point at the IP of the peer container.
Recommended way of solving this:
Run prometheus and grafana in the same network as the fabric network.
In you docker-compose for prometheus stack you can reference it like this:
networks:
default:
external:
name: <your-hyperledger-network>
(use docker network ls to find the network name )
Then you can use http://<peer_container_name>:9443 in your scrape config
NOTE
This solution is not for docker swarm. It for standalone containers (multi-container) aimed to be run on overlay network.
The same error we get when using overlay network and here is the solution (statically NOT dynamically)
this config does not work:
global:
scrape_interval: 15s
evaluation_interval: 15s
external_labels:
monitor: 'promswarm'
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'node'
static_configs:
- targets: [ 'localhost:9100' ]
Nor does this one even when http://docker.for.mac.localhost:9100/ is available, yet prometheus cannot find node-exporter. So the below one did not work either:
global:
scrape_interval: 15s
evaluation_interval: 15s
external_labels:
monitor: 'promswarm'
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'node'
static_configs:
- targets: [ 'docker.for.mac.localhost:9100' ]
But simply using its container ID we can have access to that service via its port number.
docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
a58264faa1a4 prom/prometheus "/bin/prometheus --c…" 5 minutes ago Up 5 minutes 0.0.0.0:9090->9090/tcp, :::9090->9090/tcp unruffled_solomon
62310f56f64a grafana/grafana:latest "/run.sh" 42 minutes ago Up 42 minutes 0.0.0.0:3000->3000/tcp, :::3000->3000/tcp wonderful_goldberg
7f1da9796af3 prom/node-exporter "/bin/node_exporter …" 48 minutes ago Up 48 minutes 0.0.0.0:9100->9100/tcp, :::9100->9100/tcp intelligent_panini
So we have 7f1da9796af3 prom/node-exporter ID and we can update our yml file to:
global:
scrape_interval: 15s
evaluation_interval: 15s
external_labels:
monitor: 'promswarm'
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'node'
static_configs:
- targets: [ '7f1da9796af3:9100' ]
not working
working
UPDATE
I myself was not happy with this hard-coded solution , so after some other search found a more reliable approach using --network-alias NAME which within the overlay network , that container will be route-able by that name. So the yml looks like this:
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'node'
static_configs:
- targets: [ 'node_exporter:9100' ]
In which the name node_exporter is an alias which has been created with run subcommand. e.g.
docker run --rm -d -v "/:/host:ro,rslave" --network cloud --network-alias node_exporter --pid host -p 9100:9100 prom/node-exporter --path.rootfs=/host
And in a nutshell it says on the overlay cloud network you can reach node-exporter using node_exporter:<PORT>.
Well I remember I resolved the problem by downloading Prometheus node exporter for windows.
check out this link https://medium.com/#facundofarias/setting-up-a-prometheus-exporter-on-windows-b3e45f1235a5
If you pointing to some service in another docker container, your localhost might be represented not as localhost but as servicename ( service name that shown in docker ps ) or internal ip that running the docker container .
prometheus.yml
job_name: "node-exporter"
static_configs:
targets: ["nodeexporter:9100"] // docker container name
I'm trying to connect to get the endpoint metrics via a prometheus docker image. Below is my yml file. However I'm getting the error Get http://localhost:8080/assessments/metrics: dial tcp 127.0.0.1:8080: connect: connection refused from prometheus. It runs if I do it from the browser though. How can I map the port so that docker recognises it.
global:
scrape_interval: 15s
evaluation_interval: 15s
rule_files:
# - "first.rules"
# - "second.rules"
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'assessments'
metrics_path: /assessments/metrics
static_configs:
- targets: ['localhost:8080']
~
I was able to fix this by modifying my yml with docker.for.mac.localhost:8080. This made it realise that it had to look for port 8080 in mac
when i run this command it creates the docker container but shows in exit status
and i am not able to get it started
my goal is to be able to replace prometheus.yml file with a custom prometheus.yml to monitor nginx running at http://localhost:70/nginx_status
docker run -it -d --name prometheus3 -p 9090:9090 -v
/opt/docker/prometheus:/etc/prometheus prom/prometheus -
config.file=/etc/prometheus/prometheus.yml
here is my prometheus.yml file
scrape_configs:
- job_name: 'prometheus'
scrape_interval: 5s
scrape_timeout: 5s
static_configs:
- targets: ['localhost: 9090']
- job_name: 'node'
static_configs:
- targets: ['localhost: 70/nginx_status']
You should be able to see the logs of the stopped container by running:
docker logs prometheus3
Anyway, there are (at least) two issues with your configuration:
The prometheus.yml file is invalid so the prometheus process immediately exits.
The scrape_interval and scrape_timeout need to be in a global section and the indentation was off. See below for an example of a correctly formatted yml file.
2.) You can't just scrape the /nginx_status endpoint but need to use a nginx exporter which extracts the metrics for you. Then the Prometheus server will scrape the nginx_exporter to retrieve the metrics. You can find a list of exporters here and pick one that suits you.
Once you have the exporter running, you need to point Prometheus to the address of the exporter so it can be scraped.
Working prometheus.yml :
global:
scrape_interval: 5s
scrape_timeout: 5s
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'node'
static_configs:
- targets: ['<< host name and port of nginx exporter >>']