How to identify ip address for instance in Prometheus result? - docker

I've a docker swarm configured with nodeA as manager and nodeB as worker and have Promethues installed as docker container on nodeA with prometheus.yml file as -
global:
scrape_interval: 10s
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets:
- 'localhost:9090'
- job_name: 'node resources'
dns_sd_configs:
- names: ['tasks.node']
type: 'A'
port: 9100
params:
collect[]:
- cpu
- meminfo
- diskstats
- netdev
- netstat
- job_name: 'node storage'
scrape_interval: 1m
dns_sd_configs:
- names: ['tasks.node']
type: 'A'
port: 9100
params:
collect[]:
- filefd
- filesystem
- xfs
- job_name: 'cadvisor'
dns_sd_configs:
- names: ['tasks.cadvisor']
type: 'A'
port: 8080
In Prometheus, if I execute query - container_cpu_usage_seconds_total it gives results like -
✔container_cpu_usage_seconds_total{cpu="cpu15",id="/user.slice",instance="10.0.1.220:8080",job="cadvisor"}
✔container_cpu_usage_seconds_total{cpu="cpu15",id="/user.slice",instance="10.0.1.219:8080",job="cadvisor"}
I can see instance having values 10.0.1.219:8080 and 10.0.1.220:8080 but these are not ip addresses of machines in swarm.
How can I differentiate which instance is for which machine? Is there something which I should configure?

Related

Prometheus (in Docker container) Cannot Scrape Target on Host

Prometheus running inside a docker container (version 18.09.2, build 6247962, docker-compose.xml below) and the scrape target is on localhost:8000 which is created by a Python 3 script.
Error obtained for the failed scrape target (localhost:9090/targets) is
Get http://127.0.0.1:8000/metrics: dial tcp 127.0.0.1:8000: getsockopt: connection refused
Question: Why is Prometheus in the docker container unable to scrape the target which is running on the host computer (Mac OS X)? How can we get Prometheus running in docker container able to scrape the target running on the host?
Failed attempt: Tried replacing in docker-compose.yml
networks:
- back-tier
- front-tier
with
network_mode: "host"
but then we are unable to access the Prometheus admin page at localhost:9090.
Unable to find solution from similar questions
Getting error "Get http://localhost:9443/metrics: dial tcp 127.0.0.1:9443: connect: connection refused"
docker-compose.yml
version: '3.3'
networks:
front-tier:
back-tier:
services:
prometheus:
image: prom/prometheus:v2.1.0
volumes:
- ./prometheus/prometheus:/etc/prometheus/
- ./prometheus/prometheus_data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--web.console.libraries=/usr/share/prometheus/console_libraries'
- '--web.console.templates=/usr/share/prometheus/consoles'
ports:
- 9090:9090
networks:
- back-tier
restart: always
grafana:
image: grafana/grafana
user: "104"
depends_on:
- prometheus
ports:
- 3000:3000
volumes:
- ./grafana/grafana_data:/var/lib/grafana
- ./grafana/provisioning/:/etc/grafana/provisioning/
env_file:
- ./grafana/config.monitoring
networks:
- back-tier
- front-tier
restart: always
prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
external_labels:
monitor: 'my-project'
- job_name: 'prometheus'
scrape_interval: 5s
static_configs:
- targets: ['localhost:9090']
- job_name: 'rigs-portal'
scrape_interval: 5s
static_configs:
- targets: ['127.0.0.1:8000']
Output at http://localhost:8000/metrics
# HELP python_gc_objects_collected_total Objects collected during gc
# TYPE python_gc_objects_collected_total counter
python_gc_objects_collected_total{generation="0"} 65.0
python_gc_objects_collected_total{generation="1"} 281.0
python_gc_objects_collected_total{generation="2"} 0.0
# HELP python_gc_objects_uncollectable_total Uncollectable object found during GC
# TYPE python_gc_objects_uncollectable_total counter
python_gc_objects_uncollectable_total{generation="0"} 0.0
python_gc_objects_uncollectable_total{generation="1"} 0.0
python_gc_objects_uncollectable_total{generation="2"} 0.0
# HELP python_gc_collections_total Number of times this generation was collected
# TYPE python_gc_collections_total counter
python_gc_collections_total{generation="0"} 37.0
python_gc_collections_total{generation="1"} 3.0
python_gc_collections_total{generation="2"} 0.0
# HELP python_info Python platform information
# TYPE python_info gauge
python_info{implementation="CPython",major="3",minor="7",patchlevel="3",version="3.7.3"} 1.0
# HELP request_processing_seconds Time spend processing request
# TYPE request_processing_seconds summary
request_processing_seconds_count 2545.0
request_processing_seconds_sum 1290.4869346540017
# TYPE request_processing_seconds_created gauge
request_processing_seconds_created 1.562364777766845e+09
# HELP my_inprorgress_requests CPU Load
# TYPE my_inprorgress_requests gauge
my_inprorgress_requests 65.0
Python3 script
from prometheus_client import start_http_server, Summary, Gauge
import random
import time
# Create a metric to track time spent and requests made
REQUEST_TIME = Summary("request_processing_seconds", 'Time spend processing request')
#REQUEST_TIME.time()
def process_request(t):
time.sleep(t)
if __name__ == "__main__":
start_http_server(8000)
g = Gauge('my_inprorgress_requests', 'CPU Load')
g.set(65)
while True:
process_request(random.random())
While not a very common use case.. you can indeed connect from your container to your host.
From https://docs.docker.com/docker-for-mac/networking/
I want to connect from a container to a service on the host
The host has a changing IP address (or none if you have no network
access). From 18.03 onwards our recommendation is to connect to the
special DNS name host.docker.internal, which resolves to the internal
IP address used by the host. This is for development purpose and will
not work in a production environment outside of Docker Desktop for
Mac.
For reference for people who might find this question through search, this is supported now as of Docker 20.10 and above. See the following link:
How to access host port from docker container
and:
https://github.com/docker/for-linux/issues/264#issuecomment-823528103
Below is an example of running Prometheus on Docker for macOS which causes Prometheus to scrape a simple Spring Boot application running on localhost:8080:
Bash
docker run --rm --name prometheus -p 9090:9090 -v /Users/YourName/conf/prometheus.yml:/etc/prometheus/prometheus.yml -d prom/prometheus
/Users/YourName/conf/prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'spring-boot'
metrics_path: '/actuator/prometheus'
scrape_interval: 5s
static_configs:
- targets: ['host.docker.internal:8080']
In this case it is the use of the special domain host.docker.internal instead of localhost that causes the host to be resolved from the container on a macOS as the config file is mapped into the Prometheus container.
Environment
Macbook Pro, Apple M1 Pro
Docker version 20.10.17, build 100c701
Prometheus 2.38

Getting error "Get http://localhost:9443/metrics: dial tcp 127.0.0.1:9443: connect: connection refused"

I'm trying to configure Prometheus and Grafana with my Hyperledger fabric v1.4 network to analyze the peer and chaincode mertics. I've mapped peer container's port 9443 to my host machine's port 9443 after following this documentation. I've also changed the provider entry to prometheus under metrics section in core.yml of peer. I've configured prometheus and grafana in docker-compose.yml in the following way.
prometheus:
image: prom/prometheus:v2.6.1
container_name: prometheus
volumes:
- ./prometheus/:/etc/prometheus/
- prometheus_data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--web.console.libraries=/etc/prometheus/console_libraries'
- '--web.console.templates=/etc/prometheus/consoles'
- '--storage.tsdb.retention=200h'
- '--web.enable-lifecycle'
restart: unless-stopped
ports:
- 9090:9090
networks:
- basic
labels:
org.label-schema.group: "monitoring"
grafana:
image: grafana/grafana:5.4.3
container_name: grafana
volumes:
- grafana_data:/var/lib/grafana
- ./grafana/datasources:/etc/grafana/datasources
- ./grafana/dashboards:/etc/grafana/dashboards
- ./grafana/setup.sh:/setup.sh
entrypoint: /setup.sh
environment:
- GF_SECURITY_ADMIN_USER={ADMIN_USER}
- GF_SECURITY_ADMIN_PASSWORD={ADMIN_PASS}
- GF_USERS_ALLOW_SIGN_UP=false
restart: unless-stopped
ports:
- 3000:3000
networks:
- basic
labels:
org.label-schema.group: "monitoring"
When I curl 0.0.0.0:9443/metrics on my remote centos machine, I get all the list of metrics. However, when I run Prometheus with the above configuration, it throws the error Get http://localhost:9443/metrics: dial tcp 127.0.0.1:9443: connect: connection refused. This is what my prometheus.yml looks like.
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'prometheus'
scrape_interval: 10s
static_configs:
- targets: ['localhost:9090']
- job_name: 'peer_metrics'
scrape_interval: 10s
static_configs:
- targets: ['localhost:9443']
Even, when I go to endpoint http://localhost:9443/metrics in my browser, I get all the metrics. What am I doing wrong here. How come Prometheus metrics are being shown on its interface and not peer's?
Since the targets are not running inside the prometheus container, they cannot be accessed through localhost. You need to access them through the host private IP or by replacing localhost with docker.for.mac.localhost or host.docker.internal.
The problem: On Prometheus you added a service for scraping but on http://localhost:9090/targets the endpoint state is Down
with an error:
Get http://localhost:9091/metrics: dial tcp 127.0.0.1:9091: connect:
connection refused
Solution: On prometheus.yml you need to verify that
scraping details pointing to the right endpoint.
the yml indentation is correct.
using curl -v http://<serviceip>:<port>/metrics should prompt the metrics in plaintext in your terminal.
Note: If you pointing to some service in another docker container, your localhost might be represented not as localhost but as servicename ( service name that shown in docker ps ) or docker.host.internal (the internal ip that running the docker container ).
for this example: I'll be working with 2 dockers containers prometheus and "myService".
sudo docker ps
CONTAINER ID IMAGE CREATED PORTS NAMES
abc123 prom/prometheus:latest 2 hours ago 0.0.0.0:9090->9090/tcp prometheus
def456 myService/myService:latest 2 hours ago 0.0.0.0:9091->9091/tcp myService
and then edit the file prometheus.yml (and rerun prometheus)
- job_name: myService
scrape_interval: 15s
scrape_timeout: 10s
metrics_path: /metrics
static_configs:
- targets: // Presenting you 3 options
- localhost:9091 // simple localhost
- docker.host.internal:9091 // the localhost of agent that runs the docker container
- myService:9091 // docker container name (worked in my case)
Your prometheus container isn't running on host network. It's running on its own bridge (the one created by docker-compose). Therefore the scrape config for peer should point at the IP of the peer container.
Recommended way of solving this:
Run prometheus and grafana in the same network as the fabric network.
In you docker-compose for prometheus stack you can reference it like this:
networks:
default:
external:
name: <your-hyperledger-network>
(use docker network ls to find the network name )
Then you can use http://<peer_container_name>:9443 in your scrape config
NOTE
This solution is not for docker swarm. It for standalone containers (multi-container) aimed to be run on overlay network.
The same error we get when using overlay network and here is the solution (statically NOT dynamically)
this config does not work:
global:
scrape_interval: 15s
evaluation_interval: 15s
external_labels:
monitor: 'promswarm'
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'node'
static_configs:
- targets: [ 'localhost:9100' ]
Nor does this one even when http://docker.for.mac.localhost:9100/ is available, yet prometheus cannot find node-exporter. So the below one did not work either:
global:
scrape_interval: 15s
evaluation_interval: 15s
external_labels:
monitor: 'promswarm'
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'node'
static_configs:
- targets: [ 'docker.for.mac.localhost:9100' ]
But simply using its container ID we can have access to that service via its port number.
docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
a58264faa1a4 prom/prometheus "/bin/prometheus --c…" 5 minutes ago Up 5 minutes 0.0.0.0:9090->9090/tcp, :::9090->9090/tcp unruffled_solomon
62310f56f64a grafana/grafana:latest "/run.sh" 42 minutes ago Up 42 minutes 0.0.0.0:3000->3000/tcp, :::3000->3000/tcp wonderful_goldberg
7f1da9796af3 prom/node-exporter "/bin/node_exporter …" 48 minutes ago Up 48 minutes 0.0.0.0:9100->9100/tcp, :::9100->9100/tcp intelligent_panini
So we have 7f1da9796af3 prom/node-exporter ID and we can update our yml file to:
global:
scrape_interval: 15s
evaluation_interval: 15s
external_labels:
monitor: 'promswarm'
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'node'
static_configs:
- targets: [ '7f1da9796af3:9100' ]
not working
working
UPDATE
I myself was not happy with this hard-coded solution , so after some other search found a more reliable approach using --network-alias NAME which within the overlay network , that container will be route-able by that name. So the yml looks like this:
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'node'
static_configs:
- targets: [ 'node_exporter:9100' ]
In which the name node_exporter is an alias which has been created with run subcommand. e.g.
docker run --rm -d -v "/:/host:ro,rslave" --network cloud --network-alias node_exporter --pid host -p 9100:9100 prom/node-exporter --path.rootfs=/host
And in a nutshell it says on the overlay cloud network you can reach node-exporter using node_exporter:<PORT>.
Well I remember I resolved the problem by downloading Prometheus node exporter for windows.
check out this link https://medium.com/#facundofarias/setting-up-a-prometheus-exporter-on-windows-b3e45f1235a5
If you pointing to some service in another docker container, your localhost might be represented not as localhost but as servicename ( service name that shown in docker ps ) or internal ip that running the docker container .
prometheus.yml
job_name: "node-exporter"
static_configs:
targets: ["nodeexporter:9100"] // docker container name

Docker port issue with prometheus+springboot

I'm trying to connect to get the endpoint metrics via a prometheus docker image. Below is my yml file. However I'm getting the error Get http://localhost:8080/assessments/metrics: dial tcp 127.0.0.1:8080: connect: connection refused from prometheus. It runs if I do it from the browser though. How can I map the port so that docker recognises it.
global:
scrape_interval: 15s
evaluation_interval: 15s
rule_files:
# - "first.rules"
# - "second.rules"
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'assessments'
metrics_path: /assessments/metrics
static_configs:
- targets: ['localhost:8080']
~
I was able to fix this by modifying my yml with docker.for.mac.localhost:8080. This made it realise that it had to look for port 8080 in mac

Docker prom/Prometheus container exits

when i run this command it creates the docker container but shows in exit status
and i am not able to get it started
my goal is to be able to replace prometheus.yml file with a custom prometheus.yml to monitor nginx running at http://localhost:70/nginx_status
docker run -it -d --name prometheus3 -p 9090:9090 -v
/opt/docker/prometheus:/etc/prometheus prom/prometheus -
config.file=/etc/prometheus/prometheus.yml
here is my prometheus.yml file
scrape_configs:
- job_name: 'prometheus'
scrape_interval: 5s
scrape_timeout: 5s
static_configs:
- targets: ['localhost: 9090']
- job_name: 'node'
static_configs:
- targets: ['localhost: 70/nginx_status']
You should be able to see the logs of the stopped container by running:
docker logs prometheus3
Anyway, there are (at least) two issues with your configuration:
The prometheus.yml file is invalid so the prometheus process immediately exits.
The scrape_interval and scrape_timeout need to be in a global section and the indentation was off. See below for an example of a correctly formatted yml file.
2.) You can't just scrape the /nginx_status endpoint but need to use a nginx exporter which extracts the metrics for you. Then the Prometheus server will scrape the nginx_exporter to retrieve the metrics. You can find a list of exporters here and pick one that suits you.
Once you have the exporter running, you need to point Prometheus to the address of the exporter so it can be scraped.
Working prometheus.yml :
global:
scrape_interval: 5s
scrape_timeout: 5s
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'node'
static_configs:
- targets: ['<< host name and port of nginx exporter >>']

Prometheus scrape from unknown number of (docker-)hosts

I have a Docker Swarm with a Prometheus container and 1-n containers for a specific microservice.
The microservice-container can be reached by a url. I suppose the requests to this url is kind of load-balanced (of course...).
Currently I have spawned two microservice-container. Querying the metrics now seems to toggle between the two containers. Example: Number of total requests: 10, 13, 10, 13, 10, 13,...
This is my Prometheus configuration. What do I have to do? I do not want to adjust the Prometheus config each time I kill or start a microservice-container.
scrape_configs:
- job_name: 'myjobname'
metrics_path: '/prometheus'
scrape_interval: 15s
static_configs:
- targets: ['the-service-url:8080']
labels:
application: myapplication
UPDATE 1
I changed my configuration as follows which seems to work. This configuration uses a dns lookup inside of the Docker Swarm and finds all instances running the specified service.
scrape_configs:
- job_name: 'myjobname'
metrics_path: '/prometheus'
scrape_interval: 15s
dns_sd_configs:
- names: ['tasks.myServiceName']
type: A
port: 8080
The question here is: Does this configuration recognize that a Docker instance is stopped and another one is started?
UPDATE 2
There is a parameter for what I am asking for:
scrape_configs:
- job_name: 'myjobname'
metrics_path: '/prometheus'
scrape_interval: 15s
dns_sd_configs:
- names: ['tasks.myServiceName']
type: A
port: 8080
# The time after which the provided names are refreshed
[ refresh_interval: <duration> | default = 30s ]
That should do the trick.
So the answer is very simple:
There are multiple, documented ways to scrape.
I am using the dns-lookup-way:
scrape_configs:
- job_name: 'myjobname'
metrics_path: '/prometheus'
scrape_interval: 15s
dns_sd_configs:
- names ['tasks.myServiceName']
type: A
port: 8080
refresh_interval: 15s

Resources