Docker Swarm - Prometheus cannot access Cadvisor: dial tcp 10.0.0.50:8090: connect: connection refused - docker-swarm

On my Windows 10 Pro I have a complete Docker Swarm environment. Part of the Docker Swarm stack are Prometheus and cAdvisor. Step by step I will build the monitoring tools and then deploy the monitoring to a Cloud solution.
In the Docker Swarm stack I can run Prometheus and Cadvisor, but Prometheus cannot connect to cAdvisor.
I get the message:
Get http://cadvisor:8090/metrics: dial tcp 10.0.0.50:8090: connect:
connection refused
How can I get Prometheus access cadvisor?
In my browser I can perform a 'localhost:8090/metrics' and get all metrics. So, the cAdvisor runs for sure.
I have one stack file that creates the network (devhome_default). In my second stack I refer to this network.
UPDATE: one way to solve this is to use the IP-address: $ ipconfig
Using that address in my prometheus.config works fine. But this makes the target hard-wired and not maintainable.
The stack / dockercompose file is:
version: '3'
services:
cadvisor:
image: google/cadvisor
networks:
- geosolutionsnet
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- /:/rootfs
- /var/run:/var/run
- /sys:/sys
- /var/lib/docker/:/var/lib/docker
ports:
- 8090:8080
deploy:
mode: global
resources:
limits:
cpus: '0.10'
memory: 128M
reservations:
cpus: '0.10'
memory: 64M
prometheus:
image: prom/prometheus:v2.8.0
ports:
- "9090:9090"
networks:
- geosolutionsnet
volumes:
- //k/data/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
deploy:
mode: replicated
replicas: 1
networks:
geosolutionsnet:
external:
name: devhome_default
The prometheus config file is:
global:
scrape_interval: 15s
evaluation_interval: 15s
rule_files:
#- "alert.rules_nodes"
#- "alert.rules_tasks"
#- "alert.rules_service-groups"
scrape_configs:
- job_name: 'prometheus'
dns_sd_configs:
- names:
- 'tasks.prometheus'
type: 'A'
port: 9090
- job_name: 'cadvisor'
static_configs:
- targets: ['cadvisor:8090']
labels:
alias: "cadvisor"
Alternatively, I tried also for cAdvisor:
- job_name: 'cadvisor'
dns_sd_configs:
- names:
- 'tasks.cadvisor'
type: 'A'
port: 8090
And also:
- job_name: 'cadvisor'
static_configs:
- targets: ['localhost:8090']

In different Cloud environments the below 'dns' solution works like charm. Because a 'real' Cloud environment is the target environment for our Docker containers, the standard 'de facto' solution suffices.
So, this works well in Cloud environments:
- job_name: 'cadvisor'
dns_sd_configs:
- names:
- 'tasks.cadvisor'
type: 'A'
port: 8090

Related

Docker cadvisor connection refused

I'm having this issue with docker compose running cadvisor and prometheus.
DNS resolution is working for cadvisor
When internet is allowed for cadvisor, prometheus is able to connect to agent using public-ip:8081.
version: '3.7' networks: monitoring:
driver: bridge volumes: prometheus_data: {} services: prometheus:
image: prom/prometheus:latest
user: "1000"
environment:
- PUID=1000
- PGID=1000
container_name: prometheus
restart: unless-stopped
volumes:
- /home/sammantar/sammantar/Project/OpenSourceMonitoring/promgrafnode/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
- /home/sammantar/sammantar/Project/OpenSourceMonitoring/promgrafnode/prometheus:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--web.console.libraries=/etc/prometheus/console_libraries'
- '--web.console.templates=/etc/prometheus/consoles'
- '--web.enable-lifecycle'
ports:
- "9090:9090"
networks:
- monitoring
cadvisor:
image: gcr.io/cadvisor/cadvisor:latest
container_name: cadvisor
ports:
- "127.0.0.1:8081:8080"
# network_mode: "host"
networks:
- monitoring
volumes:
- /:/rootfs:ro
- /var/run:/var/run:rw
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
depends_on:
- redis
deploy:
mode: global
networks:
- monitoring
```
Is there any allowed-ip list in the cadvisor or a issue with iptables.
yaml
global:
scrape_interval: 1m
scrape_configs:
- job_name: "prometheus" (Working)
scrape_interval: 1m
static_configs:
- targets: ["localhost:9090"]
- job_name: "node" (Working)
static_configs:
- targets: ["node-exporter:9100"]
- job_name: "cadvisor" (not working)
scrape_interval: 5s
static_configs:
- targets: ["cadvisor:8081"]
Tried to using private ip address for cadvisor container, connection refused
Tried using container alias name, connection refused
Published cadvisor contianer to internet, connection accepted to public-ip:8081
I don't want to publish cadvisor over internet.
[RESOLVED]
global:
scrape_interval: 1m
scrape_configs:
- job_name: "prometheus" (Working)
scrape_interval: 1m
static_configs:
- targets: ["localhost:9090"]
- job_name: "node" (Working)
static_configs:
- targets: ["node-exporter:9100"]
- job_name: "cadvisor" (not working)
scrape_interval: 5s
static_configs:
- targets: ["cadvisor:**8080**"]
I thought the mapped port on the left side was used when containers connect to each other. 8081:8080, switched to 8080 and it worked thanks to #anemyte

Docker Swarm - Elastichsearch instances cannot resolve each other on the same overlay

I am trying to create an elasticsearch stack using docker swarm (Don't really need swarm functionality, but the .yml has been written)
The problem I seem to be getting is that when i start up the stack, the two masters can't resolve each other (Despite confirming they're on the same network)
and it may be the fault of discovery.seed_hosts providing the masters with an empty list or it could be the docker network not working properly.
"message":"publish_address {10.0.120.16:9300}, bound_addresses {0.0.0.0:9300}"
"message":"bound or publishing to a non-loopback address, enforcing bootstrap checks",
"message":"failed to resolve host [es-master2]"
"message":"master not discovered yet, this node has not previously joined a bootstrapped cluster, and this node must discover master-eligible nodes [es-master1, es-master2] to bootstrap a cluster: have discovered [{es-master1}{SmILpuyqTGCkzzG5KmMIRg}{ru7wfhJTRkSXitkT5Ubhgw}{10.0.115.6}{10.0.115.6:9300}{cdfhilmrstw}]; discovery will continue using [] from hosts providers and [{es-master1}{SmILpuyqTGCkzzG5KmMIRg}{ru7wfhJTRkSXitkT5Ubhgw}{10.0.115.6}{10.0.115.6:9300}{cdfhilmrstw}] from last-known cluster state; node term 0, last-accepted version 0 in term 0
I'm not sure if the discovering from hosts [] means that something went wrong with the discovery.seed_hosts setting. I suspect it because without the setting it'll be replaced with [127.0.0.1:9300,... etc]
Here's the parts of my docker compose that's relevant. It's part of a bigger file, but right now i just need the two masters to talk first.
.
.
.
networks:
es-internal:
driver: overlay
es-connection:
driver: overlay
external:
driver: overlay
name: external
.
.
.
es-master1:
image: docker.elastic.co/elasticsearch/elasticsearch:8.2.3
hostname: es-master1
environment:
- node.name=es-master1
- cluster.name=${CLUSTER_NAME}
- node_role="master,ingest"
- cluster.initial_master_nodes=es-master1,es-master2
- discovery.seed_hosts=es-master2
# - bootstrap.memory_lock=true
- xpack.security.enabled=true
- xpack.security.http.ssl.enabled=true
- xpack.security.http.ssl.key=/usr/share/elasticsearch/config/certs/es-master1.key
- xpack.security.http.ssl.certificate=/usr/share/elasticsearch/config/certs/es-master1.crt
- xpack.security.http.ssl.certificate_authorities/usr/share/elasticsearch/config/certs/ca.crt
- xpack.security.http.ssl.verification_mode=certificate
- xpack.security.transport.ssl.enabled=true
- xpack.security.transport.ssl.key=/usr/share/elasticsearch/config/certs/es-master1.key
- xpack.security.transport.ssl.certificate=/usr/share/elasticsearch/config/certs/es-master1.crt
- xpack.security.transport.ssl.certificate_authorities=/usr/share/elasticsearch/config/certs/ca.crt
- xpack.security.transport.ssl.verification_mode=certificate
networks:
- es-internal
volumes:
- ./data/es-master1:/usr/share/elasticsearch/data
secrets:
- source: ca-crt
target: /usr/share/elasticsearch/config/certs/ca.crt
- source: es-master1-crt
target: /usr/share/elasticsearch/config/certs/es-master1.crt
- source: es-master1-key
target: /usr/share/elasticsearch/config/certs/es-master1.key
configs:
- source: jvm-coordination
target: /usr/share/elasticsearch/config/jvm.options.d/jvm-coordination
deploy:
endpoint_mode: dnsrr
mode: "replicated"
replicas: 1
resources:
limits:
memory: 1G
depends_on:
- es-coordination
healthcheck:
test: curl -fs http://localhost:9200/_cat/health || exit 1
interval: 30s
timeout: 5s
retries: 3
start_period: 45s
es-master2:
image: docker.elastic.co/elasticsearch/elasticsearch:8.2.3
hostname: es-master2
environment:
- node.name=es-master2
- cluster.name=${CLUSTER_NAME}
- node_role="master,ingest"
- cluster.initial_master_nodes=es-master1,es-master2
- discovery.seed_hosts=es-master1
# - bootstrap.memory_lock=true
- xpack.security.enabled=true
- xpack.security.http.ssl.enabled=true
- xpack.security.http.ssl.key=/usr/share/elasticsearch/config/certs/es-master2.key
- xpack.security.http.ssl.certificate=/usr/share/elasticsearch/config/certs/es-master2.crt
- xpack.security.http.ssl.certificate_authorities/usr/share/elasticsearch/config/certs/ca.crt
- xpack.security.http.ssl.verification_mode=certificate
- xpack.security.transport.ssl.enabled=true
- xpack.security.transport.ssl.key=/usr/share/elasticsearch/config/certs/es-master2.key
- xpack.security.transport.ssl.certificate=/usr/share/elasticsearch/config/certs/es-master2.crt
- xpack.security.transport.ssl.certificate_authorities=/usr/share/elasticsearch/config/certs/ca.crt
- xpack.security.transport.ssl.verification_mode=certificate
networks:
- es-internal
volumes:
- ./data/es-master2:/usr/share/elasticsearch/data
secrets:
- source: ca-crt
target: /usr/share/elasticsearch/config/certs/ca.crt
- source: es-master2-crt
target: /usr/share/elasticsearch/config/certs/es-master2.crt
- source: es-master2-key
target: /usr/share/elasticsearch/config/certs/es-master2.key
configs:
- source: jvm-coordination
target: /usr/share/elasticsearch/config/jvm.options.d/jvm-coordination
deploy:
endpoint_mode: dnsrr
mode: "replicated"
replicas: 1
resources:
limits:
memory: 1G
depends_on:
- es-coordination
healthcheck:
test: curl -fs http://localhost:9200/_cat/health || exit 1
interval: 30s
timeout: 5s
retries: 3
start_period: 45s
.
.
.
I've tried:
inspecting the network it creates to see if they are in the same network
All sorts of es config options
New swarm service meant to test out pinging between containers on the same overlay network. (They could resolve the hostnames given to them with the ping command)
Expose ports (I know it's useless now)
If anyone has some input it would be great!

Custom metrics is not showing in prometheus web ui so does in grafana

First of all I tried this solution didn't work for me.
I need to log some custom metrics using Prometheus.
docker-compose.yml
version: "3"
volumes:
prometheus_data: {}
grafana_data: {}
services:
prometheus:
image: prom/prometheus:latest
container_name: prometheus
hostname: my_service
ports:
- 9090:9090
depends_on:
- my_service
my-service:
build: .
ports:
- 8080:8080
grafana:
image: grafana/grafana:latest
container_name: grafana
hostname: grafana
ports:
- 3000:3000
depends_on:
- prometheus
prometheus.yml
global:
scrape_interval: 5s
scrape_timeout: 10s
external_labels:
monitor: 'my-project'
rule_files:
scrape_configs:
- job_name: myapp
scrape_interval: 10s
static_configs:
- targets:
- my_service:8080
I tried external ip as well, but i can't see my metrics in prometheus UI. Also, the target page is showing localhost:9090 is up.
What could be the problem? Can anyone correct the docker compose and prometheus file?
Thanks
So I found it. I have to set my scrape configs with the container name. something like this
scrape_configs:
- job_name: my-service
scrape_interval: 15s
scrape_timeout: 10s
metrics_path: /metrics
static_configs:
- targets:
- 'prometheus:9090'
- 'my-service:8080'
Once you fix your Prometheus volumes to your data, you will see your service is up and running at http://localhost:9090/targets

Alertmanager docker container refuses connections

I have a docker-compose file with one django app, Prometheus monitoring container and Alertmanager container.
All builds fine, app is running, Prometheus is monitoring but when it is to fire an alert, the alert does not reach the Alertmanager container with the following error message:
prometheus_1 | level=error ts=2021-08-02T08:58:16.018Z caller=notifier.go:527 component=notifier alertmanager=http://0.0.0.0:9093/api/v2/alerts count=1 msg="Error sending alert" err="Post \"http://0.0.0.0:9093/api/v2/alerts\": dial tc
p 0.0.0.0:9093: connect: connection refused"
Alertmanager also refuses telnet test connection like so
klex#DESKTOP-PVC5EP:~$ telnet 0.0.0.0 9093
Trying 0.0.0.0...
Connected to 0.0.0.0.
Escape character is '^]'.
Connection closed by foreign host.
the docker-compose file is:
version: "3"
services:
web:
container_name: smsgate
build: .
command: sh -c "python manage.py migrate &&
python manage.py collectstatic --no-input &&
python manage.py runserver 0.0.0.0:15001"
volumes:
- .:/smsgate:rw
- static_volume:/home/app/smsgate/static
- /var/run/docker.sock:/var/run/docker.sock
ports:
- "15001:15001"
env_file:
- .env.prod
image: smsgate
restart: "always"
networks:
- promnet
prometheus:
image: prom/prometheus
volumes:
- ./prometheus/:/etc/prometheus/
depends_on:
- alertmanager
ports:
- "9090:9090"
networks:
- promnet
alertmanager:
image: prom/alertmanager
ports:
- "9093:9093"
volumes:
- ./alertmanager/:/etc/alertmanager/
restart: "always"
command:
- '--config.file=/etc/alertmanager/alertmanager.yml'
networks:
- promnet
volumes:
static_volume:
alertmanager_volume:
prometheus_volume:
networks:
promnet:
driver: bridge
And prometheus.yml configuration file is
global:
scrape_interval: 15s
evaluation_interval: 15s
alerting:
alertmanagers:
- static_configs:
- targets:
- "0.0.0.0:9093"
rule_files:
- alert.rules.yml
scrape_configs:
- job_name: monitoring
metrics_path: /metrics
static_configs:
- targets:
- smsgate:15001
There is very likely a network? configuration problem either as the service seems not to accept any connections.
Prometheus and Alertmanager GUI interfaces can be accessed via browser on
http://127.0.0.1:9090/ and
http://127.0.0.1:9093/ respectively
Any help would be much appreciated.
Try to use the service name instead of the 0.0.0.0. Change the last line in the configuration of alerting block as:
alerting:
alertmanagers:
- static_configs:
- targets:
- "alertmanager:9093"
given that they are on the same net, it should work just fin
Update
I misunderstood the problem in the first place. Apologies. Please check the updated block above ☝🏽

Cannot access host machine from docker container

Using docker version 18.09.2. Using docker on windows 10.
Setting up a prometheus and grafana stack to monitor metrics on a service running on my localhost. Here's my docker compose file.
version: '3.4'
networks:
monitor-net:
driver: bridge
dockernet:
external: true
volumes:
prometheus_data: {}
grafana_data: {}
services:
prometheus:
image: prom/prometheus:v2.7.1
container_name: prometheus
volumes:
- ./prometheus/:/etc/prometheus/
- prometheus_data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--web.console.libraries=/etc/prometheus/console_libraries'
- '--web.console.templates=/etc/prometheus/consoles'
- '--storage.tsdb.retention.time=200h'
- '--web.enable-lifecycle'
restart: unless-stopped
expose:
- 9090
networks:
- monitor-net
- dockernet
extra_hosts:
- "localhost1:10.0.75.1"
labels:
org.label-schema.group: "monitoring"
grafana:
image: grafana/grafana:5.4.3
container_name: grafana
volumes:
- grafana_data:/var/lib/grafana
- ./grafana/datasources:/etc/grafana/datasources
- ./grafana/dashboards:/etc/grafana/dashboards
- ./grafana/setup.sh:/setup.sh
entrypoint: /setup.sh
environment:
- GF_SECURITY_ADMIN_USER=${ADMIN_USER:-admin}
- GF_SECURITY_ADMIN_PASSWORD=${ADMIN_PASSWORD:-admin}
- GF_USERS_ALLOW_SIGN_UP=false
restart: unless-stopped
expose:
- 3000
networks:
- monitor-net
- dockernet
labels:
org.label-schema.group: "monitoring"
caddy:
image: stefanprodan/caddy
container_name: caddy
ports:
- "3000:3000"
- "9090:9090"
- "9093:9093"
- "9091:9091"
volumes:
- ./caddy/:/etc/caddy/
environment:
- ADMIN_USER=${ADMIN_USER:-admin}
- ADMIN_PASSWORD=${ADMIN_PASSWORD:-admin}
restart: unless-stopped
networks:
- monitor-net
- dockernet
labels:
org.label-schema.group: "monitoring"
Here is my prometheus.yml file.
global:
scrape_interval: 15s
evaluation_interval: 15s
# Attach these labels to any time series or alerts when communicating with
# external systems (federation, remote storage, Alertmanager).
external_labels:
monitor: 'docker-host-alpha'
# Load and evaluate rules in this file every 'evaluation_interval' seconds.
rule_files:
- "alert.rules"
# A scrape configuration containing exactly one endpoint to scrape.
scrape_configs:
- job_name: 'prometheus'
scrape_interval: 10s
static_configs:
- targets: ['localhost:9090']
- job_name: 'myapp'
scrape_interval: 10s
metrics_path: /metrics
static_configs:
- targets: ['docker.for.win.localhost:32771']
- job_name: 'myapp1'
scrape_interval: 10s
metrics_path: /metrics
static_configs:
- targets: ['docker.for.win.host.internal:51626']
- job_name: 'myapp2'
scrape_interval: 10s
metrics_path: /metrics
static_configs:
- targets: ['docker.for.win.host.internal.localhost:51626']
- job_name: 'myapp3'
scrape_interval: 10s
metrics_path: /metrics
static_configs:
- targets: ['docker.for.win.host.localhost:51626']
- job_name: 'myapp4'
scrape_interval: 10s
metrics_path: /metrics
static_configs:
- targets: ['docker.for.win.localhost:51626']
- job_name: 'myapp5'
scrape_interval: 10s
metrics_path: /metrics
static_configs:
- targets: ['host.docker.internal:51626']
- job_name: 'myapp6'
scrape_interval: 10s
metrics_path: /metrics
static_configs:
- targets: ['host.docker.internal.localhost:51626']
- job_name: 'myapp7'
scrape_interval: 10s
metrics_path: /metrics
static_configs:
- targets: ['docker.for.win.localhost:51626']
- job_name: 'myapp8'
scrape_interval: 10s
metrics_path: /metrics
static_configs:
- targets: ['127.0.0.1:51626']
- job_name: 'myapp9'
scrape_interval: 10s
metrics_path: /metrics
static_configs:
- targets: ['localhost:51626']
- job_name: 'myapp10'
scrape_interval: 10s
metrics_path: /metrics
static_configs:
- targets: ['10.0.75.1:51626']
- job_name: 'myapp12'
scrape_interval: 10s
metrics_path: /metrics
static_configs:
- targets: ['localhost1:51626']
From what I understand host.docker.internal should reference my host IP and give me access to my local app but it didn't. So then I looked up my docker NAT IP address with ipconfig (the 10.0.75.1 address) and that didn't work either.
Then I tried the network binding of localhost1 to 10.0.75.1. I tried setting up a bridge network called dockernet and connect that way and it didn't work. When I launch my app in a docker container I can get to it through "docker.for.win.localhost:32771" but this container can't access my remote database so that's why I need it to run local.
Prometheus gives the following responses for some of the respective addresses:
Endpoint: Error
http://docker.for.win.localhost:32771/metrics: UP
http://host.docker.internal:51626/metrics: server returned HTTP status 400 Bad Request
http://docker.for.win.localhost:51626/metrics: server returned HTTP status 400 Bad Request
http://host.docker.internal.localhost:51626/metrics: Get http://host.docker.internal.localhost:51626/metrics: dial tcp: lookup host.docker.internal.localhost on 127.0.0.11:53: no such host
http://docker.for.win.host.internal.localhost:51626/metrics: Get http://docker.for.win.host.internal.localhost:51626/metrics: dial tcp: lookup docker.for.win.host.internal.localhost on 127.0.0.11:53: no such host
I've tried everything and am out of ideas. Can anyone shed some light?
I have the similar problem. I locally run my own application on IIS Express on port 52562, and prometheus inside container show that http://docker.for.win.localhost:52562/metrics return 400 BAD Request.
Problem was that IIS Express listen only for localhost, so I edit bindings in my applicationhost.config from
<binding protocol="http" bindingInformation="*:52562:localhost" />
to
<binding protocol="http" bindingInformation="*:52562:" />
and restart IIS Express.
This fixed the problem.
for my works this
version: "3"
networks:
sandbox:
driver: bridge
services:
prometheus:
restart: always
image: prom/prometheus:v2.3.2
volumes: ["./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml"]
ports: ["9090:9090"]
extra_hosts: ["host.docker.internal:172.17.0.1"] # from Gateway bridge and added /etc/hosts
networks: ["sandbox"]
grafana:
....
You might also check if your windows 10 firewall is blocking the connection.
To disable the firewall completely:
netsh advfirewall set allprofiles state off
To allow a connection on a specific port:
New-NetFirewallRule -Protocol TCP -LocalPort 44369 -Direction Inbound -Action Allow -DisplayName "Allow network TCP on port 44369"

Resources