Docker userns remap give permission issues within defined range - docker
/etc/subuid
ubuntu:1000:1
ubuntu:165533:65536
/etc/subgid
ubuntu:999:1
ubuntu:165536:65536
So I am expecting files created by root in the container to map to my username on the host, which avoids permission issues with bind mounts directories on the host.
This works fine, except when I docker-compose up anchore-engine
This creates a named volume with these permissions:
The anchore services immediately terminates and exit unless I manually correct the permissions with chown to ubuntu:docker on the _data directory.
I was expecting that the 166531 is within the range defined in the subuid file. What's wrong?
docker-compose.yaml
version: '2.1'
volumes:
anchore-db-volume:
# Set this to 'true' to use an external volume. In which case, it must be created manually with "docker volume create anchore-db-volume"
external: false
services:
# The primary API endpoint service
api:
image: anchore/anchore-engine:v0.8.2
depends_on:
- db
- catalog
ports:
- "8228:8228"
logging:
driver: "json-file"
options:
max-size: 100m
environment:
- ANCHORE_ENDPOINT_HOSTNAME=api
- ANCHORE_DB_HOST=db
- ANCHORE_DB_PASSWORD=mysecretpassword
command: ["anchore-manager", "service", "start", "apiext"]
# Catalog is the primary persistence and state manager of the system
catalog:
image: anchore/anchore-engine:v0.8.2
depends_on:
- db
logging:
driver: "json-file"
options:
max-size: 100m
expose:
- 8228
environment:
- ANCHORE_ENDPOINT_HOSTNAME=catalog
- ANCHORE_DB_HOST=db
- ANCHORE_DB_PASSWORD=mysecretpassword
command: ["anchore-manager", "service", "start", "catalog"]
queue:
image: anchore/anchore-engine:v0.8.2
depends_on:
- db
- catalog
expose:
- 8228
logging:
driver: "json-file"
options:
max-size: 100m
environment:
- ANCHORE_ENDPOINT_HOSTNAME=queue
- ANCHORE_DB_HOST=db
- ANCHORE_DB_PASSWORD=mysecretpassword
command: ["anchore-manager", "service", "start", "simplequeue"]
policy-engine:
image: anchore/anchore-engine:v0.8.2
depends_on:
- db
- catalog
expose:
- 8228
logging:
driver: "json-file"
options:
max-size: 100m
environment:
- ANCHORE_ENDPOINT_HOSTNAME=policy-engine
- ANCHORE_DB_HOST=db
- ANCHORE_DB_PASSWORD=mysecretpassword
command: ["anchore-manager", "service", "start", "policy_engine"]
analyzer:
image: anchore/anchore-engine:v0.8.2
depends_on:
- db
- catalog
expose:
- 8228
logging:
driver: "json-file"
options:
max-size: 100m
environment:
- ANCHORE_ENDPOINT_HOSTNAME=analyzer
- ANCHORE_DB_HOST=db
- ANCHORE_DB_PASSWORD=mysecretpassword
volumes:
- /analysis_scratch
command: ["anchore-manager", "service", "start", "analyzer"]
db:
image: "postgres:9"
volumes:
- anchore-db-volume:/var/lib/postgresql/data
environment:
- POSTGRES_PASSWORD=mysecretpassword
expose:
- 5432
logging:
driver: "json-file"
options:
max-size: 100m
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
Logs from one of the stopped containers:
/usr/local/lib/python3.6/site-packages/yosai/core/conf/yosaisettings.py:100: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
config = yaml.load(stream)
Traceback (most recent call last):
File "/usr/local/bin/twistd", line 11, in <module>
sys.exit(run())
File "/usr/local/lib64/python3.6/site-packages/twisted/scripts/twistd.py", line 31, in run
app.run(runApp, ServerOptions)
File "/usr/local/lib64/python3.6/site-packages/twisted/application/app.py", line 674, in run
runApp(config)
File "/usr/local/lib64/python3.6/site-packages/twisted/scripts/twistd.py", line 25, in runApp
runner.run()
File "/usr/local/lib64/python3.6/site-packages/twisted/application/app.py", line 383, in run
self.logger.start(self.application)
File "/usr/local/lib64/python3.6/site-packages/twisted/application/app.py", line 184, in start
observer = self._observerFactory()
File "/usr/local/lib/python3.6/site-packages/anchore_engine/subsys/twistd_logger.py", line 14, in logger
f = logfile.LogFile(thefile, '/var/log/', rotateLength=10000000, maxRotatedFiles=10)
File "/usr/local/lib64/python3.6/site-packages/twisted/python/logfile.py", line 170, in __init__
BaseLogFile.__init__(self, name, directory, defaultMode)
File "/usr/local/lib64/python3.6/site-packages/twisted/python/logfile.py", line 45, in __init__
self._openFile()
File "/usr/local/lib64/python3.6/site-packages/twisted/python/logfile.py", line 175, in _openFile
BaseLogFile._openFile(self)
File "/usr/local/lib64/python3.6/site-packages/twisted/python/logfile.py", line 85, in _openFile
self._file = open(self.path, "wb+", 0)
PermissionError: [Errno 13] Permission denied: '/var/log/anchore/anchore-api.log'
[MainThread] [anchore_manager.cli.service/start()] [INFO] Loading DB routines from module (anchore_engine)
[MainThread] [anchore_manager.util.db/connect_database()] [INFO] DB params: {"db_connect_args": {"connect_timeout": 86400}, "db_pool_size": 30, "db_pool_max_overflow": 100, "db_echo": false, "db_engine_args": null}
[MainThread] [anchore_manager.util.db/connect_database()] [INFO] DB connection configured: True
[MainThread] [anchore_manager.util.db/connect_database()] [INFO] DB attempting to connect...
[MainThread] [anchore_manager.util.db/connect_database()] [INFO] DB connected: True
[MainThread] [anchore_manager.util.db/init_database()] [INFO] DB compatibility check: running...
[MainThread] [anchore_manager.util.db/init_database()] [INFO] DB compatibility check success
[MainThread] [anchore_manager.util.db/init_database()] [INFO] DB post actions: running...
[MainThread] [anchore_manager.cli.service/start()] [INFO] DB version and code version in sync.
[MainThread] [anchore_manager.cli.service/start()] [INFO] Starting services: ['anchore-api']
[MainThread] [anchore_manager.cli.service/terminate_service()] [INFO] Looking for pre-existing service (anchore-api) pid from pidfile (/var/run/anchore/anchore-api.pid)
[MainThread] [anchore_manager.cli.service/start()] [INFO] waiting for service pidfile /var/run/anchore/anchore-api.pid to exist 0/30
[anchore-api] [anchore_manager.cli.service/startup_service()] [INFO] cleaning up service: anchore-api
[anchore-api] [anchore_manager.cli.service/terminate_service()] [INFO] Looking for pre-existing service (anchore-api) pid from pidfile (/var/run/anchore/anchore-api.pid)
[anchore-api] [anchore_manager.cli.service/startup_service()] [INFO] starting service: anchore-api
[anchore-api] [anchore_manager.cli.service/startup_service()] [INFO] /usr/local/bin/twistd --logger=anchore_engine.subsys.twistd_logger.logger --pidfile /var/run/anchore/anchore-api.pid -n anchore-api --config /config
[MainThread] [anchore_manager.cli.service/start()] [INFO] waiting for service pidfile /var/run/anchore/anchore-api.pid to exist 1/30
[MainThread] [anchore_manager.cli.service/start()] [INFO] waiting for service pidfile /var/run/anchore/anchore-api.pid to exist 2/30
[MainThread] [anchore_manager.cli.service/start()] [INFO] waiting for service pidfile /var/run/anchore/anchore-api.pid to exist 3/30
[MainThread] [anchore_manager.cli.service/start()] [INFO] waiting for service pidfile /var/run/anchore/anchore-api.pid to exist 4/30
[MainThread] [anchore_manager.cli.service/start()] [INFO] waiting for service pidfile /var/run/anchore/anchore-api.pid to exist 5/30
[MainThread] [anchore_manager.cli.service/start()] [INFO] waiting for service pidfile /var/run/anchore/anchore-api.pid to exist 6/30
[MainThread] [anchore_manager.cli.service/start()] [INFO] waiting for service pidfile /var/run/anchore/anchore-api.pid to exist 7/30
[MainThread] [anchore_manager.cli.service/start()] [INFO] waiting for service pidfile /var/run/anchore/anchore-api.pid to exist 8/30
[MainThread] [anchore_manager.cli.service/start()] [INFO] waiting for service pidfile /var/run/anchore/anchore-api.pid to exist 9/30
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/anchore_manager/cli/service.py", line 165, in startup_service
raise Exception("process exited: " + str(rc))
Exception: process exited: 1
[anchore-api] [anchore_manager.cli.service/startup_service()] [ERROR] service process exited at (Tue Dec 1 16:30:54 2020): process exited: 1
[anchore-api] [anchore_manager.cli.service/startup_service()] [FATAL] Could not start service due to: process exited: 1
[anchore-api] [anchore_manager.cli.service/startup_service()] [INFO] exiting service thread
[MainThread] [anchore_manager.cli.service/start()] [INFO] waiting for service pidfile /var/run/anchore/anchore-api.pid to exist 10/30
[MainThread] [anchore_manager.cli.service/start()] [INFO] service thread has stopped anchore-api
[MainThread] [anchore_manager.cli.service/start()] [INFO] auto_restart_services setting: False
[MainThread] [anchore_manager.cli.service/start()] [INFO] checking for startup failure pidfile=False, is_alive=False
[MainThread] [anchore_manager.cli.service/start()] [WARN] service start failed - exception: service thread for (anchore-api) failed to start
[MainThread] [anchore_manager.cli.service/start()] [FATAL] one or more services failed to start. cleanly terminating the others
[MainThread] [anchore_manager.cli.service/terminate_service()] [INFO] Looking for pre-existing service (anchore-api) pid from pidfile (/var/run/anchore/anchore-api.pid)
/usr/local/lib/python3.6/site-packages/yosai/core/conf/yosaisettings.py:100: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
config = yaml.load(stream)
Traceback (most recent call last):
File "/usr/local/bin/twistd", line 11, in <module>
sys.exit(run())
File "/usr/local/lib64/python3.6/site-packages/twisted/scripts/twistd.py", line 31, in run
app.run(runApp, ServerOptions)
File "/usr/local/lib64/python3.6/site-packages/twisted/application/app.py", line 674, in run
runApp(config)
File "/usr/local/lib64/python3.6/site-packages/twisted/scripts/twistd.py", line 25, in runApp
runner.run()
File "/usr/local/lib64/python3.6/site-packages/twisted/application/app.py", line 383, in run
self.logger.start(self.application)
File "/usr/local/lib64/python3.6/site-packages/twisted/application/app.py", line 184, in start
observer = self._observerFactory()
File "/usr/local/lib/python3.6/site-packages/anchore_engine/subsys/twistd_logger.py", line 14, in logger
f = logfile.LogFile(thefile, '/var/log/', rotateLength=10000000, maxRotatedFiles=10)
File "/usr/local/lib64/python3.6/site-packages/twisted/python/logfile.py", line 170, in __init__
BaseLogFile.__init__(self, name, directory, defaultMode)
File "/usr/local/lib64/python3.6/site-packages/twisted/python/logfile.py", line 45, in __init__
self._openFile()
File "/usr/local/lib64/python3.6/site-packages/twisted/python/logfile.py", line 175, in _openFile
BaseLogFile._openFile(self)
File "/usr/local/lib64/python3.6/site-packages/twisted/python/logfile.py", line 85, in _openFile
self._file = open(self.path, "wb+", 0)
PermissionError: [Errno 13] Permission denied: '/var/log/anchore/anchore-api.log'
[MainThread] [anchore_manager.cli.service/start()] [INFO] Loading DB routines from module (anchore_engine)
[MainThread] [anchore_manager.util.db/connect_database()] [INFO] DB params: {"db_connect_args": {"connect_timeout": 86400}, "db_pool_size": 30, "db_pool_max_overflow": 100, "db_echo": false, "db_engine_args": null}
[MainThread] [anchore_manager.util.db/connect_database()] [INFO] DB connection configured: True
[MainThread] [anchore_manager.util.db/connect_database()] [INFO] DB attempting to connect...
[MainThread] [anchore_manager.util.db/connect_database()] [INFO] DB connected: True
[MainThread] [anchore_manager.util.db/init_database()] [INFO] DB compatibility check: running...
[MainThread] [anchore_manager.util.db/init_database()] [INFO] DB compatibility check success
[MainThread] [anchore_manager.util.db/init_database()] [INFO] DB post actions: running...
[MainThread] [anchore_manager.cli.service/start()] [INFO] DB version and code version in sync.
[MainThread] [anchore_manager.cli.service/start()] [INFO] Starting services: ['anchore-api']
[MainThread] [anchore_manager.cli.service/terminate_service()] [INFO] Looking for pre-existing service (anchore-api) pid from pidfile (/var/run/anchore/anchore-api.pid)
[anchore-api] [anchore_manager.cli.service/startup_service()] [INFO] cleaning up service: anchore-api
[anchore-api] [anchore_manager.cli.service/terminate_service()] [INFO] Looking for pre-existing service (anchore-api) pid from pidfile (/var/run/anchore/anchore-api.pid)
[anchore-api] [anchore_manager.cli.service/startup_service()] [INFO] starting service: anchore-api
[anchore-api] [anchore_manager.cli.service/startup_service()] [INFO] /usr/local/bin/twistd --logger=anchore_engine.subsys.twistd_logger.logger --pidfile /var/run/anchore/anchore-api.pid -n anchore-api --config /config
[MainThread] [anchore_manager.cli.service/start()] [INFO] waiting for service pidfile /var/run/anchore/anchore-api.pid to exist 0/30
[MainThread] [anchore_manager.cli.service/start()] [INFO] waiting for service pidfile /var/run/anchore/anchore-api.pid to exist 1/30
[MainThread] [anchore_manager.cli.service/start()] [INFO] waiting for service pidfile /var/run/anchore/anchore-api.pid to exist 2/30
[MainThread] [anchore_manager.cli.service/start()] [INFO] waiting for service pidfile /var/run/anchore/anchore-api.pid to exist 3/30
[MainThread] [anchore_manager.cli.service/start()] [INFO] waiting for service pidfile /var/run/anchore/anchore-api.pid to exist 4/30
[MainThread] [anchore_manager.cli.service/start()] [INFO] waiting for service pidfile /var/run/anchore/anchore-api.pid to exist 5/30
[MainThread] [anchore_manager.cli.service/start()] [INFO] waiting for service pidfile /var/run/anchore/anchore-api.pid to exist 6/30
[MainThread] [anchore_manager.cli.service/start()] [INFO] waiting for service pidfile /var/run/anchore/anchore-api.pid to exist 7/30
[MainThread] [anchore_manager.cli.service/start()] [INFO] waiting for service pidfile /var/run/anchore/anchore-api.pid to exist 8/30
[MainThread] [anchore_manager.cli.service/start()] [INFO] waiting for service pidfile /var/run/anchore/anchore-api.pid to exist 9/30
[MainThread] [anchore_manager.cli.service/start()] [INFO] waiting for service pidfile /var/run/anchore/anchore-api.pid to exist 10/30
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/anchore_manager/cli/service.py", line 165, in startup_service
raise Exception("process exited: " + str(rc))
Exception: process exited: 1
[anchore-api] [anchore_manager.cli.service/startup_service()] [ERROR] service process exited at (Tue Dec 1 16:32:07 2020): process exited: 1
[anchore-api] [anchore_manager.cli.service/startup_service()] [FATAL] Could not start service due to: process exited: 1
[anchore-api] [anchore_manager.cli.service/startup_service()] [INFO] exiting service thread
[MainThread] [anchore_manager.cli.service/start()] [INFO] waiting for service pidfile /var/run/anchore/anchore-api.pid to exist 11/30
[MainThread] [anchore_manager.cli.service/start()] [INFO] service thread has stopped anchore-api
[MainThread] [anchore_manager.cli.service/start()] [INFO] auto_restart_services setting: False
[MainThread] [anchore_manager.cli.service/start()] [INFO] checking for startup failure pidfile=False, is_alive=False
[MainThread] [anchore_manager.cli.service/start()] [WARN] service start failed - exception: service thread for (anchore-api) failed to start
[MainThread] [anchore_manager.cli.service/start()] [FATAL] one or more services failed to start. cleanly terminating the others
[MainThread] [anchore_manager.cli.service/terminate_service()] [INFO] Looking for pre-existing service (anchore-api) pid from pidfile (/var/run/anchore/anchore-api.pid)
I was expecting that the 166531 is within the range defined in the subuid file. What's wrong?
The 166531 is within the subuid range, it maps to uid 999 (165533-1+999) which matches the uid of the postgres user within the postgres image:
$ docker run -it --rm --entrypoint /bin/bash postgres:9
root#d99b2bbb3d48:/# ls -al /var/lib/postgresql/data
total 8
drwxrwxrwx 2 postgres postgres 4096 Nov 18 08:39 .
drwxr-xr-x 1 postgres postgres 4096 Nov 18 08:39 ..
root#d99b2bbb3d48:/# id postgres
uid=999(postgres) gid=999(postgres) groups=999(postgres),101(ssl-cert)
By default, docker will initialize a new/empty named volume to match the contents of the image, including file ownership and permissions. This is expected behavior for postgres. We'd need to see logs of the failing containers to give more details of why you're seeing containers exit.
Related
Adguard docker cannot access web ui
im working on a raspberry pi and trying to setup adguard home with docker compose. First inital setup all works fine. When i start the container i can access the webUI with "HOST:3000". But when I recreate the container the webUi is no more accessible. I found out that if i delete the "AdGuardHome.yaml" within the conf folder its working again until i finish the inital setup again. My compose file: version: "3" adguard: image: adguard/adguardhome:v0.108.0-b.25 container_name: adguard restart: unless-stopped ports: - 53:53/tcp - 53:53/udp - 67:67/udp - 69:68/udp - 80:80/tcp - 443:443/tcp - 443:443/udp - 3000:3000/tcp - 853:853/tcp - 784:784/udp - 853:853/udp - 8853:8853/udp - 5443:5443/tcp - 5443:5443/udp environment: - TZ=Europe/Berlin volumes: - /home/pi/homematicDocker/adguard/work:/opt/adguardhome/work - /home/pi/homematicDocker/adguard/conf:/opt/adguardhome/conf network_mode: host The containers log: 2023/01/29 06:22:12.243914 [info] AdGuard Home, version v0.108.0-b.25 2023/01/29 06:22:12.244302 [info] AdGuard Home updates are disabled 2023/01/29 06:22:12.253579 [info] tls: using default ciphers 2023/01/29 06:22:12.292132 [info] Initializing auth module: /opt/adguardhome/work/data/sessions.db 2023/01/29 06:22:12.293924 [info] auth: initialized. users:1 sessions:1 2023/01/29 06:22:12.294066 [info] web: initializing 2023/01/29 06:22:12.439396 [info] dnsproxy: cache: enabled, size 4096 b 2023/01/29 06:22:12.439482 [info] MaxGoroutines is set to 300 2023/01/29 06:22:12.442166 [info] AdGuard Home is available at the following addresses: 2023/01/29 06:22:12.447718 [info] Go to http://127.0.0.1:80 2023/01/29 06:22:12.447818 [info] Go to http://[::1]:80 2023/01/29 06:22:12.447854 [info] Go to http://192.168.178.37:80 2023/01/29 06:22:12.447889 [info] Go to http://[2003:f2:670b:5400:fe5f:c7b3:47e9:2db0]:80 2023/01/29 06:22:12.447926 [info] Go to http://[fe80::fac1:92f4:4829:1a7a%eth0]:80 2023/01/29 06:22:12.447961 [info] Go to http://172.17.0.1:80 2023/01/29 06:22:12.447997 [info] Go to http://[fe80::42:aff:fe19:feba%docker0]:80 2023/01/29 06:22:12.448032 [info] Go to http://172.18.0.1:80 2023/01/29 06:22:12.448235 [info] Go to http://[fe80::42:2bff:fee7:ea90%br-83f36fdc3e1b]:80 2023/01/29 06:22:12.448285 [info] Go to http://172.19.0.1:80 2023/01/29 06:22:12.448414 [info] Go to http://[fe80::42:bfff:feef:d231%br-d48134c39c76]:80 2023/01/29 06:22:12.448457 [info] Go to http://[fe80::24b3:79ff:fef6:548a%veth6e52584]:80 2023/01/29 06:22:12.448581 [info] Go to http://[fe80::94bf:17ff:fe2c:62ed%veth6138d5a]:80 2023/01/29 06:22:12.448623 [info] Go to http://[fe80::4ca6:93ff:fe33:c5bb%veth4a0eccf]:80 2023/01/29 06:22:12.449176 [info] Go to http://[fe80::ed15:c73e:7dd0:b08e%veth919a0eb]:80 2023/01/29 06:22:12.449234 [info] Go to http://[fe80::ccb4:3c9e:5dc4:ed0%vethcedf116]:80 2023/01/29 06:22:13.626359 [info] Starting the DNS proxy server 2023/01/29 06:22:13.626421 [info] Ratelimit is enabled and set to 20 rps 2023/01/29 06:22:13.626444 [info] The server is configured to refuse ANY requests 2023/01/29 06:22:13.626526 [info] dnsproxy: cache: enabled, size 4194304 b 2023/01/29 06:22:13.626605 [info] MaxGoroutines is set to 300 2023/01/29 06:22:13.626708 [info] Creating the UDP server socket 2023/01/29 06:22:13.627076 [info] Listening to udp://[::]:53 2023/01/29 06:22:13.627115 [info] Creating a TCP server socket 2023/01/29 06:22:13.627356 [info] Listening to tcp://[::]:53 2023/01/29 06:22:13.627645 [info] Entering the UDP listener loop on [::]:53 2023/01/29 06:22:13.627645 [info] Entering the tcp listener loop on [::]:53
I was able to access the web ui by hostname without any port. So i changed the port of the AdguardHome.yml to 3000. So its running as expected.
Mlflow UI can't show artifacts
I have mlflow running on an azure VM and connected to Azure Blob as the artifact storage. After uploading artifacts to the storage from the Client. I tried the MLflow UI and successfully was able to show the uploaded file. The problem happens when I try to run MLFLOW with Docker, I get the error: Unable to list artifacts stored under {artifactUri} for the current run. Please contact your tracking server administrator to notify them of this error, which can happen when the tracking server lacks permission to list artifacts under the current run's root artifact directory Dockerfile: FROM python:3.7-slim-buster # Install python packages RUN pip install mlflow pymysql RUN pip install azure-storage-blob ENV AZURE_STORAGE_ACCESS_KEY="#########" ENV AZURE_STORAGE_CONNECTION_STRING="#######" docker-compose.yml web: restart: always build: ./mlflow_server image: mlflow_server container_name: mlflow_server expose: - "5000" networks: - frontend - backend environment: - AZURE_STORAGE_ACCESS_KEY="#####" - AZURE_STORAGE_CONNECTION_STRING="#####" command: mlflow server --backend-store-uri mysql+pymysql://mlflow_user:123456#db:3306/mlflow --default-artifact-root wasbs://etc.. I tried multiple solutions: Making sure that boto3 is installed (Didn't do anything) Adding Environment Variables in the Dockerfile so the command runs after they're set I double checked the url of the storage blob And MLFLOW doesn't show any logs it just kills the process and restarts again. Anyone got any idea what might be the solution or how can i access the logs here're the docker logs of the container: [2022-07-28 12:23:33 +0000] [10] [INFO] Starting gunicorn 20.1.0 [2022-07-28 12:23:33 +0000] [10] [INFO] Listening at: http://0.0.0.0:5000 (10) [2022-07-28 12:23:33 +0000] [10] [INFO] Using worker: sync [2022-07-28 12:23:33 +0000] [13] [INFO] Booting worker with pid: 13 [2022-07-28 12:23:33 +0000] [14] [INFO] Booting worker with pid: 14 [2022-07-28 12:23:33 +0000] [15] [INFO] Booting worker with pid: 15 [2022-07-28 12:23:33 +0000] [16] [INFO] Booting worker with pid: 16 [2022-07-28 12:24:24 +0000] [10] [CRITICAL] WORKER TIMEOUT (pid:14) [2022-07-28 12:24:24 +0000] [14] [INFO] Worker exiting (pid: 14) [2022-07-28 12:24:24 +0000] [21] [INFO] Booting worker with pid: 21
Connect consul agent to consul
I'm trying to setup the consul server and connect an agent to it for 2 or 3 days already. I'm using docker-compose. But after performing a join operation, agent gets a message "Agent not live or unreachable". Here are the logs: root#e33a6127103f:/app# consul agent -join 10.1.30.91 -data-dir=/tmp/consul ==> Starting Consul agent... ==> Joining cluster... Join completed. Synced with 1 initial agents ==> Consul agent running! Version: 'v1.0.1' Node ID: '0e1adf74-462d-45a4-1927-95ed123f1526' Node name: 'e33a6127103f' Datacenter: 'dc1' (Segment: '') Server: false (Bootstrap: false) Client Addr: [127.0.0.1] (HTTP: 8500, HTTPS: -1, DNS: 8600) Cluster Addr: 172.17.0.2 (LAN: 8301, WAN: 8302) Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false ==> Log data will now stream in as it occurs: 2017/12/06 10:44:43 [INFO] serf: EventMemberJoin: e33a6127103f 172.17.0.2 2017/12/06 10:44:43 [INFO] agent: Started DNS server 127.0.0.1:8600 (udp) 2017/12/06 10:44:43 [INFO] agent: Started DNS server 127.0.0.1:8600 (tcp) 2017/12/06 10:44:43 [INFO] agent: Started HTTP server on 127.0.0.1:8500 (tcp) 2017/12/06 10:44:43 [INFO] agent: (LAN) joining: [10.1.30.91] 2017/12/06 10:44:43 [INFO] serf: EventMemberJoin: consul1 172.19.0.2 2017/12/06 10:44:43 [INFO] consul: adding server consul1 (Addr: tcp/172.19.0.2:8300) (DC: dc1) 2017/12/06 10:44:43 [INFO] agent: (LAN) joined: 1 Err: <nil> 2017/12/06 10:44:43 [INFO] agent: started state syncer 2017/12/06 10:44:43 [WARN] manager: No servers available 2017/12/06 10:44:43 [ERR] agent: failed to sync remote state: No known Consul servers 2017/12/06 10:44:54 [INFO] memberlist: Suspect consul1 has failed, no acks received 2017/12/06 10:44:55 [ERR] consul: "Catalog.NodeServices" RPC failed to server 172.19.0.2:8300: rpc error getting client: failed to get conn: dial tcp <nil>->172.19.0.2:8300: i/o timeout 2017/12/06 10:44:55 [ERR] agent: failed to sync remote state: rpc error getting client: failed to get conn: dial tcp <nil>->172.19.0.2:8300: i/o timeout 2017/12/06 10:44:58 [INFO] memberlist: Marking consul1 as failed, suspect timeout reached (0 peer confirmations) 2017/12/06 10:44:58 [INFO] serf: EventMemberFailed: consul1 172.19.0.2 2017/12/06 10:44:58 [INFO] consul: removing server consul1 (Addr: tcp/172.19.0.2:8300) (DC: dc1) 2017/12/06 10:45:05 [INFO] memberlist: Suspect consul1 has failed, no acks received 2017/12/06 10:45:06 [WARN] manager: No servers available 2017/12/06 10:45:06 [ERR] agent: Coordinate update error: No known Consul servers 2017/12/06 10:45:12 [WARN] manager: No servers available 2017/12/06 10:45:12 [ERR] agent: failed to sync remote state: No known Consul servers 2017/12/06 10:45:13 [INFO] serf: attempting reconnect to consul1 172.19.0.2:8301 2017/12/06 10:45:28 [WARN] manager: No servers available 2017/12/06 10:45:28 [ERR] agent: failed to sync remote state: No known Consul servers 2017/12/06 10:45:32 [WARN] manager: No servers available . ` My settings are: docker-compose SERVER: consul1: image: "consul.1.0.1" container_name: "consul1" hostname: "consul1" volumes: - ./consul/config:/config/ ports: - "8400:8400" - "8500:8500" - "8600:53" - "8300:8300" - "8301:8301" command: "agent -config-dir=/config -ui -server -bootstrap-expect 1" Help please solve the problem.
I think you using wrong ip-address of consul-server "consul agent -join 10.1.30.91 -data-dir=/tmp/consul" 10.1.30.91 this is not docker container ip it might be your host address/virtualbox. Get consul-container ip and use that to join in consul-agent command. For more info about how consul and agent works follow the link https://dzone.com/articles/service-discovery-with-docker-and-consul-part-1
Try to get the right IP address by executing this command: docker inspect <container id> | grep "IPAddress" Where the is the container ID of the consul server. Than use the obtained address instead of "10.1.30.91" in the command consul agent -join <IP ADDRESS CONSUL SERVER> -data-dir=/tmp/consul
Consul Empty reply from server
I'm trying to get a consul server cluster up and running. I have 3 dockerized consul servers running, but I can't access the Web UI, the HTTP API nor the DNS. $ docker logs net-sci_discovery-service_consul_1 ==> WARNING: Expect Mode enabled, expecting 3 servers ==> Starting Consul agent... ==> Consul agent running! Version: 'v0.8.5' Node ID: 'ccd38897-6047-f8b6-be1c-2aa0022a1483' Node name: 'consul1' Datacenter: 'dc1' Server: true (bootstrap: false) Client Addr: 127.0.0.1 (HTTP: 8500, HTTPS: -1, DNS: 8600) Cluster Addr: 172.20.0.2 (LAN: 8301, WAN: 8302) Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false ==> Log data will now stream in as it occurs: 2017/07/07 23:24:07 [INFO] raft: Initial configuration (index=0): [] 2017/07/07 23:24:07 [INFO] raft: Node at 172.20.0.2:8300 [Follower] entering Follower state (Leader: "") 2017/07/07 23:24:07 [INFO] serf: EventMemberJoin: consul1 172.20.0.2 2017/07/07 23:24:07 [INFO] consul: Adding LAN server consul1 (Addr: tcp/172.20.0.2:8300) (DC: dc1) 2017/07/07 23:24:07 [INFO] serf: EventMemberJoin: consul1.dc1 172.20.0.2 2017/07/07 23:24:07 [INFO] consul: Handled member-join event for server "consul1.dc1" in area "wan" 2017/07/07 23:24:07 [INFO] agent: Started DNS server 127.0.0.1:8600 (tcp) 2017/07/07 23:24:07 [INFO] agent: Started DNS server 127.0.0.1:8600 (udp) 2017/07/07 23:24:07 [INFO] agent: Started HTTP server on 127.0.0.1:8500 2017/07/07 23:24:09 [INFO] serf: EventMemberJoin: consul2 172.20.0.3 2017/07/07 23:24:09 [INFO] consul: Adding LAN server consul2 (Addr: tcp/172.20.0.3:8300) (DC: dc1) 2017/07/07 23:24:09 [INFO] serf: EventMemberJoin: consul2.dc1 172.20.0.3 2017/07/07 23:24:09 [INFO] consul: Handled member-join event for server "consul2.dc1" in area "wan" 2017/07/07 23:24:10 [INFO] serf: EventMemberJoin: consul3 172.20.0.4 2017/07/07 23:24:10 [INFO] consul: Adding LAN server consul3 (Addr: tcp/172.20.0.4:8300) (DC: dc1) 2017/07/07 23:24:10 [INFO] consul: Found expected number of peers, attempting bootstrap: 172.20.0.2:8300,172.20.0.3:8300,172.20.0.4:8300 2017/07/07 23:24:10 [INFO] serf: EventMemberJoin: consul3.dc1 172.20.0.4 2017/07/07 23:24:10 [INFO] consul: Handled member-join event for server "consul3.dc1" in area "wan" 2017/07/07 23:24:14 [ERR] agent: failed to sync remote state: No cluster leader 2017/07/07 23:24:17 [WARN] raft: Heartbeat timeout from "" reached, starting election 2017/07/07 23:24:17 [INFO] raft: Node at 172.20.0.2:8300 [Candidate] entering Candidate state in term 2 2017/07/07 23:24:17 [INFO] raft: Election won. Tally: 2 2017/07/07 23:24:17 [INFO] raft: Node at 172.20.0.2:8300 [Leader] entering Leader state 2017/07/07 23:24:17 [INFO] raft: Added peer 172.20.0.3:8300, starting replication 2017/07/07 23:24:17 [INFO] raft: Added peer 172.20.0.4:8300, starting replication 2017/07/07 23:24:17 [INFO] consul: cluster leadership acquired 2017/07/07 23:24:17 [INFO] consul: New leader elected: consul1 2017/07/07 23:24:17 [WARN] raft: AppendEntries to {Voter 172.20.0.3:8300 172.20.0.3:8300} rejected, sending older logs (next: 1) 2017/07/07 23:24:17 [WARN] raft: AppendEntries to {Voter 172.20.0.4:8300 172.20.0.4:8300} rejected, sending older logs (next: 1) 2017/07/07 23:24:17 [INFO] raft: pipelining replication to peer {Voter 172.20.0.3:8300 172.20.0.3:8300} 2017/07/07 23:24:17 [INFO] raft: pipelining replication to peer {Voter 172.20.0.4:8300 172.20.0.4:8300} 2017/07/07 23:24:18 [INFO] consul: member 'consul1' joined, marking health alive 2017/07/07 23:24:18 [INFO] consul: member 'consul2' joined, marking health alive 2017/07/07 23:24:18 [INFO] consul: member 'consul3' joined, marking health alive 2017/07/07 23:24:20 [INFO] agent: Synced service 'consul' 2017/07/07 23:24:20 [INFO] agent: Synced service 'messaging-service-kafka' 2017/07/07 23:24:20 [INFO] agent: Synced service 'messaging-service-zookeeper' $ curl http://127.0.0.1:8500/v1/catalog/service/consul curl: (52) Empty reply from server dig #127.0.0.1 -p 8600 consul.service.consul ; <<>> DiG 9.8.3-P1 <<>> #127.0.0.1 -p 8600 consul.service.consul ; (1 server found) ;; global options: +cmd ;; connection timed out; no servers could be reached $ dig #127.0.0.1 -p 8600 messaging-service-kafka.service.consul ; <<>> DiG 9.8.3-P1 <<>> #127.0.0.1 -p 8600 messaging-service-kafka.service.consul ; (1 server found) ;; global options: +cmd ;; connection timed out; no servers could be reached I can't get my services to register via the HTTP API either; those shown above are registered using a config script when the container launches. Here's my docker-compose.yml: version: '2' services: consul1: image: "consul:latest" container_name: "net-sci_discovery-service_consul_1" hostname: "consul1" ports: - "8400:8400" - "8500:8500" - "8600:8600" volumes: - ./etc/consul.d:/etc/consul.d command: "agent -server -ui -bootstrap-expect 3 -config-dir=/etc/consul.d -bind=0.0.0.0" consul2: image: "consul:latest" container_name: "net-sci_discovery-service_consul_2" hostname: "consul2" command: "agent -server -join=consul1" links: - "consul1" consul3: image: "consul:latest" container_name: "net-sci_discovery-service_consul_3" hostname: "consul3" command: "agent -server -join=consul1" links: - "consul1" I'm relatively new to both docker and consul. I've had a look around the web and the above options are my understanding of what is required. Any suggestions on the way forward would be very welcome. Edit: Result of docker container ps -all: $ docker container ps --all CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES e0a1c3bba165 consul:latest "docker-entrypoint..." 38 seconds ago Up 36 seconds 8300-8302/tcp, 8500/tcp, 8301-8302/udp, 8600/tcp, 8600/udp net-sci_discovery-service_consul_3 7f05555e81e0 consul:latest "docker-entrypoint..." 38 seconds ago Up 36 seconds 8300-8302/tcp, 8500/tcp, 8301-8302/udp, 8600/tcp, 8600/udp net-sci_discovery-service_consul_2 9e2dedaa224b consul:latest "docker-entrypoint..." 39 seconds ago Up 38 seconds 0.0.0.0:8400->8400/tcp, 8301-8302/udp, 0.0.0.0:8500->8500/tcp, 8300-8302/tcp, 8600/udp, 0.0.0.0:8600->8600/tcp net-sci_discovery-service_consul_1 27b34c5dacb7 messagingservice_kafka "start-kafka.sh" 3 hours ago Up 3 hours 0.0.0.0:9092->9092/tcp net-sci_messaging-service_kafka 0389797b0b8f wurstmeister/zookeeper "/bin/sh -c '/usr/..." 3 hours ago Up 3 hours 22/tcp, 2888/tcp, 3888/tcp, 0.0.0.0:2181->2181/tcp net-sci_messaging-service_zookeeper Edit: Updated docker-compose.yml to include long format for ports: version: '3.2' services: consul1: image: "consul:latest" container_name: "net-sci_discovery-service_consul_1" hostname: "consul1" ports: - target: 8400 published: 8400 mode: host - target: 8500 published: 8500 mode: host - target: 8600 published: 8600 mode: host volumes: - ./etc/consul.d:/etc/consul.d command: "agent -server -ui -bootstrap-expect 3 -config-dir=/etc/consul.d -bind=0.0.0.0 -client=127.0.0.1" consul2: image: "consul:latest" container_name: "net-sci_discovery-service_consul_2" hostname: "consul2" command: "agent -server -join=consul1" links: - "consul1" consul3: image: "consul:latest" container_name: "net-sci_discovery-service_consul_3" hostname: "consul3" command: "agent -server -join=consul1" links: - "consul1"
From the Consul Web Gui page, make sure you have launched an agent with the -ui parameter. The UI is available at the /ui path on the same port as the HTTP API. By default this is http://localhost:8500/ui I do see 8500 mapped to your host on broadcast (0.0.0.0). Check also (as in this answer) if the client_addr can help (at least for testing)
Consul docker - advertise flag ignored
Hi i have configured a cluster with two nodes (two vm into virtualbox), cluster start correctly but advertise flag seems to be ignored by consul vm1 (app) ip 192.168.20.10 vm2 (web) ip 192.168.20.11 docker-compose vm1 (app) version: '2' services: appconsul: build: consul/ ports: - 192.168.20.10:8300:8300 - 192.168.20.10:8301:8301 - 192.168.20.10:8301:8301/udp - 192.168.20.10:8302:8302 - 192.168.20.10:8302:8302/udp - 192.168.20.10:8400:8400 - 192.168.20.10:8500:8500 - 172.32.0.1:53:53/udp hostname: node_1 command: -server -advertise 192.168.20.10 -bootstrap-expect 2 -ui-dir /ui networks: net-app: appregistrator: build: registrator/ hostname: app command: consul://192.168.20.10:8500 volumes: - /var/run/docker.sock:/tmp/docker.sock depends_on: - appconsul networks: net-app: networks: net-app: driver: bridge ipam: config: - subnet: 172.32.0.0/24 docker-compose vm2 (web) version: '2' services: webconsul: build: consul/ ports: - 192.168.20.11:8300:8300 - 192.168.20.11:8301:8301 - 192.168.20.11:8301:8301/udp - 192.168.20.11:8302:8302 - 192.168.20.11:8302:8302/udp - 192.168.20.11:8400:8400 - 192.168.20.11:8500:8500 - 172.33.0.1:53:53/udp hostname: node_2 command: -server -advertise 192.168.20.11 -join 192.168.20.10 networks: net-web: webregistrator: build: registrator/ hostname: web command: consul://192.168.20.11:8500 volumes: - /var/run/docker.sock:/tmp/docker.sock depends_on: - webconsul networks: net-web: networks: net-web: driver: bridge ipam: config: - subnet: 172.33.0.0/24 After start i not have error about advertise flag but the services has registered with private ip of internal network instead with IP declared in advertise (192.168.20.10 and 192.168.20.11), any idea? Attach log of node_1, but they are the same as node_2 appconsul_1 | ==> WARNING: Expect Mode enabled, expecting 2 servers appconsul_1 | ==> WARNING: It is highly recommended to set GOMAXPROCS higher than 1 appconsul_1 | ==> Starting raft data migration... appconsul_1 | ==> Starting Consul agent... appconsul_1 | ==> Starting Consul agent RPC... appconsul_1 | ==> Consul agent running! appconsul_1 | Node name: 'node_1' appconsul_1 | Datacenter: 'dc1' appconsul_1 | Server: true (bootstrap: false) appconsul_1 | Client Addr: 0.0.0.0 (HTTP: 8500, HTTPS: -1, DNS: 53, RPC: 8400) appconsul_1 | Cluster Addr: 192.168.20.10 (LAN: 8301, WAN: 8302) appconsul_1 | Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false appconsul_1 | Atlas: <disabled> appconsul_1 | appconsul_1 | ==> Log data will now stream in as it occurs: appconsul_1 | appconsul_1 | 2017/06/13 14:57:24 [INFO] raft: Node at 192.168.20.10:8300 [Follower] entering Follower state appconsul_1 | 2017/06/13 14:57:24 [INFO] serf: EventMemberJoin: node_1 192.168.20.10 appconsul_1 | 2017/06/13 14:57:24 [INFO] serf: EventMemberJoin: node_1.dc1 192.168.20.10 appconsul_1 | 2017/06/13 14:57:24 [INFO] consul: adding server node_1 (Addr: 192.168.20.10:8300) (DC: dc1) appconsul_1 | 2017/06/13 14:57:24 [INFO] consul: adding server node_1.dc1 (Addr: 192.168.20.10:8300) (DC: dc1) appconsul_1 | 2017/06/13 14:57:25 [ERR] agent: failed to sync remote state: No cluster leader appconsul_1 | 2017/06/13 14:57:25 [ERR] agent: failed to sync changes: No cluster leader appconsul_1 | 2017/06/13 14:57:26 [WARN] raft: EnableSingleNode disabled, and no known peers. Aborting election. appconsul_1 | 2017/06/13 14:57:48 [ERR] agent: failed to sync remote state: No cluster leader appconsul_1 | 2017/06/13 14:58:13 [ERR] agent: failed to sync remote state: No cluster leader appconsul_1 | 2017/06/13 14:58:22 [INFO] serf: EventMemberJoin: node_2 192.168.20.11 appconsul_1 | 2017/06/13 14:58:22 [INFO] consul: adding server node_2 (Addr: 192.168.20.11:8300) (DC: dc1) appconsul_1 | 2017/06/13 14:58:22 [INFO] consul: Attempting bootstrap with nodes: [192.168.20.10:8300 192.168.20.11:8300] appconsul_1 | 2017/06/13 14:58:23 [WARN] raft: Heartbeat timeout reached, starting election appconsul_1 | 2017/06/13 14:58:23 [INFO] raft: Node at 192.168.20.10:8300 [Candidate] entering Candidate state appconsul_1 | 2017/06/13 14:58:23 [WARN] raft: Remote peer 192.168.20.11:8300 does not have local node 192.168.20.10:8300 as a peer appconsul_1 | 2017/06/13 14:58:23 [INFO] raft: Election won. Tally: 2 appconsul_1 | 2017/06/13 14:58:23 [INFO] raft: Node at 192.168.20.10:8300 [Leader] entering Leader state appconsul_1 | 2017/06/13 14:58:23 [INFO] consul: cluster leadership acquired appconsul_1 | 2017/06/13 14:58:23 [INFO] consul: New leader elected: node_1 appconsul_1 | 2017/06/13 14:58:23 [INFO] raft: pipelining replication to peer 192.168.20.11:8300 appconsul_1 | 2017/06/13 14:58:23 [INFO] consul: member 'node_1' joined, marking health alive appconsul_1 | 2017/06/13 14:58:23 [INFO] consul: member 'node_2' joined, marking health alive appconsul_1 | 2017/06/13 14:58:26 [INFO] agent: Synced service 'app:dockerdata_solr_1:8983' appconsul_1 | 2017/06/13 14:58:26 [INFO] agent: Synced service 'app:dockerdata_appconsul_1:8302' appconsul_1 | 2017/06/13 14:58:26 [INFO] agent: Synced service 'app:dockerdata_appconsul_1:8302:udp' appconsul_1 | 2017/06/13 14:58:26 [INFO] agent: Synced service 'app:dockerdata_appconsul_1:8301' appconsul_1 | 2017/06/13 14:58:26 [INFO] agent: Synced service 'app:dockerdata_appconsul_1:8500' appconsul_1 | 2017/06/13 14:58:26 [INFO] agent: Synced service 'app:dockerdata_appconsul_1:8300' appconsul_1 | 2017/06/13 14:58:26 [INFO] agent: Synced service 'consul' appconsul_1 | 2017/06/13 14:58:26 [INFO] agent: Synced service 'app:dockerdata_mysql_1:3306' appconsul_1 | 2017/06/13 14:58:26 [INFO] agent: Synced service 'app:dockerdata_appconsul_1:8400' appconsul_1 | 2017/06/13 14:58:26 [INFO] agent: Synced service 'app:dockerdata_appconsul_1:53:udp' appconsul_1 | 2017/06/13 14:58:26 [INFO] agent: Synced service 'app:dockerdata_appconsul_1:8301:udp' Thanks for any reply UPDATE: I have tried to remove networks section from compose file but have same problem, i resolved using compose v1, this configuration works: compose vm1 (app) appconsul: build: consul/ ports: - 192.168.20.10:8300:8300 - 192.168.20.10:8301:8301 - 192.168.20.10:8301:8301/udp - 192.168.20.10:8302:8302 - 192.168.20.10:8302:8302/udp - 192.168.20.10:8400:8400 - 192.168.20.10:8500:8500 - 172.32.0.1:53:53/udp hostname: node_1 command: -server -advertise 192.168.20.10 -bootstrap-expect 2 -ui-dir /ui appregistrator: build: registrator/ hostname: app command: consul://192.168.20.10:8500 volumes: - /var/run/docker.sock:/tmp/docker.sock links: - appconsul compose vm2 (web) webconsul: build: consul/ ports: - 192.168.20.11:8300:8300 - 192.168.20.11:8301:8301 - 192.168.20.11:8301:8301/udp - 192.168.20.11:8302:8302 - 192.168.20.11:8302:8302/udp - 192.168.20.11:8400:8400 - 192.168.20.11:8500:8500 - 172.33.0.1:53:53/udp hostname: node_2 command: -server -advertise 192.168.20.11 -join 192.168.20.10 webregistrator: build: registrator/ hostname: web command: consul://192.168.20.11:8500 volumes: - /var/run/docker.sock:/tmp/docker.sock links: - webconsul
The problem is version of compose file, v2 and v3 have same problem, work only with compose file v1