I have managed swarm using compose.yml as follows.
compose.yml
version: '3'
services:
web:
image: nginx
depends_on:
- db
- api
deploy:
replicas: 2
update_config:
parallelism: 2
delay: 10s
db:
image: mysql
environment:
MYSQL_ROOT_PASSWORD: password
MYSQL_DATABASE: main
MYSQL_USER: root
MYSQL_PASSWORD: password
deploy:
placement:
constraints: [node.role == manager]
api:
image: node
depends_on:
- db
deploy:
replicas: 2
update_config:
parallelism: 2
delay: 10s
restart_policy:
condition: on-failure
nginx.conf
upstream test-web {
server web:5000 fail_timeout=5s max_fails=5;
}
proxy_pass http://test-web;
The problem that I am having is that the starting order of docker containers will be random as follows.
Unexpected boot results
docker stack deploy --compose-file compose.yml blue --with-registry-auth
Creating network main_default
Creating service
Creating service main_web
Creating service main_db
Creating service main_api
Startup results as expected
docker stack deploy --compose-file compose.yml blue --with-registry-auth
Creating network main_default
Creating service
Creating service main_db
Creating service main_api
Creating service main_web
If the web container is launched earlier than the api container, nginx encounters an error that api host is not found because web container does not know that api container exists.
So I'm investigating ways to improve this problem from the following layers.
nginx
is there an option to retry with nginx option even if api domain could not be found?
docker-compose
is there a method to reliably fix the start order of containers other than links and depends_on?
supervisor
To start the process in docker I am running via supervisor now.
is there a way to retry and start nginx among supervisor option even if an api container is not found and an error occurs?
Thnak you for reading my question.
Related
I have a docker swarm cluster that hosts my rails app and sidekiq as separate containers.
The API application writes an incoming uploaded file into the public folder and sends the path to sidekiq worker to upload to s3. I used docker volume mapping for this.
Because of this dependency, I need a sidekiq container running in all the nodes where my API application is running.
Is there any way to tell swarm to deploy a sidekiq container, when it is deploying an API container in a new node?
Or, is there any workaround which can solve my problem without volume mapping dependency in the first place?
My docker-stack.yml
version: "3.9"
services:
app:
image: rails_app
command: bundle exec rails s -e production
ports:
- 8000:8000
volumes:
- app-assets:/app/public/assets
networks:
- my-network
deploy:
replicas: 6
placement:
constraints:
- "node.role==worker"
update_config:
parallelism: 2
delay: 10s
restart_policy:
condition: on-failure
delay: 5s
worker:
image: rails_app
command: bundle exec sidekiq -c 2 -e production
networks:
- my-network
volumes:
- app-assets:/app/public/assets
deploy:
replicas: 6
placement:
constraints:
- "node.role==worker"
restart_policy:
condition: on-failure
delay: 5s
networks:
my-network:
volumes:
app-assets:
Even after 3 days of googling, I was not able to find any such configurations with the Docker swarm. but I was able to solve this bottleneck by using NFS for volume mapping.
More info on the NFS : https://www.digitalocean.com/community/tutorials/how-to-set-up-an-nfs-mount-on-ubuntu-16-04
I am new to Docker and trying to build a Hadoop cluster with Docker Swarm. I tried to build it with docker compose and it worked perfectly. However, I would like to add other services like Hive, Spark, HBase to it in the future so a Swarm seems a better idea.
When I tried to run it with a version 3.7 yaml file, the namenode and datanodes started successfully. But when I visited the web UI, it showed that there is no nodes available at the "Datanodes" tab (neither at the "Overview" tab). It seems the datanodes failed to connect to the namenode. I had checked the port of each node with netstat -tuplen and both 7946 and 4789 worked fine.
Here is the yaml file I used:
version: "3.7"
services:
namenode:
image: flokkr/hadoop:latest
hostname: namenode
networks:
- hbase
command: ["hdfs","namenode"]
ports:
- target: 50070
published: 50070
- target: 9870
published: 9870
environment:
- NAMENODE_INIT=hdfs dfs -chmod 777 /
- ENSURE_NAMENODE_DIR=/tmp/hadoop-hadoop/dfs/name
env_file:
- ./compose-config
deploy:
mode: replicated
replicas: 1
restart_policy:
condition: on-failure
placement:
constraints:
- node.role == manager
datanode:
image: flokkr/hadoop:latest
networks:
- hbase
command: ["hdfs","datanode"]
env_file:
- ./compose-config
deploy:
mode: global
restart_policy:
condition: on-failure
volumes:
namenode:
datanode:
networks:
hbase:
name: hbase
Basically I just update the yaml file from this repo to version 3.7 and tried to run it on GCP. And here is my repo in case you want to replicate the case.
And this is the status of ports of the manager node:
the worker node:
Thank you for your help!
It seems to be a network related issue, the pods are up an running but they are not registering on your Web GUI maybe the network communication it's not reaching between them. Check your internal firewall rules and OS firewall, run some network test on the specific ports.
I have a two servers to use in a Docker cluster Swarm(test only), one is a Manager and other is a Worker, but running the command docker stack deploy --compose-file docker-compose.yml teste2 all the services is run in the manager and the worker not receive the containers to run, for some reason the Swarm is not achieving distributing the services in the cluster and running all in manager server.
Will my docker-compose.yml be causing the problem or might it be a network problem?
Here are some settings:
Servers CentOs 7, Docker version 18.09.4;
I executed the commands systemctl stop firewalld && systemctl disable firewalld to disable firewall;
I executed the command docker swarm join --token ... in the worker;
Result docker node ls:
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION
993dko0vu6vlxjc0pyecrjeh0 * name.server.manager Ready Active Leader 18.09.4
2fn36s94wjnu3nei75ymeuitr name.server.worker Ready Active 18.09.4
File docker-compose.yml:
version: "3"
services:
web:
image: testehello
deploy:
replicas: 5
update_config:
parallelism: 2
delay: 10s
restart_policy:
condition: on-failure
# placement:
# constraints: [node.role == worker]
ports:
- 4000:80
networks:
- webnet
visualizer:
image: dockersamples/visualizer:stable
ports:
- 8080:8080
stop_grace_period: 1m30s
volumes:
- "/var/run/docker.sock:/var/run/docker.sock"
deploy:
placement:
constraints: [node.role == manager]
networks:
webnet:
I executed the command docker stack deploy --compose-file docker-compose.yml teste2
In the docker-compose.yml I commented the parameters placement and constraints because they did not work and did not start the containers on the servers, without it the containers are started in the manager. Through the Visualizer all appear in the manager.
I think the images are not accessible from a worker node, that is why they not receive containers, try to use this guide by docker https://docs.docker.com/engine/swarm/stack-deploy/
P.S. I think you solved it already, but just in case.
I'm currently trying to deploy an application with docker swarm in 3 virtual machines, I'm doing it through docker-compose to create the image, my files are the following:
Dockerfile:
FROM openjdk:8-jdk-alpine
WORKDIR /home
ARG JAR_FILE
ARG PORT
VOLUME /tmp
COPY ${JAR_FILE} /home/app.jar
EXPOSE ${PORT}
ENTRYPOINT ["java","-Djava.security.egd=file:/dev/./urandom","-jar","/home/app.jar"]
and my docker-compose is:
version: '3'
services:
service_colpensiones:
build:
context: ./colpensiones-servicio
dockerfile: Dockerfile
args:
JAR_FILE: ColpensionesServicio.jar
PORT: 8082
volumes:
- data:/home
ports:
- 8082:8082
volumes:
data:
I'm using the command docker-compose up -d --build to build the image, I automatically create the container which is deleted later. To use docker swarm I use the 3 machines, one manager and two worker, I have another file to deploy the service with 3 replicas
version: '3'
services:
service_colpensiones:
image: deploy_lyra_colpensiones_service_colpensiones
deploy:
replicas: 5
resources:
limits:
cpus: "0.1"
memory: 50M
restart_policy:
condition: on-failure
volumes:
- data:/home
ports:
- 8082:8082
networks:
- webnet
visualizer:
image: dockersamples/visualizer:stable
ports:
- "8080:8080"
volumes:
- "/var/run/docker.sock:/var/run/docker.sock"
deploy:
placement:
constraints: [node.role == manager]
networks:
- webnet
networks:
webnet:
volumes:
data:
So far I think everything is fine because in the console with the command: docker service ls I see the services created, the viewer dockersamples / visualizer: stable, shows me the nodes correctly on port 8080, but when I want to make a request to the url of the services that is in the following way:
curl -4 http://192.168.99.100:8082/colpensiones/msg
the error appears:
curl: (7) Failed to connect to 192.168.99.100 port 8082: Refused connection.
The images from service are:
I am following the docker tutorial: Get Started https://docs.docker.com/get-started/part5/
I hope your help, thanks
I had the same issue but fixed after changing the port number of the spring boot service to
ports:
- "8082:8080"
The actual issue is: tomcat server by default listening on port 8080 not the port mentioned on the compose file. Also i increased the memory limit.
FYI: The internal port of the tasks/container running in the service can be same for other containers as well(:) so mentioning 8080(internal port) for both spring boot container and visualizer container is not a problem.
I also faced the same issue for my application. I rebuilt my app by removing from Dockerfile => -Djava.security.egd=file:/dev/./urandom java cmdline property, and it started working for me.
Please check "docker service logs #containerid#" (to see container ids run command "docker stack ps #servicename#") which served you request at that time, and see if you see any error message.
PS: I recently started on docker, so might not be an expert advice. Just in case if it helps.
I need to set service mode to global while using compose files .
Any chance we can use this in compose file ?
I have a requirement where for a service there should be exactly one container on every node/host .
This doesn't happen with "spread strategy" of swarm if a node goes down & comes up , it just attains the equal number of containers on each host irrespective of services .
https://github.com/docker/compose/issues/3743
We can do this easily now with docker compose v3 (version 3) under the deploy(mode) section.
Prerequisites -
docker compose version should be 1.10.0+
docker engine version should be 1.13.0+
Example compose file -
version: "3"
services:
nginx:
image: nexus3.example.com/prd-nginx-sm:v1
ports:
- "80:80"
networks:
- cheers
volumes:
- logs:/rest/out/
deploy:
mode: global
labels:
feature.description: "Frontend"
update_config:
parallelism: 1
delay: 10s
restart_policy:
condition: any
command: "/usr/sbin/nginx"
networks:
cheers:
volumes:
logs:
data:
Deploy the compose file -
$ docker stack deploy -c sm-deploy-compose.yml --with-registry-auth CHEERS
This will deploy nginx container on all the nodes participating in the cluster .