Question
With the Ansible docker_compose module is it possible to perform a docker-compose pull and/or docker-compose build without actually starting the service?
What have I tried?
I've attempted:
- name: Build & pull services
become: yes
docker_compose:
project_src: "{{ installation_path }}"
build: yes
state: present
stopped: yes
but this seems to start the services as well (even though I have stopped: yes).
Use case
The actual situation is that starting the services causes ports conflicts with existing processes. So the idea is to:
Stop the conflicting processes
Start the docker services
The problem is that one of these processes is the one that resolves DNS queries so stopping the processes and starting the docker services leads to an attempt to fetch the docker images from the docker registry, failing with a DNS resolution error.
My idea was to:
Pull every necessary image
Stop the conflicting processes
Start the docker services
According to this Github issue this is not possible and will likely remain so in the near future given that the docker_* is not actively maintained.
Related
I have a docker-compose file with 4 services. Services 1,2,3 are job executors. Service 4 is the job scheduler. After the scheduler has finished running all its jobs on executors, it returns 0 and terminates. However the executor services still need to be shut down. With standard docker-compose this is easy. Just use the "--exit-code-from" option:
Terminate docker compose when test container finishes
However when a version 3.0+ compose file is deployed via Docker Stack, I see no equivalent way to wait for 1 service to complete and then terminate all remaining services. https://docs.docker.com/engine/reference/commandline/stack/
A few possible approaches are discussed here -
https://github.com/moby/moby/issues/30942
The solution from miltoncs seems reasonable at first:
https://github.com/moby/moby/issues/30942#issuecomment-540699206
The concept suggested is querying every second with docker stack ps to get service status. Then removing all services with docker stack rm when done. I'm not sure how all the constant stack ps traffic would scale with thousands of jobs running in a cluster. Potentially bogging down the ingress network?
Does anyone have experience / success with this or similar solutions?
Note: I've tried searching for existing answers in any way I could think of, but I don't believe there's any information out there on how to achieve what I'm after
Context
I have an existing swarm running a bunch of networked services across multiple hosts. The deployment is done via docker-compose build && docker stack deploy. Several of the services contain important state necessary for the functioning of the main service this stack is for, including when interacting with it via CLI.
Goal
How can I create an ad-hoc container within the existing stack running on my swarm for interactive diagnostics and troubleshooting of my main service? The service has a CLI interface, but it needs access to the other components for that CLI to function, thus it needs to be run exactly as if it were a service declared inside docker-compose.yml. Requirements:
I need to run it in an ad-hoc fashion. This is for troubleshooting by an operator, so I don't know when exactly I'll need it
It needs to be interactive, since it's troubleshooting by a human
It needs to be able to run an arbitrary image (usually the image built for the main service and its CLI, but sometimes other diagnostics might be needed through other containers I won't know ahead of time)
It needs to have full access to the network and other resources set up for the stack, as if it were a regular predefined service in it
So far the best I've been able to do is:
Find an existing container running my service's image
SSH into the swarm host on which it's running
docker exec -ti into it to invoke the CLI
This however has a number of downsides:
I don't want to be messing with an already running container, it has an important job I don't want to accidentally interrupt, plus its state might be unrelated to what I need to do and I don't want to corrupt it
It relies on the service image also having the CLI installed. If I want to separate the two, I'm out of luck
It relies on some containers already running. If my service is entirely down and in a restart loop, I'm completely hosed because there's nowhere for me to exec in and run my CLI
I can only exec within the context of what I already have declared and running. If I need something I haven't thought to add beforehand, I'm sadly out of luck
Finding the specific host on which the container is running and going there manually is really annoying
What I really want is a version of docker run I could point to the stack and say "run in there", or docker stack run, but I haven't been able to find anything of the sort. What's the proper way of doing that?
Option 1
deploy a diagnostic service as part of the stack - a container with useful tools in it, with an entrypoint of tail -f /dev/null - use a placement contraint to deploy this to a known node.
services:
diagnostics:
image: nicolaka/netshoot
command: tail -f /dev/null
deploy:
placement:
constraints:
- node.hostname == host1
NB. You do NOT have to deploy this service with your normal stack. It can be in a separate stack.yml file. You can simply stack deploy this file to your stack later, and as long as --prune is not used, the services are cumulative.
Option 2
To allow regular containers to access your services - make your network attachable. If you havn't specified the network explicitly you can just explicitly declare the default network.
networks:
default:
driver: overlay
attachable: true
Now you can use docker run and attach to the network with a diagnostic container :-
docker -c manager run --rm --network <stack>_default -it nicolaka/netshoot
Option 3
The third option does not address the need to directly access the node running the service, and it does not address the need to have an instance of the service running, but it does allow you to investigate a service without effecting its state and without needing tooling in the container.
Start by executing the usual commands to discover the node and container name and id of the service task of interest:
docker service ps ${service} --no-trunc --format '{{.Node}} {{.Name}}.{{.ID}}' --filter desired-state=running
Then, assuming you have docker contexts to match your node names: - pick one ${node}, ${container} from the list of {{.Node}}, {{.Name}}.{{.ID}} and run a container such as ubuntu or netshoot, attaching it to the network namespace of the target container.
docker -c ${node} run --rm -it --network container:${container} nicolaka/netshoot
This container can be used to perform diagnostics in the context of the running service task, and then closed without affecting it.
I am using docker-compose to deploy an application combining a number of different images.
Using Docker version 18.09.2, build 6247962
Docker-compose 1.117
Primarily, I have
ZooKeeper
Kafka
MYSQLDb
I notice a strange problem where i could not start my application with docker-compose up due to port already being assigned. I then checked docker stats and saw that there were three containers named "test_ZooKeeper.1slehgaior"
"test_Kafka.kgjdorgsr"
"test_MYSQLDB.kgjdorgsr"
I have tried kill the containers, removing them and pruning the system. When ever I kill one of these containers, it instantly restarts and I cannot for the life of me determine where they are being created from!
Please help :)
If you look into your docker-compose.yaml I'm pretty sure you'll find a restart:always somewhere. If you want to correctly shut down a running docker container managed by docker-compose, one way is to use docker-compose down from the directory where your yaml sits.
More information on the subject:
https://docs.docker.com/config/containers/start-containers-automatically/
Otherwise, you might try out to stop a single running container instead of killing it, which according to my memory tells docker not to restart it again, while a killed container looks to the service like it just has crashed. Not too sure about the last part though.
I have a swarm stack deployed and I removed couple services from the stack and tried to deploy them again. these services are showing with desired state remove and current state preparing .. also their name got changed from the custom service name to a random docker name. swarm also trying to start these services which are also stuck in preparing. I ran docker system prune on all nodes and them removed the stack. all the services in the stack are not existent anymore except for the random ones. now I cant delete them and they still in preparing state. the services are not running anywhere in the swarm but I want to know if there is a way to remove them.
I had the same problem. Later I found that the current state, 'Preparing' indicates that docker is trying to pull images from docker hub. But there is no clear indicator in docker service logs <serviceName> available in the docker-compose-version above '3.1'.
But it sometimes imposes the latency due to n\w bandwidth or other docker internal reasons.
Hope it helps! I will update the answer if I find more relevant information.
P.S. I identified that docker stack deploy -c <your-compose-file> <appGroupName> is not stuck when switching the command to docker-compose up. For me, it took 20+ minutes to download my image for some reasons.
So, it proves that there is no open issues with docker stack deploy,
Adding reference from Christian to club and complete this answer.
Use docker-machine ssh to connect to a particular machine:
docker-machine ssh <nameOfNode/Machine>
Your prompt will change. You are now inside another machine. Inside this other machine do this:
tail -f /var/log/docker.log
You'll see the "daemon" log for that machine. There you'll see if that particular daemon is doing the "pull" or what's is doing as part of the service preparation. In my case, I found something like this:
time="2016-09-05T19:04:07.881790998Z" level=debug msg="pull progress map[progress:[===========================================> ] 112.4 MB/130.2 MB status:Downloading
Which made me realise that it was just downloading some images from my docker account.
I've been deploying stacks to swarms with the start-first option for quite a while now.
So given the following api.yml file:
version: '3.4'
services:
api:
image: registry.gitlab.com/myproj/api:${VERSION}
deploy:
update_config:
order: start-first
I would run the following command against a swarm manager:
env VERSION=x.y.z docker stack deploy -f api.yml api
This worked fine - the old service kept serving requests until the new one was fully available. Only then would it be torn down and enter shutdown state.
Now recently, and I believe this started happening with docker v17.12.0-ce or v18.01.0-ce - or I didn't notice before - what happens instead is that the old service sometimes isn't correctly stopped.
When that happens it hangs around and keeps serving requests, resulting in us running a mix of old and new versions side by side indefinitely.
This happens both on swarms that have the service replicated but also on one that runs it with scale=1.
What's worse, I cannot even kill the old containers. Here's what I've tried:
docker service rm api_api
docker stack rm api && docker stack deploy -f api.yml api
docker rm -f <container id>
Nothing allows me to get rid of the 'zombie' container. In fact docker rm -f <container id> even locks up and simply sits there.
The only way I've found to get rid of them is to restart the node. Thanks to replication I can actually afford to do that without downtime but it's not great for various reasons, least of which is what may happen if another manager were to go down while I do that.
Has anyone else seen this behaviour? What might be the cause and how could I debug this?
Try to set max_replicas_per_node (1 if only needed one replica per node) in placement section
Refer to https://docs.docker.com/compose/compose-file/compose-file-v3/