docker stack deploy leaves old service around after update - docker

I've been deploying stacks to swarms with the start-first option for quite a while now.
So given the following api.yml file:
version: '3.4'
services:
api:
image: registry.gitlab.com/myproj/api:${VERSION}
deploy:
update_config:
order: start-first
I would run the following command against a swarm manager:
env VERSION=x.y.z docker stack deploy -f api.yml api
This worked fine - the old service kept serving requests until the new one was fully available. Only then would it be torn down and enter shutdown state.
Now recently, and I believe this started happening with docker v17.12.0-ce or v18.01.0-ce - or I didn't notice before - what happens instead is that the old service sometimes isn't correctly stopped.
When that happens it hangs around and keeps serving requests, resulting in us running a mix of old and new versions side by side indefinitely.
This happens both on swarms that have the service replicated but also on one that runs it with scale=1.
What's worse, I cannot even kill the old containers. Here's what I've tried:
docker service rm api_api
docker stack rm api && docker stack deploy -f api.yml api
docker rm -f <container id>
Nothing allows me to get rid of the 'zombie' container. In fact docker rm -f <container id> even locks up and simply sits there.
The only way I've found to get rid of them is to restart the node. Thanks to replication I can actually afford to do that without downtime but it's not great for various reasons, least of which is what may happen if another manager were to go down while I do that.
Has anyone else seen this behaviour? What might be the cause and how could I debug this?

Try to set max_replicas_per_node (1 if only needed one replica per node) in placement section
Refer to https://docs.docker.com/compose/compose-file/compose-file-v3/

Related

How to spawn an interactive container in an existing docker swarm?

Note: I've tried searching for existing answers in any way I could think of, but I don't believe there's any information out there on how to achieve what I'm after
Context
I have an existing swarm running a bunch of networked services across multiple hosts. The deployment is done via docker-compose build && docker stack deploy. Several of the services contain important state necessary for the functioning of the main service this stack is for, including when interacting with it via CLI.
Goal
How can I create an ad-hoc container within the existing stack running on my swarm for interactive diagnostics and troubleshooting of my main service? The service has a CLI interface, but it needs access to the other components for that CLI to function, thus it needs to be run exactly as if it were a service declared inside docker-compose.yml. Requirements:
I need to run it in an ad-hoc fashion. This is for troubleshooting by an operator, so I don't know when exactly I'll need it
It needs to be interactive, since it's troubleshooting by a human
It needs to be able to run an arbitrary image (usually the image built for the main service and its CLI, but sometimes other diagnostics might be needed through other containers I won't know ahead of time)
It needs to have full access to the network and other resources set up for the stack, as if it were a regular predefined service in it
So far the best I've been able to do is:
Find an existing container running my service's image
SSH into the swarm host on which it's running
docker exec -ti into it to invoke the CLI
This however has a number of downsides:
I don't want to be messing with an already running container, it has an important job I don't want to accidentally interrupt, plus its state might be unrelated to what I need to do and I don't want to corrupt it
It relies on the service image also having the CLI installed. If I want to separate the two, I'm out of luck
It relies on some containers already running. If my service is entirely down and in a restart loop, I'm completely hosed because there's nowhere for me to exec in and run my CLI
I can only exec within the context of what I already have declared and running. If I need something I haven't thought to add beforehand, I'm sadly out of luck
Finding the specific host on which the container is running and going there manually is really annoying
What I really want is a version of docker run I could point to the stack and say "run in there", or docker stack run, but I haven't been able to find anything of the sort. What's the proper way of doing that?
Option 1
deploy a diagnostic service as part of the stack - a container with useful tools in it, with an entrypoint of tail -f /dev/null - use a placement contraint to deploy this to a known node.
services:
diagnostics:
image: nicolaka/netshoot
command: tail -f /dev/null
deploy:
placement:
constraints:
- node.hostname == host1
NB. You do NOT have to deploy this service with your normal stack. It can be in a separate stack.yml file. You can simply stack deploy this file to your stack later, and as long as --prune is not used, the services are cumulative.
Option 2
To allow regular containers to access your services - make your network attachable. If you havn't specified the network explicitly you can just explicitly declare the default network.
networks:
default:
driver: overlay
attachable: true
Now you can use docker run and attach to the network with a diagnostic container :-
docker -c manager run --rm --network <stack>_default -it nicolaka/netshoot
Option 3
The third option does not address the need to directly access the node running the service, and it does not address the need to have an instance of the service running, but it does allow you to investigate a service without effecting its state and without needing tooling in the container.
Start by executing the usual commands to discover the node and container name and id of the service task of interest:
docker service ps ${service} --no-trunc --format '{{.Node}} {{.Name}}.{{.ID}}' --filter desired-state=running
Then, assuming you have docker contexts to match your node names: - pick one ${node}, ${container} from the list of {{.Node}}, {{.Name}}.{{.ID}} and run a container such as ubuntu or netshoot, attaching it to the network namespace of the target container.
docker -c ${node} run --rm -it --network container:${container} nicolaka/netshoot
This container can be used to perform diagnostics in the context of the running service task, and then closed without affecting it.

Docker Compose "Ghost Containers"

I am using docker-compose to deploy an application combining a number of different images.
Using Docker version 18.09.2, build 6247962
Docker-compose 1.117
Primarily, I have
ZooKeeper
Kafka
MYSQLDb
I notice a strange problem where i could not start my application with docker-compose up due to port already being assigned. I then checked docker stats and saw that there were three containers named "test_ZooKeeper.1slehgaior"
"test_Kafka.kgjdorgsr"
"test_MYSQLDB.kgjdorgsr"
I have tried kill the containers, removing them and pruning the system. When ever I kill one of these containers, it instantly restarts and I cannot for the life of me determine where they are being created from!
Please help :)
If you look into your docker-compose.yaml I'm pretty sure you'll find a restart:always somewhere. If you want to correctly shut down a running docker container managed by docker-compose, one way is to use docker-compose down from the directory where your yaml sits.
More information on the subject:
https://docs.docker.com/config/containers/start-containers-automatically/
Otherwise, you might try out to stop a single running container instead of killing it, which according to my memory tells docker not to restart it again, while a killed container looks to the service like it just has crashed. Not too sure about the last part though.

Docker error: Cannot start service ...: network 7808732465bd529e6f20e4071115218b2826f198f8cb10c3899de527c3b637e6 not found

When starting a docker container (not developed by me), docker says a network has not been found.
Does this mean the problem is within the container itself (so only the developer can fix it), or is it possible to change some network configuration to fix this?
I'm assuming you're using docker-compose and seeing this error. I'd recommend
docker-compose up --force-recreate <name>
That should recreate the containers as well as supporting services such as the network in question (it will likely create a new network).
shutdown properly first, then restart
docker-compose down
docker-compose up
I was facing this similar issue and this worked for me :
Try running this
- docker container ls -a and remove the container id by docker container rm ca877071ac10 (this is the container id ).
The problem was there were some old container instances which were not removed. Once all the old terminated instances get removed, you can start the container with docker-compose file
This can be caused by some old service that has not been killed, first add
--remove-orphans flag when bringing down your container to remove any undead services running, then bring the container back up
docker-compose down --remove-orphans
docker-compose up
This is based in this answer.
In my case the steps that produced the error where:
Server restart, containers from a docker-compose stack remained stopped.
Network prune ran, so the network associated with stack containers where deleted.
Running docker-compose --project-name "my-project" up -d failed with the error described in this topic.
Solved simply adding force-recreate, in this way:
docker-compose --project-name "my-project" up -d --force-recreate
This possibly works because with this containers are recreated linked with the also recreated network (previously pruned as described in the pre conditions).
Apparently VPN was causing this. Turning off VPN and resetting Docker to factory settings has solved the problem in two computers in our company. A third, personal computer that did not have VPN never showed the problem.
Amongst other things docker system prune will remove 'all networks not used by at least one container' allowing them to be recreated next docker-compose up
More precisely docker network prune can also be used.

How to rebuild and update a container without downtime with docker-compose?

I enjoy a lot using docker-compose.
Eg. on my server, when I want to update my app with minor changes, I only need to git pull origin master && docker-compose restart, works perfectly.
But sometimes, I need to rebuild (eg. I added an npm dependency, need to run npm install again).
In this case, I do docker-compose build --no-cache && docker-compose restart.
I would expect this to :
create a new instance of my container
stop the existing container (after the newer has finished building)
start the new one
optionally remove the old one, but this could be done manually
But in practice it seems to restart the former one again.
Is it the expected behavior?
How can I handle a rebuild and start the new one after it is built?
Maybe I missed a specific command? Or would it make sense to have it?
from the manual docker-compose restart
If you make changes to your docker-compose.yml configuration these
changes will not be reflected after running this command.
you should be able to do
$docker-compose up -d --no-deps --build <service_name>
The --no-deps will not start linked services.
The problem is that restart will restart your current containers, which is not what you want.
As an example, I just did this
change the docker file for one of the images
call docker-compose build to build the images
call docker-compose down1 and docker-compose up
docker-compose restart will NOT work here
using docker-compose start instead also does not work
To be honest, i'm not completly sure you need to do a down first, but that should be easy to check.1 The bottomline is that you need to call up. You will see the containers of unchanged images restarting, but for the changed image you'll see recreating.
The advantage of this over just calling up --build is that you can see the building-process first before you restart.
1: from the comments; down is not needed, you can just call up --build. Down has some "down"-sides, including possible being destructive to your (volume-)data.
Use the --build flag to the up command, along with the -d flag to run your containers in the background:
docker-compose up -d --build
This will rebuild all images defined in your compose file, then restart any containers whose images have changed.
-d assumes that you don't want to keep everything running in your shell foreground. This makes it act more like restart, but it's not required.
Don't manage your application environment directly. Use deployment tool like Rancher / Kubernetes. Using one you will be able to upgrade your dockerized application without any downtime and even downgrade it should you need to.
Running Rancher is as easy as running another docker container as this tool is available in the Docker Hub.
You can use Swarm. Init swarm first by docker swarm init command and use healthcheck in docker-compose.yml.
Then run below command:
docker stack deploy -c docker-compose.yml project_name
instead of
docker-compose up -d.
When docker-compose.yml file is updated only run this command again:
docker stack deploy -c docker-compose.yml project_name
Docker Swarm will create new version of services and stop old version after that.
Though the accepted answer shall work to rebuild the container before starting the new one as a replacement, it is ok for simple use case, but the container will still be down during new container initialization process. If this is quite long, it can be an issue.
I managed to achieve rolling updates with docker-compose (along with a nginx reverse proxy), and detailed how I built that in this github issue: https://github.com/docker/compose/issues/1786#issuecomment-579794865
Hope it can help!
Run the following commands:
docker-compose pull
docker-compose up -d --no-deps --build <service_name>
As the top rated answer mentioned
docker-compose up -d --no-deps --build <service_name>
will restart a single service without taking down the whole compose.
I just wanted to add to the top answer in case anyone is unsure how to update an image without restarting the container.
Another way:
docker-compose restart in your case could be replaced with docker-compose up -d --force-recreate, see https://docs.docker.com/compose/reference/up/
Running docker-compose up while docker-compose is in the running state, will recreate container that got their configuration changed.
Thats the easiest way, and it will only affect containers that got their configuration changed.
root#docker:~# docker-compose up
traefik is up-to-date
nginx is up-to-date
Recreating php ... done

My docker data-only container is empty

The scenario I most feared has happened: my data-only docker container is suddenly empty.
This is not serious: it's a development machine and I have back-up. But I fear this most because I know that I still have holes in my understanding of Docker.
I have read in this answer the following:
Docker containers will persist on disk until they are explicitly deleted with docker rm.
Here are the containers I'm interested in (from a docker ps command):
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
478e59ecd218 dockerlocal_mongo_instance "/entrypoint.sh mongo" About an hour ago Exited (137) 12 minutes ago dockerlocal_mongo_instance_1
0ca49f6629cb tianon/true "/true" 3 hours ago Exited (0) About an hour ago dockerlocal_mongo_data_1
I have a 1) a mongo container which references the data-only container, and 2) the data-only container itself. I recently ran docker rm a couple of times on the mongo dockerlocal_mongo_instance_1 container which references the data-only container.
I can see from the output of the docker ps command (see above) that it says that the data-only container was created '3 hours ago'. But I created it about 2 weeks ago. Somehow my original one has gone. My question is how could this happen? What other possibilities are there?
I have checked my bash command history and the docker rm command was run only on the mongo container, not on the data-only container - which for obvious reasons I have been extremely careful not to touch.
Can anyone shed any light on this? I must have misunderstood something fundamental here.
I would be grateful for any other possible scenarios that could cause the data-container to be trashed and re-created in this way.
Docker compose .yml file (relevant bits):
mongo_data:
image: tianon/true
volumes:
- /data/db
mongo_instance:
build: mongodb
volumes_from:
- mongo_data
ports:
- "27017:27017"
environment:
- MONGODB_USER=$S_USER_NAME
- MONGODB_PASS=$S_USER_PASSWORD
# command: --auth
There's a couple of things you need to understand.
A data container doesn't need to be, and shouldn't be, running. It's really just a namespace for a set of volumes that can the be referred to from other containers. In your case, every time you start up your application, the data container will start, run true, then shut down. It would be better if you just created the container once and never ran it again.
Docker Compose defines the running services that make up your application. It has a lot of logic related to deciding when to recreate containers or reuse existing ones, which at some stage has decided to recreate your data container (I'm not sure why in this case). You should only put stuff in Compose that does not need to be persisted. Also note that Compose will attempt to copy volumes from old containers to new ones, which can cause confusion if you're not expecting it.
In your case, the solution is to define the data container outside Compose e.g:
docker run --name mongo_data mongodb echo "Data Container"
This will run the echo command then immediately exit. You can then remove the mongo_data entry from the Compose yaml. Note that I have intentionally used the mongodb image rather than tianon/true; as a data container isn't left running, it won't take up any extra space and using the mongodb image ensures file permissions etc are correct.
If you ever ran docker-compose rm (or, worse docker-compose rm -f), that would have deleted all extant containers defined in your docker-compose.yml. Note that, even if you only meant to start the mongo_instance container with docker-compose up mongo_instance, the mongo_data container would have been created as well, as mongo_instance depends on mongo_data, and so doing docker-compose rm while mongo_instance was around would delete both containers.
The answer for me in the end is to forget using a docker data container and just set a normal volume on the mongo container.
mongo_instance:
build: mongodb
volume:
- /data/db:/data/db
ports:
- "27017:27017"
environment:
- MONGODB_USER=$S_USER_NAME
- MONGODB_PASS=$S_USER_PASSWORD
Why?
The main reason to use Docker Compose in my view is because of its portability. You easily run all your docker commands in one place, which means a simple installation. So if using a docker data container means moving the creation of the data container out of the .yml file I am complicating everything - I first have to create the data container and then run docker compose: I would much prefer to just upload my .yml file and run it.
I found problems anyway with the creation of the data container on its own before calling docker-compose. You are advised to reuse your own images when creating data containers rather than using small third-party images like tianon/true. But my images, which I might be able to use, only get created when I run docker-compose and I haven't run it yet - a chicken and egg situation.
I tried creating a data container using tianon/true (and others) and I always ran into permission problems.
So I have removed the data container and used a volume parameter on the mongo_instance. I hope this also solves my original problem ...

Resources