Is it possible to remove a task in docker (docker swarm)? - docker

Suppose I had 3 replicated images:
docker service create --name redis-replica --replicas=3 redis:3.0.6
Consider that there are two nodes connected (including the manager), and running the command docker service ps redis-replica yields this:
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
x1uumhz9or71 redis-replica.1 redis:3.0.6 worker-vm Running Running 9 minutes ago
j4xk9inr2mms redis-replica.2 redis:3.0.6 manager-vm Running Running 8 minutes ago
ud4dxbxzjsx4 redis-replica.3 redis:3.0.6 worker-vm Running Running 9 minutes ago
As you can see all tasks are running.
I have a scenario I want to fix:
Suppose I want to remove a redis container on the worker-vm. Currently there are two, but I want to make it one.
I could do this by going into the worker-vm, removing the container by docker rm. This poses a problem however:
Once docker swarm sees that one of the tasks has gone down, it will immediately spit out another redis image on another node (manager or worker). As a result I will always have 3 tasks.
This is not what I want. Suppose I want to force docker to not relight another image if it is removed.
Is this currently possible?

In Swarm mode, it is the orchestrator who's scheduling the tasks for you. Task is the unit of scheduling and each task invokes exactly one container.
What this means in practice is, that you are not supposed to manage tasks manually. Swarm takes care of this for you.
You need to describe the desired state of your service, if you have placement preferences you can use the --placement-pref, in docker service commands. You can specify number of replicas, etc. E.g.
docker service create \
--replicas 9 \
--name redis_2 \
--placement-pref 'spread=node.labels.datacenter' \
redis:3.0.6
You can limit the set of nodes where the task will be placed using placement constraints (https://docs.docker.com/engine/reference/commandline/service_create/#specify-service-constraints---constraint. Here is an example from the Docker docs:
$ docker service create \
--name redis_2 \
--constraint 'node.labels.type == queue' \
redis:3.0.6
I think that's the closest solution to control tasks.
Once you described your placement constraints/preferences, Swarm will make sure that the actual state of your service is in line with the desired state that you described in the create command. You are not supposed to manage any further details after describing the desired state.
If you change the actual state by killing a container for example, Swarm will re-align the state of your service to be in-line with your desired state again. This is what happened when you removed your container.
In order to change the desired state you can use the docker service update command.
The key point is that tasks are not equal to containers. Technically they invoke exactly one container, but they are not equal. Task is like a scheduling slot where the scheduler places a container.
The Swarm scheduler manages tasks (not you), that's why there is no command like docker task. You drive the mechanism by describing the desired state.
To answer your original question, yes it is possible to remove a task, you can do it by updating the desired state of your service.

Related

How to spawn an interactive container in an existing docker swarm?

Note: I've tried searching for existing answers in any way I could think of, but I don't believe there's any information out there on how to achieve what I'm after
Context
I have an existing swarm running a bunch of networked services across multiple hosts. The deployment is done via docker-compose build && docker stack deploy. Several of the services contain important state necessary for the functioning of the main service this stack is for, including when interacting with it via CLI.
Goal
How can I create an ad-hoc container within the existing stack running on my swarm for interactive diagnostics and troubleshooting of my main service? The service has a CLI interface, but it needs access to the other components for that CLI to function, thus it needs to be run exactly as if it were a service declared inside docker-compose.yml. Requirements:
I need to run it in an ad-hoc fashion. This is for troubleshooting by an operator, so I don't know when exactly I'll need it
It needs to be interactive, since it's troubleshooting by a human
It needs to be able to run an arbitrary image (usually the image built for the main service and its CLI, but sometimes other diagnostics might be needed through other containers I won't know ahead of time)
It needs to have full access to the network and other resources set up for the stack, as if it were a regular predefined service in it
So far the best I've been able to do is:
Find an existing container running my service's image
SSH into the swarm host on which it's running
docker exec -ti into it to invoke the CLI
This however has a number of downsides:
I don't want to be messing with an already running container, it has an important job I don't want to accidentally interrupt, plus its state might be unrelated to what I need to do and I don't want to corrupt it
It relies on the service image also having the CLI installed. If I want to separate the two, I'm out of luck
It relies on some containers already running. If my service is entirely down and in a restart loop, I'm completely hosed because there's nowhere for me to exec in and run my CLI
I can only exec within the context of what I already have declared and running. If I need something I haven't thought to add beforehand, I'm sadly out of luck
Finding the specific host on which the container is running and going there manually is really annoying
What I really want is a version of docker run I could point to the stack and say "run in there", or docker stack run, but I haven't been able to find anything of the sort. What's the proper way of doing that?
Option 1
deploy a diagnostic service as part of the stack - a container with useful tools in it, with an entrypoint of tail -f /dev/null - use a placement contraint to deploy this to a known node.
services:
diagnostics:
image: nicolaka/netshoot
command: tail -f /dev/null
deploy:
placement:
constraints:
- node.hostname == host1
NB. You do NOT have to deploy this service with your normal stack. It can be in a separate stack.yml file. You can simply stack deploy this file to your stack later, and as long as --prune is not used, the services are cumulative.
Option 2
To allow regular containers to access your services - make your network attachable. If you havn't specified the network explicitly you can just explicitly declare the default network.
networks:
default:
driver: overlay
attachable: true
Now you can use docker run and attach to the network with a diagnostic container :-
docker -c manager run --rm --network <stack>_default -it nicolaka/netshoot
Option 3
The third option does not address the need to directly access the node running the service, and it does not address the need to have an instance of the service running, but it does allow you to investigate a service without effecting its state and without needing tooling in the container.
Start by executing the usual commands to discover the node and container name and id of the service task of interest:
docker service ps ${service} --no-trunc --format '{{.Node}} {{.Name}}.{{.ID}}' --filter desired-state=running
Then, assuming you have docker contexts to match your node names: - pick one ${node}, ${container} from the list of {{.Node}}, {{.Name}}.{{.ID}} and run a container such as ubuntu or netshoot, attaching it to the network namespace of the target container.
docker -c ${node} run --rm -it --network container:${container} nicolaka/netshoot
This container can be used to perform diagnostics in the context of the running service task, and then closed without affecting it.

docker container lifecycle confusion

I am new to Docker, and I find the definitions of containers' lifecycle differ a lot.
here is what "Manning.Docker.in.Action.2016.3" shows:
here is what google gives me:
https://medium.com/#nagarwal/lifecycle-of-docker-container-d2da9f85959
here is what the official document says:
status: One of created, restarting, running, removing, paused, exited, or dead
https://docs.docker.com/engine/reference/commandline/ps/
So what's going on here? I guess some new states(and renaming) are introduced in newer version of Docker?
Thanks in advance
Your linked diagram separates docker create from docker start, it includes "die" as a state transition, and it shows how to get to the "restarting" state. That's all valid, though it leads to a more complicated state machine.
(docker create wasn't in the very first versions of Docker but it appeared in Docker 1.3.0 in 2014, which should predate your diagram.)
Practically I might suggest an even simpler state machine:
-------> running -+------> stopped ------>
run | stop rm
\------> exited ------>
process exits rm
That is, never try to restart a container or make changes inside a running container; if you need to tweak anything, delete the existing container and create a new one. This gives you a consistent environment (when the main container process starts you always know what's in its filesystem, up to mounted data). It also matches what happens in cluster environments like Kubernetes, where the cluster manager will routinely create and delete containers for you.
When you get in a situation where internet gives you different answers, you should consider trying it yourself. Especially with technologies like docker, where it is pretty simple to make tests. For example:
I want to run a container (I will use nginx):
docker run -d nginx
docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
258cd2edbed8 nginx "nginx -g 'daemon of…" 3 seconds ago Up 2 seconds 80/tcp jolly_golick
Note: docker will keep a container running only if there is a process running in it.
If you would start a debian container (for example), you would see how it immediately stop, as there is nothing running in it. So you could do
docker run -d debian sleep 10
and see that the container is up for 10 seconds.
When a container is running, you can do some things on it. You can't do other things, like removing it. To remove a container, you need to stop it first (or kill it), or force container removal.
Note: You would get all this info from docker itself, if you would be playing around with it, as it would return these info. Like if you would try to remove a running container, you would get this error:
Error response from daemon: You cannot remove a running container 258cd2edbed85bed23ab543312968bd893c1fbd9ba81de40366337f434daedff. Stop the container before attempting removal or force remove
I can't do all possible combinations here. You would get a similar error if you would try removing a paused container. Just play with it, and you will get a clear picture of how it works.

Using docker swarm to execute singular containers rather than "services"

I really enjoy the concept of having a cluster of docker machines available to execute docker services. I also like the additional features not available to singular docker containers (such as docker secret).
But I really have no need for long-standing services. My use case is to simply execute a bash script to use the docker swarm to take in an arbitrary number of finite commands, and execute each as a running docker container on the same docker image, while using the secrets loaded up with docker swarm's secrets.
Can I do this?
I do not want to have this container be "long running". I want it to run, and then exit with the output when the bash script loaded into the container is finished.
You can apply the ideas presented in "One-shot containers on Docker Swarm" from alex ellis.
You still neeeds to create a service, but with the right restart policy.
For instance, for a quick web server:
docker service create --restart-condition=none --name crawler1 -e url=http://blog.alexellis.io -d crawl_site alexellis2/href-counter
(--restart-condition, not --restart-policy, as commented by ethergeist)
So by setting a restart condition of 0, the container will be scheduled somewhere in the swarm as a (task). The container will execute and then when ready - it will exit.
If the container fails to start for a valid reason then the restart policy will mean the application code never executes. It would also be ideal if we could immediately return the exit code (if non-zero) and the accompanying log output, too.
For the last part, use his tool: alexellis/jaas.
Run your first one-shot container:
# jaas -rm -image alexellis2/cows:latest
The -rm flag removes the Swarm service that was used to run your container.
The exit code from your container will also be available, you can check it with echo $?.

Docker: kill/stop/restart a container, parameters maintained?

I run a specific docker image for the first time:
docker run [OPTIONS] image [CMD]
Some of the options I supply include --link (link with other containers) and -p (expose ports)
I noticed that if I kill that container and simply do docker start <container-id>, Docker honors all the options that I specified during the run command including the links and ports.
Is this behavior explicitly documented and can I always count on the start command to reincarnate the container with all the options I supplied in the run command?
Also, I noticed that killing/starting a container which is linked to another container updates the upstream container's /etc/hosts file automatically:
A--(link)-->B (A has an entry in /etc/hosts for B)
If I kill B, B will normally get a new IP address. I notice that when i start B, the entry for B in A's /etc/hosts file is automatically updated... This is very nice.
I read here that --link does not handle container restarts... Has this been updated recently? If not, why am I seeing this behavior?
(Im using Docker version 1.7.1, build 786b29d)
Yes, things work as you describe :)
You can rely on the behaviour of docker start as it doesn't really "reincarnate" your container; it was always there on disk, just in a stopped state. It will also retain any changes to files, but changes in RAM, such as process state, will be lost. (Note that kill doesn't remove a container, it just stops it with a SIGKILL rather than a SIGTERM, use docker rm to truly remove a container).
Links are now updated when a container changes IP address due to a restart. This didn't use to be the case. However, that's not what the linked question is about - they are discussing whether you can replace a container with a new container of the same name and have links still work. This isn't possible, but that scenario will be covered by the new networking functionality and "service" objects which is currently in the Docker experimental channel.

Strategies for deciding when to use 'docker run' vs 'docker start' and using the latest version of a given image

I'm dockerizing some of our services. For our dev environment, I'd like to make things as easy as possible for our developers and so I'm writing some scripts to manage the dockerized components. I want developers to be able to start and stop these services just as if they were non-dockerized. I don't want them to have to worry about creating and running the container vs stopping and starting and already-created container. I was thinking that this could be handled using Fig. To create the container (if it doesn't already exist) and start the service, I'd use fig up --no-recreate. To stop the service, I'd use fig stop.
I'd also like to ensure that developers are running containers built using the latest images. In other words, something would check to see if there was a later version of the image in our Docker registry. If so, this image would be downloaded and run to create a new container from that image. At the moment it seems like I'd have to use docker commands to list the contents of the registry (docker search) and compare that to existing local containers (docker ps -a) with the addition of some greping and awking or use the Docker API to achieve the same thing.
Any persistent data will be written to mounted volumes so the data should survive the creation of a new container.
This seems like it might be a common pattern so I'm wondering whether anyone else has given these sorts of scenarios any thought.
This is what I've decided to do for now for our Neo4j Docker image:
I've written a shell script around docker run that accepts command-line arguments for the port, database persistence directory on the host, log file persistence directory on the host. It executes a docker run command that looks like:
docker run --rm -it -p ${port}:7474 -v ${graphdir}:/var/lib/neo4j/data/graph.db -v ${logdir}:/var/log/neo4j my/neo4j
By default port is 7474, graphdir is $PWD/graph.db and logdir is $PWD/log.
--rm removes the container on exit, however the database and logs are maintained on the host's file system. So no containers are left around.
-it allows the container and the Neo4j service running within it to receive signals so that the service can be gracefully shut down (the Neo4j server gracefully shuts down on SIGINT) and the container exited by hitting ^C or sending it a SIGINT if the developer puts this in the background. No need for separate start/stop commands.
Although I certainly wouldn't do this in production, I think this fine for a dev environment.
I am not familiar with fig but your scenario seems good.
Usually, I prefer to kill/delete + run my container instead of playing with start/stop though. That way, if there is a new image available, Docker will use it. This work only for stateless services. As you are using Volumes for persistent data, you could do something like this.
Regarding the image update, what about running docker pull <image> every N minutes and checking the "Status" that the command returns? If it is up to date, then do nothing, otherwise, kill/rerun the container.

Resources