How to prevent removal of unused docker container images in DCOS cluster? - docker

We are running our application on a DCOS cluster in Azure Container Service. Docker image of our marathon app is approx 7GB. I know this is against best practices but lets keep that debate aside for this question. We pull latest on worker nodes and it takes around 20 minutes, if no running container currently uses this image on a node, it gets deleted from that node by some cleanup routine job.
Is there a way to prevent this from happening?

Amount of time to wait before Docker containers are removed can be set using this flag (this is agent option)
--docker_remove_delay flag
--docker_remove_delay=VALUE The amount of time to wait before removing docker containers (e.g., 3days, 2weeks, etc). (default: 6hrs)

To prevent accidental deletion (or modification) of a resource. You can create a lock that would prevent users from deleting or modifying the resource while the lock is there (even if they have the permissions to delete\modify the resource).
For more details, refer "Lock resources to prevent unexpected changes".

Related

Is there a way to set the "--rm" option for a docker container deployed in a GCP compute instance?

I'm admittedly very new to Docker so this might be a dumb question but here it goes.
I have a Python ETL script that I've packaged in a Docker container essentially following this tutorial, then using cloud functions and cloud scheduler, I have the instance turn start every hour, run the sync and then shut down the instance.
I've run into an issue though where after this process has been running for a while the VM runs out of hard drive space. The script doesn't require any storage or persistence of state - it pulls any state data from external systems and only uses temporary files which are supposed to be deleted when the machine shuts down.
This has caused particular problems where updates I make to the script stop working because the machine doesn't have the space to download the latest version of the container.
I'm guessing it's either logs or perhaps files created automatically to try to persist the state - either within the Docker container or on the VM.
I'm wondering whether if I could get the VM to run the instance with the "--rm" flag so that the image was removed when it was finished this could solve this problem. This would theoretically guarantee that I'm always starting with the most recent image.
The trouble is, I can't for the life of my find a way to configure the "rm" option within the instance settings and the documentation for container options only covers passing arguments to the container ENTRYPOINT and not the docker run options docker run [OPTIONS] IMAGE [COMMAND] [ARG...]
I feel like I'm either missing something obvious or it's not designed to be used this way. Is this something that can be configured in the Dockerfile or is there a different way I have to set up the VM in the first place?
Basically I just want the docker image to be pulled fresh and run each time and not leave any remnants on the VM that will slowly run out of space.
Also, I know Cloud Run might work in some similar situations but I need the script to be able to run for as long as it needs to (particularly at the start when it's backfilling data) and so the 15 minute cap on runtime would be a problem.
Any suggestions would be appreciated!
Note: I'm posting this as an answer as I need more space than a comment. If anyone feels it is not a good answer and wants it deleted, I will be delighted to do such.
Recapping the story, we have a Compute Engine configured to start a Docker Container. The Compute Engine runs the container and then we stop it. An hour later we restart it, let it run and then we stop it again. This continues on into the future. What we seem to find is that the disk associated with the Compute Engine fills up and we end up breaking. The thinking is that the container contained within the Compute Engine is created at first launch of the Compute Engine and then each time it is restarted, it is being "re-used" as opposed to a brand new container instance being created. This means that resources consumed by the container from one run to the next (eg disk storage) continues to grow.
What we would like to happen is that when the Compute Engine starts, it will always create a brand new instance of the container with no history / resource usage of the past. This means that we won't consume resources over time.
One way to achieve this outside of GCP would be to start the container through Docker with the "--rm" flag. This means that when the container ends, it will be auto-deleted and hence there will be no previous container to start the next time the Compute Engine starts. Again ... this is a recap.
If we dig through how GCP Compute Engines work as they relate to containers, we come across a package called "Konlet" (Konlet). This is the package responsible for loading the container in the Compute engine. This appears to be itself a Docker container application written in Go. It appears to read the metadata associated with the Compute Engine and based on that, performs API calls to Docker to launch the target container. The first thing to see from this is that the launch of the target Docker container does not appear to be executed through simple docker command line. This then implies that we can't "simply" edit a script.
Konlet is open source so in principle, we could study it in detail and see if there are special flags associated with it to achieve the equivalent of --rm. However, my immediate recommendation is to post an issue at the Konlet GitHub site and ask the author whether there is a --rm equivalent option for Konlet and, if not, could one be added (and if not, what is the higher level thinking).
In the meantime, let me offer you an alternative to your story. If I am hearing you correctly, every hour you fire a job to start a compute engine, do work and then shutdown the compute engine. This compute engine hosts your "leaky" docker container. What if instead of starting/stopping your compute engine you created/destroyed your compute engine? While the creation/destruction steps may take a little longer to run, given that you are running this once an hour, a minute or two delay might not be egregious.

Limit starting containers after complete shutdown

We have a bare metal Docker Swarm cluster, with a lot of containers.
And recently we have a full stop on the physical server.
The main problem, happened on Docker startup where all container tried to start on the same time.
I would like to know if there is a way to limit the amount of starting container?
Or if there is another way to avoid overloading the physical server.
At present, I'm not aware of an ability to limit how fast swarm mode will start containers. There is a todo entry to add an exponential backoff in the code and various open issues in swarmkit, e.g. 1201 that may eventually help with this scenario. Ideally, you would have an HA cluster with nodes spread in different AZ's, and when one node fails, the workload would migrate to another node and you do not end up with one overloaded node.
What you can use are resource constraints. You can configure each service with a minimum CPU and memory reservation. This would prevent swarm mode from scheduling more containers on a node than it could handle during a significant outage. The downside is that some services may go unscheduled during an outage and you cannot prioritize which are more important to schedule.

How to delay Docker Swarm updating a stateful container until it's ready?

Problem domain
Imagine that a stateful container is being managed by Swarm, e.g. a database, and another container is relying on it, e.g. a service that is executing a long-running job (minutes, sometimes hours) that does not tolerate the database (or even itself) to go down while it's executing.
To give an example, a database importing a multi GB dump.
There's also a CI/CD system in place which takes care of building new versions of the containers and deploying them to the Swarm, or pushing the image to Docker Hub which then calls a defined webhook which fires off the deployment event.
Question
Is there any way I can build my containers so that Swarm can know whether it's ok to update it or not? Similarly how HEALTHCHECK reports whether it needs to be restarted, something that would let Swarm know that 'It's safe to restart this container now'.
Or is it the CI/CD system's responsibility to check whether the stateful containers are safe to restart, and only then issue the update command to swarm?
Thanks in advance!
Docker will not check with a container if it is ready to be stopped, once you give docker the command to stop a container it will perform that action. However it performs the stop in two steps. The first step is a SIGTERM that your container can trap and gracefully handle. By default, after 10 seconds, a SIGKILL is sent that the Linux kernel immediately applies and cannot be trapped by the container. For your goals, you'll want to make sure your app knows when it's safe to exit after receiving the first signal, and you'll probably want to extend the time to much longer than 10 seconds between signals.
The healthcheck won't tell docker that your container is at a safe point to stop. It does tell swarm when your container has finished starting, or when it's misbehaving and needs to be stopped and replaced. The healthcheck defines a command to run inside your container, and the exit code is checked for whether it's 0 (healthy) or 1 (unhealthy). No other exit codes are currently valid.
If you need more than the simple signal handling inside the container, then yes, you're likely moving up the stack to a ci/cd tool to manage the deployment.

Launching jobs with large docker images in mesos via aurora can be slow

When launching a task over mesos via aurora that uses a rather large docker image (~2GB) there is a long wait time before the task actually starts.
Even when the task has been previously launched and we would expect the docker image to already be available to the worker node, there is still a waiting time dependent on image size before the task actually launches. Using docker, you can launch a container almost instantly as long as it is in your images list already, does the mesos containerizer not support this "caching" as well ? Is this functionality something that can be configured ?
I haven't tried using the docker containerizer, but it is my understanding that it will be phased out soon anyway and that gpu resource isolation, which we require, only works for the mesos containerizer.
I am assuming you are talking about the unified containerizer running docker images? What is backend you are using? By default the Mesos agents use the copy backend which is why you are seeing it being slow. You can look at the backend the agent is using by hitting flags endpoint on the agent. Switch the backend to aufs or overlayfs to see if speeds up the launch. You can specify the backend through the flag --image_provisioner_backend=VALUE on the agent.
NOTE: There are few bugs fixes related to aufs and overlayfs backend in the latest Mesos release 1.2.0-rc1 that you might want to pick up. Not to mention that there is an autobackend feature in 1.2.0-rc1 that will automatically select the fastest backend available.

how to schedule job with the monitoring of cpu,memory, disk io, etc.

My problem is I have a dedicate server, but the resources are still limited, i.e. IO, memory, CPU, etc.
I need to run a lot of jobs every day. Some jobs are io intensive some jobs are computation intensive. Is there a way to monitor the current status and decide when to start a new job from my job pool or not.
For example, when it knows the current running job are io intensive, it can lunch a job which do not relay on much of io. Or it can choose a running job which use a lot of disk io, stop it, re-schedule it later.
I come up with the solution with docker,since it can monitor the process, but I do not know such kind of scheduler build on top of docker.
Thanks
You can check the docker stats command in order to get basic metrics on what is running in the containers managed by a docker daemon.
You cannot exactly assign a job to a node depending on its dynamic behavior. That would mean to know in advance what type of resource a job will use. Which is not described in docker at all.
Docker provides a way to tag nodes which enable swarm filers, which would enable a cluster manager like swarm to select the right node based on criteria represented by a tag.
But Docker doesn't know about the "job" about to be launched.
Depending on the Docker version you're on, you have a number of options for prod. You can use the native Docker Swarm (just went GA in v1.9), you can give the more mature Kubernetes a try or HashiCorp's Nomad (early days) and there's of course Apache Mesos+Marathon. See also this comparison for more info on the topic.

Resources