Question about safely starting and stopping docker container - docker

I'm learning to use RabbitMQ and I am hosting it in a Docker container on my laptop. I went this route instead of running it directly on my laptop because I also wanted to gain some experience with Docker. Is it best to start and stop the container on shutdown/startup or does Docker automatically start and stop the container for me? I'm having a hard time finding clear instructions on making sure the state of my container is maintained and ensuring the work I do in it isn't lost due to simple mistakes. Thanks in advance!!

Is it best to start and stop the container on shutdown/startup or does Docker automatically start and stop the container for me?
Docker will automatically start containers for you if you set an appropriate restart policy on the container. In general, this means using the unless-stopped policy, as in:
docker run --restart=unless-stopped ...
Your container will but "shut down" when your system shuts down the same way as any other process -- your service will probably first receive a SIGTERM, and may later receive a SIGKILL. This is sufficient for most applications.
I'm having a hard time finding clear instructions on making sure the state of my container is maintained and ensuring the work I do in it isn't lost due to simple mistakes.
Any data in your container will persist between a stop and a start as long as you don't remove the container. However, if you are generating data that you care about, your best choice is to mount a volume in the container and store the data there. This makes the life cycle of your data independent of the container, which means you can replace/upgrade your container while still preserving your data.
You can read more about volumes here.
Using volumes effectively requires you to know where your application stores data. This will often be documented in the README for the image, but not always.
Do you have any resources you recommend for learning how to use and develop in Docker?
The Docker documentation itself is a great place to start. Looking at the design of some of the official images can also be educational.

Related

Is there a way to set the "--rm" option for a docker container deployed in a GCP compute instance?

I'm admittedly very new to Docker so this might be a dumb question but here it goes.
I have a Python ETL script that I've packaged in a Docker container essentially following this tutorial, then using cloud functions and cloud scheduler, I have the instance turn start every hour, run the sync and then shut down the instance.
I've run into an issue though where after this process has been running for a while the VM runs out of hard drive space. The script doesn't require any storage or persistence of state - it pulls any state data from external systems and only uses temporary files which are supposed to be deleted when the machine shuts down.
This has caused particular problems where updates I make to the script stop working because the machine doesn't have the space to download the latest version of the container.
I'm guessing it's either logs or perhaps files created automatically to try to persist the state - either within the Docker container or on the VM.
I'm wondering whether if I could get the VM to run the instance with the "--rm" flag so that the image was removed when it was finished this could solve this problem. This would theoretically guarantee that I'm always starting with the most recent image.
The trouble is, I can't for the life of my find a way to configure the "rm" option within the instance settings and the documentation for container options only covers passing arguments to the container ENTRYPOINT and not the docker run options docker run [OPTIONS] IMAGE [COMMAND] [ARG...]
I feel like I'm either missing something obvious or it's not designed to be used this way. Is this something that can be configured in the Dockerfile or is there a different way I have to set up the VM in the first place?
Basically I just want the docker image to be pulled fresh and run each time and not leave any remnants on the VM that will slowly run out of space.
Also, I know Cloud Run might work in some similar situations but I need the script to be able to run for as long as it needs to (particularly at the start when it's backfilling data) and so the 15 minute cap on runtime would be a problem.
Any suggestions would be appreciated!
Note: I'm posting this as an answer as I need more space than a comment. If anyone feels it is not a good answer and wants it deleted, I will be delighted to do such.
Recapping the story, we have a Compute Engine configured to start a Docker Container. The Compute Engine runs the container and then we stop it. An hour later we restart it, let it run and then we stop it again. This continues on into the future. What we seem to find is that the disk associated with the Compute Engine fills up and we end up breaking. The thinking is that the container contained within the Compute Engine is created at first launch of the Compute Engine and then each time it is restarted, it is being "re-used" as opposed to a brand new container instance being created. This means that resources consumed by the container from one run to the next (eg disk storage) continues to grow.
What we would like to happen is that when the Compute Engine starts, it will always create a brand new instance of the container with no history / resource usage of the past. This means that we won't consume resources over time.
One way to achieve this outside of GCP would be to start the container through Docker with the "--rm" flag. This means that when the container ends, it will be auto-deleted and hence there will be no previous container to start the next time the Compute Engine starts. Again ... this is a recap.
If we dig through how GCP Compute Engines work as they relate to containers, we come across a package called "Konlet" (Konlet). This is the package responsible for loading the container in the Compute engine. This appears to be itself a Docker container application written in Go. It appears to read the metadata associated with the Compute Engine and based on that, performs API calls to Docker to launch the target container. The first thing to see from this is that the launch of the target Docker container does not appear to be executed through simple docker command line. This then implies that we can't "simply" edit a script.
Konlet is open source so in principle, we could study it in detail and see if there are special flags associated with it to achieve the equivalent of --rm. However, my immediate recommendation is to post an issue at the Konlet GitHub site and ask the author whether there is a --rm equivalent option for Konlet and, if not, could one be added (and if not, what is the higher level thinking).
In the meantime, let me offer you an alternative to your story. If I am hearing you correctly, every hour you fire a job to start a compute engine, do work and then shutdown the compute engine. This compute engine hosts your "leaky" docker container. What if instead of starting/stopping your compute engine you created/destroyed your compute engine? While the creation/destruction steps may take a little longer to run, given that you are running this once an hour, a minute or two delay might not be egregious.

Can I migrate processes between fat containers?

I recently began getting into Docker and containers. Up to now, I understand that the philosophy behind containers is to run one process per container so we end up with applications which can be run easily and consistently regardless of the environment. Also, that containers are intrinsically connected to it's image, so if you want to save the changes to a container you need to commit and create a new image.
But let's say I want to run multiple processes inside a single container, AKA a fat container. I know it can be done and things like "Supervisord" and "Baseimage-docker" can help manage processes within fat containers.
Now we get to my question: Is there a way to have a fat container running, save the run state of a single process and migrate said process to another container?
I've looked online but I haven't really found anyone that has said that this is possible. So I'm turning to you guys in case one of you have thought about this problem or maybe I've missed something along the way.
I am not so sure if the question might be opinion based. but here is what i think you might be able to do, lets say you have a web application like a Django application that uses Redis within the same container, which will be considered as a fat container and you need to migrate redis to be a standalone service within its own container then you have to do the following:
1- Prepare a docker image that has Redis installed you might go with your own image or use the official redis docker image.
2- Copy the configuration that is being used with redis from the fat container so you can mount it later inside the new redis container
3- Change the Django application settings and make it point to that new redis container
4- Remove redis service and its configuration from the fat container or maybe build a new image.
and thats it, now you should start the redis container and restart the django application container to take effect or start a new one if you modified the fat image.
There's the famous Quake demo and the ability to migrate the state of an entire container with CRIU. That's probably the closest I've seen to what you're talking about. More here: https://criu.org/Docker
As far as a "single" process inside a container, maybe just migrate the entire container and kill the processes you want moved?
I would say the more common pattern in the Docker community is single process containers that are freely killed/updated/etc.

Is it best practice to daemonize a process within docker?

Many best practice guides emphasize making your process a daemon and having something watch it to restart in case of failure. This made sense for a while. A specific example can be sidekiq.
bundle exec sidekiq -d
However, with Docker as I build I've found myself simply executing the command, if the process stops or exits abruptly the entire docker container poofs and a new one is automatically spun up - basically the entire point of daemonizing a process and having something watch it (All STDOUT is sent to CloudWatch / Elasticsearch for monitoring).
I feel like this also tends to re-enforce the idea of a single process in a docker container, which if you daemonize would tend to in my opinion encourage a violation of that general standard.
Is there any best practice documentation on this even if you're running only a single process within the container?
You don't daemonize a process inside a container.
The -d is usually seen in the docker run -d command, using a detached (not daemonized) mode, where the the docker container would run in the background completely detached from your current shell.
For running multiple processes in a container, the background one would be a supervisor.
See "Use of Supervisor in docker" (or the more recent docker --init).
Some relevent 12 Factor app recommendations
An app is executed in the execution environment as one or more processes
Concurrency is implemented by running additional processes (rather than threads)
Website:
https://12factor.net/
Docker was open sourced by a PAAS operator (dotCloud) so it's entirely possible the authors were influenced by this architectural recommendation. Would explain why Docker is designed to normally run a single process.
The thing to remember here is that a Docker container is not a virtual machine, although it's entirely possible to make it quack like one. In practice a docker container is a jailed process running on the host server. Container orchestration engines like Kubernetes (Mesos, Docker Swarm mode) have features that will ensure containers stay running, replacing them should the need arise.
Remember my mention of duck vocalization? :-) If you want your container to run multiple processes then it's possible to run a supervisor process that keeps everything healthy and running inside (A container dies when all processes stop)
https://docs.docker.com/engine/admin/using_supervisord/
The ultimate expression of this VM envy would be LXD from Ubuntu, here an entire set of VM services get bootstrapped within LXC containers
https://www.ubuntu.com/cloud/lxd
In conclusion is it a best practice? I think there is no clear answer. Personally I'd say no for two reasons:
I'm fixated on deploying 12 factor compliant applications, so married to the single process model
If I need to run two processes on the same set of data, then in Kubernetes I can run containers within the same POD... Means Kubernetes manages the processes (running as separate containers with a common data volume).
Clearly my reasons are implementation specific.
There are multiple run supervisors that can help you take a foreground process (or multiple ones) run them monitored and restart them on failure (or exit the container).
one is runit (http://smarden.org/runit/), which I have not used myself.
my choice is S6 (http://skarnet.org/software/s6/). someone already built a container envelope for it, named S6-overlay (https://github.com/just-containers/s6-overlay) which is what I usually use if/when I need to have a user-space process run as daemon. it also has facets to do prep work on container start, change permissions and more, in runtime.
tl;dr: I can't find a best practices document that relates directly to this for docker, but I agree with you.
The only best "Best Practices" for docker I could find was at dockers own site, which states that containers should be one process. In my mind, that means foregrounded processes as well. So basically, I've drawn the same conclusion as you. (You've probably read that too, but this is for anyone else reading this).
Honestly, I think we are still in (relatively) new territory with best practices for docker. Anecdotally, it has been a best practice in the organizations I've worked with. The number of times I've felt more satisfied with a foregrounded process has been significantly greater then the times I've said to myself "Boy, I sure wish I backgrounded that one." In fact, I don't think I've ever said that.
The only exception I can think of is when you are trying to evaluate software and need a quick and dirty way to ship infrastructure off to someone. EG: "Hey, there is this new thing called LAMP stacks I just heard of, here is a docker container that has all the components for you to play around with". Again, though, that's an outlier and I would shudder if something like that ever made it to production or even any sort of serious development environment.
Additionally, it certainly forces a micro-architecture style, which I think is ultimately a good thing.

Should I use separate Docker containers for my web app?

Do I need use separate Docker container for my complex web application or I can put all required services in one container?
Could anyone explain me why I should divide my app to many containers (for example php-fpm container, mysql container, mongo container) when I have ability to install and launch all stuff in one container?
Something to think about when working with Docker is how it works inside. Docker replaces your PID 1 with the command you specify in the CMD (and ENTRYPOINT, which is slightly more complex) directive in your Dockerfile. PID 1 is normally where your init system lives (sysvinit, runit, systemd, whatever). Your container lives and dies by whatever process is started there. When the process dies, your container dies. Stdout and stderr for that process in the container is what you are given on the host machine when you type docker logs myContainer. Incidentally, this is why you need to jump through hoops to start services and run cronjobs (things normally done by your init system). This is very important in understanding the motivation for doing things a certain way.
Now, you can do whatever you want. There are many opinions about the "right" way to do this, but you can throw all that away and do what you want. So you COULD figure out how to run all of those services in one container. But now that you know how docker replaces PID 1 with whatever command you specify in CMD (and ENTRYPOINT) in your Dockerfiles, you might think it prudent to try and keep your apps running each in their own containers, and let them work with each other via container linking. (Update -- 27 April 2017: Container linking has been deprecated in favor of regular ole container networking, which is much more robust, the idea being that you simply join your separate application containers to the same network so they can talk to one another).
If you want a little help deciding, I can tell you from my own experience that it ends up being much cleaner and easier to maintain when you separate your apps into individual containers and then link them together. Just now I am building a Wordpress installation from HHVM, and I am installing Nginx and HHVM/php-fpm with the Wordpress installation in one container, and the MariaDB stuff in another container. In the future, this will let me drop in a replacement Wordpress installation directly in front of my MariaDB data with almost no hassle. It is worth it to containerize per app. Good luck!
When you divide your web application to many containers, you don't need to restart all the services when you deploy your application. Like traditionally you don't restart your mysql server when you update your web layer.
Also if you want to scale your application, it is easier if your application is divided separate containers. Then you can just scale those parts of your application that are needed to solve your bottlenecks.
Some will tell you that you should run only 1 process per container. Others will say 1 application per container. Those advices are based on principles of microservices.
I don't believe microservices is the right solution for all cases, so I would not follow those advices blindly just for that reason. If it makes sense to have multiples processes in one container for your case, then do so. (See Supervisor and Phusion baseimage for that matter)
But there is also another reason to separate containers: In most cases, it is less work for you to do.
On the Docker Hub, there are plenty of ready to use Docker images. Just pull the ones you need.
What's remaining for you to do is then:
read the doc for those docker images (what environnement variable to set, etc)
create a docker-compose.yml file to ease operating those containers
It is probably better to have your webapp in a single container and your supporting services like databases etc. in a separate containers. By doing this if you need to do rolling updates or restarts you can keep your database online while your application nodes are doing individual restarts so you wont experience downtime. If you have caching with something like Redis etc this is also useful for the same reason. It will also allow you to more easily add nodes to scale in a loosely coupled fashion. It will also allow you to manage the containers in a manner more suitable to a specific purpose. For the type of application you are describing I see very few arguments for running all services on a single container.
It depends on the vision and road map you have for your application. Putting all components of an application in one tier in this case docker container is like putting all eggs in one basket.
Whenever your application would require security, performance related issues then separating those three components in their own containers would be an ideal solution. It's needless to mention that this division of labor across containers would come at some cost and which would be related to wiring up those containers together for communication and security etc.

Is it "safe" to commit a running container in docker?

As the title goes, safe means... the proper way?
Safe = consistent, no data loss, professional, legit way.
Hope to share some experiences with pro docker users.
Q. Commit is safe for running docker containers (with the exception of rapidly changing realtime stuff and database stuff, your own commentary is appreciated.)
Yes or No answer is accepted with comment. Thanks.
All memory and harddisk storage is saved inside the container instance. You should, as long as you don't use any external mounts/docker volumes and servers (externally connected DBs?) never get in trouble for stopping/restarting and comitting dockers. Please read on to go more in depth on this topic.
A question that you might want to ask yourself initially, is how does docker store changes that it makes to its disk on runtime? What is really sweet to check out, is how docker actually manages to get this working. The original state of the container's hard disk is what is given to it from the image. It can NOT write to this image. Instead of writing to the image, a diff is made of what is changed in the containers internal state in comparison to what is in the docker image.
Docker uses a technology called "Union Filesystem", which creates a diff layer on top of the initial state of the docker image.
This "diff" (referenced as the writable container in the image below) is stored in memory and disappears when you delete your container. When you use docker commit, the writable container that is retained in the temporary "state" of the container is stored inside a new image, however: I don't recommend this. The state of your new docker image is not represented in a dockerfile and can not easily be regenerated from a rebuild. Making a new dockerfile should not be hard. So that is alway the way-to-go for me personally.
When your docker is working with mounted volumes, external servers/DBs, you might want to make sure you don't get out of sync and temporary stop your services inside the docker container. When you would use a dockerfile you can start up a bootstrap shell script inside your container to start up connections, perform checks and initialize the running process to get your application durably set up. Again, running a committed container makes it harder to do something like this.

Resources