Is it safe to export tarball of running docker container? - docker

I am playing a couple of days with docker.io. It is awesome!
Some simple containers already run in production servers.
At the moment I stuck in how to make container backup.
Assume I have running complicated docker container with supervisor+apache+memcached+mysql inside under hi load (10k requests per second).
Is it safe to make
docker export > backup.tar
Or I have to stop all processes inside container and only after that export container to tar file?

If by "safe" you mean "consistent", then the exported archive will be filesystem consistent.
The docker exportcommand, as any other classical backup method, won't solve the problem of being application consistent.
Apache and Memcached won't be an issue, since they don't need storage to maintain any kind of state.
But backuping Mysql this way will probably make your database restart in recovery mode, if you run a container from the image generated by docker export.
In particular, if backuped when having to perform write activity (insert, updates..),
as with any other filesystem-level backup, you will loose a few transactions.
If you need your Mysql backuped datafiles to be 100% consistent and to reflect the exact state of your data, you have to either:
Stop Mysql before running docker export
Stop the whole container
Have something connecting to Mysql before running the export command, and run flush tables with read lock;. When the export completes, you'd have to run unlock tables;The backuped data files (generally under /var/lib/mysql) will be consistent.
Use the classical Mysql backup tools (mysqldump...)

Related

Intro to Docker for FreeBSD Jail User - How and should I start the container with systemd?

We're currently migrating room server to the cloud for reliability, but our provider doesn't have the FreeBSD option. Although I'm prepared to pay and upload a custom system image for deployment, I nontheless want to learn how to start a application system instance using Docker.
in FreeBSD Jail, what I did was to extract an entire base.txz directory hierarchy as system content into /usr/jail/app, and pkg -r /usr/jail/app install apache24 php perl; then I configured /etc/jail.conf to start the /etc/rc script in the jail.
I followed the official FreeBSD Handbook, and this is generally what I've worked out so far.
But Docker is another world entirely.
To build a Docker image, there are two options: a) import from a tarball, b) use a Dockerfile. The latter of which lets you specify a "CMD", which is the default command to run, but
Q1. why isn't it available from a)?
Q2. where are information like "CMD ENV" stored? in the image? in the container?
Q3. How to start a GNU/Linux system in a container? Do I just run systemd and let it figure out the rest from configuration? Do I need to pass to it some special arguments or envvars?
You should think of a Docker container as a packaging around a single running daemon. The ideal Docker container runs one process and one process only. Systemd in particular is so heavyweight and invasive that it's actively difficult to run inside a Docker container; if you need multiple processes in a container then a lighter-weight init system like supervisord can work for you, but that's usually an exception more than a standard packaging.
Docker has an official tutorial on building and running custom images which is worth a read through; this is a pretty typical use case for Docker. In particular, best practice is to write a Dockerfile that describes how to build an image and check it into source control. Containers should avoid having persistent data if they can (storing everything in an external database is ideal); if you change an image, you need to delete and recreate any containers based on it. If local data is unavoidable then either Docker volumes or bind mounts will let you keep data "outside" the container.
While Docker has several other ways to create containers and images, none of them are as reproducible. You should avoid the import, export, and commit commands; and you should only use save and load if you can't use or set up a Docker registry and are forced to move images between systems via a tar file.
On your specific questions:
Q1. I suspect the best reason the non-docker build paths to create images don't easily let you specify things like CMD is just an implementation detail: if you look at the docker history of an image you'll see the CMD winds up being its own layer. Don't worry about it and use a Dockerfile.
Q2. The default CMD, any set ENV variables, and other related metadata are stored in the image alongside the filesystem tree. (Once you launch a container, it has a normal Unix process tree, with the initial process being pid 1.)
Q3. You don't "start a system in a container". Generally run one process or service in a container, and manage their lifecycles independently.

Install entire database (including binaries) inside A VOLUME in Docker

I need to containerize a JanusGraph database inside Docker, i don't know what files/directories needs to reside in volume to become persistent/writable. In order to make all the things simple and fast, can i install the entire database in a volume? Not only the data, but the entire app, all the binaries etc. I think this is a fast way to containerize some of my apps.
The janusGraph subdirectories of binaries, data, log resides inside a "janusgraph-hadoop" directory
For example: i will create a volume called /janusgraph-hadoop and run the command to install all the software inside that (it will be a volume).
This can be considered a bad practice or there are no problem in doing that?
I know, we have some JanusGraph already containerized, but they are not official, and my doubt is more general in order to containerize some apps in a more direct way without the need to research what directories need to be in volume and what not.
I will not redistribute any of this, it's just to my use.
At a technical level, nothing would stop you from launching a plain container with an attached volume and installing software there.
docker run -v my_opt:/opt -it --rm ubuntu sh
I wouldn't consider this an especially effective use of Docker. If your colleague wants to use your database installation, you have no way of giving it to them; if you leave the project for six months and come back to it, you'll have no record of how you built this setup. If you were set on this approach, you might find the networking and snapshot setups for more typical virtual machines to be better matched to it.

keep CDH container running

I am learning CDH and Docker and didn't have prior experiene in setting up both tools. After reading documentation i managed to run CDH docker in mac environment and also completed example given in quick start guid. But when next day when i started mac book again to learn something new but i didn't find my previous work which i found very strange and even couldn't see container running which seems fine to me.
What i really want to do is i don't want to loose my work even after stoping docker container. could you please guid me how do i configure docker so that i will not loose my work even after restarting docker again?
Every instance of a docker run will allocate a new filesystem, essentially starting from scratch.
If you actually want to "save" your work, then you need to volume mount (using -v docker flag) your local filesystem into the container for at least the following directories.
HDFS Data Directory
NameNode Data Directory
/home/cloudera
I think the hadoop data folders are somewhere under /var/lib/hadoop-*, by default
The better alternative for saving your workloads would be the CDH VM, where it actually has a persistent HDD associated with it.

Docker separation of concerns / services

I have a laravel project which I am using with docker. Currently I am using a single container to host all the services (apache, mySQL etc) as well as the needed dependencies (project files, git, composer etc) I need for my project.
From what I am reading the current best practice is to put each service into a separate container. So far this seems simple enough since these services are designed to run at length (apache server, mySQL server). When I spin up these 'service' containers using -d they remain running (docker ps) since their main process continuously runs.
However, when I remove all the services from my project container, then there is no main process left to continuously run. This means my container immediately exits once spun up.
I have read the 'hacks' of running other processes like tail -f /dev/null, sleep infinity, using interactive mode, installing supervisord (which I assume would end up watching no processes in such containers?) and even leaving the container to run in the foreground (taking up a terminal console...).
How do I network such a container to keep it running like the abstracted services but detached without these hacks? I cannot seem to find much information on this in the official docker docs nor can I find any examples of other projects (please link any)
EDIT: I am not talking about volumes / storage containers to store the data my project processes, but rather how I can use a container to store the project itself and its dependencies that aren't services (project files, git, composer)
when you run the container try running with the flags ...
docker run -dt ..... etc
you might even try .....
docker run -dti ..... etc
let me know if this brings any joy. has certainly worked for me on occassions.
i know you wanted to avoid hacks but if the above fails then also add ...
CMD cat
to the end of your Dockerfile - it is a hack but is the cleanest hack :)
So after reading this a few times along with Joachim Isaksson's comment, I finally get it. Tools don't need the containers to run continuously to use. Proper separation of the project files, services (mySQL, apache) and tools (git, composer) are done differently.
The project files are persisted within a data volume container. The services are networked since they expose ports. The tools live in their own containers which share the project files data volume - they are not networked. Logs, databases and other output can be persisted in different volumes.
When you wish to run one of these tools, you spin up the tool container by passing the relevant command using docker run. The tool then manipulates the data within the directory persisted within the shared volume. The containers only persist as long as the command to manipulate the data within the shared volume takes to run and then the container stops.
I don't know why this took me so long to grasp, but this is the aha moment for me.

What would be a good docker webdev workflow?

I have a hunch that docker could greatly improve my webdev workflow - but I haven't quite managed to wrap my head around how to approach a project adding docker to the stack.
The basic software stack would look like this:
Software
Docker image(s) providing custom LAMP stack
Apache with several modules
MYSQL
PHP
Some CMS, e.g. Silverstripe
GIT
Workflow
I could imagine the workflow to look somewhat like the following:
Development
Write a Dockerfile that defines a LAMP-container meeting the requirements stated above
REQ: The machine should start apache/mysql right after booting
Build the docker image
Copy the files required to run the CMS into e.g. ~/dev/cmsdir
Put ~/dev/cmsdir/ under version control
Run the docker container, and somehow mount ~/dev/cmsdir to /var/www/ on the container
Populate the database
Do work in /dev/cmsdir/
Commit & shut down docker container
Deployment
Set up remote host (e.g. with ansible)
Push container image to remote host
Fetch cmsdir-project via git
Run the docker container, pull in the database and mount cmsdir into /var/www
Now, this looks all quite nice on paper, BUT I am not quite sure whether this would be the right approach at all.
Questions:
While developing locally, how would I get the database to persist between reboots of the container instance? Or would I need to run sql-dump every time before spinning down the container?
Should I have separate container instances for the db and the apache server? Or would it be sufficient to have a single container for above use case?
If using separate containers for database and server, how could I automate spinning them up and down at the same time?
How would I actually mount /dev/cmsdir/ into the containers /var/www/-directory? Should I utilize data-volumes for this?
Did I miss any pitfalls? Anything that could be simplified?
If you need database persistance indepent of your CMS container, you can use one container for MySQL and one container for your CMS. In such case, you can have your MySQL container still running and your can redeploy your CMS as often as you want independently.
For development - the another option is to map mysql data directories from your host/development machine using data volumes. This way you can manage data files for mysql (in docker) using git (on host) and "reload" initial state anytime you want (before starting mysql container).
Yes, I think you should have a separate container for db.
I am using just basic script:
#!/bin/bash
$JOB1 = (docker run ... /usr/sbin/mysqld)
$JOB2 = (docker run ... /usr/sbin/apache2)
echo MySql=$JOB1, Apache=$JOB2
Yes, you can use data-volumes -v switch. I would use this for development. You can use read-only mounting, so no changes will be made to this directory if you want (your app should store data somewhere else anyway).
docker run -v=/home/user/dev/cmsdir:/var/www/cmsdir:ro image /usr/sbin/apache2
Anyway, for final deployment, I would build and image using dockerfile with ADD /home/user/dev/cmsdir /var/www/cmsdir
I don't know :-)
You want to use docker-compose. Follow the tutorial here. Very simple. Seems to tick all your boxes.
https://docs.docker.com/compose/
I understand this post is over a year old at this time, but I have recently asked myself very similar questions and have several great answers to your questions.
You can setup a MySQL docker instance and have data persist on a stateless data container, aka the data container does not need to be actively running
Yes I would recommend having a separate instance for your web server and database. This is the power of Docker.
Check out this repo I have been building. Basically it is as simple as make build & make run and you can have a web server and database container running locally.
You use the -v argument when running the container for the first time, this will link a specific folder on the container to the host running the container.
I think your ideas are great and it is currently possible to achieve all that you are asking.
Here is a turn key solution achieving all of the needs you have listed.
I've put together an easy to use docker compose setup that should match your development workflow requirements.
https://github.com/ehyland/docker-silverstripe-dev
Main Features
Persistent DB
Your choice of HHVM + NGINX or Apache2 + PHP5
Debug and set breakpoints with xDebug
The README.md should be clear enough to get you started.

Resources