How does Docker AUFS persist data? - docker

I have some doubts about how docker AUFS persists data. As i understand, data inside VOLUME can be persistent and others location can not. Here made an example, pull a mysql container whose dockerfile there and do below steps:
make a test directory under /opt directory which shouldn't be persistent in docker container
login mysql and create a database, so /var/lib/mysql directory added one additional file
stop such container and start again
commit such container and then remove the container
start the committed container and check whether database still exitst
Result
After step 3, all exists inside container including test directory
which i think shouldn't be persist in AUFS. So whether all i know about docker aufs is wrong?
After step 5 i can see database not exists in new container, but why? shouldn't file inside /var/lib/mysql be persist?
Here is my running docker command
docker build -td --name mysql_test mysqlImageId
Attention: i have not specified -v options

According to my understanding of docker, I would expect the following results from the process steps that you have described:
First up: data can be persisted in docker containers - in fact any operation on your container that creates, deletes or updates files, will trigger AUFS to write that file into the container layer. For deletes or updates this will hide entries for the same files in lower image layers. Of course any data written only lasts until you remove the container. If you start a new container based on the same image, this data will naturally not exist.
Also, the purpose of docker build is to build images from a Dockerfile. You can't use it start a container.
Regarding your observations
After step 3 I would expect all your data to still be there, since stopping and restarting a container leaves the container layer untouched. The stop merely kills the main process that keeps your container alive and the start rekindles the main process by executing the relevant entry point script.
After step 5 I still expect your data to be there, since you have committed the container to a new image. This image should now include the modified container layer. When you call docker commit, keep in mind that this is not updating the original image but creates a new one.

Related

How docker detects which changes should be saved and which not?

I know that when we stop docker our changes are lost. There are many answers how to prevent this - commit each time. Idea is that when docker runs it will spin up a fresh container based on the image. On the other hand container persists some data after it exists unless you start using --rm.
Just to simplify:
If you run apt-get install vim, you must commit to save the change
BUT If you change nginx.conf or upload new file to HDFS, you do not lose the data.
So, just curious:
How docker knows what to save and what not? Ex: At the end of apt-get-install we have new files in the system. The same is when I upload new file. for the container/image there is NO difference , Right? Just I/O modification. So how docker know which modification should be saved when we stop the image?
The basic rules here:
Anything you explicitly store outside the container — a database, S3 — will outlive the container.
If you attach a volume to the container when you create the container using a docker run -v option or a Docker Compose volumes: option, any data written to that directory outlives the container. (If it’s a named volume, it lasts until you docker volume rm it.)
Anything else in the container filesystem is lost as soon as you docker rm the container.
If you need things like your application source code or a helper tool installed in an image, write a Dockerfile to describe how to build the image and run docker build. Check the Dockerfile into source control alongside your application.
The general theory of working with Docker is that you always start from a clean slate. When you docker build an image, you start from a base image and install your application into it; you never try to upgrade an installed application. Similarly, when you docker run a container, you start from a fresh copy of its image.
So the clearest answer to the question you ask is really, if you consistently docker rm a container when you stop it, when you docker run a new container, it will have the base image plus the content from the mounted volumes. Docker will never automatically persist anything outside of this.
You should never run docker commit: this leads to magic images that can’t be recreated later (in six months when you discover a critical security issue that risks taking your site down). Similarly, you should never install software in a running container, because it will be lost as soon as the container exits; add it to your Dockerfile and rebuild.
For any Container working with the Docker platform by default all the data generated is temporary and all the file generation or data generation is temporary and no data will persist if you have not mounted the filesystem part of if you have not attached volumes to the container.
IF you are finding that the nginx.conf is getting reused even after changes i would suggest try to find what directories are you trying to mount or mapped to the docker volumes.
The configurations for nginx which reside at /etc/nginx/conf.d/* and you might be mapping the volume with this directory. So if you make any changes in a working container and then remove the container the data will still persist as the data gets written to the writable layer. If the new container which you deploy later with the same volume mapping you will find all the changes you had initially done in the previous case are reflected in the newer container as well.

When do I need Docker Volumes?

Trying to make sure I understand the proper usage of docker volumes. If I have a container running MongoDB that I plan to start and stop do I need a volume configured with I "docker run" the first time? My understanding is that if use Docker run once, then docker stop/start my data is saved inside the container. The volume is more useful if multiple containers want access to the data. Is that accurate or am I misunderstanding something?
Starting and stopping a container will not delete the container specific data. However, you upgrade containers by replacing them with new containers. Any changes to the container specific read/write layer will be lost when that happens, and the new container will go back to it's initial state. If there are files inside your container that you want to preserve when the container is replaced, then you need to store those files in a volume, and then mount that same volume in the new container.

Using docker to file-sandbox a program

I want to use docker to achieve the following:
Run a program with access to a certain directory and then after the run reset all the changes made to that directory. I therefore wanted to create a docker image that copies said directory into the docker volume so that the program has access to the files in the directory, but the directory has the same contents for each run.
However, this seems to add a lot of overhead - are there other (better) ways to achieve the described goal?
PS: I want to measure the behavior of the program, but not for security reasons, i.e. I trust the program.
You actually don't have to do the copying, that's exactly what Docker is for: keeping and discarding changes. Volumes are a way to preserve the files (and/or share them between containers), not for discarding them.
Once you build the image (using docker build and Dockerfile), the image acts as a starting point for containers. When you use docker start <image>, you create a container based the given image. The container runs a single program (called the entry point), and when it exits, any changes it made are either discarded (docker container rm) or you can start the program again from the last state (docker start). The same image is shared as read-only layer among all running containers so that each container only stores any differences the program makes. In the background, docker uses overlay filesystem to actieve this, so if you wanted, you could also do it yourself.
So the short answer is:
Create an image containing the untouched directory
Run the image with docker run --rm <image>.
(Alternatively, you can use just docker run <image> and delete container yourself when it exits with docker rm <container>).
Everythime you do docker run, the program will see the untouched directory as it is was stored in the image. When the program exits, any changes will be discarded. Even if you run two containers in parallel, their changes will be isolated.

Deploy web app in docker data container vs volume

I'm confused about common consensus that one shouldn't use data containers. I have specific use case that I want to accomplish.
I want to have docker nginx container and behind it some other container with application. To run newest version of my app I want to download ready container from my private docker registry. The application is for now purely static html, javascript something.
So my plan is to create docker image which will hold the files, and will specify a named volume in some /webapp folder. The nginx container will serve this volume. I do not see any other way how to move bunch of files to remote system the "docker containerized" way. Am I not actually creating cursed data container?
Anyway what happens during app containers exchange? When I stop the app container the volume remains accesible, as it is placed on host. When I pull and start new version of app container. The volume will be created again and prefiled with image files stored at the same location, replacing the content on host so the nginx container will server from now new version of the application.Right? What happens when I will reference volume that does not exist yet from the nginx container.
It seem that named values are not automatically filed with the content of the image. As well I'm not sure how to create named volume in docker file as this syntax taken from here doesn't work
FROM training/webapp
VOLUME webapp:/webapp
I think you might want what i have described here https://stackoverflow.com/a/41576040/3625317
The problem with volumes is, that when a container is recreated, not docker-compose down but rather docker-compose pull + up, the new container will not have your "new code stored in the volume" but rather, due to the recycled volume, still the old anon volume. The point is, you will need a anon-volume for the code anyway, since you want it redeployable, not a named volume since you want the code to be exchangeable.
On re-create the anon-volume is not removed, that said, lets say you have the image:v1 right now and you pull image:v2 and then do a docker-compose up. It will recreate your container based on image:v2 - when this finished, you will have a new container, but the code is still from the old container, which was based on image:v1, since the anon-volume has not been replaced, it was re-assigned. docker-compose down && docker-compose up will resolve that for you - but you have to keep this in mind when dealing with your idea. (down removes anon-volumes)
In general, there is a pro / con, see my other post.
Data-containers in general have a other meaning and have been replaced by so called named volumes. Data-containers have been used to establish a volume-mount which is "named" and not based on a anon-volume.
In the past, you had to create a container with a volume, and later use a container-name based mount of this volume ( the container would be the static / name part ), today, you just create a named volume name and mount by this volume-name, no need for a busybox killed after start based container-name based volume mount.

Deploy a docker app using volume create

I have a Python app using a SQLite database (it's a data collector that runs daily by cron). I want to deploy it, probably on AWS or Google Container Engine, using Docker. I see three main steps:
1. Containerize and test the app locally.
2. Deploy and run the app on AWS or GCE.
3. Backup the DB periodically and download back to a local archive.
Recent posts (on Docker, StackOverflow and elsewhere) say that since 1.9, Volumes are now the recommended way to handle persisted data, rather than the "data container" pattern. For future compatibility, I always like to use the preferred, idiomatic method, however Volumes seem to be much more of a challenge than data containers. Am I missing something??
Following the "data container" pattern, I can easily:
Build a base image with all the static program and config files.
From that image create a data container image and copy my DB and backup directory into it (simple COPY in the Dockerfile).
Push both images to Docker Hub.
Pull them down to AWS.
Run the data and base images, using "--volume-from" to refer to the data.
Using "docker volume create":
I'm unclear how to copy my DB into the volume.
I'm very unclear how to get that volume (containing the DB) up to AWS or GCE... you can't PUSH/PULL a volume.
Am I missing something regarding Volumes?
Is there a good overview of using Volumes to do what I want to do?
Is there a recommended, idiomatic way to backup and download data (either using the data container pattern or volumes) as per my step 3?
When you first use an empty named volume, it will receive a copy of the image's volume data where it's first used (unlike a host based volume that completely overlays the mount point with the host directory). So you can initialize the volume contents in your main image as a volume, upload that image to your registry and pull that image down to your target host, create a named volume on that host, point your image to that named volume (using docker-compose makes the last two steps easy, it's really 2 commands at most docker volume create <vol-name> and docker run -v <vol-name>:/mnt <image>), and it will be populated with your initial data.
Retrieving the data from a container based volume or a named volume is an identical process, you need to mount the volume in a container and run an export/backup to your outside location. The only difference is in the command line, instead of --volumes-from <container-id> you have -v <vol-name>:/mnt. You can use this same process to import data into the volume as well, removing the need to initialize the app image with data in it's volume.
The biggest advantage of the new process is that it clearly separates data from containers. You can purge all the containers on the system without fear of losing data, and any volumes listed on the system are clear in their name, rather than a randomly assigned name. Lastly, named volumes can be mounted anywhere on the target, and you can pick and choose which of the volumes you'd like to mount if you have multiple data sources (e.g. config files vs databases).

Resources