I'm a bit confused about data-only docker containers. I read it's a bad practice to mount directories directly to the source-os: https://groups.google.com/forum/#!msg/docker-user/EUndR1W5EBo/4hmJau8WyjAJ
And I get how I make data-only containers: http://container42.com/2014/11/18/data-only-container-madness/
And I see somewhat similar question like mine: How to deal with persistent storage (e.g. databases) in docker
But what if I have a lamp-server setup.. and I have everything nice setup with data-containers, not linking them 'directly' to my source-os and make a backup once a while..
Than someone comes by, and restarts my server.. How do I setup my docker (data-only)-containers again, so I don't lose any data?
Actually, even though it was shykes who said it was considered a "hack" in that link you provide, note the date. Several eons worth of Docker years have passed since that post about volumes, and it's no longer considered bad practice to mount volumes on the host. In fact, here is a link to the very same shykes saying that he has "definitely used them at large scale in production for several years with no issues". Mount a host OS directory as a docker volume and don't worry about it. This means that your data persists across docker restarts/deployments/whatever. It's right there on the disk of the host, and doesn't go anywhere when your container goes away.
I've been using docker volumes that mount host OS directories for data storage (database persistent storage, configuration data, et cetera) for as long as I've been using Docker, and it's worked perfectly. Furthermore, it appears shykes no longer considers this to be bad practice.
Docker containers will persist on disk until they are explicitly deleted with docker rm. If your server restarts you may need to restart your service containers, but your data containers will continue to exist and their volumes will be available to other containers.
docker rm alone doesn't remove the actual data (which lives on in /var/lib/docker/vfs/dir)
Only docker rm -v would clear out the data as well.
The only issue is that, after a docker rm, a new docker run would re-create an empty volume in /var/lib/docker/vfs/dir.
In theory, you could with symlink redirect the new volume folders to the old ones, but that supposes you notes which volumes were associated to which data container... before the docker rm.
It's worth noting that the volumes you create with "data-only containers" are essentially still directories on your host OS, just in a different location (/var/lib/docker/...). One benefit is that you get to label your volumes with friendly identifiers and thus you don't have to hardcode your directory paths.
The downside is that administrative work like backing up specific data volumes is a bit of a hassle now since you have to manually inspect metadata to find the directory location. Also, if you accidentally wipe your docker installation or all of your docker containers, you'll lose your data volumes.
Related
I have been reading about Docker, and one of the first things that I read about docker was that it runs images in a read-only manner. This has raised this question in my mind, what happens if I need users to upload files? In that case where would the file go (are they appended to the image)? or in other words, how to handle uploaded files?
Docker containers are meant to be immutable and replaceable - you should be able to stop a container and replace it with a newer version without any ill effects. It's bad practice to store any configuration or operational data inside the container.
The situation you describe with file uploads would typically be resolved with a volume, which mounts a folder from the host filesystem into the container. Any modifications performed by the container to the mounted folder would persist on the host filesystem. When the container is replaced, the folder is re-mounted when the new container is started.
It may be helpful to read up on volumes: https://docs.docker.com/storage/volumes/
docker containers use file systems similar to their underlying operating system, as it seems in your case Windows Nano Server(windows optimized to be used in a container).
so any uploads to your container will be placed on the corresponding path you provided when uploading the file.
but this data is ephemeral, this means your data will persist until the container is for whatever reason stopped.
to use persistent storage you must provide a volume for your docker container, you can think of volumes as external disks attached to a container that mount on a path inside the container. this will persist data regardless of container state
If my container exits then all my images got lost and their respective data as well.
Could you please answer how to save the data (Here in this case of gitlab, we have multiple branches). How to save those branches even if container exits and next time when we restart the container, I should get all my old branches back?
This question is a bit light on specific details of your workflow, but the general answer to the need for persistent data in the ephemeral container world is volumes. Without a broader understanding of your workflow and infrastructure, it could be as simple as making sure that your gitlab data is in a named local volume. E.g. something you create with docker volume or an image that everyone uses that has a VOLUME location identified in the Dockerfile and is bind mounted to a host location at container run time.
Of course once you are in the world of distributed systems and orchestrating multi-node container environments, local volumes will no longer be a viable answer and you will need to investigate shared volume capabilities from a storage vendor or self-managed with NFS or some other global filesystem capabilities. A lot of good detail is provided in the Docker volume administrative guide if you are new to the volume concept.
Let's say you are trying to dockerise a database (couchdb for example).
Then there are at least two assets you consider volumes for:
database files
log files
Let's further say you want to keep the db-files private but want to expose the log-files for later processing.
As far as I undestand the documentation, you have two options:
First option
define managed volumes for both, log- and db-files within the db-image
import these in a second container (you will get both) and work with the logs
Second option
create data container with a managed volume for the logs
create the db-image with a managed volume for the db-files only
import logs-volume from data container when running db-image
Two questions:
Are both options realy valid/ possible?
What is the better way to do it?
br volker
The answer to question 1 is that, yes both are valid and possible.
My answer to question 2 is that I would consider a different approach entirely and which one to choose depends on whether or not this is a mission critical system and that data loss must be avoided.
Mission critical
If you absolutely cannot lose your data, then I would recommend that you bind mount a reliable disk into your database container. Bind mounting is essentially mounting a part of the Docker Host filesystem into the container.
So taking the database files as an example, you could image these steps:
Create a reliable disk e.g. NFS that is backed-up on a regular basis
Attach this disk to your Docker host
Bind mount this disk into my database container which then writes database files to this disk.
So following the above example, lets say I have created a reliable disk that is shared over NFS and mounted on my Docker Host at /reliable/disk. To use that with my database I would run the following Docker command:
docker run -d -v /reliable/disk:/data/db my-database-image
This way I know that the database files are written to reliable storage. Even if I lose my Docker Host, I will still have the database files and can easily recover by running my database container on another host that can access the NFS share.
You can do exactly the same thing for the database logs:
docker run -d -v /reliable/disk/data/db:/data/db -v /reliable/disk/logs/db:/logs/db my-database-image
Additionally you can easily bind mount these volumes into other containers for separate tasks. You may want to consider bind mounting them as read-only into other containers to protect your data:
docker run -d -v /reliable/disk/logs/db:/logs/db:ro my-log-processor
This would be my recommended approach if this is a mission critical system.
Not mission critical
If the system is not mission critical and you can tolerate a higher potential for data loss, then I would look at Docker Volume API which is used precisely for what you want to do: managing and creating volumes for data that should live beyond the lifecycle of a container.
The nice thing about the docker volume command is that it lets you created named volumes and if you name them well it can be quite obvious to people what they are used for:
docker volume create db-data
docker volume create db-logs
You can then mount these volumes into your container from the command line:
docker run -d -v db-data:/db/data -v db-logs:/logs/db my-database-image
These volumes will survive beyond the lifecycle of your container and are stored on the filesystem if your Docker host. You can use:
docker volume inspect db-data
To find out where the data is being stored and back-up that location if you want to.
You may also want to look at something like Docker Compose which will allow you to declare all of this in one file and then create your entire environment through a single command.
I was reading Project Atomic's guidance for images which states that the 2 main use cases for using a volume are:-
sharing data between containers
when writing large files to disk
I have neither of these use cases in my example using an Nginx image. I intended to mount a host directory as a volume in the path of the Nginx docroot in the container. This is so that I can push changes to a website's contents into the host rather then addressing the container. I feel it is easier to use this approach since I can - for example - just add my ssh key once to the host.
My question is, is this an appropriate use of a data volume and if not can anyone suggest an alternative approach to updating data inside a container?
One of the primary reasons for using Docker is to isolate your app from the server. This means you can run your container anywhere and get the same result. This is my main use case for it.
If you look at it from that point of view, having your container depend on files on the host machine for a deployed environment is counterproductive- running the same container on a different machine may result in different output.
If you do NOT care about that, and are just using docker to simplify the installation of nginx, then yes you can just use a volume from the host system.
Think about this though...
#Dockerfile
FROM nginx
ADD . /myfiles
#docker-compose.yml
web:
build: .
You could then use docker-machine to connect to your remote server and deploy a new version of your software with easy commands
docker-compose build
docker-compose up -d
even better, you could do
docker build -t me/myapp .
docker push me/myapp
and then deploy with
docker pull
docker run
There's a number of ways to achieve updating data in containers. Host volumes are a valid approach and probably the simplest way to achieve making your data available.
You can also copy files into and out of a container from the host. You may need to commit afterwards if you are stopping and removing the running web host container at all.
docker cp /src/www webserver:/www
You can copy files into a docker image build from your Dockerfile, which is the same process as above (copy and commit). Then restart the webserver container from the new image.
COPY /src/www /www
But I think the host volume is a good choice.
docker run -v /src/www:/www webserver command
Docker data containers are also an option for mounted volumes but they don't solve your immediate problem of copying data into your data container.
If you ever find yourself thinking "I need to ssh into this container", you are probably doing it wrong.
Not sure if I fully understand your request. But why you need do that to push files into Nginx container.
Manage volume in separate docker container, that's my suggestion and recommend by Docker.io
Data volumes
A data volume is a specially-designated directory within one or more containers that bypasses the Union File System. Data volumes provide several useful features for persistent or shared data:
Volumes are initialized when a container is created. If the container’s base image contains data at the specified mount point, that existing data is copied into the new volume upon volume initialization.
Data volumes can be shared and reused among containers.
Changes to a data volume are made directly.
Changes to a data volume will not be included when you update an image.
Data volumes persist even if the container itself is deleted.
refer: Manage data in containers
As said, one of the main reasons to use docker is to achieve always the same result. A best practice is to use a data only container.
With docker inspect <container_name> you can know the path of the volume on the host and update data manually, but this is not recommended;
or you can retrieve data from an external source, like a git repository
I am looking at Managing Data in Containers. There are two ways to manage data in Docker.
Data Volumes, and
Data Volume Containers
https://docs.docker.com/userguide/dockervolumes/
My question is: What are the pros and cons of these two methods?
I wouldn't think of them as different methods.
Volumes are the mechanism to bypass the Union File System, thereby allowing data to be easily shared with other containers and the host. Data-containers simply wrap a volume (or volumes) to provide a handy name which you can use in --volumes-from statements to share data between containers. You can't have data-containers without data volumes.
There are basically three ways you can manage data within a container and it would perhaps be best to outline and provide some case-by-case examples as to when and why you would use these.
First, you have the option to use the Union File System. Each container that runs has an associated writable layer provided by the UFS, so if I run a container based on my choice image, the writes I perform during that session when the container runs can be committed back to the image and persisted, through the means that they are permanently associated with the image's build. So if you have a Debian image and do apt-get update && apt-get install -y python, you have the possibility to commit that back to the image, share it with others and save everyone the time required to perform all those multiple network requests to have an up-to-date container with Python pre-installed.
Secondly, you can use volumes. When the container runs, writes to the directories that are targeted as volumes are kept distinctively of the UFS and remain associated with the container. As long as the associated container exists, so does the volume. Say you had a container who's entry point is a process that produces logs at /var/logs/myapp. Without volumes, the data written by the process could inadvertently be committed back to the image, needlessly adding to it's size. Conversely, as long as the container exists, should the process crash and bring down the container, you can access the logs and inspect what happened. By it's very nature, data stored in volumes associated with such containers is meant to be transient--discard the container and the data generated by the process is gone. If the container's image is updated and your dealt a new one, or you have no need for the generated logs anymore, you can simply remove and recreate the container and effectively flush the generated logs from disk.
While this seems great, what happens with, say, data that's written by a database? Surely, it's not something you'd keep as a part of the UFS, but you can't simply have it flushed if you update the DB image or switch over from foo/postgresql to bar/postgresql and end up with a new container in each case. Clearly, it's unacceptable and that's where the third option comes in, to have a persistent, named container with associated volumes and utilizing the full scope of volume capabilities, such as being able to share them with other containers, even when the associated container isn't actually running. With this pattern, you can have a dbdata container, with /var/lib/postgresql/data configured as a volume. You can then reliably have transient database containers and remove and re-create them leniently without losing important data.
So, to recap some facts about volumes:
Volumes are associated with containers
Writing to volume directories writes directly to the volume itself, bypassing the UFS
This makes it possible to share volumes independently across several containers
Volumes are destroyed when the last associated container is removed
If you don't want to lose important data stored in volumes when removing transient containers, associate the volume with a permanent, named container and share it with the non-persisting containers to retain the data
Then, as a general rule of thumb:
Data which you want to become a permanent feature for every container environment should be written to UFS and committed to the image
Data which is generated through by the container's managed process should be written to a volume
Data written to a volume which you don't want to accidentally lose if you remove a container should be associated with a named container which you intend to keep, and then shared with other transient containers which can be safely removed afterwards
Data containers offer:
A sensible layer of abstraction for your storage. The files that make up your volume are stored and managed for you by Docker.
Data containers are very handy to share using the special "--volumes--from" directive.
The data is automatically cleaned up by deleting the data container.
Project's like Flocker demonstrate how eventually the storage associated with a container will be shared between docker hosts.
Can use the "docker cp" command to pull files out of the container onto the host.
Data volume mappings offer:
Simpler to understand
Explicit and direct control over where the data is stored.
I have experienced file ownership and permission issues. For example Docker processes running as root within a container create files owned by root on the file-system. (Yes, I understand that data volumes store their data the same way, but using the "docker cp" command pulls out the files owned by me :-))
In conclusion I think it really boils down to how much control you wish to exert over the underlying storage. Personally I like the abstraction and indirection provided by Docker, for the same reason I like port mapping.