How to restore docker postgres container? - docker

I am totally newbie with docker. Unfortunately, I nave made a change - I set a vew environment variable from GUI and it astonishingly caused container re-creation! All postgreSQL DBs have been lost.
So, two questions:
Why did it happen?
is there a way to rollback? (There were no backups or something else).

There are a fairly broad set of changes that require deleting and recreating containers. As you've discovered, this includes changing environment variables; it also includes published ports, host-mapped directories, and changing the image underneath the container. In turn, the image will change if there's ever any sort of security update, software patch release, or just a new application build.
In short: deleting Docker containers is very common and you need to make sure the data gets preserved.
The standard way to do this is is to mount some additional storage into the container. Docker provides a named volume system, but the named volumes can be opaque and hard to manage; it's often easier to bind mount a host directory. (N.B.: the linked documentation advocates for named volumes, IME host directories are easier to inspect and manage with readily-available non-Docker tools.) You need to look at each image's documentation to know where to attach the storage, but for the standard postgres image it is in /var/lib/postgresql/data (see "Where To Store Data" at the end of the linked page). In plain Docker you could run
docker run \
-d \
-p 5432:5432 \
-v ./postgres:/var/lib/postgresql/data \
but there's presumably a setting for that in your GUI tool.
Your previous data is probably lost. Docker doesn't keep snapshots of containers, and deleting a container actually deletes it and its underlying data. You still need to do things like take backups of your data in case Docker or some other part of your system fails.


Backing up docker volumes

I've created a separate volume on an Ubuntu machine with the intention to store docker volumes and persist data. So far, I've created volumes on the host machine for two services (jira and postgres), which I intent to backup offsite. I am using docker-compose like so
- /var/dkr/pgdata:/var/lib/postgresql/data
And for jira:
- /var/dkr/jira:/var/atlassian/jira
My thinking is that I could just rsync the /var/dkr folder to a temporary location, tar it and send it to S3. Now that I've read a bit more on the process of hosted volumes I am worried that I might end up with messed up GIDs and UIDs for the services when I restore from a backup.
My questions are - has docker resolved this problem in the newer versions (I am using the latest). Is it safe to take this approach? What would be a better way to backup my persistent volumes?
There's no magic solution to uid/gid mapping issues between containers and hosts. It would need to be implemented by the filesystem drivers in the Linux kernel, which is how NFS and some of the VM filesystem mappings work. For "bind" mounts, forcing a uid/gid is not an option from Linux, and Docker is just providing an easy to use interface on top of that.
With your backups, ensure that uid/gid is part of your backup (tar does this by default). Also ensure that the uid/gid being used in your container is defined in the image or specified to a static value in your docker run or compose file. As long as you don't depend on a host specific uid/gid, and restore preserving the uid/gid (default for tar as root), you won't have any trouble.
Worst case, you run something like find /var/dkr -uid $old_uid -exec chown $new_uid {} \; to change your UID's. The tar command also has options for change uid/gid on extract (see the man page for more details).

docker volume container strategy

Let's say you are trying to dockerise a database (couchdb for example).
Then there are at least two assets you consider volumes for:
database files
log files
Let's further say you want to keep the db-files private but want to expose the log-files for later processing.
As far as I undestand the documentation, you have two options:
First option
define managed volumes for both, log- and db-files within the db-image
import these in a second container (you will get both) and work with the logs
Second option
create data container with a managed volume for the logs
create the db-image with a managed volume for the db-files only
import logs-volume from data container when running db-image
Two questions:
Are both options realy valid/ possible?
What is the better way to do it?
br volker
The answer to question 1 is that, yes both are valid and possible.
My answer to question 2 is that I would consider a different approach entirely and which one to choose depends on whether or not this is a mission critical system and that data loss must be avoided.
Mission critical
If you absolutely cannot lose your data, then I would recommend that you bind mount a reliable disk into your database container. Bind mounting is essentially mounting a part of the Docker Host filesystem into the container.
So taking the database files as an example, you could image these steps:
Create a reliable disk e.g. NFS that is backed-up on a regular basis
Attach this disk to your Docker host
Bind mount this disk into my database container which then writes database files to this disk.
So following the above example, lets say I have created a reliable disk that is shared over NFS and mounted on my Docker Host at /reliable/disk. To use that with my database I would run the following Docker command:
docker run -d -v /reliable/disk:/data/db my-database-image
This way I know that the database files are written to reliable storage. Even if I lose my Docker Host, I will still have the database files and can easily recover by running my database container on another host that can access the NFS share.
You can do exactly the same thing for the database logs:
docker run -d -v /reliable/disk/data/db:/data/db -v /reliable/disk/logs/db:/logs/db my-database-image
Additionally you can easily bind mount these volumes into other containers for separate tasks. You may want to consider bind mounting them as read-only into other containers to protect your data:
docker run -d -v /reliable/disk/logs/db:/logs/db:ro my-log-processor
This would be my recommended approach if this is a mission critical system.
Not mission critical
If the system is not mission critical and you can tolerate a higher potential for data loss, then I would look at Docker Volume API which is used precisely for what you want to do: managing and creating volumes for data that should live beyond the lifecycle of a container.
The nice thing about the docker volume command is that it lets you created named volumes and if you name them well it can be quite obvious to people what they are used for:
docker volume create db-data
docker volume create db-logs
You can then mount these volumes into your container from the command line:
docker run -d -v db-data:/db/data -v db-logs:/logs/db my-database-image
These volumes will survive beyond the lifecycle of your container and are stored on the filesystem if your Docker host. You can use:
docker volume inspect db-data
To find out where the data is being stored and back-up that location if you want to.
You may also want to look at something like Docker Compose which will allow you to declare all of this in one file and then create your entire environment through a single command.

How to deal with files of web applications in docker?

How do you guys deal with files of web applications for your docker containers? We are using same application for >400 customers. It's the same application with enabled/disabled modules (there are extra files).
I am currently using this approach: build the images, e.g. for Mysql, nginx+php, and then start the container with specific prepared application folder:
docker create -v /dbdata --name dbstore x/mysql /bin/true
docker run -d --volumes-from dbstore --name db1 x/mysql
docker run -d -P --name web --link db1:db1 -v /webapp:/opt/webapp x/webapp php-start index.php
IMHO, it's a space overusing.
I think it's a little bit complex to create >100 tags(revisions) of a webapp docker data container.
Please advice how to manage this problem?
First, recent versions of Docker let you create and use named volumes. This means that "data-only containers" are antiquated and no longer necessary, and in fact are considered an anti-pattern these days. It's pretty straightforward to create and use a named volume:
docker volume create --name=foo
docker run -d -v "foo:/dbdata" --name "db1" x/mysql
You can view your volumes with:
docker volume ls
As far as your main question, you could take advantage of Docker's union filesystem (which could also more simply be called a "shared layer") design. What this means is that if you create two containers from the ubuntu image (e.g. docker run -d --name=one ubuntu and docker run -d --name=two ubuntu), both of those containers are going to use the same filesystem objects in the base ubuntu image. So for example the /etc/passwd file in both of those containers point to the same /etc/passwd data stored on disk. This is part of what is meant by the term "union filesystem" in the context of Docker.
So just take this knowledge a step further and "bake" those modules into your base image for use by all of the containers for your different customers. That just means creating your own image from a Dockerfile which uses FROM wordpress:latest at the top. Continuing with the WordPress example, and if you wanted to make a bunch of WP plugins available, you could just store them in /var/www/html/wp-plugins (or whatever) and only enable certain ones in your configuration. Since they're baked into the image you have created (and used the same image to create all of your different containers), all of those module files point to the same exact data stored on disk, via the union filesystem. Of course, if someone changes the code in one of their modules, for example, the individual container's image will store the changes in its own image layer, but the base files will all be from the same data, not taking up any extra space. Of course, you can substitute in whichever CMS you're using.
Now, where I work, I've recently created a Docker-based hosting system for people to use. The issue is that we wanted each and every customer to have their own copy of the CMS filesystem. Even though the union filesystem means that changes to the base image would be stored in their own image layers, that wasn't good enough for the guy that signs my paycheck. They wanted each customer to have their own EBS volume with their own copy of the CMS filesystem on it. So in that situation, where you want each and every customer to have their own volume (for example in order to transport them for backup, or move to a new host, etc), then you won't be able to get around the issue of using extra storage for those files.
It depends:
If the files are static and you want to be able to move the container around easily, then I keep the files in the container by just copying them into the web location as single directory.
If you have a reliable external location, and you change the files more regular (for example by using some kind of CMS), you could just run an apache or a nginx container and mount the volume

What is best practice for sharing database between containers in docker?

Is there anyone knows what is the best practice for sharing database between containers in docker?
What I mean is I want to create multiple containers in docker. Then, these containers will execute CRUD on the same database with same identity.
So far, I have two ideas. One is create an separate container to run database merely. Another one is install database directly on the host machine where installed docker.
Which one is better? Or, is there any other best practice for this requirement?
It is hard to answer a 'best practice' question, because it's a matter of opinion. And opinions are off topic on Stack Overflow.
So I will give a specific example of what I have done in a serious deployment.
I'm running ELK (Elasticsearch, Logstash, Kibana). It's containerised.
For my data stores, I have storage containers. These storage containers contain a local fileystem pass through:
docker create -v /elasticsearch_data:/elasticsearch_data --name ${HOST}-es-data base_image /bin/true
I'm also using etcd and confd, to dynamically reconfigure my services that point at the databases. etcd lets me store key-values, so at a simplistic level:
CONTAINER_ID=`docker run -d --volumes-from ${HOST}-es-data elasticsearch-thing`
ES_IP=`docker inspect $CONTAINER_ID | jq -r .[0].NetworkSettings.Networks.dockernet.IPAddress`
etcdctl set /mynet/elasticsearch/${HOST}-es-0
Because we register it in etcd, we can then use confd to watch the key-value store, monitor it for changes, and rewrite and restart our other container services.
I'm using haproxy for this sometimes, and nginx when I need something a bit more complicated. Both these let you specify sets of hosts to 'send' traffic to, and have some basic availability/load balance mechanisms.
That means I can be pretty lazy about restarted/moving/adding elasticsearch nodes, because the registration process updates the whole environment. A mechanism similar to this is what's used for openshift.
So to specifically answer your question:
DB is packaged in a container, for all the same reasons the other elements are.
Volumes for DB storage are storage containers passing through local filesystems.
'finding' the database is done via etcd on the parent host, but otherwise I've minimised my install footprint. (I have a common 'install' template for docker hosts, and try and avoid adding anything extra to it wherever possible)
It is my opinion that the advantages of docker are largely diminished if you're reliant on the local host having a (particular) database instance, because you've no longer got the ability to package-test-deploy, or 'spin up' a new system in minutes.
(The above example - I have literally rebuilt the whole thing in 10 minutes, and most of that was the docker pull transferring the images)
It depends. A useful thing to do is to keep the database URL and password in an environment variable and provide that to Docker when running the containers. That way you will be free to connect to a database wherever it may be located. E.g. running in a container during testing and on a dedicated server in production.
The best practice is to use Docker Volumes.
Official doc: Manage data in containers. This doc details how to deal with DB and container. The usual way of doing so is to put the DB into a container (which is actually not a container but a volume) then the other containers can access this DB-container (the volume) to CRUD (or more) the data.
Random article on "Understanding Docker Volumes"
edit I won't detail much further as the other answer is well made.

How persistent are docker data-only containers

I'm a bit confused about data-only docker containers. I read it's a bad practice to mount directories directly to the source-os:!msg/docker-user/EUndR1W5EBo/4hmJau8WyjAJ
And I get how I make data-only containers:
And I see somewhat similar question like mine: How to deal with persistent storage (e.g. databases) in docker
But what if I have a lamp-server setup.. and I have everything nice setup with data-containers, not linking them 'directly' to my source-os and make a backup once a while..
Than someone comes by, and restarts my server.. How do I setup my docker (data-only)-containers again, so I don't lose any data?
Actually, even though it was shykes who said it was considered a "hack" in that link you provide, note the date. Several eons worth of Docker years have passed since that post about volumes, and it's no longer considered bad practice to mount volumes on the host. In fact, here is a link to the very same shykes saying that he has "definitely used them at large scale in production for several years with no issues". Mount a host OS directory as a docker volume and don't worry about it. This means that your data persists across docker restarts/deployments/whatever. It's right there on the disk of the host, and doesn't go anywhere when your container goes away.
I've been using docker volumes that mount host OS directories for data storage (database persistent storage, configuration data, et cetera) for as long as I've been using Docker, and it's worked perfectly. Furthermore, it appears shykes no longer considers this to be bad practice.
Docker containers will persist on disk until they are explicitly deleted with docker rm. If your server restarts you may need to restart your service containers, but your data containers will continue to exist and their volumes will be available to other containers.
docker rm alone doesn't remove the actual data (which lives on in /var/lib/docker/vfs/dir)
Only docker rm -v would clear out the data as well.
The only issue is that, after a docker rm, a new docker run would re-create an empty volume in /var/lib/docker/vfs/dir.
In theory, you could with symlink redirect the new volume folders to the old ones, but that supposes you notes which volumes were associated to which data container... before the docker rm.
It's worth noting that the volumes you create with "data-only containers" are essentially still directories on your host OS, just in a different location (/var/lib/docker/...). One benefit is that you get to label your volumes with friendly identifiers and thus you don't have to hardcode your directory paths.
The downside is that administrative work like backing up specific data volumes is a bit of a hassle now since you have to manually inspect metadata to find the directory location. Also, if you accidentally wipe your docker installation or all of your docker containers, you'll lose your data volumes.
