Backup a postgres Container with its databases - docker

So we have around 100 tests, each test connect to a postgres instance and consume a database loaded with some data. The tests edits and change that data so we reload the postgres database for each test.
This takes really long time so I thought of using Docker for this as follows. I'm new to docker so this is the steps I'm using:
1) I would create one postgres container, load it with the test database that I want and make it ready and polished.
2) Use this command to save my container as tar
docker save -o postgres_testdatabase.tar postgres_testdatabase
3) For each test I load a new tar into an image
docker load -i postgres_testdatabase.tar
4) Run the container with the postgres instance
docker run -i -p 5432 postgres_testdatabase
5) The test runs and changes the data..
6) Destroy the container and load a fresh container with new fresh test database
7) Run the second test and so on..
My problem is that I found out that when I backup a container to a tar and load it and then run a new container I do not get my database, I basically get a fresh postgres installation with none of my databases.
What I'm doing wrong?
EDIT:
I tried one of the suggestion to commit my changes before I save my container to an image as follows:
I committed my updated container to a new image. Saved that Image to a tar file, deleted my existing container. Loaded the tar file and then run a new container from my saved image. I still don't see my databases.. I believe it has to do something with Volumes. How do I do this without volumes? how do I force all my data to be in the container so it get backed up with the image?
EDIT2
Warmoverflow suggested I use an sql file to load all my data while loading the image. This wont work in my case since the data is carefully being authored using another software (ArcGIS), plus the data has some complex blob fields geometries, so sql file to load the script wont work. He also suggested that I dont need to save the data as tar if im spawing containers in the same machine. Once Im satisified with my data and commit it to the image, i can load the image into a new container. Thanks for clarifying this. Still the problem is that how do I keep my database within my image so when I restore the image, the database comes with the container.
EDIT3
So I find a workaround inspired by warmoverflow suggestion, this should solve my problem. However, I'm still looking for a cleaner way to do this.
The solution is do the following:
Create a fresh postgres Container.
Populate your database as you please, in my case I use ArcGIS to do
so
use pg_dumpall to dump the entire postgres instance into a single
file with this command. We can run this command from any postgres
client, and we don't have to copy the dump file inside the container. I'm running this from Windows.
C:\Program Files\PostgreSQL\9.3\bin>pg_dumpall.exe -h 192.168.99.100 -p 5432 -U postgres > c:\Hussein\dump\pg_test_dump.dmp
You can now safely delete your container.
Create a new postgres container
Call this command on your container postgres instance to load your dump
C:\Program Files\PostgreSQL\9.3\bin>psql -f c:\Hussein\dump\ pg_test_dump.dmp -h 192.168.99.100 -p 5432 -U postgres
Run the test, test will screw the data so we need to reload, we
simply repeat the steps above.
I would still, really want the container image to have the database "in it" so when I run a container from an image, I get the database. Will be great if anyone could suggest a solution with that, will save me huge time.
Edit4
Finally Warmoverflow solved it! Answer below
Thanks

docker save is for images (saving images as tar file). What you need is docker commit which commit container changes to an image, and then save it to tar. But if your database is the same for all tests, you should build a custom image using a Dockerfile, and then run your containers using the single image.
If your data is loaded using an sql file, you can follow the instructions on "How to extend this image" section of the official postgres docker page https://hub.docker.com/_/postgres/. You can create a Dockerfile with the following content
FROM postgres
RUN mkdir -p /docker-entrypoint-initdb.d
ADD data.sql /docker-entrypoint-initdb.d/
Put your data.sql file and Dockerfile in a new folder, and run docker build -t custom_postgres ., which will build a customized image for you, and every time you run a new container with it, it will load the sql file on boot.
[Update]
Based on the new information from the question, the cause of the issue is that the official postgres image defines a VOLUME at the postgres data folder /var/lib/postgresql/data. VOLUME is used to persist data outside the container (when you use docker run -v to mount a host folder to the container), and thus any data inside the VOLUME are not saved when you commit the container itself. While this is normally a good idea, in this specific situation, we actually need data not be persistent, so that a fresh new container with the same data unmodified can be started every time.
The solution is to create your own version of the postgres image, with the VOLUME removed.
The files are at https://github.com/docker-library/postgres/tree/master/9.3
Download both files to a new folder
Remove the VOLUME line from Dockerfile
In Docker Quickstart Terminal, switch to that folder, and run docker build -t mypostgres ., which will build your own postgres image with the name mypostgres.
Use docker run -d -p 5432:5432 -e POSTGRES_PASSWORD=123456 mypostgres to start your container. The postgres db is available at postgres:123456#192.168.99.100:5432
Put in your data as normal using ArcGIS
Commit the container with docker commit container_id_from_step_5 mypostgres_withdata. This creates your own postgres image with data.
Stop and remove the intermediate container docker rm -f container_id_from_step_5
Every time you need a new container, in Docker Quickstart Terminal, run docker run -d -p 5432:5432 mypostgres_withdata to start a container, and remember to stop or remove the used container afterwards so that it won't occupy the 5432 port.

Related

How to store all container's data in docker?

I am trying to execute ubuntu in docker. I use this command docker run -it ubuntu, and I want to install some packages and store some files. I know about volumes, but I have used it only in docker-compose. Is it possible to store all the container's data or how can I do that properly?
when you run a container, Docker creates a namespace and loads the image filesystem in that namespace. any changes you apply in a running container including installing some packages only remains during the lifetime of the container if you remove the container and rerun it they're gone.
if you want to your changes be permanent you have to commit the running container and actually create an image for that using this command:
As David pointed out in the comments
You should pretty much never run docker commit. It leads to images that can't be reproduced, and you'll be in trouble if there's a security fix you're required to take a year down the road.
sudo docker commit [CONTAINER_ID] [new_image_name]
if you have an app inside the container like MySQL and wants the data stored in that app be permanent you should map a volume from the host like this:
docker run -d -v /home/username/mysql-data:/var/lib/mysql --name mysql mysql

Is there a dockerfile RUN command that executes the argument on the host?

We're trying to build a Docker stack that includes our complete application: a Postgres database and at least one web application.
When the stack is started, we expect the application to be immediately working - there should not be any delay due to database setup or data import. So the database schema (DDL) and the initial data have to be imported when the image is created.
This could be achieved by a RUN command in the dockerfile, for example
RUN psql.exe -f initalize.sql -h myhost -d mydatabase -U myuser
RUN data-import.exe myhost mydatabase myuser
However, AFAIU this would execute data-import.exe inside the Postgres container, which can only work if the Postgres container is a Windows container. Our production uses a Linux Postgres distribution, so this is not a good idea. We need the image to be a Linux Postgres container.
So the natural solution is to execute data-import.exe on the host, like this:
When we run docker build, a Linux Postgres container is started.
RUN psql.exe ... runs some SQL commands inside the Postgres container.
Now, our data-import.exe is executed on the host. Its Postgres client connects to the database in the container and imports the data.
When the data import is done, the data is committed to the image, and docker builds an image which contains the Postgres database together with the imported data.
Is there such a command? If not, how can we implement this scenario in docker?
Use the correct tool, a dockerfile is not a hammer for everything.
Obviously you come from a state where you had postgres up before using some import-tool. Now you can mimic that strategy by firing up a postgres container (without dockerfile, just docker/kubernetes). Then run the import-tool, stop the postgres-container, and make a snapshot of the result using "docker commit". The committed image will be used for the next stages of your deployment.
In Docker generally the application data is stored separately from images and containers; for instance you'd frequently use a docker run -v option to store data in a host directory or Docker volume so that it would outlive its container. You wouldn't generally try to bake data into an image, both for scale reasons and because any changes will be lost when a container exits.
As you've described the higher-level problem, I might distribute something like a "test kit" that included a docker-compose.yml and a base data directory. Your Docker Compose file would use a stock PostgreSQL container with data:
postgres:
image: postgres:10.5
volumes:
- './postgres:/var/lib/postgresql/data'
To answer the specific question you asked, docker build steps only run individual commands within Docker container space; they can't run arbitrary host commands, read filesystem content outside of the tree containing the Dockerfile, or write any sort of host filesystem content outside the container.

Docker: change port binding on an already created container with no data loss

Assuming that I have a MongoDb or Sql Server container with a lotta data, and all of a sudden (which is very probable) I need to change the port! Maybe due to a sudden security issue! And I need to stop the container and start it up again running on a different port. Why doesn't docker allow me to do that, if I run the image again a new container will be created with no data inside and that causes a lot of mess.
Is there a proper built-in solution? By proper I mean a solution that does not require me to back up databases, move them to out the container volume and restore them again. Something logical such as a command that can allow me to change the forwarded port, for example -p 1433:1234 to 27017:1234.
BLUF: Start your MongoDB container with a volume mapped in to keep the data persistant using this format: docker run --name some-mongo -v /my/own/datadir:/data/db -d mongo
While I agree, it would be great if Docker had the ability to switch port numbers in a running container. As others said, each container is a process, and I do not know a way of changing a port on a running process.
You do not need to import your data if you have set up your volumes properly. I do this all the time for MySQL databases. The MyQSL image is just the database engine separate from the database if you map in your volumes correctly. That's how Docker is designed.
In looking at the section "Where to store data", it gives an example of mounting a volume to a folder on the host to keep your data. This should allow you to start a new container using the same data without having to re-import. But I'm not as familiar with MongoDB which is a NoSQL.
https://hub.docker.com/_/mongo/#!
You may need backup your database using this dump command:
docker exec some-mongo sh -c 'exec mongodump -d <database_name> --archive' > /some/path/on/your/host/all-collections.archive
Start a new container with the volume mapped and restore the data.
docker run --name some-mongo -v /my/own/datadir:/data/db -v /some/path/on/your/host/all-collections.archive:/data/db/collections.archive -d mongo
You'll need to restore that backup.
docker exec some-mongo sh -c 'exec mongorestore --db <database_name> --archive=/data/db/collections.archive
From that point on you should be able to simply stop and start a new container with the volumes mapped in. Your data should remain persistent. You should not need to dump and restore any more (well, obviously for normal backup purposes).
Container is the instantiation of a image.
The port number is the instantiation state of a container, so it can only be changed while creating a container.
You can change the port mapping by directly editing the hostconfig.json file at /var/lib/docker/containers/[hash_of_the_container]/hostconfig.json
You can determine the [hash_of_the_container] via the docker inspect command and the value of the "Id" field is the hash.
1) stop the container
2) change the file
3) restart your docker engine (to flush/clear config caches)
4) start the container
Reference: How do I assign a port mapping to an existing Docker container?

How to handle database storage/backup and application logs with many linked containers?

I've just created my first dockerized app and I am using docker-compose to start it on my clients server:
web:
image: user/repo:latest
ports:
- "8080:8080"
links:
- db
db:
image: postgres:9.4.4
It exposes REST API (node.js) over 8080 port. REST API makes use of Postgres database. It works fine. My idea is that I will give this file (docker-compose.yml) to my client and he will just run docker-compose pull && docker-compose up -d each time he want to pull fresh app code from a repo (assuming he has rights to access user/repo repo.
However I have to handle two tasks: database backups and log backups.
How I can expose database to the host (docker host) system to for example define cron job that will make database dump and store it on S3?
I've read some article about docker container storage and docker volumes. As I understand in my set up all database files will be stored in "container memory" that will be lost if container is removed from the host. So I should use a docker volume to hold database data on "host side" right? How I can do this with postgres image?
In my app I log all info to stdout and stderr in case of errors. It would be coll (I think) if those logs were "streamed" directly to some file(s) on host system so they could be backed up to S3 for example (again by cron job?) - how I can do this? Or maybe there is a better aproach?
Sorry for so many questions but I am new to docker-world and it's really hard for me to understand how it actually works or how it's supposed to work.
You could execute a command on a running container to create a backup, like docker exec -it --rm db <command> > sqlDump. I do not know much about postgres but in that case would create the dump on stdout and > sqlDump would redirect that to the file sqlDump which would be created on the hosts current working directory. Then you can include that created dump file into your backup. That could be done perfectly with a cronjob defined on the host. But a much better solution is linked in the next paragraph.
If you run your containers as you described above, your volumes are deleted when you delete your container. You could go for a second approach to use volume containers as described here. In that case you could remove and re-create e.g. the db container without loosing your data. A backup could be created very easy then via a temporary container instance following these instructions. Assuming /dbdata is the place where your volume is mounted which contains the database data to be backed up:
docker run --volumes-from dbdata -v $(pwd):/backup <imageId> tar cvf /backup/backup.tar /dbdata
Since version 1.6 you can define a log driver for your container instance. With that you could interact e.g. with your syslog to have log entries in the hosts /var/log/syslog file. I do not know S3 but maybe that gives you some ideas:
docker run --log-driver=syslog ...

Mysql installed and persisting data in docker images

I am a newbie to Docker. I created a docker image with JAVA and MySQL installed in it. I tried running a normal Java application by copying it into the docker image, it ran successfully with the expected result. After that i tried to run a Java based application using mysql database. i created a database and tried executing the program, it ran successfully and i got the expected output.
But, when i closed that container and again tried to run it, it requires me to again create a new database with same name and my already existing data is lost. so, every time it opens a new mysql and i need to run it and create new database and table to store the data. Is there anyway to save my data in the image, so that every other time when i RUN the docker image, it should have the previous data stored in the same database?
If you're just starting out with docker, I would recommend mounting a local directory in your container for the database data. This assumes you'll only be running your application on a single machine, but it is easier than creating separate containers for your data. To do this, you would do something like this:
# Add VOLUMEs to allow backup of config and databases
VOLUME ["/etc/mysql", "/var/lib/mysql"]
and do
$ docker run -v /path/to/local/etc:/etc/mysql -v /path/to/local/db:/var/lib/mysql your-image
That said, running mysql in docker is a decidedly non-trivial first step in the docker world. This is a good example:
https://github.com/tutumcloud/tutum-docker-mysql
There is a difference between image and container.
If you want to save the change you make in your container to a new image you should use
docker commit <container name or id> <optionnal image name>
If you just want to relaunch your container for more modification before commiting use:
docker start <container name or id>
docker attach <container name or id>
How to see list of all container
docker ps -a
You have two ways to managing data in containers:
Data volumes: A data volume is a specially-designated directory within one or more containers that bypasses the Union File System to provide several useful features for persistent or shared data
Data volume containers: If you have some persistent data that you want to share between containers, or want to use from non-persistent containers, it's best to create a named Data Volume Container, and then to mount the data from it.
For more info see official user guide: https://docs.docker.com/userguide/dockervolumes/

Resources