Docker volume vs. persistent volume on official Docker beginner tutorial - docker

The tutorial is at docker-curriculum
I am having trouble understanding the difference between volumes in this docker-compose.yml in the tutorial:
version: "3"
services:
es:
image: docker.elastic.co/elasticsearch/elasticsearch:6.3.2
container_name: es
environment:
- discovery.type=single-node
ports:
- 9200:9200
volumes:
- esdata1:/usr/share/elasticsearch/data
web:
build: . # replaced image with build
command: python app.py
environment:
- DEBUG=True # set an env var for flask
depends_on:
- es
ports:
- "5000:5000"
volumes:
- ./flask-app:/opt/flask-app
volumes:
esdata1:
driver: local
There are volumes under web, es and then by itself with esdata1: and driver: local underneath it. My newbie mind understands the ones under web and es to be mounts of external data to a directory within each container. Then the last volume is putting a persistent volume on the host machine that will be there even when containers are killed. In this case, it is esdata1: data that will persist. My next question is, what does the driver: local mean?

I just received advice from a mentor and yes, my guess is partially correct in that Docker will create a directory at /var/lib/docker/volumes and that directory can be mounted to containers. This persistent volume becomes the location where data from processes inside containers, ie. MySQL, ElasticSearch, etc, is permanently stored.
(I'm guessing until you kill, delete everything with the -v option...just a guess).
The volume under web is the directory created by the programmer, ie. same subdirectory that contains app.py that is mounted in the container allowing rewrite of app.py on the fly, ie. whatever changes are made on the local machine are reflected in the container.
As a newbie to containers and Docker, this all seems very efficient although the learning curve is somewhat steep for me. Please see above comments with link to another question for answer to the second question.

Related

How can I store data with Docker Compose containers?

I have this docker-compose.yml, and I have a Postgres database and Grafana running over it to make queries on data.
version: "3"
services:
db:
image: postgres
container_name: db
ports:
- "5432:5432"
environment:
- POSTGRES_PASSWORD=my_secret_password
grafana:
image: grafana/grafana
container_name: grafana
depends_on:
- db
ports:
- "3000:3000"
I start this compose with the command docker-compose up, but then, if I want to not lose any data, I must run docker-compose stop instead of docker-compose down.
I also read about docker commit, but "the commit operation will not include any data contained in volumes mounted inside the container", so I guess it's no use for my needs.
What's the proper way to store the created volumes and reusing them with commands up/down, so even when recreating the containers? I must use some sort of backup methods provided by every image (so, for example, a DB export for Postgres, and some other type of export for Grafana), or there is a way to do this inside docker-compose.yml?
EDIT:
I also read about volumes, but is there a standard way to store everything?
In the link provided by #DannyB, setting volumes to ./postgres-data:/var/lib/postgresql instead of ./postgres-data:/var/lib/postgresql/data caused the container to not store the actual folder.
My question is: every image must follow a particular pattern like the one above? This path to data to store the volume underlying is present in every Docker image Readme? Or is there something like:
volumes:
- ./my_image_root:/
Docker provides for volumes as the way to persist volumes between container invocations and to share data between containers.
They are quite simple to declare and use in compose files:
volumes:
postgres:
grafana:
services:
db:
image: postgres
ports:
- "5432:5432"
environment:
- POSTGRES_PASSWORD=my_secret_password
volumes:
- postgres:/var/lib/postgresql/data
grafana:
image: grafana/grafana
depends_on:
- db
volumes:
- grafana:/var/lib/grafana
ports:
- "3000:3000"
Optionally, you can also set a local directory as your container volume
with the added convince of having the files easily accessible not only from inside the container. This is especially helpful for mounting specific config files to their location in the container, you can edit the file locally like any other file restart the container with the updated configuration (certificates and other similar files also make good use of this option). And you do that like so:
volumes:
- /home/myusername/postgres_data/:/var/lib/postgresql/data/
PS. I have omitted the container_name and version directives from this compose.yml because (as of docker 20.10), the docker compose spec determines version automatically, and docker compose exposes enough functionality that accessing the containers directly using short names isn't necessary usually.

Docker volume associated to postgres image empty and not persistent

I have a docker-compose file to build a web server with django and a postgres database. It basically looks like that :
version: '3'
services:
server:
build:
context: .
dockerfile: ./docker/server/Dockerfile
image: backend
volumes:
- ./api:/app
ports:
- 8000:8000
depends_on:
- postgres
- redis
environment:
- PYTHONUNBUFFERED=1
postgres:
image: kartoza/postgis:11.0-2.5
volumes:
- pg_data:/var/lib/postgresql/data:rw
environment:
POSTGRES_DB: "gis,backend"
POSTGRES_PORT: "5432"
POSTGRES_USER: "user"
POSTGRES_PASS: "pass"
POSTGRES_MULTIPLE_EXTENSIONS: "postgis,postgis_topology"
ports:
- 5432:5432
redis:
image: "redis:alpine"
volumes:
pg_data:
I'm using a volume to make my data persistent
I managed to run my containers and add data to the database. A volume has successfully been created : docker volume ls
DRIVER VOLUME NAME
local server_pg_data
But this volume is empty as the output of docker system df -v shows:
Local Volumes space usage:
VOLUME NAME LINKS SIZE
server_pg_data 1 0B
Also, if I want or need to build the containers once again using docker-compose down and docker-compose up, data has been purged from my database. Yet, I thought that volumes were used to make data persistent on diskā€¦
I must be missing something in the way I'm using docker and volumes but I don't get what:
why does my volume appears empty while there is some data in my postgres container ?
why does my volume does not persist after doing docker-compose down ?
This thread (How to persist data in a dockerized postgres database using volumes) looked similar but the solution does not seem to apply.
The kartoza/postgis image isn't configured the same way as the standard postgres image. Its documentation notes (under "Cluster Initializations"):
By default, DATADIR will point to /var/lib/postgresql/{major-version}. You can instead mount the parent location like this: -v data-volume:/var/lib/postgresql
If you look at the Dockerfile in GitHub, you will also see that parent directory named as a VOLUME, which has some interesting semantics here.
With the setting you show, the actual data will be stored in /var/lib/postgresql/11.0; you're mounting the named volume on a different directory, /var/lib/postgresql/data, which is why it stays empty. Changing the volume mount to just /var/lib/postgresql should address this:
volumes:
- pg_data:/var/lib/postgresql:rw # not .../data

How fast do the files from a docker image get copied to a named volume after container initialization

I have a stack of containers that are sharing a named volume. The image that contains the files is built to contain code (multiple libraries, thousands of classes).
The issue I am facing is that when I deploy the stack to a docker swarm mode cluster, the containers initialize before the files are fully copied to the volume.
Is there a way to tell that the volume is ready and all files mounted have been copied? I would have assumed that the containers would only get created after the volume is ready, but this does not seem to be the case.
I have an install command that runs in one of the containers sharing that named volume and this fails because the files are not there yet.
version: '3.3'
services:
php:
image: code
volumes:
- namedvolume:/var/www/html
web:
image: nginx
volumes:
- namedvolume:/var/www/html
install:
image: code
volumes:
- namedvolume:/var/www/html
command: "/bin/bash -c \"somecommand\""
volumes:
namedvolume:
Or is there something i am doing wrong?
Thanks

docker-elk - how is it persisting elasticsearch index?

I'm just getting to grips with Docker and docker-compose, trying to create a development environment for Elasticsearch which I will deploy later.
I've been using docker-elk as a reference, and I've managed to create a working Elasticsearch container, seed it, and use it in my project.
As I understand it, Docker containers don't persist data, unless you use the Volumes API and create a volume outside the container that the container then accesses (read that here).
However docker-elk only uses Volumes to share a config yml file, but somehow my elastic indices are persisting when I bring the container down and up again.
From the docker-elk readme:
The data stored in Elasticsearch
will be persisted after container reboot but not after container
removal.
Can someone please explain what part of the below configuration is allowing the docker container to persist the index?
docker-compose.yml
version: '2'
services:
elasticsearch:
build:
context: build/elasticsearch/
volumes:
- ./build/elasticsearch/config.yml:/usr/share/elasticsearch/config/elasticsearch.yml:ro
ports:
- "9200:9200"
- "9300:9300"
environment:
ES_JAVA_OPTS: "-Xmx256m -Xms256m"
networks:
- elk
networks:
elk:
driver: bridge
build/elasticsearch/Dockerfile
FROM docker.elastic.co/elasticsearch/elasticsearch-oss:6.0.0
build/elasticsearch/config.yml
cluster.name: "docker-cluster"
network.host: 0.0.0.0
discovery.zen.minimum_master_nodes: 1
discovery.type: single-node
As you may know, a container is a sandbox. It has a filesystem with a structure very identical to a typical linux OS. The container only sees those files and folders that are in this filesystem.
The process running inside the container writes it data and config to files in this filesystem. This process is unaware that it is running in a container or on a VM. Thus the data is persisted in files and folder in this filesystem.
Now when you remove a container using docker rm ... those files are deleted with the container and thus you lose the data unless you use volumes which backup this data on the host.
On the other hand, stopping and starting the container does not remove the container files and thus the data is still there when you restart the container.
To supplement the accepted answer, for anyone who is looking for how to persist the data. Add a volume as mentioned in the question.
version: '3'
services:
elasticsearch: # Elasticsearch Instance
container_name: es-search
image: docker.elastic.co/elasticsearch/elasticsearch:6.1.1
volumes: # Persist ES data in seperate "esdata" volume
- esdata:/usr/share/elasticsearch/data
environment:
- bootstrap.memory_lock=true
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
- discovery.type=single-node
ports: # Expose Elasticsearch ports
- "9300:9300"
- "9200:9200"
volumes: # Define seperate volume for Elasticsearch data
esdata: ./my/esdata # path of your persisted data here
I found a guide for elastic docker here: https://blog.patricktriest.com/text-search-docker-elasticsearch/
One can observe the index and their mapping in UUID using the below command.
curl 'localhost:9200/_cat/indices?v'

Mount a windows host directory in compose file version 3

I trying to upgrade docker-compose.yml from version 1 to version 3.
Main question about
volumes_from: To share a volume between services,
define it using the top-level volumes option and
reference it from each service that shares it using the
service-level volumes option.
Simplest example:
version "1"
data:
image: postgres:latest
volumes:
- ./pg_hba.conf/:/var/lib/postgresql/data/pg_hba.conf
postgres:
restart: always
image: postgres:latest
volumes_from:
- data
ports:
- "5432:5432"
If I have understood correctly, should be converted to
version: "3"
services:
db:
image: postgres:latest
restart: always
volumes:
- db-data:/var/lib/postgresql/data
ports:
- "5432:5432"
networks:
- appn
networks:
appn:
volumes:
db-data:?
Question: How now in top-level volumes option i can set relative path to folder "example_folder" from windows host to "db-data" ?
In this instance, you might consider not using volumes_from.
As mentioned in this docker 1.13 issue by Sebastiaan van Stijn (thaJeztah):
The volumes_from is basically a "lazy" way to copy volume definitions from one container to another, so;
docker run -d --name one -v myvolume:/foo image-one
docker run -d --volumes-from=one image-two
Is the same as running;
docker run -d --name one -v myvolume:/foo image-one
docker run -d --name two -v myvolume:/foo image-two
If you are deploying to AWS you should not use bind-mounts, but use named volumes instead (as in my example above), for example;
version: "3.0"
services:
db:
image: nginx
volumes:
- uploads-data:/usr/share/nginx/html/uploads/
volumes:
uploads-data:
Which you can run with docker-compose;
docker-compose up -d
Creating network "foo_default" with the default driver
Creating volume "foo_uploads-data" with default driver
Creating foo_db_1
Basically, it is not available in docker compose version 3:
There's a couple of reasons volumes_from is not ported to the compose-file "3";
In a swarm, there is no guarantee that the "from" container is running on the same node. Using volumes_from would not lead to the expected result.
This is especially the case with bind-mounts, which, in a swarm, have to exist on the host (are not automatically created)
There is still a "race" condition (as described earlier)
The "data" container has to use exactly the right paths for volumes as the "app" container that uses the volumes (i.e. if the "app" uses the volume in /some/path/in/container, then the data container also has to have the volume at /some/path/in/container). There are many cases where the volume may be shared by multiple services, and those may be consuming the volume in different paths.
But also, as mentioned in issue 19990:
The "regular" volume you're describing is a bind-mount, not a volume; you specify a path from the host, and it's mounted in the container. No data is copied from the container to that path, because the files from the host are used.
For a volume, you're asking docker to create a volume (persistent storage) to store data, and copy the data from the container to that volume.
Volumes are managed by docker (or through a plugin) and the storage path (or mechanism) is an implementation detail, as all you're asking is a storage, that's managed.
For your question, you would need to define a docker volume container and copy your host content in it:
services:
data:
image: "nginx:alpine"
volumes:
- ./pg_hba.conf/:/var/lib/postgresql/data/pg_hba.conf

Resources