I'm just getting to grips with Docker and docker-compose, trying to create a development environment for Elasticsearch which I will deploy later.
I've been using docker-elk as a reference, and I've managed to create a working Elasticsearch container, seed it, and use it in my project.
As I understand it, Docker containers don't persist data, unless you use the Volumes API and create a volume outside the container that the container then accesses (read that here).
However docker-elk only uses Volumes to share a config yml file, but somehow my elastic indices are persisting when I bring the container down and up again.
From the docker-elk readme:
The data stored in Elasticsearch
will be persisted after container reboot but not after container
removal.
Can someone please explain what part of the below configuration is allowing the docker container to persist the index?
docker-compose.yml
version: '2'
services:
elasticsearch:
build:
context: build/elasticsearch/
volumes:
- ./build/elasticsearch/config.yml:/usr/share/elasticsearch/config/elasticsearch.yml:ro
ports:
- "9200:9200"
- "9300:9300"
environment:
ES_JAVA_OPTS: "-Xmx256m -Xms256m"
networks:
- elk
networks:
elk:
driver: bridge
build/elasticsearch/Dockerfile
FROM docker.elastic.co/elasticsearch/elasticsearch-oss:6.0.0
build/elasticsearch/config.yml
cluster.name: "docker-cluster"
network.host: 0.0.0.0
discovery.zen.minimum_master_nodes: 1
discovery.type: single-node
As you may know, a container is a sandbox. It has a filesystem with a structure very identical to a typical linux OS. The container only sees those files and folders that are in this filesystem.
The process running inside the container writes it data and config to files in this filesystem. This process is unaware that it is running in a container or on a VM. Thus the data is persisted in files and folder in this filesystem.
Now when you remove a container using docker rm ... those files are deleted with the container and thus you lose the data unless you use volumes which backup this data on the host.
On the other hand, stopping and starting the container does not remove the container files and thus the data is still there when you restart the container.
To supplement the accepted answer, for anyone who is looking for how to persist the data. Add a volume as mentioned in the question.
version: '3'
services:
elasticsearch: # Elasticsearch Instance
container_name: es-search
image: docker.elastic.co/elasticsearch/elasticsearch:6.1.1
volumes: # Persist ES data in seperate "esdata" volume
- esdata:/usr/share/elasticsearch/data
environment:
- bootstrap.memory_lock=true
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
- discovery.type=single-node
ports: # Expose Elasticsearch ports
- "9300:9300"
- "9200:9200"
volumes: # Define seperate volume for Elasticsearch data
esdata: ./my/esdata # path of your persisted data here
I found a guide for elastic docker here: https://blog.patricktriest.com/text-search-docker-elasticsearch/
One can observe the index and their mapping in UUID using the below command.
curl 'localhost:9200/_cat/indices?v'
Related
I'm using windows with linux containers. I have a docker-compose file for an api and a ms sql database. I'm trying to use volumes with the database so that my data will persist even if my container is deleted. My docker-compose file looks like this:
version: '3'
services:
api:
image: myimage/myimagename:myimagetag
environment:
- SQL_CONNECTION=myserverconnection
ports:
- 44384:80
depends_on:
- mydatabase
mydatabase:
image: mcr.microsoft.com/mssql/server:2019-latest
environment:
- ACCEPT_EULA=Y
- SA_PASSWORD=mypassword
volumes:
- ./data:/data
ports:
- 1433:1433
volumes:
sssvolume:
everything spins up fine when i do docker-compose up. I enter data into the database and my api is able to access it. The issue I'm having is when I stop everything and try deleting my database container, then do docker-compose up again. The data is no longer there. I've tried creating an external volume first and adding
external: true
to the volumes section, but that hasn't worked. I've also messed around with the path of the volume like instead of ./data:/data I've had
sssvolume:/var/lib/docker/volumes/sssvolume/_data
but still the same thing happens. It was my understanding that if you name a volume and then reference it by name in a different container, it will use that volume.
I'm not sure if my config is wrong or if I'm misunderstanding the use case for volumes and they aren't able to do what I want them to do.
MSSQL stores data under /var/opt/mssql, so you should change your volume definition in your docker-compose file to
volumes:
- ./data:/var/opt/mssql
I have this docker-compose.yml, and I have a Postgres database and Grafana running over it to make queries on data.
version: "3"
services:
db:
image: postgres
container_name: db
ports:
- "5432:5432"
environment:
- POSTGRES_PASSWORD=my_secret_password
grafana:
image: grafana/grafana
container_name: grafana
depends_on:
- db
ports:
- "3000:3000"
I start this compose with the command docker-compose up, but then, if I want to not lose any data, I must run docker-compose stop instead of docker-compose down.
I also read about docker commit, but "the commit operation will not include any data contained in volumes mounted inside the container", so I guess it's no use for my needs.
What's the proper way to store the created volumes and reusing them with commands up/down, so even when recreating the containers? I must use some sort of backup methods provided by every image (so, for example, a DB export for Postgres, and some other type of export for Grafana), or there is a way to do this inside docker-compose.yml?
EDIT:
I also read about volumes, but is there a standard way to store everything?
In the link provided by #DannyB, setting volumes to ./postgres-data:/var/lib/postgresql instead of ./postgres-data:/var/lib/postgresql/data caused the container to not store the actual folder.
My question is: every image must follow a particular pattern like the one above? This path to data to store the volume underlying is present in every Docker image Readme? Or is there something like:
volumes:
- ./my_image_root:/
Docker provides for volumes as the way to persist volumes between container invocations and to share data between containers.
They are quite simple to declare and use in compose files:
volumes:
postgres:
grafana:
services:
db:
image: postgres
ports:
- "5432:5432"
environment:
- POSTGRES_PASSWORD=my_secret_password
volumes:
- postgres:/var/lib/postgresql/data
grafana:
image: grafana/grafana
depends_on:
- db
volumes:
- grafana:/var/lib/grafana
ports:
- "3000:3000"
Optionally, you can also set a local directory as your container volume
with the added convince of having the files easily accessible not only from inside the container. This is especially helpful for mounting specific config files to their location in the container, you can edit the file locally like any other file restart the container with the updated configuration (certificates and other similar files also make good use of this option). And you do that like so:
volumes:
- /home/myusername/postgres_data/:/var/lib/postgresql/data/
PS. I have omitted the container_name and version directives from this compose.yml because (as of docker 20.10), the docker compose spec determines version automatically, and docker compose exposes enough functionality that accessing the containers directly using short names isn't necessary usually.
I have a docker-compose file to build a web server with django and a postgres database. It basically looks like that :
version: '3'
services:
server:
build:
context: .
dockerfile: ./docker/server/Dockerfile
image: backend
volumes:
- ./api:/app
ports:
- 8000:8000
depends_on:
- postgres
- redis
environment:
- PYTHONUNBUFFERED=1
postgres:
image: kartoza/postgis:11.0-2.5
volumes:
- pg_data:/var/lib/postgresql/data:rw
environment:
POSTGRES_DB: "gis,backend"
POSTGRES_PORT: "5432"
POSTGRES_USER: "user"
POSTGRES_PASS: "pass"
POSTGRES_MULTIPLE_EXTENSIONS: "postgis,postgis_topology"
ports:
- 5432:5432
redis:
image: "redis:alpine"
volumes:
pg_data:
I'm using a volume to make my data persistent
I managed to run my containers and add data to the database. A volume has successfully been created : docker volume ls
DRIVER VOLUME NAME
local server_pg_data
But this volume is empty as the output of docker system df -v shows:
Local Volumes space usage:
VOLUME NAME LINKS SIZE
server_pg_data 1 0B
Also, if I want or need to build the containers once again using docker-compose down and docker-compose up, data has been purged from my database. Yet, I thought that volumes were used to make data persistent on diskā¦
I must be missing something in the way I'm using docker and volumes but I don't get what:
why does my volume appears empty while there is some data in my postgres container ?
why does my volume does not persist after doing docker-compose down ?
This thread (How to persist data in a dockerized postgres database using volumes) looked similar but the solution does not seem to apply.
The kartoza/postgis image isn't configured the same way as the standard postgres image. Its documentation notes (under "Cluster Initializations"):
By default, DATADIR will point to /var/lib/postgresql/{major-version}. You can instead mount the parent location like this: -v data-volume:/var/lib/postgresql
If you look at the Dockerfile in GitHub, you will also see that parent directory named as a VOLUME, which has some interesting semantics here.
With the setting you show, the actual data will be stored in /var/lib/postgresql/11.0; you're mounting the named volume on a different directory, /var/lib/postgresql/data, which is why it stays empty. Changing the volume mount to just /var/lib/postgresql should address this:
volumes:
- pg_data:/var/lib/postgresql:rw # not .../data
The tutorial is at docker-curriculum
I am having trouble understanding the difference between volumes in this docker-compose.yml in the tutorial:
version: "3"
services:
es:
image: docker.elastic.co/elasticsearch/elasticsearch:6.3.2
container_name: es
environment:
- discovery.type=single-node
ports:
- 9200:9200
volumes:
- esdata1:/usr/share/elasticsearch/data
web:
build: . # replaced image with build
command: python app.py
environment:
- DEBUG=True # set an env var for flask
depends_on:
- es
ports:
- "5000:5000"
volumes:
- ./flask-app:/opt/flask-app
volumes:
esdata1:
driver: local
There are volumes under web, es and then by itself with esdata1: and driver: local underneath it. My newbie mind understands the ones under web and es to be mounts of external data to a directory within each container. Then the last volume is putting a persistent volume on the host machine that will be there even when containers are killed. In this case, it is esdata1: data that will persist. My next question is, what does the driver: local mean?
I just received advice from a mentor and yes, my guess is partially correct in that Docker will create a directory at /var/lib/docker/volumes and that directory can be mounted to containers. This persistent volume becomes the location where data from processes inside containers, ie. MySQL, ElasticSearch, etc, is permanently stored.
(I'm guessing until you kill, delete everything with the -v option...just a guess).
The volume under web is the directory created by the programmer, ie. same subdirectory that contains app.py that is mounted in the container allowing rewrite of app.py on the fly, ie. whatever changes are made on the local machine are reflected in the container.
As a newbie to containers and Docker, this all seems very efficient although the learning curve is somewhat steep for me. Please see above comments with link to another question for answer to the second question.
I have a Docker container that runs a simple web application. That container is linked to two other containers by Docker Compose with the following docker-compose.yml file:
version: '2'
services:
mongo_service:
image: mongo
command: mongod
ports:
- '27017:27017'
tomcat_service:
image: 'bitnami/tomcat:latest'
ports:
- '8080:8080'
web:
# gain access to linked containers
links:
- mongo_service
- tomcat_service
# explicitly declare service dependencies
depends_on:
- mongo_service
- tomcat_service
# set environment variables
environment:
PYTHONUNBUFFERED: 'true'
# use the image from the Dockerfile in the cwd
build: .
ports:
- '8000:8000'
Once the web container starts, I want to write some content to /bitnami/tomcat/data/ on the tomcat_service container. I tried just writing to that disk location from within the web container but am getting an exception:
No such file or directory: '/bitnami/tomcat/data/'
Does anyone know what I can do to be able to write to the tomcat_service container from the web container? I'd be very grateful for any advice others can offer on this question!
you have to use docker volumes if you want one service to write to other service. If web writes to someFolderName the same file will exist in the tomcat_service.
version: '2'
services:
tomcat_service:
image: 'bitnami/tomcat:latest'
volumes:
- my_shared_data:/bitnami/tomcat/data/
web:
volumes:
- my_shared_data:/someFolderName
volumes:
my_shared_data:
Data in volumes persist and they will be available even next time you re-create docker containers. You should always use docker volumes when writing some data in docker containers.