Docker - How to access a volume not attached to a container? - docker

I have (had) a data container which has a volume used by other containers (--volumes-from).
The data container has accidentally been removed.
Thankfully the volume was not removed.
Is there any way I can re run the data container and point it BACK to this volume?

Is there any way can re run the data container and point it BACK to this volume?
Sure, I detailed it in "How to recover data from a deleted Docker container? How to reconnect it to the data?"
You need to create a new container with the same VOLUME (but its path /var/lib/docker/volumes/... would be empty or with an initial content)
Then you move your legacy volume to the path of the volume of the new container.
More generally, whenever I start a data volume container, I register its volume path in a file (to reuse that path later if my container is accidentally removed)

Not entirely sure but you might try
docker run -i -t --volumes-from yourvolume ubuntu /bin/bash
You should then be able to access the directory, i think.

When I came to this question, my main concern was data loss. Here is how I copied data from a volume to AWS S3:
# Create a dummy container - I like Python
host$ docker run -it -v my_volume:/datavolume1 python:3.7-slim bash
# Prepare AWS stuff
# executing 'cat ~/.aws/credentials' on your development machine
# will likely show them
python:3.7-slim$ pip install awscli
python:3.7-slim$ export AWS_ACCESS_KEY_ID=yourkeyid
python:3.7-slim$ export AWS_SECRET_ACCESS_KEY=yoursecrectaccesskey
# Copy
python:3.7-slim$ aws s3 cp /datavolume1/thefile.zip s3://bucket/key/thefile.zip
Alternatively you can use aws s3 sync.
MySQL / MariaDB
My specific example was about MySQL / MariaDB. If you want to backup a database of MySQL / MariaDB, just execute
$ mysqldump -u [username] -p [database_name] \
--single-transaction --quick --lock-tables=false \
> db1-backup-$(date +%F).sql
You might also want to consider
--skip-add-drop-table: Skip the table creation if it already exists. Without this flag, the table is dropped.
--complete-insert: Add the column names. Without this flag, there might be a column mismatch.
To restore the backup:
$ mysql -u [username] -p [database_name] < [filename].sql

Related

Google Cloud docker image file getting deleted

I am running a docker image for Juypter and tensorboard. The data seem to get deleted everytime the VM instance is stopped is there away to stop this from happening i could find anything on the web that would allow me to do this?
TL;DR: You are not persisting your data.
Docker containers does not persist data out of the box, you need to explicity tell docker to persist any data created inside the container when the container is deleted.
You can read more at Use volumes page at Docker documentation.
If you want to persist data you need to do the next steps:
Create a local volume inside the VM where you want to persist data. This command should be executed on the GCE instance
mkdir -p /opt/data/jupyterdata
Set the correct ownership of the folder to the user id that the user inside your container uses. For example, let's imagine that your container lspvic/tensorboard-notebook run the application using the user tensorflow with the UID 1500. So you need to set the ownership of your folder to the UID 1500:
chown 1500:1500 /opt/data/jupyterdata -R
Modify your docker run command to mount the local directory as a volume inside the container. For example, lets imagine that inside your container you want to save the files at /var/lib/jupyter (this is an example), you will need to modify the docker run command as follows:
docker run -it --rm -p 8888:8888 \
-v /opt/data/jupyterdata:/var/lib/jupyter:Z \
lspvic/tensorboard-notebook
NOTE: the :Z parameter is needed to avoid SELINUX issues
With this steps now your data saved on folder /var/lib/jupyter inside the container will be saved on /opt/data/jupyterdata inside the VM so no more data loss.

Docker NodeRed committed container does not maintain flows and modules

I'm working on a project using NodeRed deployed with docker and I would like to save the state of my deployment, including flows, settings and new added modules so that I can save the image and load it on another host replicating exactly the same NodeRed instance.
I created the container using:
docker run -itd --name my-nodered node-red
After implementing the flows and installing some custom modules, with the container running I used this command:
docker commit my-nodered my-project-nodered/my-nodered:version1
docker save my-project-nodered/my-nodered:version1 > tar-archive.tar.gz
And on another machine I'd imported the image using:
docker load < tar-archive.tar.gz
And run it using:
docker run -itd my-project-nodered/my-nodered:version1
And I obtain a vanilla NodeRed docker container with a default /data directory and just the files on the data directory maintained.
What am I missing? It could be possibile that my /data directory is overwrittenm as well as my settings.js file in the home directory? And in this case, which is the best practice to achieve my target?
Thank you a lot in advance
commit will not work, as you can see that there is volume defined in the Dockerfile.
# User configuration directory volume
VOLUME ["/data"]
That makes it impossible to create a derived image with any different content in that directory tree. (This is the same reason you can't create a mysql or postgresql image with prepopulated data.)
docker commit doesn't consider volumes at all, so you'll get an unchanged image with nothing preloaded in it.
You can see the offical documentation
Managing User Data
Once you have Node-RED running with Docker, we need to ensure any
added nodes or flows are not lost if the container is destroyed. This
user data can be persisted by mounting a data directory to a volume
outside the container. This can either be done using a bind mount or a
named data volume.
Node-RED uses the /data directory inside the container to store user
configuration data.
nodered-user-data-in-docker
one way is to restore the your config file on another machine, for example backup-config then
docker run -it -p 1880:1880 -v $PWD/backup-config/:/data --name mynodered nodered/node-red-docker
or if you want to full for some repo then you can try
docker run -it --rm -v "$PWD/$(wget https://raw.githubusercontent.com/openenergymonitor/oem_node-red/master/flows_emonpi.json)":/data/ nodered/node-red-docker

What is the use of VOLUME in this official Dockerfile of postgresql

I found the following code in the Dockerfile of official postgresql. https://github.com/docker-library/postgres/blob/master/11/Dockerfile
ENV PGDATA /var/lib/postgresql/data
RUN mkdir -p "$PGDATA" && chown -R postgres:postgres "$PGDATA" && chmod 777 "$PGDATA" # this 777 will be replaced by 700 at runtime (allows semi-arbitrary "--user" values)
VOLUME /var/lib/postgresql/data
I want to know what is the purpose of VOLUME in this regard.
VOLUME /var/lib/postgresql/data
As per my understanding it will create a new storage volume when we start a container and that storage volume will also be deleted permanently when the container is removed (docker stop contianerid; docker rm containeid)
Then if the data is not going to persist then why to use this. Because VOLUME are used if we want the data to persist.
My question is w.r.t what is its use if the postgres data is only going to remain only till the container is running and after that everything is wiped out. If i have done lot of work and in the end everything is gone then whats the use.
As per my understanding it will create a new storage volume when we start a container and that storage volume will also be deleted permanently when the container is removed (docker stop contianerid; docker rm containeid)
If you run a container with the --rm option, anonymous volumes are deleted when the container exits. If you do not pass the --rm option when creating the container, then the -v option to docker container rm will also delete volumes. Otherwise, these anonymous volumes will persist after a stop/rm.
That said, anonymous volumes are difficult to manage since it's not clear which volume contains what data. Particularly with images like postgresql, I would prefer if they removed the VOLUME line from their Dockerfile, and instead provided a compose file that defined the volume with a name. You can see more about what the VOLUME line does and why it creates problems in my answer over here.
Your understanding of how volumes works is almost correct but not completely.
As you stated, when you create a container from an image defining a VOLUME, docker will indeed create an anonymous volume (i.e. with a random name).
When you stop/remove the container the volume itself will not be deleted and will still be accessible by the docker volume family of commands.
Indeed in order to remove a container and delete associated volumes you have to use the -v flag as in docker rm -v container-name. This command will remove the container and delete all the anonymous volumes associated with it (named volume will never be deleted unless explicitly requested via docker volume rm volume-name).
So to sum up the VOLUME directive inside a Dockerfile is used to identify those places that will host persistent data and ensuring the following:
data will survive the life of the container by default
data can be shared with other containers (i.e. --volumes-from)
The most important aspect to me is that it also serves as a sort of implicit documentation for your user to let them know where the persistent state is kept (so that they can name the volumes via the -v flag of docker run).

How to migrate Docker volume between hosts?

Docker's documentation states that volumes can be "migrated" - which I'm assuming means that I should be able to move a volume from one host to another host. (More than happy to be corrected on this point.) However, the same documentation page doesn't provide information on how to do this.
Digging around on SO, I have found an older question (circa 2015-ish) that states that this is not possible, but given that it's 2 years on, I thought I'd ask again.
In case it helps, I'm developing a Flask app that uses [TinyDB] + local disk as its data storage - I have determined that I didn't need anything more fancy than that; this is a project done for learning at the moment, so I've decided to go extremely lightweight. The project is structured as such:
/project_directory
|- /app
|- __init__.py
|- ...
|- run.py # assumes `data/databases/ and data/files/` are present
|- Dockerfile
|- data/
|- databases/
|- db1.json
|- db2.json
|- files/
|- file1.pdf
|- file2.pdf
I have the folder data/* inside my .dockerignore and .gitignore, so that they are not placed under version control and are ignored by Docker when building the images.
While developing the app, I am also trying to work with database entries and PDFs that are as close to real-world as possible, so I seeded the app with a very small subset of real data, that are stored on a volume that is mounted directly into data/ when the Docker container is instantiated.
What I want to do is deploy the container on a remote host, but have the remote host seeded with the starter data (ideally, this would be the volume that I've been using locally, for maximal convenience); later on as more data are added on the remote host, I'd like to be able to pull that back down so that during development I'm working with up-to-date data that my end users have entered.
Looking around, the "hacky" way I'm thinking of doing is simply using rsync, which might work out just fine. However, if there's a solution I'm missing, I'd greatly appreciate guidance!
The way I would approach this is to generate a Docker container that stores a copy of the data you want to seed your development environment with. You can then expose the data in that container as a volume, and finally mount that volume into your development containers. I'll demonstrate with an example:
Creating the Data Container
Firstly we're just going to create a Docker container that contains your seed data and nothing else. I'd create a Dockerfile at ~/data/Dockerfile and give it the following content:
FROM alpine:3.4
ADD . /data
VOLUME /data
CMD /bin/true
You could then build this with:
docker build -t myproject/my-seed-data .
This will create you a Docker image tagged as myproject/my-seed-data:latest. The image simply contains all of the data you want to seed the environment with, stored at /data within the image. Whenever we create an instance of the image as a container, it will expose all of the files within /data as a volume.
Mounting the volume into another Docker container
I imagine you're running your Docker container something like this:
docker run -d -v $(pwd)/data:/data your-container-image <start_up_command>
You could now extend that to do the following:
docker run -d --name seed-data myproject/my-seed-data
docker run -d --volumes-from seed-data your-container-image <start_up_command>
What we're doing here is first creating an instance of your seed data container. We're then creating an instance of the development container and mounting the volumes from the data container into it. This means that you'll get the seed data at /data within your development container.
This gets a little bit of a pain that you know need to run two commands, so we could go ahead and orchestrate it a bit better with something like Docker Compose
Simple Orchestration with Docker Compose
Docker Compose is a way of running more than one container at the same time. You can declare what your environment needs to look like and do things like define:
"My development container depends on an instance of my seed data container"
You create a docker-compose.yml file to layout what you need. It would look something like this:
version: 2
services:
seed-data:
image: myproject/my-seed-data:latest
my_app:
build: .
volumes_from:
- seed-data
depends_on:
- seed-data
You can then start all containers at once using docker-compose up -d my_app. Docker Compose is smart enough to firstly start an instance of your data container, and then finally your app container.
Sharing the Data Container between hosts
The easiest way to do this is to push your data container as an image to Docker Hub. Once you have built the image, it can be pushed to Docker Hub as follows:
docker push myproject/my-seed-data:latest
It's very similar in concept to pushing a Git commit to a remote repository, instead in this case you're pushing a Docker image. What this does mean however is that any environment can now pull this image and use the data contained within it. That means you can re-generate the data image when you have new seed data, push it to Docker Hub under the :latest tag and when you re-start your dev environment will have the latest data.
To me this is the "Docker" way of sharing data and it keeps things portable between Docker environments. You can also do things like have your data container generated on a regular basis by a job within a CI environment like Jenkins.
According the Docker docs you could also create a Backup and Restore it:
Backup volume
docker run --rm --volumes-from CONTAINER -v \
$(pwd):/backup ubuntu tar cvf /backup/backup.tar /MOUNT_POINT_OF_VOLUME
Restore volume from backup on another host
docker run --rm --volumes-from CONTAINER -v \
$(pwd):/LOCAL_FOLDER ubuntu bash -c "cd /MOUNT_POINT_OF_VOLUME && \
tar xvf /backup/backup.tar --strip 1"
OR (what I prefer) just copy it to local storage
docker cp --archive CONTAINER:/MOUNT_POINT_OF_VOLUME ./LOCAL_FOLDER
then copy it to the other host and start with e.g.
docker run -v ./LOCAL_FOLDER:/MOUNT_POINT_OF_VOLUME some_image
you can use this trick :
docker run --rm -v <SOURCE_DATA_VOLUME_NAME>:/from alpine ash -c "cd /from ; tar -cf - . " | ssh <TARGET_HOST> 'docker run --rm -i -v <TARGET_DATA_VOLUME_NAME>:/to alpine ash -c "cd /to ; tar -xpvf - " '
more information

Change volume configuration in docker-compose without losing the data

My docker-compose has a data container which isn't mapped to a local directory in the host machine, and I want to change it from:
volumes:
- /var/www/html
to
volumes:
- /html:/var/www/html
But when I will restart the container, it will remove the current data container and replace it with a new one.
I know that the container is actually still there, but is there an easy way to do it without the creation of a new data container.
My docker-compose version is 1.7.1 (under boot2docker).
Thanks.
Try at your own risk:
create your host directory /htmlas you wish
docker inspect {container_name} | grep Source and grab your volume path on the host system. It'll be something like /var/lib/docker/volumes/abdb15a2eff[...]/_data
copy the content of that directory to your host directory
recreate the container as you wish.
One safe way to do this is to create a backup of the data from inside the Docker image. Then restore that backup to the directory on your host machine. The Docker Volumes Tutorial mentions a process like this near the bottom.
Here's how you'd do it:
First, mount a directory from your host machine into the container if you don't already have one mounted in. Maybe a volume like ./:/backup. Next, run a backup command like this:
docker-compose run service-name tar czvf /backup/html_data.tar.gz /var/www/html
Now you have html_data.tar.gz in your current directory. Extract it wherever you want and be on your way!
(I'm assuming, based on the way you indicated your volumes, that you're using docker-compose. The process is similar for vanilla Docker.)
Alternate approach, with --volumes-from
Get the name (or hash) of the container with the data you want to copy. You can do this with docker ps. For this example, let's call it container1. Now run this command to back up its data:
docker run --rm --volumes-from container1 -v $(pwd):/backup ubuntu:latest tar czvf /backup/html_data.tar.gz /var/www/html
Note that the image you use (ubuntu:latest) is not important as long as it can tar things.

Resources