Undo Dockerfile VOLUME directive from a base image [duplicate] - docker

This question already has answers here:
Docker base image includes a volume. How can I stop mounting it in my derived image
(2 answers)
Closed 2 years ago.
I have an image derived from the Postgres official image, whose Dockerfile includes the following:
VOLUME /var/lib/postgresql/data
I'd like to create my own image based on this official image, but I don't want it to reference any volume. I'd like the Postgres data to be inside my image.
Any ideas please?

You can't undo the VOLUME directive.
See this open issue on github: Reset properties inherited from parent image #3465
There are some solutions, based on answers of theses questions:
"Remove" a VOLUME in a Dockerfile
How to remove a volume in a Dockerfile
How to remove configure volumes in docker images
You can copy the base image and remove the VOLUME manually.
A workaround, see docker-copyedit. The script will docker save and image into an archive to modify its metadata and docker load it back into an image.
If you don't care about the volume and just want to put the data in the image, you'll have to use another location for the data and you can use use the environment variable PGDATA to define the new location.

Related

How to switch from docker named volumes to path based volumes without losing the container data? [duplicate]

This question already has answers here:
How to copy data from docker volume to host?
(2 answers)
Closed 2 years ago.
Im runing several docker containers on my raspberry pi. All containers are using named volumes to store persistent data. But since I often need to edit config files etc from the docker volumes I prefer to use path based volumes instead of named volumes that are managed by docker.
I first tought that I just could copy all the content from
/var/lib/docker/volumes/
to a folder on my home directory and remove all containers and rerun them with the new path based volumes.
But unfortunately this seems not to work. For example if I rerun portainer with the new path based volume (which is just the folder that I copied from /var/lib/docker/volumes/ ) I need to create a new user etc. as if portainer could not use the copied data. I already used chown to add permissions for the current user.
Hope someone can help.
Ok I've noticed that in every folder of the named volumes is a extra folder called "_data" I had to copy all the content from that underlying folders to the proper folders to be able to run the containers with the data of the named volumes.
I still dont know if this is the "correct way" of migrating from named to path based volumes but at least it works now.
Why does docker even create a extra folder named _data instead of just using the folder of the named volume?

How to create a docker volume from a recipe?

I have a bash script recipe that creates large assets. I'd like to create them once and use them in different docker containers. I'm assuming the best way to handle this is to create a docker volume containing these assets but how do I do that? I prefer not copy files into the volume directly from the host as that's not really version controlled. Can this be achieved using dockerfiles?
You can use docker-compose.yml to set up a volume sharing between containers. And you can declare mount points in respective Dockerfiles using VOLUME instruction. More in detail explanation is in Volumes Documentation
I think this answers my question. In summary:
Create a Dockerfile that's along these lines:
FROM ubuntu
RUN mkdir /dataset
RUN ***populate /dataset***
VOLUME /dataset
Build the docker image
Build a container from that image
Mount /dataset in any container you need using --volumes-from option

What is the purpose of Dockerfile command "Volume"?

When a Dockerfile contains VOLUME instruction (say) VOLUME [/opt/apache2/www, ...] (hope this path exists in real installation), it means this path is going to be mounted to something (right?). And this VOLUME instruction is for the image and not for one instance of it (container) but for every instance.
Anyway irrespective of whether an image contains a VOLUME defined or not, at the time of starting a container the run command can create a volume by mapping a local host path to a container path.
docker run --name understanding_volumes -v /localhost/path1:/opt/apache2/www -v /localhost/path2:/any/container/path image_name
The above should make it clear that though /any/container/path is not defined as a VOLUME in Dockerfile, we are able to mount it while running container.
That said, this SOF question throws some light on it - What is the purpose of defining VOLUME mount points within DockerFile rather than adhoc cmd-line -v?. Here one benefit of VOLUME instruction is mentioned. Which is, other containers can benefit from it. Using the --from-container (could not find this option for docker run --help, not sure if the answer meant --volumes-from) Anyway thus the mount point is accessible to other container in some kind of automatic way. Great.
My first question is, is the other volume path /any/container/path image_name mounted on to the container understanding_volumes also available to the second container using --from-container or --volumes-from (whichever option is correct)?
My next question is, is the use of VOLUME instruction just to let the other containers link to this path --> that is to make the data on /opt/apache2/www available to other containers through easy linking. So it's just sharing out. Or is there any data that can be made available to first container too.
Defining a volume in a Dockerfile has the advantage of specifying the volume location inside the image definition as documentation from the image creator to the user of the image. That's just about the only upside.
It was added to docker very early on, quite possibly when data containers were the only way to persist data. We now have a solution for named volumes that has obsoleted data containers. We have also added the compose file to define how containers are run in an easy to understand and reuse syntax.
While there is the one upside of self documented images, there are quite a few downsides, to the point that I strongly recommend against defining a volume inside the image to my clients and anyone publishing images for general reuse:
The volume is forced on the end user, there's no way to undefine a volume in the image.
If the volume is not defined at runtime (with a -v or compose file), the user will see anonymous volumes in their docker volume ls that have no association to what created them. These are almost always useless wastes of disk space.
They break the ability to extend the image since any changes to a volume in an image after the VOLUME line are typically ignored by docker. This means a user can never add their own initial volume data, which is very confusing because docker gives no warning that it is ignoring the user changes during the image build.
If you need to have a volume as a user a runtime, you can always define it with a -v or compose file, even if that volume is not defined in the Dockerfile. Many users have the misconception that you must define it in the image to be able to make it a named volume at runtime.
The ability to use --volumes-from is unaffected by defining the volume in the image, but I'd encourage you to avoid this capability. It does not exist in swarm mode, and you can get all the same capabilities along with more granularity by using a named volume that you mount in two containers.

Docker making image with Dockerfile and permanent helper files in its volume

New to Docker.
Is there a possible way to create docker image with some helper files which will be permanent in the volume of the image under container certain folder, without dependency to copy them each build time from the host machine where we build the image, since I may have host which down't contain these files.
Any help will be greatly appreciated
Yes, you can create first base image where these files will be placed. And you need to push this image to repository. After it you can create other images based on first image.
I try to explain idea in example.
Base image has Dockerfile
FROM ubuntu:16.04
...
COPY /my_big_files /my_big_files/
Build this image with tag my_image_with_files:latest and push it to repository
Other images based on first image can be buit on the another PC.
Dockerfile
FROM my_image_with_files:latest
...
RUN ls /by_big_files/ # <- your files already there!

Deploy a docker app using volume create

I have a Python app using a SQLite database (it's a data collector that runs daily by cron). I want to deploy it, probably on AWS or Google Container Engine, using Docker. I see three main steps:
1. Containerize and test the app locally.
2. Deploy and run the app on AWS or GCE.
3. Backup the DB periodically and download back to a local archive.
Recent posts (on Docker, StackOverflow and elsewhere) say that since 1.9, Volumes are now the recommended way to handle persisted data, rather than the "data container" pattern. For future compatibility, I always like to use the preferred, idiomatic method, however Volumes seem to be much more of a challenge than data containers. Am I missing something??
Following the "data container" pattern, I can easily:
Build a base image with all the static program and config files.
From that image create a data container image and copy my DB and backup directory into it (simple COPY in the Dockerfile).
Push both images to Docker Hub.
Pull them down to AWS.
Run the data and base images, using "--volume-from" to refer to the data.
Using "docker volume create":
I'm unclear how to copy my DB into the volume.
I'm very unclear how to get that volume (containing the DB) up to AWS or GCE... you can't PUSH/PULL a volume.
Am I missing something regarding Volumes?
Is there a good overview of using Volumes to do what I want to do?
Is there a recommended, idiomatic way to backup and download data (either using the data container pattern or volumes) as per my step 3?
When you first use an empty named volume, it will receive a copy of the image's volume data where it's first used (unlike a host based volume that completely overlays the mount point with the host directory). So you can initialize the volume contents in your main image as a volume, upload that image to your registry and pull that image down to your target host, create a named volume on that host, point your image to that named volume (using docker-compose makes the last two steps easy, it's really 2 commands at most docker volume create <vol-name> and docker run -v <vol-name>:/mnt <image>), and it will be populated with your initial data.
Retrieving the data from a container based volume or a named volume is an identical process, you need to mount the volume in a container and run an export/backup to your outside location. The only difference is in the command line, instead of --volumes-from <container-id> you have -v <vol-name>:/mnt. You can use this same process to import data into the volume as well, removing the need to initialize the app image with data in it's volume.
The biggest advantage of the new process is that it clearly separates data from containers. You can purge all the containers on the system without fear of losing data, and any volumes listed on the system are clear in their name, rather than a randomly assigned name. Lastly, named volumes can be mounted anywhere on the target, and you can pick and choose which of the volumes you'd like to mount if you have multiple data sources (e.g. config files vs databases).

Resources