Using docker volumes in packer build - docker

Is it possible to use existing docker or external volumes in/during packer build?
I saw in https://www.packer.io/docs/builders/docker/:
"VOLUME /test1 /test2"
What does it exactly mean? "VOLUME String EX: "VOLUME FROM TO"" doesn't explain much. Is /test1 from host?
I also saw in https://www.packer.io/docs/builders/docker/#volumes:
volumes (map[string]string) - A mapping of additional volumes to mount into this container. The key of the object is the host path, the value is the container path.
How can I make use of that? Where/how can I put/declare it , suppose that I want to map /etc/dnsmasq.d/ host path into the container, during build time and run time as well?

It has the same meaning as the corresponding Dockerfile directive (indeed, all of the directives in that section of the Packer documentation are Dockerfile commands). You probably don't need or want it.
This is different from the docker run -v option to mount content into a container. You cannot specify mount options like this at container build time (whether using docker build or Packer). You don't need to specify a VOLUME to be able to mount content on some container directory.
The Dockerfile VOLUME directive isn't needed for most common uses and mostly only has confusing side effects. You do not need it to mount configuration into your application; you do not need it to overwrite application source code with a development tree; the most obvious thing it does do is prevent future RUN instructions from having an effect. I'd avoid it unless you understand in detail what it does and why you want it.

Related

Docker force usage of volume

I know that you can specify a volume inside the dockerfile, but I see the problem that the user is not required to create such a volume.
What if he forgot to specify a volume and than there are many, possibly expensive to create, files saved there, but they are not persistent, because there is no volume specified?
So my question is if it is possible to force the user to create a volume for that mountpoint, or at least check at start time (inside the container) if there is a volume mounted, so that it can react to the missing volume?
EDIT: With the new information that there are automatic created unnnamed volumes I would also accept a user-side solution (not changing the container in such a ways that he checks the volume, but a docker-deamon settings which warn/prevents me from creating unnamed volumes by mistake.
I think the VOLUME declaration is the best you can do here.
In general, a container cannot force itself to be run with any particular options. You could make a similar argument that a container "must" be run with published port or with an attached stdin to be useful, but Docker doesn't allow an image to force these on either. (And more importantly, an image can't require direct access to the host filesystem, host networking, or privileged mode.)
As #masseyb notes in a comment, the key effect of the Dockerfile VOLUME directive is to create a new anonymous volume on the given directory if nothing else is mounted there. docker volume ls will show it and you should be able to use the volume ID directly in docker run -v options, so you won't actually lose data here. (There doesn't seem to be a command to give a name to the volume, surprisingly.)
In principle it's possible to check some things in an entrypoint wrapper script, but that won't work well for this volume case. The container can't tell whether a directory is an automatically-created anonymous volume or a new empty named volume.
(Also remember that volumes, including automatically-created anonymous volumes, are never committed to images. In your Dockerfile you can't change the directory content after you declare it a VOLUME; if an end user tries to docker commit a derived image it won't include the volume data. Unless you're sure it's what you want, I usually advise against declaring VOLUME. The case you describe in the question is pretty much the one case where it's useful.)

What is the purpose of Dockerfile command "Volume"?

When a Dockerfile contains VOLUME instruction (say) VOLUME [/opt/apache2/www, ...] (hope this path exists in real installation), it means this path is going to be mounted to something (right?). And this VOLUME instruction is for the image and not for one instance of it (container) but for every instance.
Anyway irrespective of whether an image contains a VOLUME defined or not, at the time of starting a container the run command can create a volume by mapping a local host path to a container path.
docker run --name understanding_volumes -v /localhost/path1:/opt/apache2/www -v /localhost/path2:/any/container/path image_name
The above should make it clear that though /any/container/path is not defined as a VOLUME in Dockerfile, we are able to mount it while running container.
That said, this SOF question throws some light on it - What is the purpose of defining VOLUME mount points within DockerFile rather than adhoc cmd-line -v?. Here one benefit of VOLUME instruction is mentioned. Which is, other containers can benefit from it. Using the --from-container (could not find this option for docker run --help, not sure if the answer meant --volumes-from) Anyway thus the mount point is accessible to other container in some kind of automatic way. Great.
My first question is, is the other volume path /any/container/path image_name mounted on to the container understanding_volumes also available to the second container using --from-container or --volumes-from (whichever option is correct)?
My next question is, is the use of VOLUME instruction just to let the other containers link to this path --> that is to make the data on /opt/apache2/www available to other containers through easy linking. So it's just sharing out. Or is there any data that can be made available to first container too.
Defining a volume in a Dockerfile has the advantage of specifying the volume location inside the image definition as documentation from the image creator to the user of the image. That's just about the only upside.
It was added to docker very early on, quite possibly when data containers were the only way to persist data. We now have a solution for named volumes that has obsoleted data containers. We have also added the compose file to define how containers are run in an easy to understand and reuse syntax.
While there is the one upside of self documented images, there are quite a few downsides, to the point that I strongly recommend against defining a volume inside the image to my clients and anyone publishing images for general reuse:
The volume is forced on the end user, there's no way to undefine a volume in the image.
If the volume is not defined at runtime (with a -v or compose file), the user will see anonymous volumes in their docker volume ls that have no association to what created them. These are almost always useless wastes of disk space.
They break the ability to extend the image since any changes to a volume in an image after the VOLUME line are typically ignored by docker. This means a user can never add their own initial volume data, which is very confusing because docker gives no warning that it is ignoring the user changes during the image build.
If you need to have a volume as a user a runtime, you can always define it with a -v or compose file, even if that volume is not defined in the Dockerfile. Many users have the misconception that you must define it in the image to be able to make it a named volume at runtime.
The ability to use --volumes-from is unaffected by defining the volume in the image, but I'd encourage you to avoid this capability. It does not exist in swarm mode, and you can get all the same capabilities along with more granularity by using a named volume that you mount in two containers.

How to do host-specific customization on docker images

Let me explain what I want with a silly example:
After I do "docker pull" to download an image to my host, I want to create a file /etc/myname on this image to have the exact name of this host. As a result, all containers running this image on this host can find the hostname by reading /etc/myname.
Plus, I want the file /etc/myname to be shared across all contains on this host. I know I can easily create this file separately in each container, but that's not what I want.
(Again, this is just a silly example. I don't actually need to store the hostname. I want to store a large amount of host-specific data in a shared file, without using a shared volume).
I can do that by manually creating the file myself, where $dir is the top-most layer of the image:
dir=17024e41f8b6c958c5c9e60bffa8b6c8b2da5a1235b6e18085d5059f9602f605
echo $HOSTNAME > /var/lib/docker/aufs/diff/$dir/etc/myname
But is there a less hacky way to do this?
The easiest way to do this would be to use a shared volume, and that is in fact the only way to do it currently. I assume you know about bind mounting in docker, but I'll show here just in case.
To the docker run command, as well as passing -v <volume name>:<path in container> you can also pass <path on host>:<path in container>. So you could have your metadata in the same place on each host and then bind mount it into the containers.

Define Docker container volume bindings in a configuration file?

Is there a way to define all the volume bindings either in the Dockerfile or another configuration file that I am not aware of?
Since volume bindings are used when you create a container, you can't define them in the Dockerfile (which is used to build your Docker image, not the container).
If you want a way to define the volume bindings without having to type them every time, you have the following options:
Create a script that runs the docker command and includes all of the volume options.
If you want to run more than one container, you can also use Docker Compose and define the volume bindings in the docker-compose.yaml file: https://docs.docker.com/compose/compose-file/#/volumes-volumedriver
Out of the two, I prefer Docker Compose, since it includes lots of other cool functionality, e.g. allowing you to define the port bindings, having links between containers, etc. You can do all of that in a script as well, but as soon as you use more than one container at a time for the same application (e.g. a web server container talking to a database container), Docker Compose makes a lot of sense, since you have the configuration in one place, and you can start/stop all of your containers with one single command.

Appropriate use of Volumes - to push files into container?

I was reading Project Atomic's guidance for images which states that the 2 main use cases for using a volume are:-
sharing data between containers
when writing large files to disk
I have neither of these use cases in my example using an Nginx image. I intended to mount a host directory as a volume in the path of the Nginx docroot in the container. This is so that I can push changes to a website's contents into the host rather then addressing the container. I feel it is easier to use this approach since I can - for example - just add my ssh key once to the host.
My question is, is this an appropriate use of a data volume and if not can anyone suggest an alternative approach to updating data inside a container?
One of the primary reasons for using Docker is to isolate your app from the server. This means you can run your container anywhere and get the same result. This is my main use case for it.
If you look at it from that point of view, having your container depend on files on the host machine for a deployed environment is counterproductive- running the same container on a different machine may result in different output.
If you do NOT care about that, and are just using docker to simplify the installation of nginx, then yes you can just use a volume from the host system.
Think about this though...
#Dockerfile
FROM nginx
ADD . /myfiles
#docker-compose.yml
web:
build: .
You could then use docker-machine to connect to your remote server and deploy a new version of your software with easy commands
docker-compose build
docker-compose up -d
even better, you could do
docker build -t me/myapp .
docker push me/myapp
and then deploy with
docker pull
docker run
There's a number of ways to achieve updating data in containers. Host volumes are a valid approach and probably the simplest way to achieve making your data available.
You can also copy files into and out of a container from the host. You may need to commit afterwards if you are stopping and removing the running web host container at all.
docker cp /src/www webserver:/www
You can copy files into a docker image build from your Dockerfile, which is the same process as above (copy and commit). Then restart the webserver container from the new image.
COPY /src/www /www
But I think the host volume is a good choice.
docker run -v /src/www:/www webserver command
Docker data containers are also an option for mounted volumes but they don't solve your immediate problem of copying data into your data container.
If you ever find yourself thinking "I need to ssh into this container", you are probably doing it wrong.
Not sure if I fully understand your request. But why you need do that to push files into Nginx container.
Manage volume in separate docker container, that's my suggestion and recommend by Docker.io
Data volumes
A data volume is a specially-designated directory within one or more containers that bypasses the Union File System. Data volumes provide several useful features for persistent or shared data:
Volumes are initialized when a container is created. If the container’s base image contains data at the specified mount point, that existing data is copied into the new volume upon volume initialization.
Data volumes can be shared and reused among containers.
Changes to a data volume are made directly.
Changes to a data volume will not be included when you update an image.
Data volumes persist even if the container itself is deleted.
refer: Manage data in containers
As said, one of the main reasons to use docker is to achieve always the same result. A best practice is to use a data only container.
With docker inspect <container_name> you can know the path of the volume on the host and update data manually, but this is not recommended;
or you can retrieve data from an external source, like a git repository

Resources