How to do host-specific customization on docker images - docker

Let me explain what I want with a silly example:
After I do "docker pull" to download an image to my host, I want to create a file /etc/myname on this image to have the exact name of this host. As a result, all containers running this image on this host can find the hostname by reading /etc/myname.
Plus, I want the file /etc/myname to be shared across all contains on this host. I know I can easily create this file separately in each container, but that's not what I want.
(Again, this is just a silly example. I don't actually need to store the hostname. I want to store a large amount of host-specific data in a shared file, without using a shared volume).
I can do that by manually creating the file myself, where $dir is the top-most layer of the image:
dir=17024e41f8b6c958c5c9e60bffa8b6c8b2da5a1235b6e18085d5059f9602f605
echo $HOSTNAME > /var/lib/docker/aufs/diff/$dir/etc/myname
But is there a less hacky way to do this?

The easiest way to do this would be to use a shared volume, and that is in fact the only way to do it currently. I assume you know about bind mounting in docker, but I'll show here just in case.
To the docker run command, as well as passing -v <volume name>:<path in container> you can also pass <path on host>:<path in container>. So you could have your metadata in the same place on each host and then bind mount it into the containers.

Related

Using docker volumes in packer build

Is it possible to use existing docker or external volumes in/during packer build?
I saw in https://www.packer.io/docs/builders/docker/:
"VOLUME /test1 /test2"
What does it exactly mean? "VOLUME String EX: "VOLUME FROM TO"" doesn't explain much. Is /test1 from host?
I also saw in https://www.packer.io/docs/builders/docker/#volumes:
volumes (map[string]string) - A mapping of additional volumes to mount into this container. The key of the object is the host path, the value is the container path.
How can I make use of that? Where/how can I put/declare it , suppose that I want to map /etc/dnsmasq.d/ host path into the container, during build time and run time as well?
It has the same meaning as the corresponding Dockerfile directive (indeed, all of the directives in that section of the Packer documentation are Dockerfile commands). You probably don't need or want it.
This is different from the docker run -v option to mount content into a container. You cannot specify mount options like this at container build time (whether using docker build or Packer). You don't need to specify a VOLUME to be able to mount content on some container directory.
The Dockerfile VOLUME directive isn't needed for most common uses and mostly only has confusing side effects. You do not need it to mount configuration into your application; you do not need it to overwrite application source code with a development tree; the most obvious thing it does do is prevent future RUN instructions from having an effect. I'd avoid it unless you understand in detail what it does and why you want it.

What is the purpose of Dockerfile command "Volume"?

When a Dockerfile contains VOLUME instruction (say) VOLUME [/opt/apache2/www, ...] (hope this path exists in real installation), it means this path is going to be mounted to something (right?). And this VOLUME instruction is for the image and not for one instance of it (container) but for every instance.
Anyway irrespective of whether an image contains a VOLUME defined or not, at the time of starting a container the run command can create a volume by mapping a local host path to a container path.
docker run --name understanding_volumes -v /localhost/path1:/opt/apache2/www -v /localhost/path2:/any/container/path image_name
The above should make it clear that though /any/container/path is not defined as a VOLUME in Dockerfile, we are able to mount it while running container.
That said, this SOF question throws some light on it - What is the purpose of defining VOLUME mount points within DockerFile rather than adhoc cmd-line -v?. Here one benefit of VOLUME instruction is mentioned. Which is, other containers can benefit from it. Using the --from-container (could not find this option for docker run --help, not sure if the answer meant --volumes-from) Anyway thus the mount point is accessible to other container in some kind of automatic way. Great.
My first question is, is the other volume path /any/container/path image_name mounted on to the container understanding_volumes also available to the second container using --from-container or --volumes-from (whichever option is correct)?
My next question is, is the use of VOLUME instruction just to let the other containers link to this path --> that is to make the data on /opt/apache2/www available to other containers through easy linking. So it's just sharing out. Or is there any data that can be made available to first container too.
Defining a volume in a Dockerfile has the advantage of specifying the volume location inside the image definition as documentation from the image creator to the user of the image. That's just about the only upside.
It was added to docker very early on, quite possibly when data containers were the only way to persist data. We now have a solution for named volumes that has obsoleted data containers. We have also added the compose file to define how containers are run in an easy to understand and reuse syntax.
While there is the one upside of self documented images, there are quite a few downsides, to the point that I strongly recommend against defining a volume inside the image to my clients and anyone publishing images for general reuse:
The volume is forced on the end user, there's no way to undefine a volume in the image.
If the volume is not defined at runtime (with a -v or compose file), the user will see anonymous volumes in their docker volume ls that have no association to what created them. These are almost always useless wastes of disk space.
They break the ability to extend the image since any changes to a volume in an image after the VOLUME line are typically ignored by docker. This means a user can never add their own initial volume data, which is very confusing because docker gives no warning that it is ignoring the user changes during the image build.
If you need to have a volume as a user a runtime, you can always define it with a -v or compose file, even if that volume is not defined in the Dockerfile. Many users have the misconception that you must define it in the image to be able to make it a named volume at runtime.
The ability to use --volumes-from is unaffected by defining the volume in the image, but I'd encourage you to avoid this capability. It does not exist in swarm mode, and you can get all the same capabilities along with more granularity by using a named volume that you mount in two containers.

docker volume container strategy

Let's say you are trying to dockerise a database (couchdb for example).
Then there are at least two assets you consider volumes for:
database files
log files
Let's further say you want to keep the db-files private but want to expose the log-files for later processing.
As far as I undestand the documentation, you have two options:
First option
define managed volumes for both, log- and db-files within the db-image
import these in a second container (you will get both) and work with the logs
Second option
create data container with a managed volume for the logs
create the db-image with a managed volume for the db-files only
import logs-volume from data container when running db-image
Two questions:
Are both options realy valid/ possible?
What is the better way to do it?
br volker
The answer to question 1 is that, yes both are valid and possible.
My answer to question 2 is that I would consider a different approach entirely and which one to choose depends on whether or not this is a mission critical system and that data loss must be avoided.
Mission critical
If you absolutely cannot lose your data, then I would recommend that you bind mount a reliable disk into your database container. Bind mounting is essentially mounting a part of the Docker Host filesystem into the container.
So taking the database files as an example, you could image these steps:
Create a reliable disk e.g. NFS that is backed-up on a regular basis
Attach this disk to your Docker host
Bind mount this disk into my database container which then writes database files to this disk.
So following the above example, lets say I have created a reliable disk that is shared over NFS and mounted on my Docker Host at /reliable/disk. To use that with my database I would run the following Docker command:
docker run -d -v /reliable/disk:/data/db my-database-image
This way I know that the database files are written to reliable storage. Even if I lose my Docker Host, I will still have the database files and can easily recover by running my database container on another host that can access the NFS share.
You can do exactly the same thing for the database logs:
docker run -d -v /reliable/disk/data/db:/data/db -v /reliable/disk/logs/db:/logs/db my-database-image
Additionally you can easily bind mount these volumes into other containers for separate tasks. You may want to consider bind mounting them as read-only into other containers to protect your data:
docker run -d -v /reliable/disk/logs/db:/logs/db:ro my-log-processor
This would be my recommended approach if this is a mission critical system.
Not mission critical
If the system is not mission critical and you can tolerate a higher potential for data loss, then I would look at Docker Volume API which is used precisely for what you want to do: managing and creating volumes for data that should live beyond the lifecycle of a container.
The nice thing about the docker volume command is that it lets you created named volumes and if you name them well it can be quite obvious to people what they are used for:
docker volume create db-data
docker volume create db-logs
You can then mount these volumes into your container from the command line:
docker run -d -v db-data:/db/data -v db-logs:/logs/db my-database-image
These volumes will survive beyond the lifecycle of your container and are stored on the filesystem if your Docker host. You can use:
docker volume inspect db-data
To find out where the data is being stored and back-up that location if you want to.
You may also want to look at something like Docker Compose which will allow you to declare all of this in one file and then create your entire environment through a single command.

How can I have shared assets (pictures, text documents, etc) between my Docker container and host system?

I have a Docker container and I am trying to make it so that all of the files in /var/www/ on the container will be saved on the host system at a location (/home/me), and vise-versa. Is it possible to have this shared space between the two?
Would you accomplish this with mount points, or is there a better method?
Thanks
You can use volumes for sharing between container and host.
docker run -v /home/me:/var/www <image>
If you have a fixed files/data, you can add to the image using dockerfile or committing after copying into container. If you want to share rw dir between host and container, you need to use the volumes. Your data will also be persisted even if you remove and recreate a new container.
There are three ways that you can do this
Use volumes. Official docs
Burn the files in your image. Basically include the creation of the files inside the Dockerfile. This means every container container from that image will have an initial state of sorts.
Use data-only containers. These are containers without a running process that contain the data that you need. This also uses volumes. But instead of mounting to the host, your containers mount on the data-only container (which in turn mounts on the host if you want to). This answer will be useful

Appropriate use of Volumes - to push files into container?

I was reading Project Atomic's guidance for images which states that the 2 main use cases for using a volume are:-
sharing data between containers
when writing large files to disk
I have neither of these use cases in my example using an Nginx image. I intended to mount a host directory as a volume in the path of the Nginx docroot in the container. This is so that I can push changes to a website's contents into the host rather then addressing the container. I feel it is easier to use this approach since I can - for example - just add my ssh key once to the host.
My question is, is this an appropriate use of a data volume and if not can anyone suggest an alternative approach to updating data inside a container?
One of the primary reasons for using Docker is to isolate your app from the server. This means you can run your container anywhere and get the same result. This is my main use case for it.
If you look at it from that point of view, having your container depend on files on the host machine for a deployed environment is counterproductive- running the same container on a different machine may result in different output.
If you do NOT care about that, and are just using docker to simplify the installation of nginx, then yes you can just use a volume from the host system.
Think about this though...
#Dockerfile
FROM nginx
ADD . /myfiles
#docker-compose.yml
web:
build: .
You could then use docker-machine to connect to your remote server and deploy a new version of your software with easy commands
docker-compose build
docker-compose up -d
even better, you could do
docker build -t me/myapp .
docker push me/myapp
and then deploy with
docker pull
docker run
There's a number of ways to achieve updating data in containers. Host volumes are a valid approach and probably the simplest way to achieve making your data available.
You can also copy files into and out of a container from the host. You may need to commit afterwards if you are stopping and removing the running web host container at all.
docker cp /src/www webserver:/www
You can copy files into a docker image build from your Dockerfile, which is the same process as above (copy and commit). Then restart the webserver container from the new image.
COPY /src/www /www
But I think the host volume is a good choice.
docker run -v /src/www:/www webserver command
Docker data containers are also an option for mounted volumes but they don't solve your immediate problem of copying data into your data container.
If you ever find yourself thinking "I need to ssh into this container", you are probably doing it wrong.
Not sure if I fully understand your request. But why you need do that to push files into Nginx container.
Manage volume in separate docker container, that's my suggestion and recommend by Docker.io
Data volumes
A data volume is a specially-designated directory within one or more containers that bypasses the Union File System. Data volumes provide several useful features for persistent or shared data:
Volumes are initialized when a container is created. If the container’s base image contains data at the specified mount point, that existing data is copied into the new volume upon volume initialization.
Data volumes can be shared and reused among containers.
Changes to a data volume are made directly.
Changes to a data volume will not be included when you update an image.
Data volumes persist even if the container itself is deleted.
refer: Manage data in containers
As said, one of the main reasons to use docker is to achieve always the same result. A best practice is to use a data only container.
With docker inspect <container_name> you can know the path of the volume on the host and update data manually, but this is not recommended;
or you can retrieve data from an external source, like a git repository

Resources