Docker temporary files strategy - docker

My docker produces some temporary files.
Is there an encouraged strategy regarding those?
If I put those to /tmp, I'm not sure they'll get cleared. (Edit note: the link is dead. The question was, "Are default cronjobs executed in a docker container?")
Or should I expose the volume /tmp from the host machine?

I am not aware of any encouraged way to manage temporary files with Docker as it will mostly depend on how you need to handle these temporary files with your application (should they be deleted on restart? Periodically?...)
You have several possibilities depending on your needs:
Use Docker tmpfs mount
You can mount a tmpfs volume which will persist data as long as the container is running (i.e. the data in the volume will be deleted when the container stops), for example:
docker run --mount type=tmpfs,destination=/myapp/tmpdir someimage
This may be useful if you (can) restart your containers regularly and the temporary data may be recreated on container restart. However if you need to be able to clean up temporary data while the container is running, this is not a good solution as you will need to stop the container to have your temporary data cleaned.
Edit: as per #alexander-azarov coment, the tmpfs volume size is unlimited by default with the risk of the container using up all the machine memory. Using tmpfs-size flag is recommended to mitigate that risk, such as docker run --mount type=tmpfs,destination=/app,tmpfs-size=4096
Writing into the container writable layer
The writable layer of the container is where all the data will be written in the container if no volume is mounted. It will persist on container restart, but will be deleted if the container is deleted.
This way the temporary data will be deleted only when the container is deleted. It may be a good solution for short-lived containers, but not for long-lived containers.
Mounting host machine /tmp in the container with a bind mount
For example:
docker run -v /tmp/myapp-tmp-dir:/myapp/tmpdir someimage
This will cause all data to be written in the host machine /tmp/myapp-tmp-dir directory, and result will depend on how the host machine manage /tmp (in most cases, data are cleared upon machine restart)
Create and mount a volume to manage data into
You can create a volume which will contain your data, for example:
docker run --mount source=myappvol,target=/myapp/tmpdir someimage
And manage the data in the volume: mount-it in another container and cleanup the data, deleting the volume, etc.
These are the most common solutions relying (almost) solely on Docker functionalities. Another possibility would be to handle temporary files directly from your software or app running in the container, but it's more an application-related issue than a Docker-related one.

Related

How does volume mount from container to host and vice versa work?

docker run -ti --rm -v DataVolume3:/var ubuntu
Lets say I have a volume DataVolume 3 which pulls the contents of /var in the ubuntu container
even after killing this ubuntu container the volume remains and I can use this volume DataVolume3 to mount it to other containers.
This means with the deletion of container the volume mounts are not deleted.
How does this work ?
Does that volume mount mean that it copies the contents of /var into some local directory because this does not look like a symbolic link ?
If I have the container running and I create a file in the container then the same file gets copied to the host path ?
How does this whole process of volume mount from container to host and host to container work ?
Volumes are used for persistent storage and the volumes persists independent of the lifecycle of the container.
We can go through a demo to understand it clearly.
First, let's create a container using the named volumes approach as:
docker run -ti --rm -v DataVolume3:/var ubuntu
This will create a docker volume named DataVolume3 and it can be viewed in the output of docker volume ls:
docker volume ls
DRIVER VOLUME NAME
local DataVolume3
Docker stores the information about these named volumes in the directory /var/lib/docker/volumes/ (*):
ls /var/lib/docker/volumes/
1617af4bce3a647a0b93ed980d64d97746878564b141f30b6110d0818bf32b76 DataVolume3
Next, let's write some data from the ubuntu container at the mounted path var:
echo "hello" > var/file1
root#2b67a89a0050:/# cat /var/file1
hello
We can see this data with cat even after deleting the container:
cat /var/lib/docker/volumes/DataVolume3/_data/file1
hello
Note: Although, we are able to access the volumes like shown above but it not a recommended practice to access volumes data like this.
Now, next time when another container uses the same volume then the data from the volume gets mounted at the container directory specified as part of -v flag.
(*) The location may vary based on OS as pointed by David and probably can be seen by the docker volume inspect command.
Docker has a concept of a named volume. By default the storage for this lives somewhere on your host system and you can't directly access it from outside Docker (*). A named volume has its own lifecycle, it can be independently docker volume rm'd, and if you start another container mounting the same volume, it will have the same persistent content.
The docker run -v option takes some unit of storage, either a named volume or a specific host directory, and mounts it (as in the mount(8) command) in a specific place in the container filesystem. This will hide what was originally in the image and replace it with the volume content.
As you note, if the thing you mount is an empty named volume, it will get populated from the image content at container initialization time. There are some really important caveats on this functionality:
Named volume initialization happens only if the volume is totally empty.
The contents of the named volume never automatically update.
If the volume isn't empty, the volume contents completely replace what's in the image, even if it's changed.
The initialization happens only on native Docker, and not for example in Kubernetes.
The initialization happens only on named volumes, and not for bind-mounted host directories.
With all of these caveats, I'd avoid relying on this functionality.
If you need to mount a volume into a container, assume it will be empty when your entrypoint or the main container command starts. If you need a particular directory layout or file structure there, an entrypoint script can create it; if you're expecting it to hold particular data, keep a copy of it somewhere else in your image and copy it in if it's not already there (or, perhaps, always).
(*) On native Linux you can find a filesystem location for it, but accessing this isn't a best practice. On other OSes this will be hidden inside a virtual machine or other opaque storage. If you need to directly access the data (or inject config files, or read log files) a docker run -v /host/path:/container/path bind mount is a better choice.
Volumes are part of neither the container nor the host. Well, technically everything resides in the host machine. But the docker directories are only accessible by users in "docker" group. The files in these directories are separately managed by docker.
"Volumes are stored in a part of the host filesystem which is managed by Docker (/var/lib/docker/volumes/ on Linux)."
Hence volumes are like the union of files under the docker container and the host itself. Any addition on either end will be added to the volume(/var/lib/docker/volumes), not hard copy, rather something like symbol link
As volumes can be shared across different containers, deleting a container does not cascade to the volumes associated with it.
To remove unused volumes:
docker volume prune .

"a bind mount won't copy the container contents to the host automatically, unlike a named volume"

Need clarity on a comment here:
The only 'problem' with a bind mount is that it won't copy the
container contents to the host automatically, unlike a named volume.
docs.docker.com/compose/compose-file/#volumes
Is this accurate? If yes, then:
how does one get the container's "new data" (e.g. a growing database) into the host when using a bind mount (to persist the data in case of a container restart)?
how did Docker persist data across container restarts before there were named volumes?
The only 'problem' with a bind mount is that it won't copy the
container contents to the host automatically, unlike a named volume.
Is this accurate?
Close to accurate, but I can see the confusion. Host volumes, aka bind mounts, do not have an initialization feature from docker. With anonymous and named volumes, docker will initialize the volume with the contents of the image at that path. This initialization includes ownership and permissions which helps avoid permission errors. This initialization only runs when the container is created and the volume is new or empty, so subsequent containers will not pickup changes to the image made in newer image versions.
If yes, then:
how does one get the container's "new data" (e.g. a growing database) into the host when using a bind mount (to persist the data
in case of a container restart)?
Reads and writes from the app in the container will continue through to the host filesystem used in the bind mount as expected. It's only the initialization step that doesn't run.
how did Docker persist data across container restarts before there were named volumes?
There were data containers, mounting volumes from other containers, but this was inflexible (all volume paths were fixed to the path in the data container) and mixed management of persistent data with ephemeral containers, and has therefore been phased out.
Volumes are used to handle data persistence between containers. A single container restarting (rather than being replaced) will still have all the container specific filesystem changes. The docker rm command deletes these filesystem changes, along with container logs and metadata/configuration of the container.
The container specific changes are the read/write top layer of an overlay filesystem used by docker. Volume mounts are all separate mounts into subdirectories of this overlay filesystem (just like /home or /var are often separate filesystem mounts in the / filesystem of a Linux host, all reads and writes to those other paths go to a separate underlying filesystem).
If you're going to mount a volume into a container, and you want that volume to reliably contain some content from the image, you need to manually copy it there at container startup time. One way to do this is with an entrypoint wrapper script:
#!/bin/sh
# Copy data into a possibly-mounted location
cp -a /app/static /var/www
# Then run the image's CMD
exec "$#"
You'd include this in your image's Dockerfile
# Must use JSON-array syntax
ENTRYPOINT ["/app/entrypoint.sh"]
CMD same as it was before
There are two important details about Docker named volumes' initialization behavior to be aware of here. The first, which you note, is that Docker only copies content into a volume for Docker named volumes; it doesn't happen for bind mounts, and it doesn't happen in other environments like Kubernetes.
The second, more subtle detail is that the initialization only happens the first time the container runs. If there's already content in a volume that you mount into a container, it will hide what was already there. In other SO questions you can see this manifest as, for example, "I added a package to my Node package.json file, but when I put the node_modules directory in a volume, it ignores the update" or "I'm using a volume to export content to an nginx proxy but it doesn't update".
I think #BMitch having the accepted answer is correct, but I will just try to add in some details with the hope of being useful.
Is this accurate? If yes, then:
Given it is my claim being scrutinised - I totally defer to #BMitch here :)!
However I would also add:
https://github.com/docker/compose/issues/4581#issuecomment-389559090
Provides a layman explanation of how named volumes / host volumes behave
My explanation needs updated to reflect the notion of 'initialization'
https://stackoverflow.com/a/40030535/3080207
This is how I would recommend setting up volumes in docker-compose at the moment, courtesy of #kaiser
how does one get the container's "new data" (e.g. a growing database) into the host when using a bind mount (to persist the data in case of a container restart)?
Both host volumes and named volumes can achieve this.
I think the point of contention is what you want to happen on the:
first run of the container
subsequent runs of the container and
the location/accessibility of the volume on the host system.
Once a volume is attached to a container (be it a named volume or bind mount), whatever is stored to that volume should be persisted between restarts - that effectively comes for free. This assumes the same docker-compose config, and no manual removal of volumes.
Previously it was a bit limiting using a named volume, as you couldn't tail logs, or edit code directly from the host as easily as you could with a bind mount - but it seems that problem is resolved / has a work around now.
Bind mounts are able to persist data between restarts. I personally find that bind volumes do what I want 99% of the time, that being said, named volumes can now 'do it all' and I'd be using those moving forward.
There are differences between them though, and I'm sure they'll still bite people occasionally, requiring them to reach out to actual experts, instead of users like me :).

What Is The Difference Between Binding Mounts And Volumes While Handling Persistent Data In Docker Containers?

I want to know why we have two different options to do the same thing, What are the differences between the two.
We basically have 3 types of volumes or mounts for persistent data:
Bind mounts
Named volumes
Volumes in dockerfiles
Bind mounts are basically just binding a certain directory or file from the host inside the container (docker run -v /hostdir:/containerdir IMAGE_NAME)
Named volumes are volumes which you create manually with docker volume create VOLUME_NAME. They are created in /var/lib/docker/volumes and can be referenced to by only their name. Let's say you create a volume called "mysql_data", you can just reference to it like this docker run -v mysql_data:/containerdir IMAGE_NAME.
And then there's volumes in dockerfiles, which are created by the VOLUME instruction. These volumes are also created under /var/lib/docker/volumes but don't have a certain name. Their "name" is just some kind of hash. The volume gets created when running the container and are handy to save persistent data, whether you start the container with -v or not. The developer gets to say where the important data is and what should be persistent.
What should I use?
What you want to use comes mostly down to either preference or your management. If you want to keep everything in the "docker area" (/var/lib/docker) you can use volumes. If you want to keep your own directory-structure, you can use binds.
Docker recommends the use of volumes over the use of binds, as volumes are created and managed by docker and binds have a lot more potential of failure (also due to layer 8 problems).
If you use binds and want to transfer your containers/applications on another host, you have to rebuild your directory-structure, where as volumes are more uniform on every host.
Volumes are the preferred mechanism for persisting data generated by and used by Docker containers. While bind mounts are dependent on the directory structure of the host machine, volumes are completely managed by Docker. Volumes are often a better choice than persisting data in a container’s writable layer, because a volume does not increase the size of the containers using it, and the volume’s contents exist outside the lifecycle of a given container. More on
Differences between -v and --mount behavior
Because the -v and --volume flags have been a part of Docker for a long time, their behavior cannot be changed. This means that there is one behavior that is different between -v and --mount.
If you use -v or --volume to bind-mount a file or directory that does not yet exist on the Docker host, -v creates the endpoint for you. It is always created as a directory.
If you use --mount to bind-mount a file or directory that does not yet exist on the Docker host, Docker does not automatically create it for you, but generates an error. More on
Docker for Windows shared folders limitation
Docker for Windows does make much of the VM transparent to the Windows host, but it is still a virtual machine. For instance, when using –v with a mongo container, MongoDB needs something else supported by the file system. There is also this issue about volume mounts being extremely slow.
More on
Bind mounts are like a superset of Volumes (named or unnamed).
Bind mounts are created by binding an existing folder in the host system (host system is native linux machine or vm (in windows or mac)) to a path in the container.
Volume command results in a new folder, created in the host system under /var/lib/docker
Volumes are recommended because they are managed by docker engine (prune, rm, etc).
A good use case for bind mount is linking development folders to a path in the container. Any change in host folder will be reflected in the container.
Another use case for bind mount is keeping the application log which is not crucial like a database.
Command syntax is almost the same for both cases:
bind mount:
note that the host path should start with '/'. Use $(pwd) for convenience.
docker container run -v /host-path:/container-path image-name
unnamed volume:
creates a folder in the host with an arbitrary name
docker container run -v /container-path image-name
named volume:
should not start with '/' as this is reserved for bind mount.
'volume-name' is not a full path here. the command will cause a folder to be created with path "/var/lib/docker/volumes/volume-name" in the host.
docker container run -v volume-name:/container-path image-name
A named volume can also be created beforehand a container is run (docker volume create). But this is almost never needed.
As a developer, we always need to do comparison among the options provided by tools or technology. For Volume & Bind mounts, I would suggest to list down what kind of application you are trying to containerize.
Following are the parameters that I would consider before choosing Volume over Bind Mounts:
Docker provide various CLI commands to Volumes easily outside containers.
For backup & restore, Volume is far easier than Bind as it depends upon the underlying host OS.
Volumes are platform-agnostic so they can work on Linux as well as on Window containers.
With Bind, you have 2 technologies to take care of. Your host machine directory structure as well as Docker.
Migration of Volumes are easier not only on local machines but on cloud machines as well.
Volumes can be easily shared among multiple containers.

docker volume container strategy

Let's say you are trying to dockerise a database (couchdb for example).
Then there are at least two assets you consider volumes for:
database files
log files
Let's further say you want to keep the db-files private but want to expose the log-files for later processing.
As far as I undestand the documentation, you have two options:
First option
define managed volumes for both, log- and db-files within the db-image
import these in a second container (you will get both) and work with the logs
Second option
create data container with a managed volume for the logs
create the db-image with a managed volume for the db-files only
import logs-volume from data container when running db-image
Two questions:
Are both options realy valid/ possible?
What is the better way to do it?
br volker
The answer to question 1 is that, yes both are valid and possible.
My answer to question 2 is that I would consider a different approach entirely and which one to choose depends on whether or not this is a mission critical system and that data loss must be avoided.
Mission critical
If you absolutely cannot lose your data, then I would recommend that you bind mount a reliable disk into your database container. Bind mounting is essentially mounting a part of the Docker Host filesystem into the container.
So taking the database files as an example, you could image these steps:
Create a reliable disk e.g. NFS that is backed-up on a regular basis
Attach this disk to your Docker host
Bind mount this disk into my database container which then writes database files to this disk.
So following the above example, lets say I have created a reliable disk that is shared over NFS and mounted on my Docker Host at /reliable/disk. To use that with my database I would run the following Docker command:
docker run -d -v /reliable/disk:/data/db my-database-image
This way I know that the database files are written to reliable storage. Even if I lose my Docker Host, I will still have the database files and can easily recover by running my database container on another host that can access the NFS share.
You can do exactly the same thing for the database logs:
docker run -d -v /reliable/disk/data/db:/data/db -v /reliable/disk/logs/db:/logs/db my-database-image
Additionally you can easily bind mount these volumes into other containers for separate tasks. You may want to consider bind mounting them as read-only into other containers to protect your data:
docker run -d -v /reliable/disk/logs/db:/logs/db:ro my-log-processor
This would be my recommended approach if this is a mission critical system.
Not mission critical
If the system is not mission critical and you can tolerate a higher potential for data loss, then I would look at Docker Volume API which is used precisely for what you want to do: managing and creating volumes for data that should live beyond the lifecycle of a container.
The nice thing about the docker volume command is that it lets you created named volumes and if you name them well it can be quite obvious to people what they are used for:
docker volume create db-data
docker volume create db-logs
You can then mount these volumes into your container from the command line:
docker run -d -v db-data:/db/data -v db-logs:/logs/db my-database-image
These volumes will survive beyond the lifecycle of your container and are stored on the filesystem if your Docker host. You can use:
docker volume inspect db-data
To find out where the data is being stored and back-up that location if you want to.
You may also want to look at something like Docker Compose which will allow you to declare all of this in one file and then create your entire environment through a single command.

Appropriate use of Volumes - to push files into container?

I was reading Project Atomic's guidance for images which states that the 2 main use cases for using a volume are:-
sharing data between containers
when writing large files to disk
I have neither of these use cases in my example using an Nginx image. I intended to mount a host directory as a volume in the path of the Nginx docroot in the container. This is so that I can push changes to a website's contents into the host rather then addressing the container. I feel it is easier to use this approach since I can - for example - just add my ssh key once to the host.
My question is, is this an appropriate use of a data volume and if not can anyone suggest an alternative approach to updating data inside a container?
One of the primary reasons for using Docker is to isolate your app from the server. This means you can run your container anywhere and get the same result. This is my main use case for it.
If you look at it from that point of view, having your container depend on files on the host machine for a deployed environment is counterproductive- running the same container on a different machine may result in different output.
If you do NOT care about that, and are just using docker to simplify the installation of nginx, then yes you can just use a volume from the host system.
Think about this though...
#Dockerfile
FROM nginx
ADD . /myfiles
#docker-compose.yml
web:
build: .
You could then use docker-machine to connect to your remote server and deploy a new version of your software with easy commands
docker-compose build
docker-compose up -d
even better, you could do
docker build -t me/myapp .
docker push me/myapp
and then deploy with
docker pull
docker run
There's a number of ways to achieve updating data in containers. Host volumes are a valid approach and probably the simplest way to achieve making your data available.
You can also copy files into and out of a container from the host. You may need to commit afterwards if you are stopping and removing the running web host container at all.
docker cp /src/www webserver:/www
You can copy files into a docker image build from your Dockerfile, which is the same process as above (copy and commit). Then restart the webserver container from the new image.
COPY /src/www /www
But I think the host volume is a good choice.
docker run -v /src/www:/www webserver command
Docker data containers are also an option for mounted volumes but they don't solve your immediate problem of copying data into your data container.
If you ever find yourself thinking "I need to ssh into this container", you are probably doing it wrong.
Not sure if I fully understand your request. But why you need do that to push files into Nginx container.
Manage volume in separate docker container, that's my suggestion and recommend by Docker.io
Data volumes
A data volume is a specially-designated directory within one or more containers that bypasses the Union File System. Data volumes provide several useful features for persistent or shared data:
Volumes are initialized when a container is created. If the container’s base image contains data at the specified mount point, that existing data is copied into the new volume upon volume initialization.
Data volumes can be shared and reused among containers.
Changes to a data volume are made directly.
Changes to a data volume will not be included when you update an image.
Data volumes persist even if the container itself is deleted.
refer: Manage data in containers
As said, one of the main reasons to use docker is to achieve always the same result. A best practice is to use a data only container.
With docker inspect <container_name> you can know the path of the volume on the host and update data manually, but this is not recommended;
or you can retrieve data from an external source, like a git repository

Resources