I'm not 100% clear whether this post is appropriate for stack overflow, or if it should go somewhere else - please suggest where else if it shouldn't be here.
I am trying to understand how docker images work. The particular reference in this case is the Dockerfile at https://github.com/frappe/frappe_docker/blob/main/images/production/Containerfile
This file contains VOLUME directives, and some of the RUN commands in it modify the contents of the paths in the VOLUME directives.
If I pull this image from docker hub, what happens with the data in the volumes?
Is it somehow contained in the pulled image?
Or are the RUN commands only executed when you start the container in docker?
What happens if you bind mount a local directory onto one of the mount points mentioned in the VOLUME directive when you run the container?
It seems, on experimenting, that bind mounting will replace the data with the contents of the local folder (which means the data created by the RUN commands gets lost).
But, if you use regular named volumes when running the container, the data is there. But I thought regular volumes persist - so what happens when you pull a later version of the container, with perhaps different data in the volume? Does it change the original volume?
[Later]
The docs at https://docs.docker.com/storage/volumes/#:~:text=If%20you%20start%20a%20container%20which%20creates%20a%20new%20volume,are%20copied%20into%20the%20volume. say:
Populate a volume using a container
If you start a container which creates a new volume, and the container has files or directories in the directory to be mounted such as /app/, the directory’s contents are copied into the volume. The container then mounts and uses the volume, and other containers which use the volume also have access to the pre-populated content.
But this does not seem to work with bind mounts if the directory already exists.
[Later]
I have done some experiments to compare the behaviour of bind mount and docker volumes.
I used this Dockerfile:
FROM alpine
RUN mkdir /test \
&& echo 'echo "start `date`" >>/test/log' >/runtest \
&& echo 'echo "/log:"' >>/runtest \
&& echo 'cat /log' >>/runtest \
&& echo 'echo "/test/log:"' >>/runtest \
&& echo 'cat /test/log' >>/runtest \
&& echo 'sleep 99999999' >>/runtest \
&& chmod +x /runtest \
&& echo "build `date`" >/test/log \
&& echo "build `date`" >/log \
&& cat /runtest \
&& cat /log \
&& cat /test/log
CMD /runtest
VOLUME "/test"
This creates /test/log in a RUN command, containing the build date. The CMD (which is run every time the container starts) appends the start date to this file. The directory is made a VOLUME (note - this has to be AFTER the RUN command, because the VOLUME copies the current contents of the directory into the volume - any RUN commands after that affect the original directory, but not the volume - see https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#volume
I then ran it with the following docker-compose file:
version: "3"
services:
tester:
build: .
volumes:
- /test:/test
volumes:
test:
Build and run first time:
tester_1 | /log:
tester_1 | build Mon Feb 20 08:53:29 UTC 2023
tester_1 | /test/log:
tester_1 | build Mon Feb 20 08:53:29 UTC 2023
tester_1 | start Mon Feb 20 08:53:45 UTC 2023
Stop and run again:
tester_1 | /log:
tester_1 | build Mon Feb 20 08:54:28 UTC 2023
tester_1 | /test/log:
tester_1 | build Mon Feb 20 08:53:29 UTC 2023
tester_1 | start Mon Feb 20 08:53:45 UTC 2023
tester_1 | start Mon Feb 20 08:54:33 UTC 2023
Note /log (not in volume) is overwritten. /test/log (in volume) is not.
I replaced the volume in the docker-compose file with a bind mount.
When I built and ran it, I got a complaint Service "tester" is using volume "/test" from the previous container.
I deleted the docker volume, and tried aqain:
tester_1 | /log:
tester_1 | build Mon Feb 20 09:00:50 UTC 2023
tester_1 | /test/log:
tester_1 | start Mon Feb 20 09:01:03 UTC 2023
Note the result of the RUN command is not copied to the bind mount.
My conclusion is that, if you (or other users of your Dockefile) intent to use bind mounts, then you must not expect the contents of volumes to be copied if those contents are created within the Dockerfile (except, obviously, in the CMD directive).
I found this post that answers a lot of your questions. https://www.howtogeek.com/devops/understanding-the-dockerfile-volume-instruction/
If I pull this image from docker hub, what happens with the data in the volumes?
Is it somehow contained in the pulled image? Or are the RUN commands only executed when you start the container in docker?
The data is contained in the image, and a new volume with a unique id is created. It's a way to enforce persistence for containers started from the image.
What happens if you bind mount a local directory onto one of the mount points mentioned in the VOLUME directive when you run the container?
It seems, on experimenting, that bind mounting will replace the data with the contents of the local folder (which means the data created by the RUN commands gets lost).
But, if you use regular named volumes when running the container, the data is there.
See https://www.howtogeek.com/devops/understanding-the-dockerfile-volume-instruction/#overriding-volume-instructions-when-starting-a-container.
Specifying the volume manually will override the mount point, as the VOLUME command from the build is irrelevant.
But I thought regular volumes persist - so what happens when you pull a later version of the container, with perhaps different data in the volume? Does it change the original volume?
I couldn't find much information on this, but this post may be helpful: https://stackoverflow.com/a/52762779/6658374.
When you define a VOLUME in the Dockerfile, you can only define the target, not the source of the volume. During the build, you will only get an anonymous volume from this. That anonymous volume will be mounted at every RUN command, prepopulated with the contents of the image, and then discarded at the end of the RUN command. Only changes to the container are saved, not changes to the volume.
So it seems that any changes to the image are not persisted in the volume, and thus pulling a new image would create a new volume instead of using the previous one.
Related
Steps to reproduce:
Download and run postgres:9.6.24:
docker run --name my_container --restart=always -d -p 127.0.0.1:5432:5432 -e POSTGRES_PASSWORD=pgmypass postgres:9.6.24
Here result:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
879883bfc84a postgres:9.6.24 "docker-entrypoint.s…" 26 seconds ago Up 25 seconds 127.0.0.1:5432->5432/tcp my_container
OK.
Open file inside container /var/lib/postgresql/data/pg_hba.conf
docker exec -it my_container bash
root#879883bfc84a:/# cat /var/lib/postgresql/data/pg_hba.conf
IPv4 local connections:
host all all 127.0.0.1/32 trust
Replace file /var/lib/postgresql/data/pg_hba.conf inside container by my file. Copy and overwrite my file from host to container:
tar --overwrite -c pg_hba.conf | docker exec -i my_container /bin/tar -C /var/lib/postgresql/data/ -x
Make sure the file has been modified. Go inside container and open changed file
docker exec -it my_container bash
root#879883bfc84a:/# cat /var/lib/postgresql/data/pg_hba.conf
IPv4 local connections:
host all all 0.0.0.0/0 trust
As you can see the content of file was changed.
Create new image from container
docker commit my_container
See result:
docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
<none> <none> ee57ad4bc6b4 3 seconds ago 200MB
postgres 9.6.24 027ccf656dc1 12 months ago 200MB
Now tag my new image
docker tag ee57ad4bc6b4 my_new_image:1.0.0
See reult:
docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
my_new_image 1.0.0 ee57ad4bc6b4 About a minute ago 200MB
postgres 9.6.24 027ccf656dc1 12 months ago 200MB
OK.
Stop and delete old continer:
docker stop my_continer
docker rm my_container
See result:
docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
As you can see not exit any container. OK.
Create new continer from new image
docker run --name my_new_container_test --restart=always -d -p 127.0.0.1:5432:5432 -e POSTGRES_PASSWORD=pg1210 my_new_image:1.0.0
See result:
docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
3a965dbbd991 my_new_image:1.0.0 "docker-entrypoint.s…" 7 seconds ago Up 6 seconds 127.0.0.1:5432->5432/tcp my_new_container
Open file inside container /var/lib/postgresql/data/pg_hba.conf
docker exec -it my_new_container bash
root#879883bfc84a:/# cat /var/lib/postgresql/data/pg_hba.conf
IPv4 local connections:
host all all 127.0.0.1/32 trust
As you can see my change in files are lost. The content of file is original. Not my changes.
P.S. This problem is only with file pg_hba.config. E.g if I created in the container the folder and file: /Downaloads/myfile.txt then this file not lost in the my container "my_new_container".
Editing files inside container with docker exec, in general, will in fact cause you to lose work. You mention docker commit but that's almost never a best practice. (If this was successful, but then you discovered PostgreSQL 9.6.24 exactly had some critical bug and you must upgrade, could you recreate the exact some image?)
In the case of the postgres image, the files in /var/lib/postgresql/data are always stored in a Docker volume or mount point. In your case you didn't use a docker run -v option, but the image is configured to create an anonymous volume in that directory. The volume is not included in docker commit, which is why you're not seeing it on the rebuilt container. (Also see docker postgres with initial data is not persisted over commits.)
For editing a configuration file, the easiest thing to do is to store the data on the host system. Create a directory to hold it, and extract the configuration file from the image. (Since the data directory is created by the image's startup script, you need a slightly longer path to get it out.)
mkdir pgdata
docker run -d --name pgtmp postgres:9.6.24
docker cp pgtmp:/var/lib/postgresql/data/pg_hba.conf ./pgdata
docker stop pgtmp
docker rm pgtmp
$EDITOR pgdata/pg_hba.conf
Now when you run the container, provide this data directory as a bind mount. That will inject the configuration file, but also cause the database data to persist over container exits.
docker run -v "$PWD/pgdata:/var/lib/postgresql/data" -u $(id -u) ... postgres:9.6.24
Note that this sequence doesn't use docker exec or "go inside" containers at all, and you haven't created an image without corresponding source. Everything is run with commands from the host. If you do need to reset the database data, in this setup, it's just files, and you can rm -rf pgdata, maybe saving the modified configuration file along the way.
(If I'm reading this configuration change correctly, you're trying to globally disable passwords and instead allow trust authentication for all inbound connections. That's not usually a good idea, especially since username/password authentication is standard in every database library I've encountered. You probably still want the volume to persist data, but I might not make this change to pg_hba.conf.)
Docker Container is a readyonly entity, which means if you will create a file into the container, remove it and re-create it (The container), the file is not supposed to be there.
what you want to do is one of two things,
Map your container to a local directory (volume)
Create a docker file based on the postgres image, and generate this modifications in a script, that your dockerfile reads.
docker volume usages
Dockerfile Reference
Host OS: Red Hat Enterprise Linux Server release 7.9 (Maipo)
Guest OS i.e. a Docker container running: OpenSuse 15.2
Docker Version (on Host): Docker version 19.03.5, build 633a0ea
On host, when I git clone a repository "utilities_scripts" I have valid access for the user (due to umask).
I think the issue here is not permission related but WHY the USER defined in the Dockerfile is not getting set as the owner of folders/files (which are getting mounted) inside the docker container, when I issue the following docker run ... command. Setting 755/775 etc is not an option as I can't chown inside the container as target docker user and setting 777 is bad practice.
NOTE:
When I try the same docker image from a different Linux machine, the folder/files are mounted as the user "docker_non_root_user" which is defined in the Dockerfile as USER docker_non_root_user.
docker build ... runs successfully and creates an image, and the container works on a different machine (when I mount the git clone repos inside the container using -v <host>:<container> docker CLI option syntax.
Code snippet from Dockerfile is:
# Define any mount points references
VOLUME ["/home/docker_non_root_user/git"]
USER docker_non_root_user
WORKDIR /home/docker_non_root_user/git
This is what I see on the host where I have the Dockerfile USER ... ownership issue:
[gigauser#jenkins-projectABC bitbucket_workspace]$ whoami
gigauser
[gigauser#jenkins-projectABC bitbucket_workspace]$ id
uid=gigauser(gigauser) gid=21520(jenkins) groups=21520(jenkins),3000(ectx)
[gigauser#jenkins-projectABC bitbucket_workspace]$ umask
0077
[gigauser#jenkins-projectABC bitbucket_workspace]$ ls -l
total 12
drwx------ 5 gigauser jenkins 4096 Feb 3 16:36 utilities_scripts
[gigauser#jenkins-projectABC bitbucket_workspace]$
[gigauser#jenkins-projectABC bitbucket_workspace]$ sudo docker image ls
Active Directory Password:
REPOSITORY TAG IMAGE ID CREATED SIZE
project-im-opensuse 15.2 0c9ee31464cd 43 hours ago 2.39GB
[gigauser#jenkins-projectABC bitbucket_workspace]$
[gigauser#jenkins-projectABC bitbucket_workspace]$
[gigauser#jenkins-projectABC bitbucket_workspace]$ sudo docker run -v $PWD/utilities_scripts:/home/docker_non_root_user/git/utilities_scripts/ -it project-im-opensuse:15.2 bash -c "whoami; id; which bash; bash --version; ls -l; echo; ls -l utilities_scripts; ls -l /home/docker_non_root_user/git/utilities_scripts; id gigauser; echo"
WARNING: IPv4 forwarding is disabled. Networking will not work.
docker_non_root_user
uid=1000(docker_non_root_user) gid=487(docker_non_root_user) groups=487(docker_non_root_user),100(users)
/bin/bash
GNU bash, version 4.4.23(1)-release (x86_64-suse-linux-gnu)
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software; you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
total 8
drwx------ 5 gigauser 21520 4096 Feb 4 00:36 utilities_scripts
ls: cannot open directory 'utilities_scripts': Permission denied
ls: cannot open directory '/home/docker_non_root_user/git/utilities_scripts': Permission denied
id: 'gigauser': no such user
[gigauser#jenkins-projectABC bitbucket_workspace]$
When I run the same command on other Linux machine, I see the mounted utilities_scripts folder's owner is docker_non_root_user.
Question: WHY I'm seeing the USER-ID of host's user (where I'm running docker run command) is getting set inside docker container on the folder utilities_scripts, when no such user ID was created in Dockerfile/exists inside the container? (see 2nd last line in the above output above). It's currently getting the same host's owner and folder level permissions inside the docker container.
gigauser i.e. host's user, is not there in docker container but the file ls -l output shows gigauser is the owner of utilities_scripts folder/files in the container. This issue is not coming on other host machine.
I even checked /etc/subuid file, looks ok to me. Changing the value inside to docker's user didn't help. Also I don't see anything related to this in /etc/docker/daemon.json file.
$ cat /etc/subuid
gigauser:165536:65536
Running the same docker run ....options command above from the other Linux host machine, it shows the folder ownership inside docker container as:
drwx------ 5 docker_non_root_user 1000 272 Jan 26 21:52 utilities_scripts
User gigauser numeric ID is not 1000, i.e. 21520. It works on another host because there, local user probably has the numeric ID 1000.
Because we're mounting the folder not copying it, When you mount it, it gets shared into the container with exactly the same permissions/IDs as set on the host - because it's on the host. Containers aren't like VMs with totally separate resources, and even on a VM if you mount something like an NFS directory you'll get numeric IDs that may or may not match your local IDs.
Using /etc/subuid requires passing a flag to the run command, and you'd have to do maths to work out the offsets for your user.
We are deploying a stack, the compose file list two services:
mediawiki
mySQL
We saw that the mysql Dockerfile has a VOLUME directive, to persist the databases to a Docker volume
Yet, if we docker stack rm then docker stack deploy our compose file, we lose all the database content.
Is that the expected behaviour? What would be the rationale for it?
Swarm mode doesn't build docker images, so it's safe to say that docker swarm is ignoring every line of your Dockerfile. The resulting image will have a volume identified which will be created when the container is run, whether that is inside or outside of swarm mode. If you do not specify a volume mount at runtime, docker will give you an anonymous volume at that location, which is difficult to identify later on, and local to the node where the container is running. With a new stack, the previous anonymous volume won't be used, and in various situations, the anonymous volume can be automatically deleted (e.g. when containers are configured to automatically delete on exit, which I'm not sure if it applies to swarm mode).
A named volume can make the data easier to reused in a single node cluster. When you get to a multi-node cluster, you need to move this data off of the node where the container is running. For details on how to use something like NFS for external data storage, see this answer to a related question.
The Dockerfile has a volume but you probably haven't mounted that volume as accessible outside of that container, which means it probably isn't being pulled out to the persistent overlay storage to be shared between containers/container deployments.
In your database service in the docker-compose.yml file you're going to need a volumes section. I'm omitting your current setup since you didn't provide it and I'm going to replace it with ellipses ....
mySQL:
...
volumes:
- mySQL-data:/var/lib/mysql
This tells the system that you want to persist the /var/lib/mysql directory outside of the container in your shared overlay that I randomly named mySQL-data.
Being in swarm mode is not relevant in this case.
If you are only relying on the anonymous volume defined in the Dockerfile, running a new container will create and mount a new fresh volume. You need to specifically mount a named volume at container start (in your case add it in your compose file) to remount the same volume between runs.
If you need to remount a lost data volume, it might still be possible if you did not prune data on your server. You will just need to find the relevant volume (that has a hash as name), eventually rename it and remount it in your new container.
I played the following scenario to illustrate my point:
First get the image and have a look:
$ docker pull mysql:latest
$ docker image inspect mysql:latest
From this last command we can see there is a volume declared for /var/lib/mysql
I'm on my dev machine. Cleaned up everything so I have no volumes at this time
$ docker volume ls
DRIVER VOLUME NAME
$
Start a container then look at volumes again
$ docker run -d --name test -e MYSQL_ALLOW_EMPTY_PASSWORD=1 mysql:latest
37a92341f52b189d00636d1f03ecfbd4e3e7e5d55b685f5ec254971d7732566c
$ docker volume ls
DRIVER VOLUME NAME
local 50f9810d13271c7c91b7e025e139db480f288a936f0a55f85d9580ef3aa83af7
Now add some data to the container
$ docker exec -it test bash
root#37a92341f52b:/# mysql
Welcome to the MySQL monitor. [... snip ...]
mysql> create database testso;
Query OK, 1 row affected (0.03 sec)
mysql> use testso;
Database changed
mysql> create table test (id int not null primary key);
Query OK, 0 rows affected (0.08 sec)
mysql> insert into test values (1);
Query OK, 1 row affected (0.05 sec)
mysql> select * from test;
+----+
| id |
+----+
| 1 |
+----+
1 row in set (0.00 sec)
mysql> exit
Bye
root#37a92341f52b:/# exit
$
Create a new container
$ docker rm -f test
test
$ docker run -d --name test -e MYSQL_ALLOW_EMPTY_PASSWORD=1 mysql:latest
79148de09d7a3e13db338da133cfd7d44fe3590dc1c7ffe6129722c5c6baea21
$ docker volume ls
DRIVER VOLUME NAME
local 50f9810d13271c7c91b7e025e139db480f288a936f0a55f85d9580ef3aa83af7
local ef6fb2647c11c10ef98d25ca0dc2bd43231729ed18386c65768a0ad808fca93b
As you can see we have a second volume for the new container. I will not show this here but if I connect, the data is empty as in your case. Now let's try to recover.
First a little cleanup
$ docker rm -f test
test
$ docker volume rm ef6fb2647c11c10ef98d25ca0dc2bd43231729ed18386c65768a0ad808fca93b
ef6fb2647c11c10ef98d25ca0dc2bd43231729ed18386c65768a0ad808fca93b
We want to have a human readable name but we cannot rename a volume. What I did is mount the old one and a new named volume in a busybox container to transfer the data over.
$ docker run -it --rm -v 50f9810d13271c7c91b7e025e139db480f288a936f0a55f85d9580ef3aa83af7:/mysql/old -v mysql_data:/mysql/new busybox:latest
/ # cd /mysql/
/mysql # mv old/* new/
/mysql # exit
We now have this new volume and we can get rid off the anonymous one
$ docker volume ls
DRIVER VOLUME NAME
local 50f9810d13271c7c91b7e025e139db480f288a936f0a55f85d9580ef3aa83af7
local mysql_data
$ docker volume rm 50f9810d13271c7c91b7e025e139db480f288a936f0a55f85d9580ef3aa83af7
50f9810d13271c7c91b7e025e139db480f288a936f0a55f85d9580ef3aa83af7
And finally we remount the named volume in a fresh mysql container to get our data back
$ docker run -d --name test -e MYSQL_ALLOW_EMPTY_PASSWORD=1 -v mysql_data:/var/lib/mysql mysql:latest
3f6eff0b7660f3f8e9518564affc6555acb17184845156099d18300b3e76f4a2
$ docker exec -it test bash
root#3f6eff0b7660:/# mysql
Welcome to the MySQL monitor. [... snip ...]
mysql> use testso
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> select * from test;
+----+
| id |
+----+
| 1 |
+----+
1 row in set (0.01 sec)
I am trying to understand the functioning of docker well enough to come to reasonable confidence I am using it securely. One advice for this is to always use a USER statement in a Dockerfile. In trying to understand the effect of this I run into some trouble.
Concrete questions:
What mechanism allows the host kernel to deal with users that only exist in the container?
Why does run2 below show the directory belongs to testuser but doesn't allow the ls when in the directory?
Why does run3 below show the directory belong to testuser?
Version information at the bottom of this question.
Setup
I have the following Dockerfile
FROM alpine#sha256:1354db23ff5478120c980eca1611a51c9f2b88b61f24283ee8200bf9a54f2e5c
LABEL version 2.0
LABEL description "Test image for setting user"
RUN adduser -D testuser1 ## sometimes removed
RUN adduser -D testuser2 ## sometimes removed
RUN adduser -D testuser
USER testuser
CMD sh
I build this with
docker build -t kasterma/testuser:1 .
Then run with
docker run -ti -v /home/kasterma/test-user/:/test-home kasterma/testuser:1
The directory /home/kasterma/test-user/ is the directory that contains the Dockerfile.
Run 1: remove both lines marked ##sometimes removed in the Dockerfile.
[root#datalocal01 test-user]# docker run -ti -v /home/kasterma/test-user/:/test-home kasterma/testuser:1
/ $ ls -lh
...
drwx------ 2 1001 1001 40 Dec 30 14:08 test-home
...
Here is shows user and group both as being 1001; which is the user and groupid of kasterma in the host. In this context testuser has uid and gid 1000.
Also
/ $ cd test-home
sh: cd: can't cd to test-home
Run 2: remove only the second line marked ##sometimes removed in the Dockerfile.
/ $ ls -lh
...
drwx------ 2 testuser testuser 40 Dec 30 14:12 test-home
...
and
/ $ cd test-home
/test-home $ ls
ls: can't open '.': Permission denied
Now testuser and kasterma have the same uid and gid (though the one has them in the container, and the other on the host). Why can I cd, but not ls?
Run 3: remove neither line marked ##sometimes removed in the Dockerfile.
/ $ ls -lh
...
drwx------ 2 testuser testuser 40 Dec 30 14:15 test-home
...
and
/ $ cd test-home
sh: cd: can't cd to test-home
Now testuser has uid and gid 1002, so not the same as kasterma. But the listing shows it as testuser, but the cd command fails.
Version information
the OS version (running on a VM in VirtualBox)
[root#datalocal01 test-user]# uname -a
Linux datalocal01 3.10.0-514.2.2.el7.x86_64 #1 SMP Tue Dec 6 23:06:41 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
and for docker
[root#datalocal01 test-user]# docker version
Client:
Version: 1.10.3
API version: 1.22
Package version: docker-common-1.10.3-59.el7.centos.x86_64
Go version: go1.6.3
Git commit: 3999ccb-unsupported
Built: Thu Dec 15 17:24:43 2016
OS/Arch: linux/amd64
Server:
Version: 1.10.3
API version: 1.22
Package version: docker-common-1.10.3-59.el7.centos.x86_64
Go version: go1.6.3
Git commit: 3999ccb-unsupported
Built: Thu Dec 15 17:24:43 2016
OS/Arch: linux/amd64
When the host is running SELinux it's possible you can't access the file system content if it isn't labeled.
From man docker-run
Labeling systems like SELinux require that proper labels are placed on
volume content mounted into a container. Without a label, the security
system might prevent the processes running inside the container from
using the content. By default, Docker does not change the labels set
by the OS. To change a label in the container context, you can add
either of two suffixes :z or :Z to the volume mount. These suffixes
tell Docker to relabel file objects on the shared volumes. The z
option tells Docker that two containers share the volume content. As a
result, Docker labels the content with a shared content label. Shared
volume labels allow all containers to read/write content. The Z
option tells Docker to label the content with a private unshared
label. Only the current container can use a private volume.
So, instead of disabling SELinux you could try
docker run -ti -v /home/kasterma/test-user/:/test-home:Z kasterma/testuser:1
See Using Volumes with Docker can Cause Problems with SELinux for more details.
I tried your use cases on my box (without SELinux and having Docker version 1.12.5): I always get the right "testuser" ownership and I'm able to change directory and list its content (my local uid is 1000 and I haven't more users above it). So, may be your problem is due to the older Docker version.
If not related to SELinux neither old Docker version, the behavior you described seems related to User Namespaces.
Check if your host's kernel enables User Namespace (CentOS 7, that seems the distro you are using, doesn't enable it by default.
Look at Using User Namespaces on Docker that describes how to enable User namespaces on CentOS 7 and how to check the correct behavior.
About User Namespace details, look at several sites like:
Introduction to User Namespaces in Docker Engine
Docker Security
User namespaces have arrived in Docker!
Docker for your users - Introducing user namespace
You can find a clear description about permissions in Docker volumes before introduction of User Namespaces (before Docker 1.10) at Deni Bertovic blog - Handling Permissions with Docker Volumes.
Hope it helps.
Reading these links:
https://docs.docker.com/userguide/dockervolumes/#backup-restore-or-migrate-data-volumes
Backing up data volume containers off machine
My understanding is I can take a data volume container and archive its backup.
However reading the first link I can't seem to get it to work.
docker create -v /sonatype-work --name sonatype-work sonatype/nexus /bin/true
I launch sonatype/nexus image in a container using:
--volumes-from sonatype-nexus
All good, after running nexus, i inspect the data volume, i can see the innards created, and stop and remove nexus and start again, all changes saved.
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
f84abb054d2e sonatype/nexus "/bin/sh -c 'java -" 22 seconds ago Up 21 seconds 0.0.0.0:8081->8081/tcp nexus
1aea2674e482 sonatype/nexus "/bin/true" 25 seconds ago Created sonatype-work
I want to now back up sonatype-work, but with no luck.
[root#ansible22 ~]# pwd
/root
[root#ansible22 ~]# docker run --volumes-from sonatype-work -v $(pwd):/backup ubuntu tar cvf /backup/sonatype-work-backup.tar /sonatype-work
tar: /backup/sonatype-work-backup.tar: Cannot open: Permission denied
tar: Error is not recoverable: exiting now
I have tried running as -u root, I also tried with:
/root/sonatype-work-backup.tar
When doing so, i can see it taring stuff, but I don't see the tar file. Based on the example and my understanding I don't think thats right anyway.
Can anyone see what I'm doing wrong?
EDIT: Linux Version Info
Fedora release 22 (Twenty Two)
NAME=Fedora
VERSION="22 (Twenty Two)"
ID=fedora
VERSION_ID=22
PRETTY_NAME="Fedora 22 (Twenty Two)"
ANSI_COLOR="0;34"
CPE_NAME="cpe:/o:fedoraproject:fedora:22"
HOME_URL="https://fedoraproject.org/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="Fedora"
REDHAT_BUGZILLA_PRODUCT_VERSION=22
REDHAT_SUPPORT_PRODUCT="Fedora"
REDHAT_SUPPORT_PRODUCT_VERSION=22
PRIVACY_POLICY_URL=https://fedoraproject.org/wiki/Legal:PrivacyPolicy
VARIANT="Server Edition"
VARIANT_ID=server
Fedora release 22 (Twenty Two)
Fedora release 22 (Twenty Two)
The reason for this is related to selinux labelling. There are a couple of good Project Atomic pages on this:
Docker and Linux
The default type for a confined container process is svirt_lxc_net_t. This type is permitted to read and execute all files types under /usr and most types under /etc. svirt_lxc_net_t is permitted to use the network but is not permitted to read content under /var, /home, /root, /mnt … svirt_lxc_net_t is permitted to write only to files labeled svirt_sandbox_file_t and docker_var_lib_t. All files in a container are labeled by default as svirt_sandbox_file_t.
Then in Using Volumes with Docker can Cause Problems with SELinux:
This will label the content inside the container with the exact MCS label that the container will run with, basically it runs chcon -Rt svirt_sandbox_file_t -l s0:c1,c2 /var/db where s0:c1,c2 differs for each container.
(In this case not /var/db but /root)
If you volume mount a image with -v /SOURCE:/DESTINATION:z docker will automatically relabel the content for you to s0. If you volume mount with a Z, then the label will be specific to the container, and not be able to be shared between containers.
So either z or Z are suitable in this case but one might usually prefer Z for the isolation.
The reason I'm getting permission denied is because of selinux. I am not sure why yet, but will edit this answer when/if I find out. Disabling selinux and restarting, i was able to take a back up.