I'm working on a build step that handles common deployment tasks in a Docker Swarm Mode cluster. As this is a common problem for us and for others, we've shared this build step as a BitBucket pipe: https://bitbucket.org/matchory/swarm-secret-pipe/
The pipe needs to use the docker command to work with a remote Docker installation. This doesn't work, however, because the docker executable cannot be found when the pipe runs.
The following holds true for our test repository pipeline:
The docker option is set to true:
options:
docker: true
The docker service is enabled for the build step:
main:
- step:
services:
- docker: true
Docker works fine in the repository pipeline itself, but not within the pipe.
Pipeline log shows the docker path being mounted into the pipe container:
docker container run \
--volume=/opt/atlassian/pipelines/agent/build:/opt/atlassian/pipelines/agent/build \
--volume=/opt/atlassian/pipelines/agent/ssh:/opt/atlassian/pipelines/agent/ssh:ro \
--volume=/usr/local/bin/docker:/usr/local/bin/docker:ro \
--volume=/opt/atlassian/pipelines/agent/build/.bitbucket/pipelines/generated/pipeline/pipes:/opt/atlassian/pipelines/agent/build/.bitbucket/pipelines/generated/pipeline/pipes \
--volume=/opt/atlassian/pipelines/agent/build/.bitbucket/pipelines/generated/pipeline/pipes/matchory/swarm-secret-pipe:/opt/atlassian/pipelines/agent/build/.bitbucket/pipelines/generated/pipeline/pipes/matchory/swarm-secret-pipe \
--workdir=$(pwd) \
--label=org.bitbucket.pipelines.system=true \
radiergummi/swarm-secret-pipe:1.3.7#sha256:baf05b25b38f2a59b044e07f4ad07065de90257a000137a0e1eb71cbe1a438e5
The pipe is pretty standard and uses a recent Alpine image; nothing special in that regard. The PATH is never overwritten. Now for the fun part: If I do ls /usr/local/bin/docker inside the pipe, it shows an empty directory:
ls /usr/local/bin
total 16K
drwxr-xr-x 1 root root 4.0K May 13 13:06 .
drwxr-xr-x 1 root root 4.0K Apr 4 16:06 ..
drwxr-xr-x 2 root root 4.0K Apr 29 09:30 docker
ls /usr/local/bin/docker
total 8K
drwxr-xr-x 2 root root 4.0K Apr 29 09:30 .
drwxr-xr-x 1 root root 4.0K May 13 13:06 ..
ls: /usr/local/bin/docker/docker: No such file or directory
As far as I understand pipelines and Docker, /usr/local/bin/docker should be the docker binary file. Instead, it appears to be an empty directory for some reason.
What is going on here?
I've also looked at other, official, pipes. They don't do anything differently, but seem to be using the docker command just fine (eg. the Azure pipe).
After talking to BitBucket support, I solved the issue. As it turns out, if the docker context is changed, any docker command is sent straight to the remote docker binary, which (on our services) lives at a different path than in BitBucket Pipelines!
As we changed the docker context before using the pipe, and the docker instance mounted into the pipe still has the remote context set, but the pipe searches for the docker binary at another place, the No such file or directory error is thrown.
TL;DR: Always restore the default docker host/context before passing control to a pipe, e.g.:
script:
- export DEFAULT_DOCKER_HOST=$DOCKER_HOST
- unset DOCKER_HOST
- docker context create remote --docker "host=ssh://${DEPLOY_SSH_USER}#${DEPLOY_SSH_HOST}"
- docker context use remote
# do your thing
- export DOCKER_HOST=$DEFAULT_DOCKER_HOST # <------ restore the default host
- pipe: matchory/swarm-secret-pipe:1.3.16
Related
How does docker create a file system for a Linux container? And how are permissions set up on the root file system?
I encountered a situation when starting a docker container on a particular machine with Ubuntu Server. For some reason, /tmp in the container doesn't have write permissions:
$ docker run -it python:3.11-slim-buster /bin/bash
root#5d5fefe9b9a2:/# ls -la /tmp
total 8
drwxr-xr-t 1 root root 4096 Jan 26 06:58 .
drwxr-xr-x 1 root root 4096 Jan 29 04:31 ..
Note that this has 755 permissions.
However, when I start the same docker image as a container on WSL, I get 777:
$ docker run -it python:3.11-slim-buster /bin/bash
root#201dfe147e5a:/# ls -la /tmp
total 8
drwxrwxrwt 1 root root 4096 Nov 16 06:56 .
drwxr-xr-x 1 root root 4096 Jan 29 04:36 ..
This was fine a few weeks ago on the Ubuntu machine. I recently moved all the files from /var/lib/ubuntu to /ubuntu because the partition mounted at /var was full. Would this have caused the behavior with the permissions of /tmp inside a container? If so, why? And how do I fix it? If not, what else would cause this and...how do I fix it?
Docker uses a so-called union file system for a running container. The recommended driver on Linux is called overlay2. The files and directories for each layer of an image are stored under /var/lib/docker/overlay2, assuming the default config. The directory structure for each layer is combined to create the final file system for the container. See https://docs.docker.com/storage/storagedriver/overlayfs-driver/ for more details.
As for the permissions for the files in the container, they are derived from the permissions of the files in this directory in the host file system. When I copied the files from /var/lib/docker to /docker, I failed to preserve ownership and permissions. My best guess is that umask was applied as each new file was created.
I am Dockerising an old project. A feature in the project pulls in user-specified Git repos, and since the size of a repo could cause the filing system to be overwhelmed, I created a local filing system of a fixed size, and then mounted it. This was intended to prevent the web host from having its file system filled up.
The general approach is this:
IMAGE=filesystem/image.img
MOUNT_POINT=filesystem/mount
SIZE=20
PROJECT_ROOT=`pwd`
# Number of M to set aside for this filing system
dd if=/dev/zero of=$IMAGE bs=1M count=$SIZE &> /dev/null
# Format: the -F permits creation even though it's not a "block special device"
mkfs.ext3 -F -q $IMAGE
# Mount if the filing system is not already mounted
$MOUNTCMD | cut -d ' ' -f 3 | grep -q "^${PROJECT_ROOT}/${MOUNT_POINT}$"
if [ $? -ne 0 ]; then
# -p Create all parent dirs as necessary
mkdir -p $MOUNT_POINT
/bin/mount -t ext3 $IMAGE $MOUNT_POINT
fi
This works fine in a Linux local or remote VM. However, I'd like to run this shell code, or something like it, inside a container. Part of the reason I'd like to do that is to contain all fiddly stuff inside a container, so that building a new host machine is as kept as simple as possible (in my view, setting up custom mounts and cron-restart rules on the host works against that).
So, this command does not work inside a container ("filesystem" is an on-host Docker volume)
mount -t ext3 filesystem/image.img filesystem/mount
mount: can't setup loop device: No space left on device
It also does not work on a container folder ("filesystem2" is a container directory):
dd if=/dev/zero of=filesystem2/image.img bs=1M count=20
mount -t ext3 filesystem2/image.img filesystem2/mount
mount: can't setup loop device: No space left on device
I wonder whether containers just don't have the right internal machinery to do mounting, and thus whether I should change course. I'd prefer not to spend too much time on this (I'm just moving a project to a Docker-only server) which is why I would like to get mount working if I can.
Other options
If that's not possible, then a size-limited Docker volume, that works with both Docker and Swarm, may be an alternative I'd need to look into. There are conflicting reports on the web as to whether this actually works (see this question).
There is a suggestion here to say this is supported in Flocker. However, I am hesitant to use that, as it appears to be abandoned, presumably having been affected by ClusterHQ going bust.
This post indicates I can use --storage-opt size=120G with docker run. However, it does not look like it is supported by docker service create (unless perhaps the option has been renamed).
Update
As per the comment convo, I made some progress; I found that adding --privileged to the docker run enables mounting, at the cost of removing security isolation. A helpful commenter says that it is better to use the more fine-grained control of --cap-add SYS_ADMIN, allowing the container to retain some of its isolation.
However, Docker Swarm has not yet implemented either of these flags, so I can't use this solution. This lengthy feature request suggests to me that this feature is not going to be added in a hurry; it's been pending for two years already.
You won't be able to safely do this inside of a container. Docker removes the mount privilege from containers because using this you could mount the host filesystem and escape the container. However, you can do this outside of the container and mount the filesystem into the container as a volume using the default local driver. The size option isn't supported by most filesystems, tmpfs being one of the few exceptions. Most of them use the size of the underlying device which you defined with the image file creation command:
dd if=/dev/zero of=filesystem/image.img bs=1M count=$SIZE
I had trouble getting docker to create the loop device dynamically, so here's the process to create it manually:
$ sudo losetup --find --show ./vol-image.img
/dev/loop0
$ sudo mkfs -t ext3 /dev/loop0
mke2fs 1.43.4 (31-Jan-2017)
Creating filesystem with 10240 1k blocks and 2560 inodes
Filesystem UUID: 25c95fcd-6c78-4b8e-b923-f808517b28df
Superblock backups stored on blocks:
8193
Allocating group tables: done
Writing inode tables: done
Creating journal (1024 blocks): done
Writing superblocks and filesystem accounting information: done
When defining the volume mount options are passed almost verbatim from the mount command you run on the command line:
docker volume create --driver local --opt type=ext3 \
--opt device=filesystem/image.img app_vol
docker service create --mount type=volume,src=app_vol,dst=/filesystem/mount ...
or in a single service create command:
docker service create \
--mount type=volume,src=app_vol,dst=/filesystem/mount,volume-driver=local,volume-opt=type=ext3,volume-opt=device=filesystem/image.img ...
With docker run, the command looks like:
$ docker run -it --rm --mount type=volume,dst=/data,src=ext3vol,volume-driver=local,volume-opt=type=ext3,volume-opt=device=/dev/loop0 busybox /bin/sh
/ # ls -al /data
total 17
drwxr-xr-x 3 root root 1024 Sep 19 14:39 .
drwxr-xr-x 1 root root 4096 Sep 19 14:40 ..
drwx------ 2 root root 12288 Sep 19 14:39 lost+found
The only prerequisite is that you create this file and loop device before creating the service, and that this file is accessible wherever the service is scheduled. I would also suggest making all of the paths in these commands fully qualified rather than relative to the current directory. I'm pretty sure there are a few places that relative paths don't work.
I have found a size-limiting solution I am happy with, and it does not use the Linux mount command at all. I've not implemented it yet, but the tests documented below are satisfying enough. Readers may wish to note the minor warnings at the end.
I had not tried mounting Docker volumes prior to asking this question, since part of my research stumbled on a Stack Overflow poster casting doubt on whether Docker volumes can be made to respect a size limitation. My test indicates that they can, but you may wish to test this on your own platform to ensure it works for you.
Size limit on Docker container
The below commands have been cobbled together from various sources on the web.
To start with, I create a volume like so, with a 20m size limit:
docker volume create \
--driver local \
--opt o=size=20m \
--opt type=tmpfs \
--opt device=tmpfs \
hello-volume
I then create an Alpine Swarm service with a mount on this container:
docker service create \
--mount source=hello-volume,target=/myvol \
alpine \
sleep 10000
We can ensure the container is mounted by getting a shell on the single container in this service:
docker exec -it amazing_feynman.1.lpsgoyv0jrju6fvb8skrybqap
/ # ls - /myvol
total 0
OK, great. So, while remaining in this shell, let's try slowly overwhelming this disk, in 5m increments. We can see that it fails on the fifth try, which is what we would expect:
/ # cd /myvol
/myvol # ls
/myvol # dd if=/dev/zero of=image1 bs=1M count=5
5+0 records in
5+0 records out
/myvol # dd if=/dev/zero of=image2 bs=1M count=5
5+0 records in
5+0 records out
/myvol # ls -l
total 10240
-rw-r--r-- 1 root root 5242880 Sep 16 13:11 image1
-rw-r--r-- 1 root root 5242880 Sep 16 13:12 image2
/myvol # dd if=/dev/zero of=image3 bs=1M count=5
5+0 records in
5+0 records out
/myvol # dd if=/dev/zero of=image4 bs=1M count=5
5+0 records in
5+0 records out
/myvol # ls -l
total 20480
-rw-r--r-- 1 root root 5242880 Sep 16 13:11 image1
-rw-r--r-- 1 root root 5242880 Sep 16 13:12 image2
-rw-r--r-- 1 root root 5242880 Sep 16 13:12 image3
-rw-r--r-- 1 root root 5242880 Sep 16 13:12 image4
/myvol # dd if=/dev/zero of=image5 bs=1M count=5
dd: writing 'image5': No space left on device
1+0 records in
0+0 records out
/myvol #
Finally, let's see if we can get an error by overwhelming the disk in one go, in case the limitation only applies to newly opened file handles in a full disk:
/ # cd /myvol
/ # rm *
/myvol # dd if=/dev/zero of=image1 bs=1M count=21
dd: writing 'image1': No space left on device
21+0 records in
20+0 records out
It turns out we can, so that looks pretty robust to me.
Nota bene
The volume is created with a type and a device of "tmpfs", which sounded to me worryingly like a RAM disk. I've successfully checked that the volume remains connected and intact after a system reboot, so it looks good to me, at least for now.
However, I'd say that when it comes to organising your data persistence systems, don't just copy what I have. Make sure the volume is robust enough for your use case before you put it into production, and of course, make sure you include it in your back-up process.
(This is for Docker version 18.06.1-ce, build e68fc7a).
When I start nexus3 in a docker container I get the following error messages.
$ docker run --rm sonatype/nexus3:3.8.0
Warning: Cannot open log file: ../sonatype-work/nexus3/log/jvm.log
Warning: Forcing option -XX:LogFile=/tmp/jvm.log
Java HotSpot(TM) 64-Bit Server VM warning: Cannot open file ../sonatype-work/nexus3/log/jvm.log due to Permission denied
Unable to update instance pid: Unable to create directory /nexus-data/instances
/nexus-data/log/karaf.log (Permission denied)
Unable to update instance pid: Unable to create directory /nexus-data/instances
It indicates that there is a file permission issue.
I am using Red Hat Enterprise Linux 7.5 as host machine and the most recent docker version.
On another machine (ubuntu) it works fine.
The issue occurs in the persistent volume (/nexus-data). However, I do not mount a specific volume and let docker use a anonymous one.
If I compare the volumes on both machines I can see the following permissions:
For Red Hat, where it is not working is belongs to root.
$ docker run --rm sonatype/nexus3:3.8.0 ls -l /nexus-data
total 0
drwxr-xr-x. 2 root root 6 Mar 1 00:07 etc
drwxr-xr-x. 2 root root 6 Mar 1 00:07 log
drwxr-xr-x. 2 root root 6 Mar 1 00:07 tmp
On ubuntu, where it is working it belongs to nexus. Nexus is also the default user in the container.
$ docker run --rm sonatype/nexus3:3.8.0 ls -l /nexus-data
total 12
drwxr-xr-x 2 nexus nexus 4096 Mar 1 00:07 etc
drwxr-xr-x 2 nexus nexus 4096 Mar 1 00:07 log
drwxr-xr-x 2 nexus nexus 4096 Mar 1 00:07 tmp
Changing the user with the options -u is not an option.
I could solve it by deleting all local docker images: docker image prune -a
Afterwards it downloaded the image again and it worked.
This is strange because I also compared the fingerprints of the images and they were identical.
An example of docker-compose for Nexus :
version: "3"
services:
#Nexus
nexus:
image: sonatype/nexus3:3.39.0
expose:
- "8081"
- "8082"
- "8083"
ports:
# UI
- "8081:8081"
# repositories http
- "8082:8082"
- "8083:8083"
# repositories https
#- "8182:8182"
#- "8183:8183"
environment:
- VIRTUAL_PORT=8081
volumes:
- "./nexus/data/nexus-data:/nexus-data"
Setup the volume :
mkdir -p ./nexus/data/nexus-data
sudo chown -R 200 nexus/ # 200 because it's the UID of the nexus user inside the container
Start Nexus
sudo docker-compose up -d
hf
You should attribute correct right to the folder where the persistent volume is located.
chmod u+wxr -R <folder of /nexus-data volumes>
Be carefull, if you execute previous command, it would give write, read and execution right to all users. If you want to give more restricted right, you should modify the command.
I am trying to set up a multi-container service with docker-compose.
Some of the containers need to be restarted from a fresh container (eg. the file system should be like in the image) when they restart.
How can I achieve this?
I've found the restart: always option I can put on my service in the docker-compose.yml file, but that doesn't give me a fresh file system as it uses the same container.
I've also seen the --force-recreate option of docker-compose up, but that doesn't apply as that only recreates the containers when the command is runned.
EDIT:
This is probably not a docker-compose issue, but more of a general docker question: What is the best way to make sure a container is in a fresh state when it is restarted? With fresh state, I mean a state identical to that of a brand new container from the same image. Restarted is the docker command docker restart or docker stop and docker start.
In docker, immutability typically refers to the image layers. They are immutable, and any changes are pushed to a container specific copy-on-write layer of the filesystem. That container specific layer will last for the lifetime of the container. So to have those files not-persist, you have two options:
Recreate the container instead of just restart it
Don't write the changes to the container filesystem, and don't write them to any persistent volumes.
You cannot do #1 with a restart policy by it's very definition. A restart policy gives you the same container filesystem, with the application restarted. But if you use docker's swarm mode, it will recreate containers when they exit, so if you can migrate to swarm mode, you can achieve this result.
Option #2 looks more difficult than it is. If you aren't writing to the container filesystem, or to a volume, then where? The answer is a tmpfs volume that is only stored in memory and is lost as soon as the container exits. In compose, this is a tmpfs: /data/dir/to/not/persist line. Here's an example on the docker command line.
First, let's create a container with a tmpfs mounted at /data, add some content, and exit the container:
$ docker run -it --tmpfs /data --name no-persist busybox /bin/sh
/ # ls -al /data
total 4
drwxrwxrwt 2 root root 40 Apr 7 21:50 .
drwxr-xr-x 1 root root 4096 Apr 7 21:50 ..
/ # echo 'do not save' >>/data/tmp-data.txt
/ # cat /data/tmp-data.txt
do not save
/ # ls -al /data
total 8
drwxrwxrwt 2 root root 60 Apr 7 21:51 .
drwxr-xr-x 1 root root 4096 Apr 7 21:50 ..
-rw-r--r-- 1 root root 12 Apr 7 21:51 tmp-data.txt
/ # exit
Easy enough, it behaves as a normal container, let's restart it and check the directory contents:
$ docker restart no-persist
no-persist
$ docker attach no-persist
/ # ls -al /data
total 4
drwxr-xr-x 2 root root 40 Apr 7 21:51 .
drwxr-xr-x 1 root root 4096 Apr 7 21:50 ..
/ # echo 'still do not save' >>/data/do-not-save.txt
/ # ls -al /data
total 8
drwxr-xr-x 2 root root 60 Apr 7 21:52 .
drwxr-xr-x 1 root root 4096 Apr 7 21:50 ..
-rw-r--r-- 1 root root 18 Apr 7 21:52 do-not-save.txt
/ # exit
As you can see, the directory returned empty, and we can add data as needed back to the directory. The only downside of this is the directory will be empty even if you have content in the image at that location. I've tried combinations of named volumes, or using the mount syntax and passing the volume-nocopy option to 0, without luck. So if you need the directory to be initialized, you'll need to do that as part of your container entrypoint/cmd by copying from another location.
In order to not persist any changes to your containers it is enough that you don't map any directory from host to the container.
In this way, every time the containers runs (with docker run or docker-compose up ), it starts with a fresh file system.
docker-compose down also removes the containers, deleting any data.
The best solution I have found so far, is for the container itself to make sure to clean up when starting or stopping. I solve this by cleaning up when starting.
I copy my app files to /srv/template with the docker COPY directive in my Dockerfile, and have something like this in my ENTRYPOINT script:
rm -rf /srv/server/
cp -r /srv/template /srv/server
cd /srv/server
The following command works perfectly and the riak service starts as expected:
docker run --name=riak -d -p 8087:8087 -p 8098:8098 -v $(pwd)/schemas:/etc/riak/schema basho/riak-ts
The local schemas directory is mounted successfully and the sql file in it is read by riak. However if I try to mount the riak's data or log directories, the riak service does not start and timeouts after 15 seconds:
docker run --name=riak -d -p 8087:8087 -p 8098:8098 -v $(pwd)/logs:/var/log/riak -v $(pwd)/schemas:/etc/riak/schema basho/riak-ts
Output of docker logs riak:
+ /usr/sbin/riak start
riak failed to start within 15 seconds,
see the output of 'riak console' for more information.
If you want to
wait longer, set the environment variable
WAIT_FOR_ERLANG to the
number of seconds to wait.
Why does riak not start when it's logs or data directories are mounted to local directories?
This issue is with the directory owner of mounted log folder. The folder $GROUP and $USER are expected to be riak as follow:
root#20e489124b9a:/var/log# ls -l
drwxr-xr-x 2 riak riak 4096 Jul 19 10:00 riak
but with volumes you are getting:
root#3546d261a465:/var/log# ls -l
drwxr-xr-x 2 root root 4096 Jul 19 09:58 riak
One way to solve this is to have the directory ownership as riak user and group on host before starting the container. I looked the UID/GID (/etc/passwd) in docker image and they were:
riak:x:102:105:Riak user,,,:/var/lib/riak:/bin/bash
now change the ownership on host directories before starting the container as:
sudo chown 102:105 logs/
sudo chown 102:105 data/
This should solve it. At least for now. Details here.