Synchronizing numeric user id's between Dockerfiles and docker-compose.yml? - docker

I have a docker-compose newbie question. We have an existing Jenkins build that creates Docker images and pushes them to an in house Artifactory repository. This is driven by using Maven/Docker and two Dockerfiles, one for the app and one for a volume/data container. The Dockerfiles look something like this:
App:
FROM centos
RUN useradd -u 6666 -ms /bin/bash foouser
COPY src/main/resources/home/foouser/.bashrc /home/foouser/
RUN chown -R foouser:foouser /home/foouser
USER foouser
COPY src/main/resources/opt/myapp/bin/startup.sh /opt/myapp/bin/
WORKDIR /home/foouser
ENTRYPOINT /opt/myapp/bin/startup.sh && /bin/bash
Data container:
FROM centos
# Environment variable for the path to mount/create. Defaults to /opt/data
ENV DATA_VOL_PATH="/opt/data"
# Make sure the user id is the same as the container using the volume, otherwise we may run into permission issues on
# the container mounting the volume.
RUN useradd -u 6666 -ms /bin/bash foouser && \
mkdir -p "$DATA_VOL_PATH" && \
chown -R foouser:foouser "$DATA_VOL_PATH"
VOLUME [ "$DATA_VOL_PATH" ]
I omitted stuff like labels, etc for brevity. So the images produced by the build from these Dockerfiles will end up in the local Artifactory repo. We're using Rancher/Cattle to instantiate these images, and I've added the Artifactory repo to Rancher so it can pull from there. The docker-compose.yml file in Rancher looks something like this:
# The data/volumes container for the data.
data:
image: data-image
# App
myapp:
image: app-image
environment:
DATA_VOL_PATH:
volumes_from:
- data
I know that I can pass environment variables from docker-compose (as in the DATA_VOL_PATH above), but I'm confused as to how things work. My understanding is that the commands in the Dockerfile are executed when I run docker build, and after that, the image is immutable. When I instantiate a container based on the image, it creates a new writable UFS layer on top of it if I've understood things correctly. So in the case of the data container, I can't really change the volume once it's created, right? If that assumption is correct, it boils down to 1) how do I best synchronize user ids across two different system (Maven for creating the Docker images, and Rancher for instantiating container clusters), and 2) is it better to drive the creation of the data volume container entirely from docker-compose.yml? How would I then be able to replicate the data container's Dockerfile content in docker-compose.yml?
I assume this is a fairly common scenario, so there must be a few "best practices" solutions out there. Thanks.

I've run into this issue as well: in my case there was a non-root process inside Docker container, and this process needed to access files mounted from host, so I had to pay attention to UID/GID values both inside Docker image and host.
I'm afraid that there is no good decision for it. Docker images are really immutable, so you'll have to establish some agreements or use third-party software to control both build and deployment process.
I would also strongly encourage you to stop using docker-compose and to give a try to SaltStack.

Related

docker-compose user mapping [duplicate]

I would like to volume mount a directory from a Docker container to my work station, so when I edit the content in the volume mount from my work station it updated in the container as well. It would be very useful for testing and develop web applications in general.
However I get a permission denied in the container, because the UID's in the container and host isn't the same. Isn't the original purpose of Docker that it should make development faster and easier?
This answer works around the issue I am facing when volume mounting a Docker container to my work station. But by doing this, I make changes to the container that I won't want in production, and that defeats the purpose of using Docker during development.
The container is Alpine Linux, work station Fedora 29, and editor Atom.
Question
Is there another way, so both my work station and container can read/write the same files?
There are multiple ways to do this, but the central issue is that bind mounts do not include any UID mapping capability, the UID on the host is what appears inside the container and vice versa. If those two UID's do not match, you will read/write files with different UID's and likely experience permission issues.
Option 1: get a Mac or deploy docker inside of VirtualBox. Both of these environments have a filesystem integration that dynamically updates the UID's. For Mac, that is implemented with OSXFS. Be aware that this convenience comes with a performance penalty.
Option 2: Change your host. If the UID on the host matches the UID inside the container, you won't experience any issues. You'd just run a usermod on your user on the host to change your UID there, and things will happen to work, at least until you run a different image with a different UID inside the container.
Option 3: Change your image. Some will modify the image to a static UID that matches their environment, often to match a UID in production. Others will pass a build arg with something like --build-arg UID=$(id -u) as part of the build command, and then the Dockerfile with something like:
FROM alpine
ARG UID=1000
RUN adduser -u ${UID} app
The downside of this is each developer may need a different image, so they are either building locally on each workstation, or you centrally build multiple images, one for each UID that exists among your developers. Neither of these are ideal.
Option 4: Change the container UID. This can be done in the compose file, or on a one off container with something like docker run -u $(id -u) your_image. The container will now be running with the new UID, and files in the volume will be accessible. However, the username inside the container will not necessarily map to your UID which may look strange to any commands you run inside the container. More importantly, any files own by the user inside the container that you have not hidden with your volume will have the original UID and may not be accessible.
Option 5: Give up, run everything as root, or change permissions to 777 allowing everyone to access the directory with no restrictions. This won't map to how you should run things in production, and the container may still write new files with limited permissions making them inaccessible to you outside the container. This also creates security risks of running code as root or leaving filesystems open to both read and write from any user on the host.
Option 6: Setup an entrypoint that dynamically updates your container. Despite not wanting to change your image, this is my preferred solution for completeness. Your container does need to start as root, but only in development, and the app will still be run as the user, matching the production environment. However, the first step of that entrypoint will be to change the user's UID/GID inside the container to match your volume's UID/GID. This is similar to option 4, but now files inside the image that were not replaced by the volume have the right UID's, and the user inside the container will now show with the changed UID so commands like ls show the username inside the container, not a UID to may map to another user or no one at all. While this is a change to your image, the code only runs in development, and only as a brief entrypoint to setup the container for that developer, after which the process inside the container will look identical to that in a production environment.
To implement this I make the following changes. First the Dockerfile now includes a fix-perms script and gosu from a base image I've pushed to the hub (this is a Java example, but the changes are portable to other environments):
FROM openjdk:jdk as build
# add this copy to include fix-perms and gosu or install them directly
COPY --from=sudobmitch/base:scratch / /
RUN apt-get update \
&& apt-get install -y maven \
&& useradd -m app
COPY code /code
RUN mvn build
# add an entrypoint to call fix-perms
COPY entrypoint.sh /usr/bin/
ENTRYPOINT ["/usr/bin/entrypoint.sh"]
CMD ["java", "-jar", "/code/app.jar"]
USER app
The entrypoint.sh script calls fix-perms and then exec and gosu to drop from root to the app user:
#!/bin/sh
if [ "$(id -u)" = "0" ]; then
# running on a developer laptop as root
fix-perms -r -u app -g app /code
exec gosu app "$#"
else
# running in production as a user
exec "$#"
fi
The developer compose file mounts the volume and starts as root:
version: '3.7'
volumes:
m2:
services:
app:
build:
context: .
target: build
image: registry:5000/app/app:dev
command: "/bin/sh -c 'mvn build && java -jar /code/app.jar'"
user: "0:0"
volumes:
- m2:/home/app/.m2
- ./code:/code
This example is taken from my presentation available here: https://sudo-bmitch.github.io/presentations/dc2019/tips-and-tricks-of-the-captains.html#fix-perms
Code for fix-perms and other examples are available in my base image repo: https://github.com/sudo-bmitch/docker-base
Since the UID in your containers are baked into the container definition, you can safely assume that they are relatively static. In this case, you can create a user in your host system with the machine UID and GID. Change user to the new account, and then make your edits to the files. Your host OS will not complain since it thinks it's just the user accessing its own files, and your container OS will see the same.
Alternatively, you can consider editing these files as root.

Best practice to connect my own code into a standard docker image in kubernetes

I have a lot of standard runtime docker images like python3 with tensorflow 1.7 installed and I want to use these standard images to run some customers code out side of them. The scenario seems quite similar with the serverless. So what is the best way to put the code into runtime dockers?
Right now I am trying to use a persistent volume to mount the code into runtime. But it has a lot of work. Is there some solution easier for this?
UPDATE
What is the workflow for google machine learning engine or floydhub. I think what I want is similar. They have a command line tool to make the local code combine with a standard env.
Following cloud native practices, code should be immutable, and releases and their dependencies uniquely identifiable for repeat-ability, replic-ability, etc - in short: you should really create images with your src code.
In your case, that would mean basing your Dockerfile on upstream python3 or TF images, there are a couple projects that may help with the workflow for above (code+build-release-run):
https://github.com/Azure/draft -- looks like better suited for your case
https://github.com/GoogleContainerTools/skaffold -- more golang friendly afaics
Hope it helps --jjo
One of the best practices is NOT to mount the code from a volume into it, but create a client-specific image that uses your TensorFlow image as a base image:
# Your base image comes in here.
FROM aisensiy/tensorflow:1
# Copy the client into your image.
COPY src /
# As Kubernetes will run your containers with an
# arbitrary UID, we set the user to nobody.
USER nobody
# ... and they will run with GID 0, so we
# need to change the group to 0 and make
# your stuff accessible to GID 0.
RUN \
chgrp -R 0 /src && \
chmod -R g=u && \
true
CMD ["/usr/bin/python", ...]
Some more best practices:
Always log to stdout instead of log files.
One process per container. If you need multiple local
processes, co-locate them into a single pod.
Even more best practices are provided in the OpenShift documentation: https://docs.openshift.org/latest/creating_images/guidelines.html
https://docs.openshift.org/latest/creating_images/guidelines.html
The code file can be passed from stdin when the container is being started. This way you can run arbitrary code when starting the container.
Please see below for example:
root#node-1:~# cat hello.py
print("This line will be printed.")
root#node-1:~#
root#node-1:~# docker run --rm -i python python < hello.py
This line will be printed.
root#node-1:~#
If this is your case,
You have a docker image with code in it.
Aim: To update the code inside docker image.
Solution:
Run a bash session with the docker image with a directory in your file system mounted as volume.
Place the updated code in the volume directory.
From the docker bash session replace the real code with updated code from the volume.
Save the current state of container as new docker image.
Sample Commands
Assume ~/my-dir in your file system has the new code updated-code.py
$ docker run -it --volume ~/my-dir:/workspace --workdir /workspace my-docker-image bash
Now a new bash session will start inside docker container.
Assuming you have the code in '/code/code.py' inside docker container,
You can simply update the code by
$ cp /workspace/updated-code.py /code/code.py
Or you can create new directory and place the code.
$ cp /workspace/updated-code.py /my-new-dir/code.py
Now the docker container contains updated code. But changes will be reset if you close the container and again run the image. To create a docker image with latest code, save this state of container using docker commit.
Open a new tab in the terminal.
$ docker ps
Will list all running docker containers.
Find CONTAINER ID of your docker container and save it.
$ docker commit id-of-your-container new-docker-image-name
Now run the docker image with latest code
$ docker run -it new-docker-image-name
Note: It is recommended to remove the old docker image using docker rmi command as docker images are heavy.
We're dealing with a similar challenge also. Our approach is to build a static docker image where Tensorflow, Python, etc are built once and maintained.
Each user has a PVC (persistent volume claim) where large files that may change such as datasets and workspaces live.
Then we have a bash shell that launches the cluster resources and syncs the workspace using ksync (like rsync for a kubernetes cluster).

Path interpretation in a Dockerfile

I want to run a container, by mounting on the fly my ~/.ssh path (so as to be able to clone some private gitlab repositories).
The
COPY ~/.ssh/ /root/.ssh/
directive did not work out, because the Dockerfile interpreted paths relative to a tmp dir it creates for the builds, e.g.
/var/lib/docker/tmp/docker-builder435303036/
So my next shot was to try and take advantage of the ARGS directive as follows:
ARG CURRENTUSER
COPY /home/$CURRENTUSER/.ssh/ /root/.ssh/
and run the build as:
docker build --build-arg CURRENTUSER=pkaramol <whatever follows ...>
However, I am still faced with the same issue:
COPY failed: stat /var/lib/docker/tmp/docker-builder435303036/home/pkaramol/.ssh: no such file or directory
1: How to make Dockerfile access a specific path inside my host?
2: Is there a better pattern for accessing private git repos from within ephemeral running containers, than copying my .ssh dir? (I just need it for the build process)
Docker Build Context
A build for a Dockerfile can't access specific paths outside the "build context" directory. This is the last argument to docker build, normally .. The docker build command tars up the build context and sends it to the Docker daemon to build the image from. Only files that are within the build context can be referenced in the build. To include a users .ssh directory, you would need to either base the build in the .ssh directory, or a parent directory like /home/$USER.
Build Secrets
COPYing or ADDing credentials in at build time is a bad idea as the credentials will be saved in the image build for anyone who has access to the image to see. There are a couple of caveats here. If you flatten the image layers after removal of the sensitive files in build, or create a multi stage build (17.05+) that only copies non sensitive artefacts into the final image.
Using ENV or ARG is also bad as the secrets will end up in the image history.
There is a long an involved github issue about secrets that covers most the variations on the idea. It's long but worth reading through the comments in there.
The two main solutions are to obtain secrets via the network or a volume.
Volumes are not available in standard builds, so that makes them tricky.
Docker has added secrets functionality but this only available at container run time for swarm based containers.
Network Secrets
Custom
The secrets github issue has a neat little net cat example.
nc -lp 10.8.8.8 8080 < $HOME/.ssh/id_rsa &
And using curl to collect it in the Dockerfile, use it, and remove it in the one RUN step.
RUN set -uex; \
curl -s http://10.8.8.8:8000 > /root/.ssh/id_rsa; \
ssh -i /root/.ssh/id_rsa root#wherever priv-command; \
rm /root/.ssh/id_rsa;
To make unsecured network services accessible, you might want to add an alias IP address to your loopback interface so your build container or local services can access it, but no one external can.
HTTP
Simply running a web server with your keys mounted could suffice.
docker run -d \
-p 10.8.8.8:80:80 \
-v /home/me/.ssh:/usr/share/nginx/html:ro \
nginx
You may want to add TLS or authentication depending on your setup and security requirements.
Hashicorp Vault
Vault is a tool built specifically for managing secrets. It goes beyond the requirements for a Docker build It's written and Go and also distributed as a container.
Build Volumes
Rocker
Rocker is a custom Docker image builder that extends Dockerfiles to support some new functionality. The MOUNT command they added allows you to mount a volume at build time.
Packer
The Packer Docker Builder also allows you to mount arbitrary volumes at build time.

How to migrate Docker volume between hosts?

Docker's documentation states that volumes can be "migrated" - which I'm assuming means that I should be able to move a volume from one host to another host. (More than happy to be corrected on this point.) However, the same documentation page doesn't provide information on how to do this.
Digging around on SO, I have found an older question (circa 2015-ish) that states that this is not possible, but given that it's 2 years on, I thought I'd ask again.
In case it helps, I'm developing a Flask app that uses [TinyDB] + local disk as its data storage - I have determined that I didn't need anything more fancy than that; this is a project done for learning at the moment, so I've decided to go extremely lightweight. The project is structured as such:
/project_directory
|- /app
|- __init__.py
|- ...
|- run.py # assumes `data/databases/ and data/files/` are present
|- Dockerfile
|- data/
|- databases/
|- db1.json
|- db2.json
|- files/
|- file1.pdf
|- file2.pdf
I have the folder data/* inside my .dockerignore and .gitignore, so that they are not placed under version control and are ignored by Docker when building the images.
While developing the app, I am also trying to work with database entries and PDFs that are as close to real-world as possible, so I seeded the app with a very small subset of real data, that are stored on a volume that is mounted directly into data/ when the Docker container is instantiated.
What I want to do is deploy the container on a remote host, but have the remote host seeded with the starter data (ideally, this would be the volume that I've been using locally, for maximal convenience); later on as more data are added on the remote host, I'd like to be able to pull that back down so that during development I'm working with up-to-date data that my end users have entered.
Looking around, the "hacky" way I'm thinking of doing is simply using rsync, which might work out just fine. However, if there's a solution I'm missing, I'd greatly appreciate guidance!
The way I would approach this is to generate a Docker container that stores a copy of the data you want to seed your development environment with. You can then expose the data in that container as a volume, and finally mount that volume into your development containers. I'll demonstrate with an example:
Creating the Data Container
Firstly we're just going to create a Docker container that contains your seed data and nothing else. I'd create a Dockerfile at ~/data/Dockerfile and give it the following content:
FROM alpine:3.4
ADD . /data
VOLUME /data
CMD /bin/true
You could then build this with:
docker build -t myproject/my-seed-data .
This will create you a Docker image tagged as myproject/my-seed-data:latest. The image simply contains all of the data you want to seed the environment with, stored at /data within the image. Whenever we create an instance of the image as a container, it will expose all of the files within /data as a volume.
Mounting the volume into another Docker container
I imagine you're running your Docker container something like this:
docker run -d -v $(pwd)/data:/data your-container-image <start_up_command>
You could now extend that to do the following:
docker run -d --name seed-data myproject/my-seed-data
docker run -d --volumes-from seed-data your-container-image <start_up_command>
What we're doing here is first creating an instance of your seed data container. We're then creating an instance of the development container and mounting the volumes from the data container into it. This means that you'll get the seed data at /data within your development container.
This gets a little bit of a pain that you know need to run two commands, so we could go ahead and orchestrate it a bit better with something like Docker Compose
Simple Orchestration with Docker Compose
Docker Compose is a way of running more than one container at the same time. You can declare what your environment needs to look like and do things like define:
"My development container depends on an instance of my seed data container"
You create a docker-compose.yml file to layout what you need. It would look something like this:
version: 2
services:
seed-data:
image: myproject/my-seed-data:latest
my_app:
build: .
volumes_from:
- seed-data
depends_on:
- seed-data
You can then start all containers at once using docker-compose up -d my_app. Docker Compose is smart enough to firstly start an instance of your data container, and then finally your app container.
Sharing the Data Container between hosts
The easiest way to do this is to push your data container as an image to Docker Hub. Once you have built the image, it can be pushed to Docker Hub as follows:
docker push myproject/my-seed-data:latest
It's very similar in concept to pushing a Git commit to a remote repository, instead in this case you're pushing a Docker image. What this does mean however is that any environment can now pull this image and use the data contained within it. That means you can re-generate the data image when you have new seed data, push it to Docker Hub under the :latest tag and when you re-start your dev environment will have the latest data.
To me this is the "Docker" way of sharing data and it keeps things portable between Docker environments. You can also do things like have your data container generated on a regular basis by a job within a CI environment like Jenkins.
According the Docker docs you could also create a Backup and Restore it:
Backup volume
docker run --rm --volumes-from CONTAINER -v \
$(pwd):/backup ubuntu tar cvf /backup/backup.tar /MOUNT_POINT_OF_VOLUME
Restore volume from backup on another host
docker run --rm --volumes-from CONTAINER -v \
$(pwd):/LOCAL_FOLDER ubuntu bash -c "cd /MOUNT_POINT_OF_VOLUME && \
tar xvf /backup/backup.tar --strip 1"
OR (what I prefer) just copy it to local storage
docker cp --archive CONTAINER:/MOUNT_POINT_OF_VOLUME ./LOCAL_FOLDER
then copy it to the other host and start with e.g.
docker run -v ./LOCAL_FOLDER:/MOUNT_POINT_OF_VOLUME some_image
you can use this trick :
docker run --rm -v <SOURCE_DATA_VOLUME_NAME>:/from alpine ash -c "cd /from ; tar -cf - . " | ssh <TARGET_HOST> 'docker run --rm -i -v <TARGET_DATA_VOLUME_NAME>:/to alpine ash -c "cd /to ; tar -xpvf - " '
more information

Docker - How to access a volume not attached to a container?

I have (had) a data container which has a volume used by other containers (--volumes-from).
The data container has accidentally been removed.
Thankfully the volume was not removed.
Is there any way I can re run the data container and point it BACK to this volume?
Is there any way can re run the data container and point it BACK to this volume?
Sure, I detailed it in "How to recover data from a deleted Docker container? How to reconnect it to the data?"
You need to create a new container with the same VOLUME (but its path /var/lib/docker/volumes/... would be empty or with an initial content)
Then you move your legacy volume to the path of the volume of the new container.
More generally, whenever I start a data volume container, I register its volume path in a file (to reuse that path later if my container is accidentally removed)
Not entirely sure but you might try
docker run -i -t --volumes-from yourvolume ubuntu /bin/bash
You should then be able to access the directory, i think.
When I came to this question, my main concern was data loss. Here is how I copied data from a volume to AWS S3:
# Create a dummy container - I like Python
host$ docker run -it -v my_volume:/datavolume1 python:3.7-slim bash
# Prepare AWS stuff
# executing 'cat ~/.aws/credentials' on your development machine
# will likely show them
python:3.7-slim$ pip install awscli
python:3.7-slim$ export AWS_ACCESS_KEY_ID=yourkeyid
python:3.7-slim$ export AWS_SECRET_ACCESS_KEY=yoursecrectaccesskey
# Copy
python:3.7-slim$ aws s3 cp /datavolume1/thefile.zip s3://bucket/key/thefile.zip
Alternatively you can use aws s3 sync.
MySQL / MariaDB
My specific example was about MySQL / MariaDB. If you want to backup a database of MySQL / MariaDB, just execute
$ mysqldump -u [username] -p [database_name] \
--single-transaction --quick --lock-tables=false \
> db1-backup-$(date +%F).sql
You might also want to consider
--skip-add-drop-table: Skip the table creation if it already exists. Without this flag, the table is dropped.
--complete-insert: Add the column names. Without this flag, there might be a column mismatch.
To restore the backup:
$ mysql -u [username] -p [database_name] < [filename].sql

Resources