How to automatically extract build artifacts from a docker container keeping ownership of the calling process? - docker

I'm writing some automated build scripts which use a docker container as a build environment.
One thing that's been bugging me is finding a way to extract the build artifacts from the container and retaining the user ownership of the calling process.
Usually this is automatic; when a process creates a file, the file is owned by the user running the process. But where a process invokes a docker container, the container runs as a different user (often root). I see no simple way for the container to run as the same user as the calling process. So if I map a local directory when invoking docker (docker run --volume $(pwd)/target:/target) then when the build script in the image writes it's files, they will turn up in the host's build directory owned by root.
The other alternative I can see is to run the container, wait for it to complete, then use docker cp to extract the build artifacts. The trouble with this is I don't see a way to run a container to completion and then get the container ID of the recently created container.
Is there a common way to automatically / programmatically extract build artifacts from a docker container keeping the ownership of the calling process?

Related

How to include files outside of build context without specifying a different Dockerfile path?

This is basically a follow-up question to How to include files outside of Docker's build context?: I'm using large files in all of my projects (several GBs) which I keep on an external drive, only used for development.
I want to COPY or ADD these files to my docker container when building it. The answer linked above allows one to specify a different path to a Dockerfile, potentially extending the build context. I find this unpractical, since this would require setting the build context to system root (?), to be able to include a single file.
Long story short: Is there any way or workaround to include a file that is far removed from the docker build context?
Three suggestions on things you could try:
include a file that is far removed from the docker build context?
You could construct your own build context by cp (or tar) files on the host into a dedicated directory tree. You don't have to use the actual source tree or your build tree.
rm -rf docker-build
mkdir docker-build
cp -a Dockerfile build/the-binary docker-build
cp -a /mnt/external/support docker-build
docker build ./docker-build
# reads docker-build/Dockerfile, and the files in the
# docker-build directory, but nothing else; only sends
# the docker-build directory to Docker as the build context
large files [...] (several GBs)
Docker doesn't deal well with build contexts this large. In the past I've at least seen docker build take a long time just on the step of sending the build context to itself, and docker push and docker pull have network issues when trying to send the gigabyte+ layer around.
It's a little hacky and breaks the "self-contained image" model a little bit, but you can provide these files as a Docker bind-mount instead of including them in the image. Your application needs to know what to do if the data isn't there. When you go to deploy the application, you also need to separately distribute the files alongside the Docker image and other deployment artifacts.
docker run \
-v /mnt/external/support:/app/support
...
the-image-without-the-support-files
only used for development
Potentially you can get away with not using Docker at all during this phase of development. Use a local source tree and local development tools; run your unit tests against these large test fixtures as needed. Build a Docker image only when you're about to run pre-commit integration tests; that may be late enough in the development cycle that you don't need these files.
I think the main thing you are worried about is that you do not want to send all files of a directory to docker daemon while it builds the image.
When directory was so big (in GBss) it takes lot of time to build an image.
If the requirement is to just use those files while you build anything inside docker, you can mount those to the container.
A tricky way
Run a container with base image and mount the direcotries inside it. docker run -d -v local-path:container-path
Get inside the container docker exec -it CONTAINER_ID bash
Run build step ./build-something.sh
Create image from the running container docker commit CONTAINER_ID
Tag the image docker tag IMAGE_ID tag:v1. You can get Image ID from previous command
From long term perspective this method may seem to be very tedious, but if you want to build image for 1 or 2 times , you can try this method.
I tried this for one of my docker image, as I want to avoid large amount of files sent to docker daemon during image build
The copy command gets source and destination values,
just specify full absolute path to your hard drive mount point as the src directory
COPY /absolute_path/to/harddrive /container/path

How docker detects which changes should be saved and which not?

I know that when we stop docker our changes are lost. There are many answers how to prevent this - commit each time. Idea is that when docker runs it will spin up a fresh container based on the image. On the other hand container persists some data after it exists unless you start using --rm.
Just to simplify:
If you run apt-get install vim, you must commit to save the change
BUT If you change nginx.conf or upload new file to HDFS, you do not lose the data.
So, just curious:
How docker knows what to save and what not? Ex: At the end of apt-get-install we have new files in the system. The same is when I upload new file. for the container/image there is NO difference , Right? Just I/O modification. So how docker know which modification should be saved when we stop the image?
The basic rules here:
Anything you explicitly store outside the container — a database, S3 — will outlive the container.
If you attach a volume to the container when you create the container using a docker run -v option or a Docker Compose volumes: option, any data written to that directory outlives the container. (If it’s a named volume, it lasts until you docker volume rm it.)
Anything else in the container filesystem is lost as soon as you docker rm the container.
If you need things like your application source code or a helper tool installed in an image, write a Dockerfile to describe how to build the image and run docker build. Check the Dockerfile into source control alongside your application.
The general theory of working with Docker is that you always start from a clean slate. When you docker build an image, you start from a base image and install your application into it; you never try to upgrade an installed application. Similarly, when you docker run a container, you start from a fresh copy of its image.
So the clearest answer to the question you ask is really, if you consistently docker rm a container when you stop it, when you docker run a new container, it will have the base image plus the content from the mounted volumes. Docker will never automatically persist anything outside of this.
You should never run docker commit: this leads to magic images that can’t be recreated later (in six months when you discover a critical security issue that risks taking your site down). Similarly, you should never install software in a running container, because it will be lost as soon as the container exits; add it to your Dockerfile and rebuild.
For any Container working with the Docker platform by default all the data generated is temporary and all the file generation or data generation is temporary and no data will persist if you have not mounted the filesystem part of if you have not attached volumes to the container.
IF you are finding that the nginx.conf is getting reused even after changes i would suggest try to find what directories are you trying to mount or mapped to the docker volumes.
The configurations for nginx which reside at /etc/nginx/conf.d/* and you might be mapping the volume with this directory. So if you make any changes in a working container and then remove the container the data will still persist as the data gets written to the writable layer. If the new container which you deploy later with the same volume mapping you will find all the changes you had initially done in the previous case are reflected in the newer container as well.

Using docker to file-sandbox a program

I want to use docker to achieve the following:
Run a program with access to a certain directory and then after the run reset all the changes made to that directory. I therefore wanted to create a docker image that copies said directory into the docker volume so that the program has access to the files in the directory, but the directory has the same contents for each run.
However, this seems to add a lot of overhead - are there other (better) ways to achieve the described goal?
PS: I want to measure the behavior of the program, but not for security reasons, i.e. I trust the program.
You actually don't have to do the copying, that's exactly what Docker is for: keeping and discarding changes. Volumes are a way to preserve the files (and/or share them between containers), not for discarding them.
Once you build the image (using docker build and Dockerfile), the image acts as a starting point for containers. When you use docker start <image>, you create a container based the given image. The container runs a single program (called the entry point), and when it exits, any changes it made are either discarded (docker container rm) or you can start the program again from the last state (docker start). The same image is shared as read-only layer among all running containers so that each container only stores any differences the program makes. In the background, docker uses overlay filesystem to actieve this, so if you wanted, you could also do it yourself.
So the short answer is:
Create an image containing the untouched directory
Run the image with docker run --rm <image>.
(Alternatively, you can use just docker run <image> and delete container yourself when it exits with docker rm <container>).
Everythime you do docker run, the program will see the untouched directory as it is was stored in the image. When the program exits, any changes will be discarded. Even if you run two containers in parallel, their changes will be isolated.

Dealing with data in Docker Containers with Gitlab-Ci

So I am using gitlab-ci to deploy my websites in docker containers, because the gitlab-ci docker runner doesn't seem to do what I want to do I am using the shell executor and let it run docker-compose up -d. Here comes the problem.
I have 2 volumes in my docker-container. ./:/var/www/html/ (which is the content of my git repo, so files I want to replace on build) and a mount that is "inside" of this mount /srv/data:/var/www/html/software/permdata (which is a persistent mount on my server).
When the gitlab-ci runner starts it tries to remove all files while the container is running, but because of this mount in mount it gets a device busy and aborts. So I have to manually stop and remove the container before I can run my build (which kind of defeats the point of build automation).
Options I thought about to fix this problem:
stop and remove the container before gitlab-ci-multi-runner starts (seems not possible)
add the git data to my docker container and only mount my permdata (seems like you can't add data to a container without the volume option with docker compose like you can in a Dockerfile)
Option 2 would be ideal because then it would also sort out my issues with permissions on the files.
Maybe someone has gone through the same problem and could give me an advice
seems like you can't add data to a container without the volume option with docker compose like you can in a Dockerfile
That's correct. The Compose file is not meant to replace the Dockerfile, it's meant to run multiple images for an application or project.
You can modify the Dockerfile to copy in the git files.

Docker commit without running

When I run docker build . the id that is spit out is of the image, which is what I thought was being committed to the docker repo. But when i run docker commit <id>, it says that it is not a valid container id. I usually get around this by starting the image in a container and then committing that id. But what should I do if the container requires linked containers to run? Running the container can take a long time especially when the build process is in the run script. If this fails, or requires a linked container to succeed, the process will exit, and my container will shut down, which does not allow me to create my new image. Is there a way to build your dockerfile and commit to the repo at the same time? Alternatives?
A Dockerfile is designed to provide a completely host independent way to repeatably build images without depending on any aspect of the host's configuration. This is why linking is not included in individual build steps, as it would render the build dependent on the other containers on the host at the time of build. Because of this Dockerfiles are not the only way to build containers.
When you must have a host dependent build environment, use a Dockerfile for the base part, installing dependencies etc, then use docker run from a script/configuration management system of your choice to setup the other containers and do the actual build. Once the build is complete, you can commit the resulting container, tag it with a name, and then push it to the repo.
To address the question at the top of the post, If you want to give a name to an image produced by a Dockerfile use docker tag image-id name
Committing takes a container and produces an image
tagging takes an image and gives it a name
pushing takes an image an a name and makes it available to pull later.

Resources