How to make docker download external file references at initialisation - docker

new user of docker here, I tried to figured it myself with google but...
I have a nodeJS/express app with this structure :
|- modules
|- lib
|- node_modules : nodejs modules
|- server.js
|- docker-compose.yml
|- dockerfile
In the lib directory I have reference files (like for example the geo2ip database : https://dev.maxmind.com/geoip/geoip2/geolite2/ ) that I don't want to include it in my docker image, as it will be huge, and quickly outdated.
Instead I want my docker image, every it runs to download the last version of the file through an url, so I don't have to build another image every time I need to update this reference file.
I tried to add curl command in the dockerfile like this :
CMD curl https://test.com/huge_file.mmdb --output lib/huge_file.mmdb
Or execute shell script, but either it has no effect, either it adds the file to the image, which I don't want.
Any idea how I can ask docker to download a file from the internet, and add it to the container at startup ?
Thanks !

The first thing you need to do for this case is to mount some external storage into your container.
docker run -v $PWD/huge_files:/app/lib ...
This causes whatever is in the huge_files directory locally to be visible inside the container, replacing what was in the lib directory before.
Once you've done this, the easiest thing to do is to just manage this whole process from the host. Download the files at your convenience; possibly have a shell script to to update the files and start the container. The advantage of doing this is that you control exactly when it happens; the downside is that it's a manual step.
If you want the container to do it on startup, you can write an entrypoint script to do this. A Docker ENTRYPOINT is a program that gets run as the main container process; it gets run instead of CMD, but it gets passed the CMD as arguments. Frequently an entrypoint will be a shell script that ends by replacing itself with the CMD, exec "$#". You can use this for any first-time setup your container needs:
#!/bin/sh
# Download the file if it doesn't exist
if [ ! -f lib/huge_file.mmdb ]; then
curl https://test.com/huge_file.mmdb --output lib/huge_file.mmdb
fi
# Switch to the container command
exec "$#"
The downside to this approach is potentially doing a large download outside the user's control; if nothing is mounted into the container, potentially repeating the same download on every container startup.

Related

Supervisor as a way to run multiple entrypoints for my container

I have a need to run a script when my container starts up. (My script creates a json file from the passed in deployment environment variables for my SPA app to use as configuration.) My container is based on the Nginx container.
But a Dockerfile does not allow for more than one ENTRYPOINT and the Nginx container already has one (that I assume I need to keep in there so it will work right.)
I found this question: Docker multiple entrypoints. It's accepted answer suggests that I use Supervisor to accomplish what I need. I was ready to try that out, but one of the alternate answers just uses a script (which was my plan before I saw this answer). In the comments, this solution (of using a script) has a concern raised. It says:
Supervisor is arguably better, since it can restart your processes if they die, has configurable logging options, can receive commands remotely over RPC, etc.
The reference here to "it can restart your processes if they die" concerns me a bit. I am not sure what that means. I don't want my script to be re-run once it is done. (It creates the json file at container startup and is not needed after that.)
Will supervisor cause my script to be re-run over and over?
Or am I fine to use supervisor to run both Nginx's /docker-entrypoint.sh script and my script? Or should I just chain them together and leave supervisor out of it?
My script creates a json file from the passed in deployment environment variables for my SPA app to use as configuration. My container is based on the Nginx container.
If you look at the Dockerfile you link to, that launches a docker-entrypoint.sh script as the ENTRYPOINT. That script is in the same directory in the same GitHub repository, and runs
find "/docker-entrypoint.d/" -follow -type f -print | sort -n | while read -r f; do
case "$f" in
*.sh)
if [ -x "$f" ]; then
echo >&3 "$0: Launching $f";
"$f"
...
fi
;;
esac
done
So, for this particular base image, if you put your initialization script in /docker-entrypoint.d, it's named *.sh, and it's executable, then the existing ENTRYPOINT will run it for you at container startup time; you don't have to override anything.
FROM nginx:1.19
COPY build-config-json /docker-entrypoint.d/30-build-config-json.sh
RUN chmod +x /docker-entrypoint.d/30-build-config-json.sh
# End of file
The default packaging also includes a script that will run envsubst on any *.template file in /etc/nginx/templates to produce a corresponding config file in /etc/nginx/conf.d, if that's what your startup sequence happens to need.
My script creates a json file from the passed in deployment environment variables for my SPA app to use as configuration.
This is straightforward to do in an entrypoint script. You can write a script that does the initial setup, and then launches the main process (there's a "but" for your particular case):
#!/bin/sh
/app/build-config-json -o /etc/nginx/config.json # or whatever
# --more--
My container is based on the Nginx container ... and the Nginx container already has [an ENTRYPOINT]
Normally you'd end the entrypoint script with exec "$#" to launch the main container process. In this case, you can launch the original entrypoint script instead
# ... the rest of the first-time setup ...
exec /docker-entrypoint.sh "$#"
In your Dockerfile you can specify your new ENTRYPOINT, but when you do you need to repeat the original CMD as well. That means this approach generally requires some knowledge of the underlying image; its Dockerfile is ideal but you can find the ENTRYPOINT and CMD from docker history or docker inspect.
ENTRYPOINT ["/wrapped-entrypoint.sh"]
CMD ["nginx", "-g", "daemon off;"]
I would avoid supervisord if possible. The options it has around restarting processes and managing logging are things Docker will already do for you. It's not necessary for just startup-time container initialization.

Docker Parameterize Files Passed Inside

I am trying to pass a directory inside the container, eventually where this can be automated. However I don't see any alternative other than physically editing the Dockerfile and manually typing the specific directory to be added.
Note: I have tried mounted volumes, however that solution doesn't help my issue, as I want to eventually call the container on a directory which will eventually have a script run on the directory in the container--not simply copying the local directory inside the container.
Method 1:
$ --build-arg project_directory=/path/to/dir
ARG project_directory
ADD $project_directory .
My unsuccessful solution assumes that I can use the argument's value as a basic string that the ADD command can interpret just as if I was just manually entering the path.
not simply copying the local directory inside the container
That's exactly what you're doing now, by using ADD $project_directory. If you need to make changes from the container and have them reflected onto the host, use:
docker run -v $host_dir:$container_dir image:tag
The command above launches a new container, and it's quite possible for you to launch it with different directory names. You can do so in a loop, from a jenkins pipeline, a shell script, or whatever suits your development environment.
#!/bin/bash
container_dir=/workspace
for directory in /src /realsrc /kickasssrc
do
docker run -v $directory:$container_dir image:tag
done

docker ubuntu sourceing after starting image

I built myself an image for ROS. I run it while mounting my original home on the host and some tricks to get graphics as well. After starting the shell inside docker I always need to execute two source commands. One of the files to be sourced are actually inside the container, but the other resides in my home, which only gets mounted on starting the container. I would have these two files sourced automatically.
I tried adding
RUN bash -c "source /opt/ros/indigo/setup.bash"
to the image file, but this did not actually source it. Using CMD instead of run didn't drop me into the container's shell (I assume it finished executing source and then exited?). I don't even have an idea how to source the file that is only available after startup. What would I need to do?
TL;DR: you need to perform this step as part of your CMD or ENTRYPOINT, and for something like a source command, you need a step after that in the shell to run your app, or whatever shell you'd like. If you just want a bash shell as your command, then put your source command inside something like your .bashrc file. Or you can run something like:
bash -c "source /opt/ros/indigo/setup.bash && bash"
as your command.
One of the files to be sourced are actually inside the container, but the other resides in my home, which only gets mounted on starting the container.
...
I tried adding ... to the image file
Images are built using temporary containers that only see your Dockerfile instructions and the context sent with that to run the build. Containers use that built image and all of your configuration, like volumes, to run your application. There's a hard divider between those two steps, image build and container run, and your volumes are not available during that image build step.
Each of those RUN steps being performed for the image build are done in a temporary container that only stores the output of the filesystem when it's finished. Changes to your environment, a cd into another directory, spawned processes or services in the background, or anything else not written to the filesystem when the command spawned by RUN exits, will be lost. This is one reason you will see commands chained together in a single long RUN command, and it's why you have ENV and WORKDIR commands in the Dockerfile.

How to keep changes inside a container on the host after a docker build?

I have a docker-compose dev stack. When I run, docker-compose up --build, the container will be built and it will execute
Dockerfile:
RUN composer install --quiet
That command will write a bunch of files inside the ./vendor/ directory, which is then only available inside the container, as expected. The also existing vendor/ on the host is not touched and, hence, out of date.
Since I use that container for development and want my changes to be available, I mount the current directory inside the container as a volume:
docker-compose.yml:
my-app:
volumes:
- ./:/var/www/myapp/
This loads an outdated vendor directory into my container; forcing me to rerun composer install either on the host or inside the container in order to have the up to date version.
I wonder how I could manage my docker-compose stack differently, so that the changes during the docker build on the current folder are also persisted on the host directory and I don't have to run the command twice.
I do want to keep the vendor folder mounted, as some vendors are my own and I like being able to modifiy them in my current project. So only mounting the folders I need to run my application would not be the best solution.
I am looking for a way to tell docker-compose: Write all the stuff inside the container back to the host before adding the volume.
You can run a short side container after docker-compose build:
docker run --rm -v /vendor:/target my-app cp -a vendor/. /target/.
The cp could also be something more efficient like an rsync. Then after that container exits, you do your docker-compose up which mounts /vendor from the host.
Write all the stuff inside the container back to the host before adding the volume.
There isn't any way to do this directly, but there are a few options to do it as a second command.
as already suggested you can run a container and copy or rsync the files
use docker cp to copy the files out of a container (without using a volume)
use a tool like dobi (disclaimer: dobi is my own project) to automate these tasks. You can use one image to update vendor, and another image to run the application. That way updates are done on the host, but can be built into the final image. dobi takes care of skipping unnecessary operations when the artifact is still fresh (based on modified time of files or resources), so you never run unnecessary operations.

Programmatically removes files/folder resides in docker container

I am currently exploring on how we can remove the file/folder resides inside docker container programmatically . I know we can copy files from container to host using docker cp. However, I am looking for something like docker mv or docker rm which allows me to move or remove files/folders inside docker.
The scenario is, We are writing the automated test cases in Java and for one case we need to copy the log file from the server's log folder to assert the log written by the test case. We are using the docker cp to copy the log files. However it contains the log of old test cases as well. So I was thinking if I can remove the log files before executing my test case. It make sure the log I have copied is written by my test case only. Is there any other way around?
You can use below command to remove the files from program running on host
docker exec <container> rm -rf <YourFile>
However if old files exist because the container were never removed, then general practice is to remove the container once the all test suite execution is complete,
New job should create the new container.
In the docker file definition you can use
RUN rm [folder-path]
However anything added using ADD or COPY will still increase the size of your image.
EDIT: For accessing a running container from an external program running on your host try this.
Mount a host folder as a volume when starting the instance.
Run a script or program on the host to delete desired folders on the host which will affect the container file system as well.
You can access the container bash console
#console control over container
docker exec -it a5866aee4e90 bash
Then when you are inside the container you can do anything with console.
Im using this command to find and rename files in my jboss deployments directory. Modify it how you need to serve you. You can delete files also insted of mv use rm
find /home/jboss/jbossas/standalone/deployments -depth -name "*.failed" -exec sh -c 'mv "$1" "${1%.failed}.dodeploy"' _ {} \;

Resources