Is there a good, standard way to "bootstrap" a containerized application? - docker

I'm working on an application which needs to be initialized the first time it is run.
Practically, what this will do is initialize a database with some starter values, and save some files in a persistent volume. If I stop the container and then restart it, I don't want to re-run that bootstrapping routine. In other words, if the container is present and populated - skip the initialization routine.
The way I was going to implement this was have an entry-point script which checks if the configuration files are present, and if so will skip the bootstrapping routine, however, I was wondering if there is a better way to do it?
For example, is there a way to run a script which is specifically triggered by the need to create a volume? If I could d that, the only circumstance under which I'd run the bootstrapper would be when the application was initializing for the first time.
Or, is there a better, more Dockerish pattern that defines how I should go about this problem?

"Do the initialization in an entrypoint script if the files don't already exist" seems to be reasonably idiomatic. For example, the standard postgres:9.6 image checks for a $PGDATA/PG_VERSION file.
Hypothetically this can look something like:
#!/bin/sh
if [ ! -f /data/config.ini ]; then
/opt/myapp/setup-data.sh /data
fi
exec "$#"
Remember that it's very routine to delete and recreate containers for a variety of reasons (IME stop and start as actions are rare, but some of this is habits born of an earlier age of Docker);this ties well into your intuition to use the entrypoint for this, since it will get launched on every docker run. From within your container you can't really tell if a directory is or isn't a volume and there aren't any hooks you can tie into; at the point the entrypoint begins, the container environment is fully set up, with whatever networks and volumes already attached.

Related

Deriving FROM existing Dockerfile + setting USER to non-root

I'm trying to find a generic best practice for how to:
Take an arbitrary (parent) Dockerfile, e.g. one of the official Docker images that run their containerized service as root,
Derive a custom (child) Dockerfile from it (via FROM ...),
Adjust the child in the way that it runs the same service as the parent, but as non-root user.
I've been searching and trying for days now but haven't been able to come up with a satisfying solution.
I'd like to come up with an approach e.g. similar to the following, simply for adjusting the user the original service runs as:
FROM mariadb:10.3
RUN chgrp -R 0 /var/lib/mysql && \
chmod g=u /var/lib/mysql
USER 1234
However, the issue I'm running into again and again is whenever the parent Dockerfile declares some path as VOLUME (in the example above actually VOLUME /var/lib/mysql), that effectively makes it impossible for the child Dockerfile to adjust file permissions for that specific path. The chgrp & chmod are without effect in that case, so the resulting docker container won't be able to start successfully, due to file access permission issues.
I understand that the VOLUME directive works that way by design and also why it's like that, but to me it seems that it completely prevents a simple solution for the given problem: Taking a Dockerfile and adjusting it in a simple, clean and minimalistic way to run as non-root instead of root.
The background is: I'm trying to run arbitrary Docker images on an Openshift Cluster. Openshift by default prevents running containers as root, which I'd like to keep that way, as it seems quite sane and a step into the right direction, security-wise.
This implies that a solution like gosu, expecting the container to be started as root in order to drop privileges during runtime isn't good enough here. I'd like to have an approach that doesn't require the container to be started as root at all, but only as the specified USER or even with a random UID.
The unsatisfying approaches that I've found until now are:
Copy the parent Dockerfile and adjust it in the way necessary (effectively duplicating code)
sed/awk through all the service's config files during build time to replace the original VOLUME path with an alternate path, so the chgrp and chmod can work (leaving the original VOLUME path orphaned).
I really don't like these approaches, as they require to really dig into the logic and infrastructure of the parent Dockerfile and how the service itself operates.
So there must be better ways to do this, right? What is it that I'm missing? Help is greatly appreciated.
Permissions on volume mount points don't matter at all, the mount covers up whatever underlying permissions were there to start with. Additionally you can set this kind of thing at the Kubernetes level rather than worrying about the Dockerfile at all. This is usually though a PodSecurityPolicy but you can also set it in the SecurityContext on the pod itself.

Automatically Configure Config inside Docker Container

While setting up and configure some docker containers I asked myself how I could automatically edit some config files inside the container after the containerized service finished installing (since the config files are created at the installation).
I have tried that using a shell file and adding it as the entrypoint in the Dockerfile. However, as I have said the config file does not exist right at the beginning and hence the sed commands in the script fail.
Linking an config files with - ./myConfig.conf:/xy/myConfig.conf is also not an option because the config contains some installation dependent options.
The most reasonable solution I have found was running a script, which edits the config, manually after the installation has finished with docker exec -i mycontainer sh < editconfig.sh
EDIT
My question is formulated in general terms. However, the question arose while working with Nextcloud in a docker-compose setup similar to the official example. That container contains a config.php file which is the general config file of Nextcloud and is generated during the installation. Certain properties of that files have to be changed (there are only a very limited number of environmental variables to specify). Since I am conducting some tests with this container I have to repeatedly reinstall it and thus reedit the config file.
Maybe you can try another approach and have your config file/application pick its settings from the environmental variables. That would be consistent with the 12factor app methodology see here
How I understand your case you need to start your container from creating config by some template.
I see a number of options to do it:
Use some script that generates a config from template and arguments from a command line or environment variables. (Jinja2 and python for example or Mustache and node.js ). In this case, your entrypoint generate the template and after this start application. For change config, you will be forced restart service (container).
Run some service can save the configuration and render you configuration in run time. Personally, I like consul template, we active use this engine in our environment, and have no problems for while. In this case, config is more dynamic and able to be changed "on the fly". In your container, you will have two processes, application, and consul-template daemon. Obviously, you will need to run and maintain consul. For reloading config restart of an application process is enough.
Run a custom script to create the config. :)

Is there a reason why people write everything to the DockerFile instead of a separate shell script?

I somehow don't like the RUN x && y && z ... syntax we currently use in DockerFile. As far as I understand I could just run a shell script instead like RUN xyz.sh and do the same tasks on my favorite language. Does the latter have any disadvantage?
Update:
In additional to the point made by David about the complexity, I believe writing everything to the Dockerfile makes it easier for share (thus creating a survivorship bias for you). Namely on the DockerHub, most of the time, you have a "Dockerfile" tab to quickly get the idea on how the image is built. If the author uses COPY and RUN xyz.sh, he/she would have to host the script elsewhere or the Dockerfile alone becomes meaningless.
CMD is executed at runtime, that is when the container is created from the image. RUN is a build time instruction. So the question is actually why people run things with RUN instead of CMD at runtime. (You can of course COPY script.sh /script.sh then RUN bash /script.sh)
If you do things like installing dependencies, it could take a lot of time, in case of scaling up your service, this would make auto-scaling useless because it can't be fast enough to absorb the peak.
At build time, RUN can be cached, so next time the build will be a lot faster.
Because the way docker file system works, creating 10 containers from the same image takes only a few more space than creating 1 container. So you can save disk space by installing packages in the image, while if you install them at runtime, they will all occupy a part of disk space.
RUN executes commands in a new layer and creates a new image. This happens when you build the image using docker build.
CMD specifies a default command an parameters to be run when a container is launched from the image.
In summary. Run and cmd is not interchangeable, RUN runs when an image is created, CMD when a container is launched.

Modifying volume data inherited from parent image

Say there is an image A described by the following Dockerfile:
FROM bash
RUN mkdir "/data" && echo "FOO" > "/data/test"
VOLUME "/data"
I want to specify an image B that inherites from A and modifies /data/test. I don't want to mount the volume, I want it to have some default data I specify in B:
FROM A
RUN echo "BAR" > "/data/test"
The thing is that the test file will maintain the content it had at the moment of VOLUME instruction in A Dockerfile. B image test file will contain FOO instead of BAR as I would expect.
The following Dockerfile demonstrates the behavior:
FROM bash
# overwriting volume file
RUN mkdir "/volume-data" && echo "FOO" > "/volume-data/test"
VOLUME "/volume-data"
RUN echo "BAR" > "/volume-data/test"
RUN cat "/volume-data/test" # prints "FOO"
# overwriting non-volume file
RUN mkdir "/regular-data" && echo "FOO" > "/regular-data/test"
RUN echo "BAR" > "/regular-data/test"
RUN cat "/regular-data/test" # prints "BAR"
Building the Dockerfile will print FOO and BAR.
Is it possible to modify file /data/test in B Dockerfile?
It seems that this is intended behavior.
Changing the volume from within the Dockerfile: If any build steps change the data within the volume after it has been declared, those changes will be discarded.
VOLUMEs are not part of your IMAGE. So what is the use case to seed data into it. When you push the image into another location, it is using an empty volume at start. The dockerfile behaviour does remind you of that.
So basically, if you want to keep the data along with the app code, you should not use VOLUMEs. If the volume declaration did exist in the parent image then you need to remove the volume before starting your own image build. (docker-copyedit).
There are a few non-obvious ways to do this, and all of them have their obvious flaws.
Hijack the parent docker file
Perhaps the simplest, but least reusable way, is to simply use the parent Dockerfile and modify that. A quick Google of docker <image-name:version> source should find the github hosting the parent image Dockerfile. This is good for optimizing the final image, but destroyes the point of using layers.
Use an on start script
While a Dockerfile can't make further modifications to a volume, a running container can. Add a script to the image, and change the Entrypoint to call that (and have that script call the original entry point). This is what you will HAVE to do if you are using a singleton-type container, and need to partially 'reset' a volume on start up. Of course, since volumes are persisted outside the container, just remember that 1) your changes may already be made, and 2) Another container started at the same time may already be making those changes.
Since volumes are (virtually) forever, I just use one time setup scripts after starting the containers for the first time. That way I easily control when the default data is setup/reset. (You can use docker inspect <volume-name> to get the host location of the volume if you need to)
The common middle ground on this one seems to be to have a one-off image whose only purpose is to run once to do the volume configurations, and then clean it up.
Bind to a new volume
Copy the contents of the old volume to a new one, and configure everything to use the new volume instead.
And finally... reconsider if Docker is right for you
You probably already wasted more time on this than it was worth. (In my experience, the maintenance pain of Docker has always far outweighed the benefits. However, Docker is a tool, and with any tool, you need to sometimes take a moment to reflect if you are using it right, and if there are better tools for the job.)

how to swap solr core from shell

I have a solr setup with two cores. I want to schedule a core(core1, backend) for full import frequently(e.g. after every 5 mins), then swap with the live(core0, serving) core from shell command through a shceduler.
For full-import command, I am using following shell command
wget -o - -q -t 1 http://localhost:8080/solr/core1/dataimport?command=full-import
Which works fine. If I do a core swap from browser by hitting
http://localhost:8080/solr/admin/cores?action=SWAP&core=core1&other=core0, I get latest update instantly on search. But if I schedule this URL as shell command similar to dataimport, it doesn't do that swap.
Did you try with
curl
"http://'localhost':8080/solr/admin/cores?action=SWAP&core=core1&other=core0"
from shell?
There is catch with the SWAPs
Apache Solr allows to swap two cores around for non-Cloud configurations. They take each other’s name, so it is a good way to push an updated core into a production without downtime.
But an interesting question is how this is achieved. Normally, core name is it’s directory name too. So, does Solr rename the directory on the filesystem too?
Not really! Instead name property in the core.properties file is updated to use the name of the other core. Usually that property is used to give an alternave name of the core for when the directory naming conventions are not suitable.
The gotcha is - of course - that you still have two directories with right looking names for the cores you see in the Admin UI. So, it is very easy to forget that extra redirection/rename step when troubleshooting somebody else’s - or even your own old - setup.

Resources