What's the difference between RUN and bash script in a dockerfile? - docker

I've seen many dockerfiles include all build steps in a RUN statement, like:
RUN echo "Hello" &&
cd /tmp &&
mv a.txt b.txt &&
...
and so on...
My question is: what's the benefits/drawbacks on replace these instructions by a single bash script that gives me highlight syntax, loop capabilities, etc?
Something like:
COPY ./script.sh /tmp
RUN bash /tmp/script.sh
and then
#!/bin/bash
echo "hello" ;
cd /tmp ;
mv a.txt b.txt ;
...
Thanks!

The primary difference is that when you COPY the bash script into the image it will be available for inspection in the running container, whereas the RUN command is a little more opaque. Putting your commands in a file like that is arguably more manageable for other reasons: changes in your VCS history will be a little more clear, and for longer or more complex scripts you will probably find it easier to format things cleanly with the script in a separate file rather than embedded in your Dockerfile in a RUN command.
Otherwise the result is the same (in both cases, you are executing the same set of commands), although the COPY and RUN will result in an extra image layer (vs. just the RUN by itself).

I guess running it off as a shell script gives you more control.
For instance, you can do if-else statements to check whether a command has failed or not and provide a code path to handle it. Whereas RUN is more straight forward and when the return code is not 0 it fails the build immediately.
Obviously the case you have there is a relatively simple one and it would not have had a huge difference. The only impact I can see here is the code readability aspect. Someone would have to read the shell script to know what is happening, comparing to having everything on a single file.
I guess it all comes down to using the right tool for the right job. If it is a simple command and you don't need complex logic handling then do RUN.

Related

Dockerfile RUN layers vs script

Docker version 19.03.12, build 48a66213fe
So in a dockerfile, if I have the following lines:
RUN yum install aaa \
bbb \
ccc && \
<some cmd> && \
<etc> && \
<some cleanup>
is that a best practice? Should I keep yum part separate than when I call other <commands/scripts>?
If I want a cleaner (vs traceable) Dockerfile, what if I put those lines in a .sh script can just call that script (i.e. COPY followed by a RUN statement). Will the build step run each time, even though nothing is changes inside .sh script**?** Looking for some gotchas here.
I'm thinking, whatever packages are stable, have a separate RUN <those packages> i.e. in one layer and lines which depend upon / change frequently i.e. may use user-defined (docker build time CLI level args) keep those in separate RUN layer (so I can use layer cache effectively).
Wondering if you think keeping a cleaner Dockerfile (calling RUN some.sh) would be less efficient than a traceable Dockerfile (where everything is listed in Dockerfile what makes that image).
Thanks.
In terms of the final image filesystem, you will notice no difference if you RUN the commands directly, or RUN a script, or have multiple RUN commands. The number of layers and the size of the command string doesn't really make any difference at all.
What can you observe?
Particularly on the "classic" Docker build system, each RUN command becomes an image layer. In your example, you RUN yum install && ... && <some cleanup>; if this was split into multiple RUN commands then the un-cleaned-up content would be committed as part of the image and takes up space even though it's removed in a later layer.
"More layers" isn't necessarily bad on its own, unless you have so many layers that you hit an internal limit. The only real downside here is creating a layer with content that you're planning to delete, in which case its space will still be in the final image.
As a more specific example of this, there's an occasional pattern where an image installs some development-only packages, runs an installation step, and uninstalls the packages. An Alpine-based example might look like
RUN apk add --virtual .build-deps \
gcc make \
&& make \
&& make install \
&& apk del .build-deps
In this case you must run the "install" and "uninstall" in the same RUN command; otherwise Docker will create a layer that includes the build-only packages.
(A multi-stage build may be an easier way to accomplish the same goal of needing build-only tools, but not including them in the final image.)
The actual text of the RUN command is visible in docker history and similar inspection commands.
And...that's kind of it. If you think it's more maintainable to keep the installation steps in a separate script (maybe you have some way to use the same script in a non-Docker context) then go for it. I'd generally default to keeping the steps spelled out in RUN commands, and in general try to keep those setup steps as light-weight as possible.
I guess the question is somewhat opinion based.
It depends on what you are after. It's ultimately a tradeoff between development experience and an optimized image.
If you put everything in on RUN instruction, you are reducing the number of layers and therefore the image size to some degree. Also, each layer is stored in the registry, so pushing and pulling would get more time-consuming and expensive.
On the other hand, it means that each small change causes everything in the RUN instruction to run again, as it invalidates the cache for that single layer.
If you are creating temporary files with a RUN instruction that are removed by a later RUN instruction, then it would be better to run both commands in a single instruction to not create a layer with temporary files.
For a production image, I would opt for a single RUN instruction as optimization is more important than build speed and caching, IMO. If you can, you could also use multi staging, where the first stage uses an individual RUN instruction to utilize the layer caching. In the second stage, some artefacts from the first stage are taken and the number of layers is aggressively kept at a minimum. Only the final stage will be pushed and pulled from a registry.
For example, in the below image, the builder stage is using more instructions than strictly required to gain better caching. Even The template file is copied into the first stage, even though it's not used at all there, since it's only read and used at runtime. But this way the final stage can get the output binary and the template with a single COPY instruction.
FROM golang as builder
WORKDIR /src
COPY go.mod go.sum ./
RUN go mod download
COPY *.go /src/
RUN mkdir -p /dist/templates
RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -a -o /dist/run .
COPY haproxy.cfg.template /dist/templates/
FROM alpine
WORKDIR /mananger
COPY --from=builder /dist ./
ENTRYPOINT ["./run"]
In terms of script vs RUN instruction, I think it is more idiomatic to use a RUN instruction and concatenate multiple commands with the double ampersand &&. If things get very complex, then it may be better to use a dedicated script to make better use of shell syntax/features. It depends on what you are doing there.
Will the build step run each time, even though nothing is changes inside .sh script**?**
The build step would only run once and get cached. As long as the content of the script would not change, docker would use the cached layer. You need to get the file somehow into the image to run beforehand, so I guess the real cache invalidation would already happen in the COPY instruction, if the file has changed.
As mentioned in the previous paragraph, using a script will cost you at minium 1 COPY or ADD instruction more, introducing an additional layer that could have been avoided, if a RUN instruction had been used.

Script to automate timeshift backup and azuracast update

I’m running an Azuracast docker instance on Linode and want to try to find a way to automate my updates. Right now my routine is when I notice there are updates by accessing the Azuracast web panel, I usually run timeshift to create a backup using the following command
timeshift —-create —-comment “azuracast update ”
And then I use the following to update azuracast
cd /var/azuracast/
./docker.sh update-self
./docker.sh update
Then it asks me to ensure the azuracast installation is backed up before updating, to which i would usually just press enter.
After that is completed, it asks me if i want to clean up all stopped docker containers and images to save space, which i usually say no to.
What I’m wondering is if there is a way to create a bash script, or python or something to automate all of this, and then have it run on a schedule?
Sure, you can write a shell script to execute these commands and then run it on a schedule using crontab(5).
For example your script might look like:
#! /bin/sh
# Backup azuracast and restart docker container
timeshift --create --comment “azuracast update” && \
cd /var/azuracast/ && \
./docker.sh update-self && \
(yes | ./docker.sh update)
It sounds like this docker.sh program takes some user inputs. See if there are options you can pass to it that will allow you to run it non-interactively. (Seems there isn't, see edit.)
To setup your cron job, you can put the script in /etc/cron.hourly, /etc/cron.daily, /etc/cron.weekly, or /etc/cron.monthly. Or if you need more control, you can get started configuring a cron job with crontab -e. Better explanation.
EDIT: Assuming this is the script you're using, it doesn't seem to have a way to run update non-interactively. Fear not though, there's a program for this: yes(1). This will answer yes to both of the questions, but honestly running docker system prune -f is probably a good idea. If you really want to answer no to that, you could probably substitute yes for printf "y\nn" to answer yes to the first and no to the second.
Also note that there's at least one other y/n question it could ask you, which you probably want to answer yes to.

How to run repo from a script inside a container in a jenkins job

I am unable to run repo non-interactively inside a container as part of a freestyle job.
It prompts for the user-name and email. I got round that by doing a git config --global inside the job.
But then it does the color test, and that hangs indefinitely.
Looking at the source code for repo I see this
if os.isatty(0) and os.isatty(1) and not self.manifest.IsMirror:
if opt.config_name or self._ShouldConfigureUser():
self._ConfigureUser()
self._ConfigureColor()
So, I ran the following inside the container:
python -C "import os; print os.isatty(0), os.isatty(1)"
and, sure enough, it printed out True True
Looking at the Jenkins log, it launches the container with --tty specified, and there seems no way to configure that option.
I can't find a bash option to force a script to be run in a non-interactive shell. If I put the above python line in a file and execute it with almost any combination of commands and options, it still prints out True True
The only way I see something different is if I use I/O redirection
bash <a.sh
which prints out False True - i.e. stdin is not a tty, and
bash <a.sh >a.log
which prints False False.
For a complex script, are there any problems using the bash <script approach?
Does anyone know any jenkins magic to prevent docker being launched using --tty?
I know that the --tty is the culprit. I built the container locally and ran the following
$ docker run repotest python -c "import os;print os.isatty(0), os.isatty(1)"
False False
$ docker run --tty repotest python -c "import os;print os.isatty(0), os.isatty(1)"
True True
Running Versions:
repo: 1.12.37 (per Ubuntu 16.04 apt-get)
Jenkins: 2.149
Cloudbees Docker Plugin: 1.7.3
Container base is ubuntu:xenial
I'm using the "Build inside a docker container" option.
To run bash script repo_script.sh "non-interactively", or more exactly speaking without having terminals associated with standard streams, you could run your script simply as
repo_script.sh < /dev/null 2>&1 | cat
assuming you want to see the output the way you would see it running simply as repo_script.sh. By piping the standard output and error to a different process the file descriptor appears as a pipe and not TTY to repo_script.sh. You could also direct output to a file, or even to /dev/null if you do not care about the output:
log_file=/dev/null
repo_script.sh < /dev/null > "${log_file}" 2>&1
Running the script as
bash < repo_script.sh | cat
might would work too, though it is very unorthodox and to my mind hackish way of running a script just to break the association of TTY to the standard input. From script engine point of view, it is different to read a script program from a file than from standard input (which typically, if it is a terminal, is not seekable), so there might be some subtle differences that could possibly bite you in unexpected ways. This way does not as clearly communicate your intention to the next person that need to understand your code, and may lead to partial hair loss in that person due to extraneous head scratching.
There is no need for any bash options, just using the output directions from within the interpreting shell as above described is an easy-to-comprehend, multi-platform compatible standard convention for changing the standard stream associations.
P.S. I think it should be enough for your repo script to just test if the standard input is a TTY. It looks to me like the author of that script did not think deeply enough there. There is simply no use waiting for input if you do not have terminal device associated with standard input, and you could determine that everything needs to run without user interaction from there or stop with an error if that is not possible.

Alpine not loading /etc/profile [duplicate]

I'm trying to write (what I thought would be) a simple bash script that will:
run virtualenv to create a new environment at $1
activate the virtual environment
do some more stuff (install django, add django-admin.py to the virtualenv's path, etc.)
Step 1 works quite well, but I can't seem to activate the virtualenv. For those not familiar with virtualenv, it creates an activate file that activates the virtual environment. From the CLI, you run it using source
source $env_name/bin/activate
Where $env_name, obviously, is the name of the dir that the virtual env is installed in.
In my script, after creating the virtual environment, I store the path to the activate script like this:
activate="`pwd`/$ENV_NAME/bin/activate"
But when I call source "$activate", I get this:
/home/clawlor/bin/scripts/djangoenv: 20: source: not found
I know that $activate contains the correct path to the activate script, in fact I even test that a file is there before I call source. But source itself can't seem to find it. I've also tried running all of the steps manually in the CLI, where everything works fine.
In my research I found this script, which is similar to what I want but is also doing a lot of other things that I don't need, like storing all of the virtual environments in a ~/.virtualenv directory (or whatever is in $WORKON_HOME). But it seems to me that he is creating the path to activate, and calling source "$activate" in basically the same way I am.
Here is the script in its entirety:
#!/bin/sh
PYTHON_PATH=~/bin/python-2.6.1/bin/python
if [ $# = 1 ]
then
ENV_NAME="$1"
virtualenv -p $PYTHON_PATH --no-site-packages $ENV_NAME
activate="`pwd`/$ENV_NAME/bin/activate"
if [ ! -f "$activate" ]
then
echo "ERROR: activate not found at $activate"
return 1
fi
source "$activate"
else
echo 'Usage: djangoenv ENV_NAME'
fi
DISCLAIMER: My bash script-fu is pretty weak. I'm fairly comfortable at the CLI, but there may well be some extremely stupid reason this isn't working.
If you're writing a bash script, call it by name:
#!/bin/bash
/bin/sh is not guaranteed to be bash. This caused a ton of broken scripts in Ubuntu some years ago (IIRC).
The source builtin works just fine in bash; but you might as well just use dot like Norman suggested.
In the POSIX standard, which /bin/sh is supposed to respect, the command is . (a single dot), not source. The source command is a csh-ism that has been pulled into bash.
Try
. $env_name/bin/activate
Or if you must have non-POSIX bash-isms in your code, use #!/bin/bash.
In Ubuntu if you execute the script with sh scriptname.sh you get this problem.
Try executing the script with ./scriptname.sh instead.
best to add the full path of the file you intend to source.
eg
source ./.env instead of source .env
or source /var/www/html/site1/.env

Is there any reason to favour concatenated RUN directives over RUNning a script?

When I'm creating a Dockerfile to generate an image, I have some options when it comes to installing and building stuff.
I could do
RUN a && \
b && \
c
Or
COPY install.sh /install.sh
RUN /install.sh
Where install.sh is
a
b
c
Are there any substantial reasons to favour one approach over the other?
In contrast to the other answer, I generally prefer:
RUN a && \
b && \
c
The main reason being that it is immediately clear what is happening. If you instead use a script, you've effectively hidden the code. For a new user to understand what's happening, they now need to find the project with the build context before they can look into your script.
It is a trade-off and once things get too complex, you should refactor into a script. However, you might prefer to curl the script from a known location rather than COPY it, so that the Dockerfile remains standalone.
RUN a && b && C
vs
RUN install.sh
From Docker perspective both the approaches are same.
However the 2nd approach (running a single script and wrapping everything under the hood) is more cleaner. It allows to have better handle on managing a,b,c and it dependencies on each other. You also update install.sh without updating Dockerfile, keeping Dockerfile simple.

Resources