Can I run scripts from a docker build context without a copy? - docker

I want to build on top of a windows docker container by installing a couple programs. The files total .5 GB and I want to keep the layers as small as possible. I am hoping I can run the setup files from the build-context, and then have the build-context swept away at the end so I don't have a needless copy of the source files for the setup.exe embedded in my container layers. However, I have not found an example where this is the case. Instead I mostly see people run a COPY command to a temporary build folder, run their setup, then remove the folder. Won't those files still be in the container layers because the COPY command creates a new layer when it's done?
I don't know if the container can see the build-context directly. I was hoping for some magical folder filled with the build-context files so I could run a script using it, but haven't found anything.
It seems like the alternative is to create a private file-server and perform a RUN that can download them from that private server and unpack them, run the install, and remove them (all as 1 docker step). I understand this would make it more available to others who need to rerun the build, but I'm not convinced we'll need to rerun it. It's not likely to change as the container will build patches for a legacy application. Just seems like a lot to host files on a private, public-facing server for something that will get called once every couple years if ever.
So are these my two options?
Make a container with needless copies of source files embedded within
Host the files on a private file server and download/install/remove them
Or am I missing another option or point about how the containers work?

It's a long shot as Windows is a tricky thing with file system, but you could do this way:
In your Dockerfile use a COPY command, install then RUN del ... to remove the installation files
Build your image docker build -t my-large-image:latest .
Run your image docker run --name my-large-container my-large-image:latest
Stop the container
Export your container filesystem docker export my-large-container > my-large-container.tar
Import the filesystem to a new image cat my-large-container.tar | docker import - my-small-image
Caveat is you need to run the container once which might not be what you want. And also I haven't tested with windows container, sorry.

I usually do the download or copy in one step, then in the next step I do the silent installation and remove the installer.
# escape=`
FROM mcr.microsoft.com/dotnet/framework/wcf:4.8-windowsservercore-ltsc2016
SHELL ["powershell", "-Command", "$ErrorActionPreference = 'Stop'; $ProgressPreference = 'SilentlyContinue';"]
ADD https://download.visualstudio.microsoft.com/download/pr/6afa582f-fa26-4a73-8cb9-194321e85f8d/ecea51ead62beb7acc73ad9799511ffdb3083ad384fe04ec50e2cbecfb426482/VS_RemoteTools.exe VS_RemoteTools_x64.exe
RUN Start-Process .\\VS_RemoteTools_x64.exe -ArgumentList #('/install','/quiet','/norestart') -NoNewWindow -Wait; `
Remove-Item -Path C:/VS_RemoteTools_x64.exe -Force;
But otherwise, I don't think you can mount a custom volume while it's being built.

I didn't find a satisfactory answer to this. Docker seems designed for only the modern era and assumes you'll be able to download what you need via scripts and tools hitting APIs and file servers. The easiest option I found that I eventually went with was to host the files on a private file server or service (in my case, AWS S3).
I really wish there was a way to have files hosted by the docker daemon in some way, eg. if it acted like a temporary server that you could get data from via http instead of needing to COPY the files and create a layer. Alas, I found no such feature.
Taking this route made my container about a GB smaller.

Related

Separating Docker files and application source files to optimize production environment

I have a bunch of (Ruby) scripts stored on a server. Up until now, my team has used them by opening an accessor app that launches a list of the script names, and they select the script they want to run in that instance on the files in their working folder. The scripts are run directly from the server, so updates made to the script files are automatically reflected when a user runs the script.
The scripts require a fair amount of specific dependencies, so I'm trying to move to a Docker-based workflow to eliminate the problems we encounter with incongruent computer environments. I've been able to successfully build an image with our script library and run an instance of it on my computer.
However, all of the documentation and tutorials include the application source files when building an image, so that all the files are copied over by the Dockerfile. From my understanding, this means that any time the code in the application files needs to be updated, all the users will need to rebuild the image before trying to run anything. I would very rarely ever need to make changes to the environment settings/dependencies, but the app code is changed relatively frequently, so it seems like having every user rebuild an image every single time a line of app code is changed would actually slow down everyone's workflow considerably.
My question is this: Is it not possible to have Docker simply create the environment that a user must have to run the applications, but have the applications themselves still run directly off the server where they were originally stored? And does a new container need to be created every single time a user wants to run any one of the scripts? (The users are not tech-savvy.)
Generally you'd do this by using a Docker image instead of the checked-out tree of scripts. You can use a Docker registry to store a built copy of the image somewhere on the network; Docker Hub works for this, most large public-cloud providers have some version of this (AWS ECR, Google GCR, Azure ACR, ...), or you can run your own. The workflow for using this would generally look like
# Get any updates to the "latest" version of the image
# (can be run infrequently)
docker pull ourorg/scripts
# Actually run the script, injecting config files and credentials
docker run --rm \
-v $PWD/config:/config \
-v $HOME/.ssh:/config/.ssh \
ourorg/scripts \
some_script.rb
# Nothing in this example actually requires a local copy of the scripts
I'm envisioning a directory that has kind of a mix of scripts and support files and not a lot of organization to it. Still, you could write a simple Dockerfile that looks like
FROM ruby:2.7
WORKDIR /opt/scripts
# As of Bundler 2.1, there is no compatibility between Bundler
# versions; this must match exactly what is in Gemfile.lock
RUN gem install bundler -v 2.1.4
# Copy the scripts in and do basic installation
COPY Gemfile Gemfile.lock .
RUN bundle install
COPY . .
ENV PATH /opt/scripts:$PATH
# Prefix all commands with...
ENTRYPOINT ["bundle", "exec"]
# The default command to run is...
CMD ["ls"]
On the back end you'd need a continous integration service (Jenkins is popular if a little unwieldy; there are a large selection of cloud-hosted ones) that can rebuild the Docker image whenever there's a commit to the source repository. You can generally rig this up so that it happens automatically whenever anybody pushes anything.
This process makes more sense of most people are just using the set of scripts and few of them are developing them. It's also a little bit difficult to discover what the scripts are (you might be able to docker run --rm ourorg/scripts ls though).
Is it not possible to have Docker simply create the environment that a user must have to run the applications, but have the applications themselves still run directly off the server where they were originally stored?
This always strikes me as an ineffective use of Docker. You have all of the fiddly steps of your current workflow that require everyone to run a git pull or equivalent routinely, but you also have to inject the host source tree into the container. If there are OS incompatibilities in, for example, native gems in the vendor tree, you have to work around that.
# You still need to do this periodically
git pull
# And you also need to
sudo docker run \
--rm \
-v $PWD:/app \
-v $HOME/config:/config \
-v $HOME/.ssh:/config/.ssh \
-w /app \
ruby:2.7 \
bundle exec ./some_script.rb
Some of these details (especially the config file and credentials) you'd have to deal with even if you did build an image; some others of the details you could improve by building an image. Inside the image you need to correct the ownership and permissions on the ssh keys and replace the $PWD/vendor tree with something the container can run, without modifying the mounted host directories.
Is it not possible to have Docker simply create the environment that a user must have to run the applications, but have the applications themselves still run directly off the server where they were originally stored?
You can build an image with all the environment already installed then mount the directory with the scripts so the container can read the scripts from the host. Something like
docker run -it --rm -v /opt/myscripts:/myscripts myimage somescript.rb
Then your image Dockerfile would end with:
WORKDIR /myscripts
ENTRYPOINT ["/usr/bin/ruby"]
And does a new container need to be created every single time a user wants to run any one of the scripts?
Of course, a container is just an isolated process managed by docker, you could make a wrapper so the users wouldn't need to type the full docker run command.

Alternative to using --squash when building docker images on Windows using local files

We have some local installers and zip files that we use to build our docker images. It is easy to get this to work in a Dockerfile:
FROM mcr.microsoft.com/windows/nanoserver
COPY myinstaller.exe .
RUN myinstaller.exe; \
del myinstaller.exe
The problem here is that it produces a layer for the COPY line, which increases the size of the image. A common work-around for this is to have one RUN line, that downloads the file from the Internet, runs commands, and then deletes the installation file. The problem, as written above, is that the installers are on the local filesystem.
I found that there is a --squash command for docker:
docker build --squash -t mytestimage .
This does exactly what I want: It gives me an image without this extra installer file that is not necessary. To run this command, you need to enable experimental features though. There is also an open issue to simply remove this feature:
https://github.com/moby/moby/issues/34565
Is there some alternative way of using local installers in a Dockerfile when running on Windows, that doesn't involve setting up a server to provide the files?
We ended up setting up nginx to provide files when building. On our build server, the machine building our docker images and the server that has the installer files have a very good connection between them, so downloading huge files is not a real problem.
When it comes to --squash, it is bugged for Docker on Windows. Here is the relevant issue for it:
https://github.com/moby/moby/issues/31468
There is an issue to move --squash out of experimental, but it doesn't seem to have a lot of support:
https://github.com/moby/moby/issues/38657
The alternative that some people propose instead of --squash is multi stage build, discussion here:
https://github.com/moby/moby/issues/34565
There is an alternative to --squash, if you have local installer files, you don't want to set up a web server, and you would like your docker image to be small, and you are running Windows: Use mapped drives.
In Windows, you can share folders with other users on your network. Docker containers are like another computer that is running on your physical machine, and it can access these network drives.
First set up a new user, for example username share and password password1. Create a folder somewhere on your computer. Then right click it, click properties, and then go to the Sharing tab and click "Share". Find the user that you have just created, using the little dropdown menu and Find people ..., and share the folder with this user.
Create a folder somewhere for your test project. Create a batch file setupshare.bat that looks like this:
#echo off
for /f "tokens=2 delims=:" %%i in ('ipconfig ^| findstr "Default Gateway"') do (
set hostip=%%i
goto :end
)
:end
set hostip=%hostip: =%
net use O: \\%hostip%\vms /USER:share password1
The first part of this file is only to find the ip address that the docker container can use to access its host computer. It is not the most pretty thing I've ever put together, so let me know if there's a better way!
It uses a for-loop, as that is the way to save the output of a command to a variable in batch files. The command is ipconfig, and we pipe it to findstr and searches for Default Gateway. We need to use ^| instead of just | because it is in a for-loop. The first part of the for-loop divides each line from the command on the delimiter, which is : in this case, and we only take the second token. The for-loop only handles the first line, if there are multiple entries with a Default Gateway. This script doesn't work if there are multiple entries and the first one is not the correct one.
The line set hostip=%hostip: =% is to remove a space at the start of the string.
We then have the IP address that we want to use stored in hostip. We use this in the net use command, which will map O:\ to shared folder vms on the machine with IP hostip. We use the username share and the password password1. Note that this is a very bad way of handling passwords, as they kind of should be secret!
With a batch file like this, we can set up a Dockerfile in this way:
# escape=`
FROM mcr.microsoft.com/dotnet/core/sdk:3.0
COPY setupshare.bat .
RUN setupshare.bat && `
copy O:\file.txt file.txt
The RUN command will first call setupshare.bat that sets up the network share properly. We can then use any file that we shared, for example a huge installer, and install the things that we want. In this case I have only shared a test file file.txt to see that it works, so just change that line.
I would still advice everyone to just set up a little web server, for example nginx, and use the standard way of writing Dockerfiles, with downloading files and running it in the same RUN command. That's what people expect when they see a Dockerfile, and it should be a more robust solution.
We can also hope that the Docker people either makes a COPY command that can copy, run, and delete installers in the same layer, or that --squash is implemented properly.

Copying an exe and composing it as a docker image and making it platform independent

I need to create a Docker image, which when run, should install an exe in the specified directory that mentioned in my docker file.
Basically, I need ImageMagick application. The docker file created should be platform independent, say if I ran in windows it should use windows distribution, Linux means Linux distribution. It would be great if it adds an environmental variable in the system. I browsed for the solution, but I couldn't find an appropriate solution.
I know it's a bit late but maybe someone (like me) was still searching.
I ended up using a java-imagemagick docker version from https://hub.docker.com/r/cpaitsupport/java-imagemagick/dockerfile
You can run docker pull cpaitsupport/java-imagemagick to get this docker image to your docker machine.
Now comes the tricky part: as I needed to run the imagemagick inside a docker container for my main app. Now you can COPY the files from cpaitsupport/java-imagemagick to your custom container. Example :
COPY --from=cpaitsupport/java-imagemagick:latest . ./some/dir/imagemagick
now you should have the docker file structure for your custom app and also under some/dir/imagemagick/ the file structure for imagemagick. Here are all ImageMagick relative files (also convert, magic, the libraries etc).
Now if you want to use ImageMagick in your Code you need to setup some ENV variables to your docker container with the "new" path to the ImageMagick directory. Example:
IM4JAVA_TOOLPATH=/some/dir/imagemagick/usr/bin \
LD_LIBRARY_PATH=/usr/lib:/some/dir/imagemagick/usr/lib \
MAGICK_CONFIGURE_PATH=/some/dir/imagemagick/etc/ImageMagick-7 \
MAGICK_CODER_MODULE_PATH=/some/dir/imagemagick/usr/lib/ImageMagick-7.0.5/modules-Q16HDRI/coders \
MAGICK_HOME=/some/dir/imagemagick/usr
Now delete (in Java Code) ProcessStarter.setGlobalSearchPath(imPath); this part if it is set. So you can use the IM4JAVA_TOOLPATH.
Now the ConvertCmd cmd = new ConvertCmd(); and cmd.run(op); should be working.
Maybe it's not the best way but worked for me and I was struggling a lot.
Hope this helps!
Feel free to correct or add additional info.
You can install (extract files) to the external hosting system using docker mount or volumes -
however you can not change system setting by updating environment variables of the hosting system from inside of the containers.

Move file downloaded in Dockerfile to harddrive

First off, I really lack a lot of knowledge regarding Docker itself and its structure. I know that it'd be way more beneficial to learn the basics first, but I do require this to work in order to move on to other things for now.
So within a Dockerfile I installed wget & used it to download a file from a website, authentification & download are successful. However, when I later try move said file it can't be found, and it doesn't show up using e.g explorer either (path was specified)
I thought it might have something to do with RUN & how it executes the wget command; I read that the Id can be used to copy it to harddrive, but how'd I do that within a Dockerfile?
RUN wget -P ./path/to/somewhere http://something.com/file.txt --username xyz --password bluh
ADD ./path/to/somewhere/file.txt /mainDirectory
Download is shown & log-in is successful, but as I mentioned I am having trouble using that file later on as it's not to be located on the harddrive. Probably a basic error, but I'd really appreciate some input that might lead to a solution.
Obviously the error is produced when trying to execute ADD as there is no file to move. I am trying to find a way to mount a volume in order to store it, but so far in vain.
Edit:
Though the question is similiar to the "move to harddrive" one, I am searching for ways to get the id of the container created within the Dockerfile in order to move it; while the thread provides such answers, I haven't had any luck using them within the Dockerfile itself.
Short answer is that it's not possible.
The Dockerfile builds an image, which you can run as a short-lived container. During the build, you don't have (write) access to the host and its file system. Which kinda makes sense, since you want to build an immutable image from which to run ephemeral containers.
What you can do is run a container, and mount a path from your host as a volume into the container. This is the only way how you can share files between the host and a container.
Here is an example how you could do this with the sherylynn/wget image:
docker run -v /path/on/host:/path/in/container sherylynn/wget wget -O /path/in/container/file http://my.url.com
The -v HOST:CONTAINER parameter allows you to specify a path on the host that is mounted inside the container at a specified location.
For wget, I would prefer -O over -P when downloading a single file, since it makes it really explicit where your download ends up. When you point -O to the location of the volume, the downloaded file ends up on the host system (in the folder you mounted).
Since I have no idea what your image or your environment looks like, you might need to tweak one or two things to work well with your own image. As a general recommendation: For basic commands like wget or curl, you can find pre-made images on Docker Hub. This can be quite useful when you need to set up a Continuous Integration pipeline or so, where you want to use wget or curl but can't execute it directly.
Use wget -O instead of -P for specific file download
for e.g.,
RUN wget -O /tmp/new_file.txt http://something.com --username xyz --password bluh/new_file.txt
Thanks

How to specify different .dockerignore files for different builds in the same project?

I used to list the tests directory in .dockerignore so that it wouldn't get included in the image, which I used to run a web service.
Now I'm trying to use Docker to run my unit tests, and in this case I want the tests directory included.
I've checked docker build -h and found no option related.
How can I do this?
Docker 19.03 shipped a solution for this.
The Docker client tries to load <dockerfile-name>.dockerignore first and then falls back to .dockerignore if it can't be found. So docker build -f Dockerfile.foo . first tries to load Dockerfile.foo.dockerignore.
Setting the DOCKER_BUILDKIT=1 environment variable is currently required to use this feature. This flag can be used with docker compose since 1.25.0-rc3 by also specifying COMPOSE_DOCKER_CLI_BUILD=1.
See also comment0, comment1, comment2
from Mugen comment, please note
the custom dockerignore should be in the same directory as the Dockerfile and not in root context directory like the original .dockerignore
i.e.
when calling
DOCKER_BUILDKIT=1
docker build -f /path/to/custom.Dockerfile ...
your .dockerignore file should be at
/path/to/custom.Dockerfile.dockerignore
At the moment, there is no way to do this. There is a lengthy discussion about adding an --ignore flag to Docker to provide the ignore file to use - please see here.
The options you have at the moment are mostly ugly:
Split your project into subdirectories that each have their own Dockerfile and .dockerignore, which might not work in your case.
Create a script that copies the relevant files into a temporary directory and run the Docker build there.
Adding the cleaned tests as a volume mount to the container could be an option here. After you build the image, if running it for testing, mount the source code containing the tests on top of the cleaned up code.
services:
tests:
image: my-clean-image
volumes:
- '../app:/opt/app' # Add removed tests
I've tried activating the DOCKER_BUILDKIT as suggested by #thisismydesign, but I ran into other problems (outside the scope of this question).
As an alternative, I'm creating an intermediary tar by using the -T flag which takes a txt file containing the files to be included in my tar, so it's not so different than a whitelist .dockerignore.
I export this tar and pipe it to the docker build command, and specify my docker file, which can live anywhere in my file hierarchy. In the end it looks like this:
tar -czh -T files-to-include.txt | docker build -f path/to/Dockerfile -
Another option is to have a further build process that includes the tests. The way I do it is this:
If the tests are unit tests then I create a new Docker image that is derived from the main project image; I just stick a FROM at the top, and then ADD the tests, plus any required tools (in my case, mocha, chai and so on). This new 'testing' image now contains both the tests and the original source to be tested. It can then simply be run as is or it can be run in 'watch mode' with volumes mapped to your source and test directories on the host.
If the tests are integration tests--for example the primary image might be a GraphQL server--then the image I create is self-contained, i.e., is not derived from the primary image (it still contains the tests and tools, of course). My tests use environment variables to tell them where to find the endpoint that needs testing, and it's easy enough to get Docker Compose to bring up both a container using the primary image, and another container using the integration testing image, and set the environment variables so that the test suite knows what to test.
Sadly it isn't currently possible to point to a specific file to use for .dockerignore, so we generate it in our build script based on the target/platform/image. As a docker enthusiast it's a sad and embarrassing workaround.

Resources