Alternative to using --squash when building docker images on Windows using local files - docker

We have some local installers and zip files that we use to build our docker images. It is easy to get this to work in a Dockerfile:
FROM mcr.microsoft.com/windows/nanoserver
COPY myinstaller.exe .
RUN myinstaller.exe; \
del myinstaller.exe
The problem here is that it produces a layer for the COPY line, which increases the size of the image. A common work-around for this is to have one RUN line, that downloads the file from the Internet, runs commands, and then deletes the installation file. The problem, as written above, is that the installers are on the local filesystem.
I found that there is a --squash command for docker:
docker build --squash -t mytestimage .
This does exactly what I want: It gives me an image without this extra installer file that is not necessary. To run this command, you need to enable experimental features though. There is also an open issue to simply remove this feature:
https://github.com/moby/moby/issues/34565
Is there some alternative way of using local installers in a Dockerfile when running on Windows, that doesn't involve setting up a server to provide the files?

We ended up setting up nginx to provide files when building. On our build server, the machine building our docker images and the server that has the installer files have a very good connection between them, so downloading huge files is not a real problem.
When it comes to --squash, it is bugged for Docker on Windows. Here is the relevant issue for it:
https://github.com/moby/moby/issues/31468
There is an issue to move --squash out of experimental, but it doesn't seem to have a lot of support:
https://github.com/moby/moby/issues/38657
The alternative that some people propose instead of --squash is multi stage build, discussion here:
https://github.com/moby/moby/issues/34565
There is an alternative to --squash, if you have local installer files, you don't want to set up a web server, and you would like your docker image to be small, and you are running Windows: Use mapped drives.
In Windows, you can share folders with other users on your network. Docker containers are like another computer that is running on your physical machine, and it can access these network drives.
First set up a new user, for example username share and password password1. Create a folder somewhere on your computer. Then right click it, click properties, and then go to the Sharing tab and click "Share". Find the user that you have just created, using the little dropdown menu and Find people ..., and share the folder with this user.
Create a folder somewhere for your test project. Create a batch file setupshare.bat that looks like this:
#echo off
for /f "tokens=2 delims=:" %%i in ('ipconfig ^| findstr "Default Gateway"') do (
set hostip=%%i
goto :end
)
:end
set hostip=%hostip: =%
net use O: \\%hostip%\vms /USER:share password1
The first part of this file is only to find the ip address that the docker container can use to access its host computer. It is not the most pretty thing I've ever put together, so let me know if there's a better way!
It uses a for-loop, as that is the way to save the output of a command to a variable in batch files. The command is ipconfig, and we pipe it to findstr and searches for Default Gateway. We need to use ^| instead of just | because it is in a for-loop. The first part of the for-loop divides each line from the command on the delimiter, which is : in this case, and we only take the second token. The for-loop only handles the first line, if there are multiple entries with a Default Gateway. This script doesn't work if there are multiple entries and the first one is not the correct one.
The line set hostip=%hostip: =% is to remove a space at the start of the string.
We then have the IP address that we want to use stored in hostip. We use this in the net use command, which will map O:\ to shared folder vms on the machine with IP hostip. We use the username share and the password password1. Note that this is a very bad way of handling passwords, as they kind of should be secret!
With a batch file like this, we can set up a Dockerfile in this way:
# escape=`
FROM mcr.microsoft.com/dotnet/core/sdk:3.0
COPY setupshare.bat .
RUN setupshare.bat && `
copy O:\file.txt file.txt
The RUN command will first call setupshare.bat that sets up the network share properly. We can then use any file that we shared, for example a huge installer, and install the things that we want. In this case I have only shared a test file file.txt to see that it works, so just change that line.
I would still advice everyone to just set up a little web server, for example nginx, and use the standard way of writing Dockerfiles, with downloading files and running it in the same RUN command. That's what people expect when they see a Dockerfile, and it should be a more robust solution.
We can also hope that the Docker people either makes a COPY command that can copy, run, and delete installers in the same layer, or that --squash is implemented properly.

Related

My docker container keeps instantly closing when trying to run an image for bigcode-tools

I'm new to Docker, and I'm not sure how to quite deal with this situation.
So I'm trying to run a docker container in order to replicate some results from a research paper, specifically from here: https://github.com/danhper/bigcode-tools/blob/master/doc/tutorial.md
(image link: https://hub.docker.com/r/tuvistavie/bigcode-tools/).
I'm using a windows machine, and every time I try to run the docker image (via: docker run -p 80:80 tuvistavie/bigcode-tools), it instantly closes. I've tried running other images, such as the getting-started, but that image doesn't close instantly.
I've looked at some other potential workarounds, like using -dit, but since the instructions require setting an alias/doskey for a docker run command, using the alias and chaining it with other commands multiple times results in creating a queue for the docker container since the port is tied to the alias.
Like in the instructions from the GitHub link, I'm trying to set an alias/doskey to make api calls to pull data, but I am unable to get any data nor am I getting any errors when performing the calls on the command prompt.
Sorry for the long question, and thank you for your time!
Going in order of the instructions:
0. I can run this, it added the image to my Docker Desktop
1.
Since I'm using a windows machine, I had to use 'set' instead of 'export'
I'm not exactly sure what the $ is meant for in UNIX, and whether or not it has significant meaning, but from my understanding, the whole purpose is to create a directory named 'bigcode-workspace'
Instead of 'alias,' I needed to use doskey.
Since -dit prevented my image from instantly closing, I added that in as well, but I'm not 100% sure what it means. Running docker run (...) resulted in the docker image instantly closing.
When it came to using the doskey alias + another command, I've tried:
(doskey macro) (another command)
(doskey macro) ^& (another command)
(doskey macro) $T (another command)
This also seemed to be using github api call, so I also added a --token=(github_token), but that didn't change anything either
Because the later steps require expected data pulled from here, I am unable to progress any further.
Looks like this image is designed to be used as a command-line utility. So it should not be running continuously, but you run it via alias docker-bigcode for your tasks.
$BIGCODE_WORKSPACE is an environment variable expansion here. So on a Windows machine it's %BIGCODE_WORKSPACE%. You might want to set this variable in Settings->System->About->Advanced System Settings, because variables set with SET command will apply to the current command prompt session only. Or you can specify the path directly, without environment variable.
As for alias then I would just create a batch file with the following content:
docker run -p 6006:6006 -v %BIGCODE_WORKSPACE%:/bigcode-tools/workspace tuvistavie/bigcode-tools %*
This will run the specified command appending the batch file parameters at the end. You might need to add double quotes if BIGCODE_WORKSPACE path contains spaces.

Can I run scripts from a docker build context without a copy?

I want to build on top of a windows docker container by installing a couple programs. The files total .5 GB and I want to keep the layers as small as possible. I am hoping I can run the setup files from the build-context, and then have the build-context swept away at the end so I don't have a needless copy of the source files for the setup.exe embedded in my container layers. However, I have not found an example where this is the case. Instead I mostly see people run a COPY command to a temporary build folder, run their setup, then remove the folder. Won't those files still be in the container layers because the COPY command creates a new layer when it's done?
I don't know if the container can see the build-context directly. I was hoping for some magical folder filled with the build-context files so I could run a script using it, but haven't found anything.
It seems like the alternative is to create a private file-server and perform a RUN that can download them from that private server and unpack them, run the install, and remove them (all as 1 docker step). I understand this would make it more available to others who need to rerun the build, but I'm not convinced we'll need to rerun it. It's not likely to change as the container will build patches for a legacy application. Just seems like a lot to host files on a private, public-facing server for something that will get called once every couple years if ever.
So are these my two options?
Make a container with needless copies of source files embedded within
Host the files on a private file server and download/install/remove them
Or am I missing another option or point about how the containers work?
It's a long shot as Windows is a tricky thing with file system, but you could do this way:
In your Dockerfile use a COPY command, install then RUN del ... to remove the installation files
Build your image docker build -t my-large-image:latest .
Run your image docker run --name my-large-container my-large-image:latest
Stop the container
Export your container filesystem docker export my-large-container > my-large-container.tar
Import the filesystem to a new image cat my-large-container.tar | docker import - my-small-image
Caveat is you need to run the container once which might not be what you want. And also I haven't tested with windows container, sorry.
I usually do the download or copy in one step, then in the next step I do the silent installation and remove the installer.
# escape=`
FROM mcr.microsoft.com/dotnet/framework/wcf:4.8-windowsservercore-ltsc2016
SHELL ["powershell", "-Command", "$ErrorActionPreference = 'Stop'; $ProgressPreference = 'SilentlyContinue';"]
ADD https://download.visualstudio.microsoft.com/download/pr/6afa582f-fa26-4a73-8cb9-194321e85f8d/ecea51ead62beb7acc73ad9799511ffdb3083ad384fe04ec50e2cbecfb426482/VS_RemoteTools.exe VS_RemoteTools_x64.exe
RUN Start-Process .\\VS_RemoteTools_x64.exe -ArgumentList #('/install','/quiet','/norestart') -NoNewWindow -Wait; `
Remove-Item -Path C:/VS_RemoteTools_x64.exe -Force;
But otherwise, I don't think you can mount a custom volume while it's being built.
I didn't find a satisfactory answer to this. Docker seems designed for only the modern era and assumes you'll be able to download what you need via scripts and tools hitting APIs and file servers. The easiest option I found that I eventually went with was to host the files on a private file server or service (in my case, AWS S3).
I really wish there was a way to have files hosted by the docker daemon in some way, eg. if it acted like a temporary server that you could get data from via http instead of needing to COPY the files and create a layer. Alas, I found no such feature.
Taking this route made my container about a GB smaller.

Copying an exe and composing it as a docker image and making it platform independent

I need to create a Docker image, which when run, should install an exe in the specified directory that mentioned in my docker file.
Basically, I need ImageMagick application. The docker file created should be platform independent, say if I ran in windows it should use windows distribution, Linux means Linux distribution. It would be great if it adds an environmental variable in the system. I browsed for the solution, but I couldn't find an appropriate solution.
I know it's a bit late but maybe someone (like me) was still searching.
I ended up using a java-imagemagick docker version from https://hub.docker.com/r/cpaitsupport/java-imagemagick/dockerfile
You can run docker pull cpaitsupport/java-imagemagick to get this docker image to your docker machine.
Now comes the tricky part: as I needed to run the imagemagick inside a docker container for my main app. Now you can COPY the files from cpaitsupport/java-imagemagick to your custom container. Example :
COPY --from=cpaitsupport/java-imagemagick:latest . ./some/dir/imagemagick
now you should have the docker file structure for your custom app and also under some/dir/imagemagick/ the file structure for imagemagick. Here are all ImageMagick relative files (also convert, magic, the libraries etc).
Now if you want to use ImageMagick in your Code you need to setup some ENV variables to your docker container with the "new" path to the ImageMagick directory. Example:
IM4JAVA_TOOLPATH=/some/dir/imagemagick/usr/bin \
LD_LIBRARY_PATH=/usr/lib:/some/dir/imagemagick/usr/lib \
MAGICK_CONFIGURE_PATH=/some/dir/imagemagick/etc/ImageMagick-7 \
MAGICK_CODER_MODULE_PATH=/some/dir/imagemagick/usr/lib/ImageMagick-7.0.5/modules-Q16HDRI/coders \
MAGICK_HOME=/some/dir/imagemagick/usr
Now delete (in Java Code) ProcessStarter.setGlobalSearchPath(imPath); this part if it is set. So you can use the IM4JAVA_TOOLPATH.
Now the ConvertCmd cmd = new ConvertCmd(); and cmd.run(op); should be working.
Maybe it's not the best way but worked for me and I was struggling a lot.
Hope this helps!
Feel free to correct or add additional info.
You can install (extract files) to the external hosting system using docker mount or volumes -
however you can not change system setting by updating environment variables of the hosting system from inside of the containers.

What does "From image" do in dockerfiles

I see that dockerfiles usually have a line beginning with "from" keywork, for example:
FROM composer/composer:1.1-alpine AS composer
As far as I know, dockerfiles are a set of commands that help to build and run many containers in docker.
As the example above, docker uses a image named composer/composer:1.1-alpine from docker hub. The As composer just make an alias, so we can use it more convenient.
When I looked for the image, I found the link enter link description here and then enter link description here.
The thing I dont really understand is that:
I guess docker will use the image to build something, but how exactly does it use the image? Does docker run the image or just prepare to use it when in need. Sometimes I dont see the dockerfiles use the image in following line (like this example, there are no lines using the keyword "composer" except the first line). It makes me confused.
Any help would be appreciated.
Thanks.
DockerFiles describes layers: Each command creates it's own layer. For example:
RUN touch test.txt
RUN cp test.txt foo.txt
would create two layers - the first one with the file test.txt, the second one without test.txt but with foo.txt
Each layer adds something to a container. When we walk the layers "up" we find that the very first layer is the empty layer, e.g. it contains only the linux (or windows) kernel itself - nothing else. But that's not really useful - we need a lot of tools (e.g. bash) to be able to run an app. So common base images like alpine add suc tools and core os functions.
It would be annoying as hell if we had to do this setup in every container so there a lots of base images, which do exactly this kind of setup.
If you want to see what a base image does, just search the name on hub.docker.com - there you will find the Dockerfile, describing the build process.
Aditionally, containers can be extenend, e.g. you use the elasticsearch container as a base image, and add your own functionality - that's the second use case for base images.
For your second question: You have to decide if you have to replicate the steps in your base image or not. If you inherit from a general OS image like apline - probably not, since linux normally ships with these tools. If you inherit from a more specialized container, it depends - if your machine matches the environment in the container, you don't need to, but if not you will have to apply these steps to your machine, too. E.g. if you don't have elasticsearch installed, you have to install it.
As for multiple froms in one Dockerfile: Please look up the documentation for Multi Stage builds. Essentially, they encapsulate multiple containers in a single dockerfile. Which can be very useful if you need a different set to build an app and to run the app. The first Container is responsible to build your app, while the second one takes the compiled source code and just runs it.
Watch for COPY --from= lines, these are copying files from one container to another.
The FROM instruction initializes a new build stage and sets the Base Image for subsequent instructions. As such, a valid Dockerfile must start with a FROM instruction. The image can be any valid image – it is especially easy to start by pulling an image from the Public Repositories.
FROM can appear multiple times within a single Dockerfile to create multiple images or use one build stage as a dependency for another. Simply make a note of the last image ID output by the commit before each new FROM instruction. Each FROM instruction clears any state created by previous instructions.
Optionally a name can be given to a new build stage by adding AS name to the FROM instruction. The name can be used in subsequent FROM and COPY --from= instructions to refer to the image built in this stage.
The tag or digest values are optional. If you omit either of them, the builder assumes a latest tag by default. The builder returns an error if it cannot find the tag value.
Taken from : https://docs.docker.com/engine/reference/builder/#from

Move file downloaded in Dockerfile to harddrive

First off, I really lack a lot of knowledge regarding Docker itself and its structure. I know that it'd be way more beneficial to learn the basics first, but I do require this to work in order to move on to other things for now.
So within a Dockerfile I installed wget & used it to download a file from a website, authentification & download are successful. However, when I later try move said file it can't be found, and it doesn't show up using e.g explorer either (path was specified)
I thought it might have something to do with RUN & how it executes the wget command; I read that the Id can be used to copy it to harddrive, but how'd I do that within a Dockerfile?
RUN wget -P ./path/to/somewhere http://something.com/file.txt --username xyz --password bluh
ADD ./path/to/somewhere/file.txt /mainDirectory
Download is shown & log-in is successful, but as I mentioned I am having trouble using that file later on as it's not to be located on the harddrive. Probably a basic error, but I'd really appreciate some input that might lead to a solution.
Obviously the error is produced when trying to execute ADD as there is no file to move. I am trying to find a way to mount a volume in order to store it, but so far in vain.
Edit:
Though the question is similiar to the "move to harddrive" one, I am searching for ways to get the id of the container created within the Dockerfile in order to move it; while the thread provides such answers, I haven't had any luck using them within the Dockerfile itself.
Short answer is that it's not possible.
The Dockerfile builds an image, which you can run as a short-lived container. During the build, you don't have (write) access to the host and its file system. Which kinda makes sense, since you want to build an immutable image from which to run ephemeral containers.
What you can do is run a container, and mount a path from your host as a volume into the container. This is the only way how you can share files between the host and a container.
Here is an example how you could do this with the sherylynn/wget image:
docker run -v /path/on/host:/path/in/container sherylynn/wget wget -O /path/in/container/file http://my.url.com
The -v HOST:CONTAINER parameter allows you to specify a path on the host that is mounted inside the container at a specified location.
For wget, I would prefer -O over -P when downloading a single file, since it makes it really explicit where your download ends up. When you point -O to the location of the volume, the downloaded file ends up on the host system (in the folder you mounted).
Since I have no idea what your image or your environment looks like, you might need to tweak one or two things to work well with your own image. As a general recommendation: For basic commands like wget or curl, you can find pre-made images on Docker Hub. This can be quite useful when you need to set up a Continuous Integration pipeline or so, where you want to use wget or curl but can't execute it directly.
Use wget -O instead of -P for specific file download
for e.g.,
RUN wget -O /tmp/new_file.txt http://something.com --username xyz --password bluh/new_file.txt
Thanks

Resources