Merging a split file inside a Windows docker container

Merging a split file inside a Windows docker container - docker

I am new to Docker so trying to figure something out.
At an absolute basic, I am trying to add some split files in a docker container, add and install 7-zip, and merge these split files inside. I managed to do all but the last, which are present in an image / container.
For the last part, I need a separate image / dockerfile, so if I choose, I can decide which split packet to build, so I derive from the first image (containing split files, and installed 7zip) and use CMD to pass arguments listing out which files it needs to merge into an archive, but it does not seem to work.
I know 7-Zip is installed correctly, since when I pass --help as a parameter to 7z, it works fine, the output is shown in PowerShell, but merging with this command line :
CMD ["7z", "e -y -o"c:\" "c:\foo-foo.7z.001""]
tells me that 7z is not an internal command or batch file etc.etc.
I suppose it's something to do with the argument list formatting, but several attempted modifications haven't worked so far.

Related

Can I run scripts from a docker build context without a copy?

I want to build on top of a windows docker container by installing a couple programs. The files total .5 GB and I want to keep the layers as small as possible. I am hoping I can run the setup files from the build-context, and then have the build-context swept away at the end so I don't have a needless copy of the source files for the setup.exe embedded in my container layers. However, I have not found an example where this is the case. Instead I mostly see people run a COPY command to a temporary build folder, run their setup, then remove the folder. Won't those files still be in the container layers because the COPY command creates a new layer when it's done?
I don't know if the container can see the build-context directly. I was hoping for some magical folder filled with the build-context files so I could run a script using it, but haven't found anything.
It seems like the alternative is to create a private file-server and perform a RUN that can download them from that private server and unpack them, run the install, and remove them (all as 1 docker step). I understand this would make it more available to others who need to rerun the build, but I'm not convinced we'll need to rerun it. It's not likely to change as the container will build patches for a legacy application. Just seems like a lot to host files on a private, public-facing server for something that will get called once every couple years if ever.
So are these my two options?
Make a container with needless copies of source files embedded within
Host the files on a private file server and download/install/remove them
Or am I missing another option or point about how the containers work?

It's a long shot as Windows is a tricky thing with file system, but you could do this way:
In your Dockerfile use a COPY command, install then RUN del ... to remove the installation files
Build your image docker build -t my-large-image:latest .
Run your image docker run --name my-large-container my-large-image:latest
Stop the container
Export your container filesystem docker export my-large-container > my-large-container.tar
Import the filesystem to a new image cat my-large-container.tar | docker import - my-small-image
Caveat is you need to run the container once which might not be what you want. And also I haven't tested with windows container, sorry.

I usually do the download or copy in one step, then in the next step I do the silent installation and remove the installer.
# escape=`
FROM mcr.microsoft.com/dotnet/framework/wcf:4.8-windowsservercore-ltsc2016
SHELL ["powershell", "-Command", "$ErrorActionPreference = 'Stop'; $ProgressPreference = 'SilentlyContinue';"]
ADD https://download.visualstudio.microsoft.com/download/pr/6afa582f-fa26-4a73-8cb9-194321e85f8d/ecea51ead62beb7acc73ad9799511ffdb3083ad384fe04ec50e2cbecfb426482/VS_RemoteTools.exe VS_RemoteTools_x64.exe
RUN Start-Process .\\VS_RemoteTools_x64.exe -ArgumentList #('/install','/quiet','/norestart') -NoNewWindow -Wait; `
Remove-Item -Path C:/VS_RemoteTools_x64.exe -Force;
But otherwise, I don't think you can mount a custom volume while it's being built.

I didn't find a satisfactory answer to this. Docker seems designed for only the modern era and assumes you'll be able to download what you need via scripts and tools hitting APIs and file servers. The easiest option I found that I eventually went with was to host the files on a private file server or service (in my case, AWS S3).
I really wish there was a way to have files hosted by the docker daemon in some way, eg. if it acted like a temporary server that you could get data from via http instead of needing to COPY the files and create a layer. Alas, I found no such feature.
Taking this route made my container about a GB smaller.

Is there a way to force a particular line in my Dockerfile to always rebuild, whilst still benefiting from caching on the preceding layers? [duplicate]

This question already has answers here:
Disable cache for specific RUN commands
(9 answers)
Closed 1 year ago.
I frequently seem to have to write Dockerfiles like this (line numbers added for clarity):
1. FROM somebase
2. RUN cp /some/local/stuff /some/docker/container/path
3. RUN some-other-local-commands
4. RUN wget http://some.remote.server/some.remote.path.for.example.json
5. RUN some-other-local-commands-which-may-depend-on-the-json
On line (4), I'm fetching a remote resource. Let's assume for now that's a JSON file. It might change from time-to-time, maybe not on every build, but perhaps every few hours or days.
What this means is that every time I build my container, I want to ensure the freshest JSON file is fetched. One way to force this is to add the --no-cache parameter to my docker build command, but this forces all of the lines/layers to rebuild, including (1)-(3), where that is likely not necessary. Is there a pattern or technique to automatically 'taint' or 'mark' line (4) so that Docker knows it always has to re-run the wget (presumably this would also have to force a rebuild of line 5), whilst still getting the layer caching behaviour for lines (1)-(3) when Docker detects the pre-req files haven't changed?

If the specific thing you're trying to trigger rebuilds is the result of RUN wget ... a specific URL, Docker does actually have native support for this.
There are two similar commands to copy files into a container. COPY only copies files from the build context. ADD can also fetch external URLs and unpack local archives (but not both at the same time). The general recommendation is to use COPY, unless you need one of the specific things ADD does differently.
So you should be able to say
ADD http://some.remote.server/some.remote.path.for.example.json .
RUN some-other-local-commands-which-may-depend-on-the-json
and the RUN command will use the Docker layer cache based on the contents of the fetched file.
If this approach doesn't work for you (maybe you need special authentication to fetch the file) you can also fetch the file outside of Docker before you run docker build, and then COPY it in. Again, it will work like any other file you COPY in, and layer caching will take effect based on whether the file has changed or not.

Docker image layer: What does `ADD file:<some_hash> in /` mean?

In Docker Hub images there are lists of commands that being run for each image layer. Here is a golang example.
Some applications also provide their Dockerfile in GitHub. Here is a golang example.
According to the Docker Hub image layer, ADD file:4b03b5f551e3fbdf47ec609712007327828f7530cc3455c43bbcdcaf449a75a9 in / is the first command. The image layer doesn't have any "FROM" command included, and it doesn't seem to be suffice the ADD definition too.
So here are the questions:
What does ADD file:<HASH> in / means? What is this format?
Is there any way I could trace upwards using the hash? I suppose that hash represents the FROM image, but it seems there are no API for that.
Why it is not possible to build a dockerfile using the ADD file:<HASH> in / syntax? Is there any way I could build an image using such syntax, OR do a conversion between two format?

That Docker Hub history view doesn't show the actual Dockerfile; instead, it shows content essentially extracted from the docker history of the image. That doesn't preserve the specific details you're looking for: it doesn't remember the names of base images, or the build-context file names of things that get ADDed or COPYed in.
Chasing through GitHub and Docker Hub links, the golang:*-buster Dockerfile is built FROM buildpack-deps:...-scm; buildpack-deps:buster-scm is FROM buildpack-deps:buster-curl; that is FROM debian:buster; and that has a very simple Dockerfile (quoted here in its entirety):
FROM scratch
ADD rootfs.tar.xz /
CMD ["bash"]
FROM scratch starts from a completely totally empty image; that is the base of the Docker image tree (and what tells docker history and similar tools to stop). The ADD line unpacks a tar file of a Debian system image.
If you look at docker history or the Docker Hub history view you cite, you should be able to see these same steps happening. The ADD file:4b0... in / corresponds to the ADD rootfs.tar.gz /, and the second line is the CMD ["bash"]. It is not split up by Dockerfile or image, and the original filenames from ADD aren't saved. (You couldn't reproduce the image anyways without the contents of the rootfs.tar.gz, so it's merely slightly helpful to know its filename but not essential.)
The ADD file:hash in /path syntax is not standard Dockerfile syntax (the word in in particular is not part of it). I'm not sure there's a reliable way to translate from the host file or URL to the hash, but building the image and looking at its docker history would tell you (assuming you've got a perfect match for the file metadata). There's no way to get back to the original filename or syntax, and definitely no way to get back to the file contents.

ADD or COPY means that files are append to the images.
That are files, you cannot "trace" them.
You cannot just copy the commands, because the hashes are not the original files. See https://forums.docker.com/t/how-to-extract-file-from-image/96987 to get the file.

Alternative to using --squash when building docker images on Windows using local files

We have some local installers and zip files that we use to build our docker images. It is easy to get this to work in a Dockerfile:
FROM mcr.microsoft.com/windows/nanoserver
COPY myinstaller.exe .
RUN myinstaller.exe; \
del myinstaller.exe
The problem here is that it produces a layer for the COPY line, which increases the size of the image. A common work-around for this is to have one RUN line, that downloads the file from the Internet, runs commands, and then deletes the installation file. The problem, as written above, is that the installers are on the local filesystem.
I found that there is a --squash command for docker:
docker build --squash -t mytestimage .
This does exactly what I want: It gives me an image without this extra installer file that is not necessary. To run this command, you need to enable experimental features though. There is also an open issue to simply remove this feature:
https://github.com/moby/moby/issues/34565
Is there some alternative way of using local installers in a Dockerfile when running on Windows, that doesn't involve setting up a server to provide the files?

We ended up setting up nginx to provide files when building. On our build server, the machine building our docker images and the server that has the installer files have a very good connection between them, so downloading huge files is not a real problem.
When it comes to --squash, it is bugged for Docker on Windows. Here is the relevant issue for it:
https://github.com/moby/moby/issues/31468
There is an issue to move --squash out of experimental, but it doesn't seem to have a lot of support:
https://github.com/moby/moby/issues/38657
The alternative that some people propose instead of --squash is multi stage build, discussion here:
https://github.com/moby/moby/issues/34565
There is an alternative to --squash, if you have local installer files, you don't want to set up a web server, and you would like your docker image to be small, and you are running Windows: Use mapped drives.
In Windows, you can share folders with other users on your network. Docker containers are like another computer that is running on your physical machine, and it can access these network drives.
First set up a new user, for example username share and password password1. Create a folder somewhere on your computer. Then right click it, click properties, and then go to the Sharing tab and click "Share". Find the user that you have just created, using the little dropdown menu and Find people ..., and share the folder with this user.
Create a folder somewhere for your test project. Create a batch file setupshare.bat that looks like this:
#echo off
for /f "tokens=2 delims=:" %%i in ('ipconfig ^| findstr "Default Gateway"') do (
set hostip=%%i
goto :end
)
:end
set hostip=%hostip: =%
net use O: \\%hostip%\vms /USER:share password1
The first part of this file is only to find the ip address that the docker container can use to access its host computer. It is not the most pretty thing I've ever put together, so let me know if there's a better way!
It uses a for-loop, as that is the way to save the output of a command to a variable in batch files. The command is ipconfig, and we pipe it to findstr and searches for Default Gateway. We need to use ^| instead of just | because it is in a for-loop. The first part of the for-loop divides each line from the command on the delimiter, which is : in this case, and we only take the second token. The for-loop only handles the first line, if there are multiple entries with a Default Gateway. This script doesn't work if there are multiple entries and the first one is not the correct one.
The line set hostip=%hostip: =% is to remove a space at the start of the string.
We then have the IP address that we want to use stored in hostip. We use this in the net use command, which will map O:\ to shared folder vms on the machine with IP hostip. We use the username share and the password password1. Note that this is a very bad way of handling passwords, as they kind of should be secret!
With a batch file like this, we can set up a Dockerfile in this way:
# escape=`
FROM mcr.microsoft.com/dotnet/core/sdk:3.0
COPY setupshare.bat .
RUN setupshare.bat && `
copy O:\file.txt file.txt
The RUN command will first call setupshare.bat that sets up the network share properly. We can then use any file that we shared, for example a huge installer, and install the things that we want. In this case I have only shared a test file file.txt to see that it works, so just change that line.
I would still advice everyone to just set up a little web server, for example nginx, and use the standard way of writing Dockerfiles, with downloading files and running it in the same RUN command. That's what people expect when they see a Dockerfile, and it should be a more robust solution.
We can also hope that the Docker people either makes a COPY command that can copy, run, and delete installers in the same layer, or that --squash is implemented properly.

What does "From image" do in dockerfiles

I see that dockerfiles usually have a line beginning with "from" keywork, for example:
FROM composer/composer:1.1-alpine AS composer
As far as I know, dockerfiles are a set of commands that help to build and run many containers in docker.
As the example above, docker uses a image named composer/composer:1.1-alpine from docker hub. The As composer just make an alias, so we can use it more convenient.
When I looked for the image, I found the link enter link description here and then enter link description here.
The thing I dont really understand is that:
I guess docker will use the image to build something, but how exactly does it use the image? Does docker run the image or just prepare to use it when in need. Sometimes I dont see the dockerfiles use the image in following line (like this example, there are no lines using the keyword "composer" except the first line). It makes me confused.
Any help would be appreciated.
Thanks.

DockerFiles describes layers: Each command creates it's own layer. For example:
RUN touch test.txt
RUN cp test.txt foo.txt
would create two layers - the first one with the file test.txt, the second one without test.txt but with foo.txt
Each layer adds something to a container. When we walk the layers "up" we find that the very first layer is the empty layer, e.g. it contains only the linux (or windows) kernel itself - nothing else. But that's not really useful - we need a lot of tools (e.g. bash) to be able to run an app. So common base images like alpine add suc tools and core os functions.
It would be annoying as hell if we had to do this setup in every container so there a lots of base images, which do exactly this kind of setup.
If you want to see what a base image does, just search the name on hub.docker.com - there you will find the Dockerfile, describing the build process.
Aditionally, containers can be extenend, e.g. you use the elasticsearch container as a base image, and add your own functionality - that's the second use case for base images.
For your second question: You have to decide if you have to replicate the steps in your base image or not. If you inherit from a general OS image like apline - probably not, since linux normally ships with these tools. If you inherit from a more specialized container, it depends - if your machine matches the environment in the container, you don't need to, but if not you will have to apply these steps to your machine, too. E.g. if you don't have elasticsearch installed, you have to install it.
As for multiple froms in one Dockerfile: Please look up the documentation for Multi Stage builds. Essentially, they encapsulate multiple containers in a single dockerfile. Which can be very useful if you need a different set to build an app and to run the app. The first Container is responsible to build your app, while the second one takes the compiled source code and just runs it.
Watch for COPY --from= lines, these are copying files from one container to another.

The FROM instruction initializes a new build stage and sets the Base Image for subsequent instructions. As such, a valid Dockerfile must start with a FROM instruction. The image can be any valid image – it is especially easy to start by pulling an image from the Public Repositories.
FROM can appear multiple times within a single Dockerfile to create multiple images or use one build stage as a dependency for another. Simply make a note of the last image ID output by the commit before each new FROM instruction. Each FROM instruction clears any state created by previous instructions.
Optionally a name can be given to a new build stage by adding AS name to the FROM instruction. The name can be used in subsequent FROM and COPY --from= instructions to refer to the image built in this stage.
The tag or digest values are optional. If you omit either of them, the builder assumes a latest tag by default. The builder returns an error if it cannot find the tag value.
Taken from : https://docs.docker.com/engine/reference/builder/#from

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart