Share go modules with docker builder stage - docker

[EDIT - added clarity]
Here is my current env setup :
$GOPATH = /home/fzd/go
projectDir = /home/fzd/go/src/github.com/fzd/amazingo
amazingo has a go.mod file that lists several (let's say thousands) dependencies.
So far, I used to go build -t bin/amazingo cmd/main.go, but I want to share this with other people and have a build command that is environment-independent. Using go build has the advantage of downloading each dependency once -- and then using those in ${GOPATH}/pkg/mod, which saves time and bandwidth.
I want to build in a multistage docker image, so I go with
> cat /home/fzd/go/src/github.com/fzd/amazingo/Dockerfile
FROM golang:1.17 as builder
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -o /bin/amazingo cmd/main.go
FROM alpine:latest
COPY --from=builder /bin/amazingo /amazingo
ENTRYPOINT ["/amazingo"]
As you can expect it, the builder is "naked" when I start it, so it has to download all my dependencies when I docker build -t amazingo:0.0.1 . . But it will do so everytime I call it, which can be several times a day.
Fortunately, I already have most of these dependencies on my disk. I would be happy to share these files (that are located in my $GOPATH/pkg/mod) with the builder, and help it build faster on my machine.
So the question is: how can I share my ${GOPATH} (or ${GOPATH}/mod/pkg) with the builder ?
I tried adding the following to the builder
ARG SRC_GOPATH
COPY ${SRC_GOPATH} /go
and call docker build --build-arg SRC_GOPATH=${GOPATH} -o amazingo:0.0.1 ., but it wasn't good enough - I got an error (COPY failed: file not found in build context or excluded by .dockerignore: stat home/fzd/go: file does not exist)
I hope this update brings a bit more clarity to the problem.
=======
I have a project with a go.mod file.
I want to build that project using a multistage docker image.
(this article is a perfect example)
The issue is that I have "lots" of dependencies, and each of them will be downloaded inside my Docker builder stage.
Is there a way to "share" my GOPATH/pkg/mod with the docker build... command (in some ways, having a local cache) ?

Your end goal isn't completely clear, but the way that I use a multistage build would look something like this for a (dirt-simple) go app, assuming that you ultimately want the docker container to run your go app. You will need to get your source into the build container somehow as well - that is not shown here:
FROM golang:1.17.2-alpine3.14 as builder
WORKDIR /my/app/source/dir
RUN go get && go build -o /path/to/my/app/binary
FROM alpine3.14 AS release
# install runtime deps, if any
# create necessary files and folders, if any
COPY --from=builder /path/to/my/app/binary /usr/local/bin
ENTRYPOINT /usr/local/bin/binary --options
In this way, the source of your application and all dependencies will not be present in the released image, only the compiled binary.
Of course you don't have to specify an output path for that, I think it just makes it a little clearer in this example. And of course you can use whatever base image/images you want to - I'm treating this as though you don't need the go runtime on your release image.

Related

docker-compose ignores ubuntu:latest in Dockerfile [duplicate]

I want to build a docker image for the Linkurious project on github, which requires both the Neo4j database, and Node.js to run.
My first approach was to declare a base image for my image, containing Neo4j. The reference docs do not define "base image" in any helpful manner:
Base image:
An image that has no parent is a base image
from which I read that I may only have a base image if that image has no base image itself.
But what is a base image? Does it mean, if I declare neo4j/neo4j in a FROM directive, that when my image is run the neo database will automatically run and be available within the container on port 7474?
Reading the Docker reference I see:
FROM can appear multiple times within a single Dockerfile in order to create multiple images. Simply make a note of the last image ID output by the commit before each new FROM command.
Do I want to create multiple images? It would seem what I want is to have a single image that contains the contents of other images e.g. neo4j and node.js.
I've found no directive to declare dependencies in the reference manual. Are there no dependencies like in RPM where in order to run my image the calling context must first install the images it needs?
I'm confused...
As of May 2017, multiple FROMs can be used in a single Dockerfile.
See "Builder pattern vs. Multi-stage builds in Docker" (by Alex Ellis) and PR 31257 by Tõnis Tiigi.
The general syntax involves adding FROM additional times within your Dockerfile - whichever is the last FROM statement is the final base image. To copy artifacts and outputs from intermediate images use COPY --from=<base_image_number>.
FROM golang:1.7.3 as builder
WORKDIR /go/src/github.com/alexellis/href-counter/
RUN go get -d -v golang.org/x/net/html
COPY app.go .
RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o app .
FROM alpine:latest
RUN apk --no-cache add ca-certificates
WORKDIR /root/
COPY --from=builder /go/src/github.com/alexellis/href-counter/app .
CMD ["./app"]
The result would be two images, one for building, one with just the resulting app (much, much smaller)
REPOSITORY TAG IMAGE ID CREATED SIZE
multi latest bcbbf69a9b59 6 minutes ago 10.3MB
golang 1.7.3 ef15416724f6 4 months ago 672MB
what is a base image?
A set of files, plus EXPOSE'd ports, ENTRYPOINT and CMD.
You can add files and build a new image based on that base image, with a new Dockerfile starting with a FROM directive: the image mentioned after FROM is "the base image" for your new image.
does it mean that if I declare neo4j/neo4j in a FROM directive, that when my image is run the neo database will automatically run and be available within the container on port 7474?
Only if you don't overwrite CMD and ENTRYPOINT.
But the image in itself is enough: you would use a FROM neo4j/neo4j if you had to add files related to neo4j for your particular usage of neo4j.
Let me summarize my understanding of the question and the answer, hoping that it will be useful to others.
Question: Let’s say I have three images, apple, banana and orange. Can I have a Dockerfile that has FROM apple, FROM banana and FROM orange that will tell docker to magically merge all three applications into a single image (containing the three individual applications) which I could call smoothie?
Answer: No, you can't. If you do that, you will end up with four images, the three fruit images you pulled, plus the new image based on the last FROM image. If, for example, FROM orange was the last statement in the Dockerfile without anything added, the smoothie image would just be a clone of the orange image.
Why Are They Not Merged? I Really Want It
A typical docker image will contain almost everything the application needs to run (leaving out the kernel) which usually means that they’re built from a base image for their chosen operating system and a particular version or distribution.
Merging images successfully without considering all possible distributions, file systems, libraries and applications, is not something Docker, understandably, wants to do. Instead, developers are expected to embrace the microservices paradigm, running multiple containers that talk to each other as needed.
What’s the Alternative?
One possible use case for image merging would be to mix and match Linux distributions with our desired applications, for example, Ubuntu and Node.js. This is not the solution:
FROM ubuntu
FROM node
If we don’t want to stick with the Linux distribution chosen by our application image, we can start with our chosen distribution and use the package manager to install the applications instead, e.g.
FROM ubuntu
RUN apt-get update &&\
apt-get install package1 &&\
apt-get install package2
But you probably knew that already. Often times there isn’t a snap or package available in the chosen distribution, or it’s not the desired version, or it doesn't work well in a docker container out of the box, which was the motivation for wanting to use an image. I’m just confirming that, as far as I know, the only option is to do it the long way, if you really want to follow a monolithic approach.
In the case of Node.js for example, you might want to manually install the latest version, since apt provides an ancient one, and snap does not come with the Ubuntu image. For neo4j we might have to download the package and manually add it to the image, according to the documentation and the license.
One strategy, if size does not matter, is to start with the base image that would be hardest to install manually, and add the rest on top.
When To Use Multiple FROM Directives
There is also the option to use multiple FROM statements and manually copy stuff between build stages or into your final one. In other words, you can manually merge images, if you know what you're doing. As per the documentation:
Optionally a name can be given to a new build stage by adding AS name
to the FROM instruction. The name can be used in subsequent FROM and
COPY --from=<name> instructions to refer to the image built in this
stage.
Personally, I’d only be comfortable using this merge approach with my own images or by following documentation from the application vendor, but it’s there if you need it or you're just feeling lucky.
A better application of this approach though, would be when we actually do want to use a temporary container from a different image, for building or doing something and discard it after copying the desired output.
Example
I wanted a lean image with gpgv only, and based on this Unix & Linux answer, I installed the whole gpg with yum and then copied only the binaries required, to the final image:
FROM docker.io/photon:latest AS builder
RUN yum install gnupg -y
FROM docker.io/photon:latest
COPY --from=builder /usr/bin/gpgv /usr/bin/
COPY --from=builder /usr/lib/libgcrypt.so.20 /usr/lib/libgpg-error.so.0 /usr/lib/
The rest of the Dockerfile continues as usual.
The first answer is too complex, historic, and uninformative for my tastes.
It's actually rather simple. Docker provides for a functionality called multi-stage builds the basic idea here is to,
Free you from having to manually remove what you don't want, by forcing you to allowlist what you do want,
Free resources that would otherwise be taken up because of Docker's implementation.
Let's start with the first. Very often with something like Debian you'll see.
RUN apt-get update \
&& apt-get dist-upgrade \
&& apt-get install <whatever> \
&& apt-get clean
We can explain all of this in terms of the above. The above command is chained together so it represents a single change with no intermediate Images required. If it was written like this,
RUN apt-get update ;
RUN apt-get dist-upgrade;
RUN apt-get install <whatever>;
RUN apt-get clean;
It would result in 3 more temporary intermediate Images. Having it reduced to one image, there is one remaining problem: apt-get clean doesn't clean up artifacts used in the install. If a Debian maintainer includes in his install a script that modifies the system that modification will also be present in the final solution (see something like pepperflashplugin-nonfree for an example of that).
By using a multi-stage build you get all the benefits of a single changed action, but it will require you to manually allowlist and copy over files that were introduced in the temporary image using the COPY --from syntax documented here. Moreover, it's a great solution where there is no alternative (like an apt-get clean), and you would otherwise have lots of un-needed files in your final image.
See also
Multi-stage builds
COPY syntax
Here is probably one of the most fundamental use cases of using multiple FROMs, aka, multi stage builds.
I want want one dockerfile, and I want to change one word and depending on what I set that word to, I get different images depending on whether I want to run, Dev or Publish the application!
Run - I just want to run the app
Dev - I want to edit the code and run the app
Publish - Run the app in production
Lets suppose we're working in the dotnet environment. Heres one single Dockerfile. Without multi stage build, there would be multiple files (builder pattern)
#See https://aka.ms/containerfastmode to understand how Visual Studio uses this Dockerfile to build your images for faster debugging.
FROM mcr.microsoft.com/dotnet/runtime:5.0 AS base
WORKDIR /app
FROM mcr.microsoft.com/dotnet/sdk:5.0 AS build
WORKDIR /src
COPY ["ConsoleApp1/ConsoleApp1.csproj", "ConsoleApp1/"]
RUN dotnet restore "ConsoleApp1/ConsoleApp1.csproj"
COPY . .
WORKDIR "/src/ConsoleApp1"
RUN dotnet build "ConsoleApp1.csproj" -c Release -o /app/build
FROM build AS publish
RUN dotnet publish "ConsoleApp1.csproj" -c Release -o /app/publish
FROM base AS final
WORKDIR /app
COPY --from=publish /app/publish .
ENTRYPOINT ["dotnet", "ConsoleApp1.dll"]
Want to run the app? Leave FROM base AS final as it currently is in the dockerfile above.
Want to dev the source code in the container? Change the same line to FROM build AS final
Want to release into prod? Change the same line to FROM publish AS final
I agree with the OP, that this feature is useful for docker! Here is a different view into the same problem:
If you had multiple FROMs (or a "FROM" and multiple "MERGE"'s, for example) then you can use the docker registry versioning system for the base docker image AND other container elements, and that is the win here: I have third party development tools which do not exist in .deb format, so these tools must be installed by un-taring a tball and is HUGE, so caching on the docker host will be important but versioning/change control of the image is equally important. I (think I) can simply use "RUN git ....", and docker will deal with the caching of the new layer for me, which is what I want; because another container will have the same base image but a different set of HUGE third party tools, so the caching of the base image and the tools image is really important (the 3rd party tools tar can be as big as the base image of say ubuntu so caching of these is really important too). The (suggested) feature just allows all these elements to be managed in a central repo. versioning system.
Said a different way, why do we use FROM at all? If I were to simply git clone an ubuntu image using the RUN command for my "base image/layer", this would create a new layer and docker would cache this anyway...so is there any difference/advantage in using FROM, other than it uses dockers internal versioning system/syntax?

Download built modules to GOBIN using go.mod for Docker caching

I have a multi-stage Dockerfile (uisng BuildKit) which contains an initial stage to go get several tools that I need to use as binaries in a later stage.
The following example is the gist of it:
# syntax = docker/dockerfile:1.0-experimental
# Go build stage
FROM golang:1.14-alpine3.12 AS gobuild
RUN apk add --no-cache git
RUN GO111MODULE=on go get -v github.com/tool1/tool1
RUN GO111MODULE=on go get -v github.com/tool2/tool2
RUN GO111MODULE=on go get -v github.com/tool3/tool3
# ...
# Release stage
FROM base AS release
# Copy Go binaries
COPY --from=gobuild /go/bin/tool1 /usr/local/bin/tool1
COPY --from=gobuild /go/bin/tool2 /usr/local/bin/tool2
COPY --from=gobuild /go/bin/tool3 /usr/local/bin/tool3
# ...
This works well. The only issue I have is that every docker build requires the Go Modules to be downloaded again, i.e. they aren't cached.
After some research, I read about go mod download which is meant to cache Go Modules locally according to go.mod.
The go.mod file is a good solution for me, in that it states exact module versions; so when using Docker, caching will be much simpler since layers can be re-used unless go.mod has changed.
I easily achieve this by running go mod init github.com/me/myproject and then subsequent go get calls add the relevant modules to go.mod.
But I am missing the final piece, something similar to go mod download but with the same "output" as go get, which saves the built binaries to GOBIN.
Just to clarify, I am using binaries of tools built with Go, but my project itself isn't a Go app, it only makes use of these tools.
As suggested in the comments, using Go 1.16 and go install package#version serves as the best solution in my case.
My Dockerfile now looks like this:
# syntax = docker/dockerfile:1.0-experimental
# Go build stage
FROM golang:1.16-alpine3.12 AS gobuild
RUN apk add --no-cache git
RUN go install github.com/tool1/tool1#latest
RUN go install github.com/tool2/tool2#latest
RUN go install github.com/tool3/tool3#v1.0.0
# ...

Dockerfile ADD variable not expandable

I am setting up a docker image, in the dockerfile I have an ADD command where source of the ADD command is a variable.
Dockerfile takes a build argument, I want to use that arg as source of the ADD command.
But ADD command is not expanding the variable and I get an error
Please share any workaround that comes in your mind
FROM ubuntu
ARG source_dir
RUN echo ${source_dir}
ADD ${source_dir} ./ContainerDir
Build command
docker build . -t image --build-arg source_dir=/home/john/Desktop/
data
Error
Step 3/3 : ADD ${source_dir} ./ContainerDir ADD failed: stat /var/lib/docker/tmp/docker-builder311119108/home/john/Desktop/
data: no such file or directory
However, the directory (/home/john/Desktop/
data) exists
From the error message, the variable expanded and complained that you don't have the path in your build context:
stat /var/lib/docker/tmp/docker-builder311119108/a/b/c: no such file or directory
In your example, the build context is . (the current directory) so you need a/b/c in the current directory for this to not error. That also need to not be in any ./.dockerignore file if you have one.
From your second edit:
docker build . -t image --build-arg source_dir=/home/john/Desktop/data
It looks like you are trying to include a directory inside your build from outside of the build context. That is explicitly not allowed in docker builds. All files needed for the ADD and COPY commands need to be included in your context, and the entire content of the context is sent to the build server in the first step, so you want to keep this small (rather than sending the entire home directory). The source is always relative to this context, so /home is looking for ./home since your context is . in the build command.
The fix is to move the data directory to be a sub directory of . where you are building your docker images. You can also switch to COPY since there is no functionality of ADD that you need.
Disclaimer: there are two pieces of over simplification here:
The COPY command can include files from different contexts using the --from option to COPY.
The entire context is sent before the build starts with the classic build command. The newer BuildKit implementation is much more selective about how much and what parts of the context to send.

How to use big file only to build the container without adding it?

I have a big tar/executable (over 30GB) I COPY/ADD it but this is used only for the installation. Once the application is installed I don't need it anymore.
How can I do? I am trying to use it but:
Everytime I run a build, it takes minutes to define the build context.
I'd like to share this image, if I create a tar with docker save, Is the final version or each layer included in it?
I found some solutions that said I can use RUN wget tar ... && rm tar but I don't want to create webserver for that.
Why isn't possible to mount a volume during build process?! It would be very useful.
Use Docker's multi-stage builds. This mechanism allows you to drop intermediate artifacts and therefore achieve a lightweight image.
Example:
FROM alpine:latest as build
# copy large file
# build
FROM alpine:latest as output
# copy necessary files built in the previous stage
COPY --from=build app /app
Anything built in the build stage will not be included in the final image, unless you explicitly COPY them.
Docs: https://docs.docker.com/develop/develop-images/multistage-build/
This is solvable using 2 different context.
Please follow these steps as mentioned below.
Objective is to create a
docker image that will have you large-build file.
docker image that will have you real codebase/executables.
For this you have to create 2 folders (Build & CodeBase) as follow.
Application<br/>
|---> BUILD <br/>
|======|--->Large-File<br/>
|======|--->Dockerfile<br/>
|--->CodeBase<br/>
|======|--->SRC+Other stuff<br/>
|======|--->Dockerfile<br/>
Build & Codebase both folders will have individual Dockerfile and arrange files accordingly.
Dockerfile(Build)
FROM **Base-Image**
COPY Large-File /tmp/Large-File
Build this and tag it with some name like (base-build-app-image)
#>cd Application <==Application root folder as mentioned above==>
#>docker build -t base-build-app-image BUILD <==path of your build-folder==>
Dockerfile(Codebase)
FROM base-build-app-image
RUN *****
CMD *****
RUN rm -f **/tmp/Large-File**
RUN rm -f **Remove installation files that is not required**
ENTRYPOINT *****
Build this-code-base and base-build-app-image is already in your local docker-repository and your large iso file is not in the current-buid-context
#>cd Application <==Application root folder as mentioned above==>
#>docker build CodeBase <==path of your code-base==>
This time since the context size is only your code base and since this doesn't include that Large file - it will definitely reduce your build time.
You can also take an advance of using docker-compose to do both operations together so you will not have to execute 2 separate commands.
If you need help on preparing this docker-compose file then do let me know in comments.
If anything is not clear then leave a comment or come over chat to fix this issue.

I'm trying to make the perfect docker build file, do i need to build it from scratch each time?

For an assignment the marker requires of me to create a dockerfile to build my project's container, however I have a fairly complex set of tasks I need to have work in the right way together for my dockerfile to be of any use to me, so I am currently building a file that takes 30 minutes each time just to see if minor changes affect the outcome in the right way, so my question is, is there a better way of doing this?
The Dockerfile best practices, or an earlier question might help: Creating a Dockerfile - docker starts from scratch on each new build
In my experience, a full build every time means you're working against docker's caching mechanism, usually by having COPY . . early in the Dockerfile.
If the files copied into the image are then used to drive a package manager, or download other sources - try copying just the script or requirements file, then using it, then copying the rest of the sources.
a simplified python example, restated from the best practices link:
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
With that structure, as long as requirements.txt does not change, the first COPY and following RUN command use cached layers and rebuilds are much faster.
The first tip is using COPY/ADD for artifacts that need to be download when docker builds.
The second tip is, you can create one Dockerfile for each step and reuse them in next steps.
for example, if you want to install postgres db, and install wildfly in your image. You can start creating a Dockerfile for postgresDB only, and build it to make your-postgres docker image.
Then create another Dockerfile which reuse your-postgres image by
FROM your-postgres
.....
and so on...

Resources