Exclude path at Docker image build time to improve size

Exclude path at Docker image build time to improve size - docker

In our project, we have an ASP.NET Core project with an Angular2 client. At Docker build time, we launch:
FROM microsoft/dotnet:latest
COPY . /app
WORKDIR /app
RUN ["dotnet", "restore"]
RUN apt-get -qq update ; apt-get -qqy --no-install-recommends install \
git \
unzip
RUN curl -sL https://deb.nodesource.com/setup_7.x | bash -
RUN apt-get install -y nodejs build-essential
RUN ["dotnet", "restore"]
RUN npm install
RUN npm run build:prod
RUN ["dotnet", "build"]
EXPOSE 5000/tcp
ENV ASPNETCORE_URLS http://*:5000
ENTRYPOINT ["dotnet", "run"]
Since restoring the npm packages is essential to be able to build the Angular2 client using npm run build, our Docker image is HUGE, I mean almost 2GB. Built Angular2 client is only 1.7Mb itself.
Our app does nothing fancy: simple web API writing to MongoDB and displaying static files.
In order to improve the size of our image, is there any way to exclude path which are useless at run time? For example node_modules or any .NET Core source?

Dotnet may restore much, especially if you have multiple targets platforms (linux, mac, windows).
Depending on how your application is configured (i.e. as portable .NET Core app or as self-contained), it can also pull the whole .NET Core Framework for one, or multiple platforms and/or architectures (x64, x86). This is mainly explained here.
When "Microsoft.NETCore.App" : "1.0.0" is defined, without the type platform, then then complete framework will be fetched via nuget. Then if you have multiple runtimes defined
"runtimes": {
"win10-x64": {},
"win10-x86": {},
"osx.10.10-x86": {},
"osx.10.10-x64": {}
}
it will get native libraries for all this platforms too. But not only in your project directory but also in ~/.nuget and npm-cache additionally to node_modules in your project + eventual copies in your wwwdata.
However, this is not how docker works. Everything you execute inside the Dockerfile is written to the virtual filesystem of the container! That's why you see this issues.
You should follow my previous comment on your other question:
Run dotnet restore, dotne build and dotnet publish outside the Dockerfile, for example in a bash or powershell/batch script.
Once finished call copy the content of the publish folder in your container with
dotnet publish
docker build bin\Debug\netcoreapp1.0\publish ... (your other parameters here)
This will generate publish files on your file system, only containing the required dll files, Views and wwwroot content without all the other build files, artifacts, caches or source and will run the docker process from the bin\Debug\netcoreapp1.0\publish folder.
You also need to change your docker files, to copy the files instead of running the commands you have during container building.
Scott uses this Dockerfile for his example in his blog:
FROM ... # Your base image here
ENTRYPOINT ["dotnet", "YourWebAppName.dll"] # Application to run
ARG source=. # An argument from outside, here store the path from real filesystem
WORKDIR /app
ENV ASPNETCORE_URLS http://+:82 # Define the port it should listen
EXPOSE 82
COPY $source . # copy the files from defined folder, here bin\Debug\netcoreapp1.0\publish to inside the docker container
This is the recommended approach for building docker containers. When you run the build commands inside, all the build and publish artifacts remain in the virtual file system and the docker image grows unexpectedly.

Related

Dockerfile (multistage build?) and container with multiple executable files

I'm having some issues with producing the following setup.
I've implemented a Java application that can start a process with any executable file (with Runtime.exec({whatever-file-here})....), where the file path is provided via external configuration. I have then created a Docker image with the said application, the idea being that the external executable file will be part of a second Docker image, containing all the necessary dependencies. This will leave the option to easily swap the file being executed by the Java app.
So from one side there is the Java image that should look like:
FROM openjdk:14
WORKDIR /app
COPY /build/some.jar /app/some.jar
And let's say I build a service-image out of it. The next step would be to use the aforementioned image as a base image in either a second Dockerfile or a single file with multiple stages.
The way I imagine it being a second Dockerfile for let's say a Python executable will be:
FROM python:latest #python so I can run the script
COPY --from=service-image / / #to get the runtime environment + app directory + jar
COPY some-file.py /app/some-file.py #copying the file for the jar to run
CMD ["java", "-jar", "/app/some.jar"] #the command that will start the java app
And running a container with an image build from the second file should have both a JRE to run the jar file and python to run the .py file as well as the actual .jar and .py files. I'm ignoring any details such as environment variables necessary for the java app to work. But that doesn't seem right, as the resulting image is absolutely massive.
What would you recommend as an approach? Until now I haven't dealt with complex Docker scenarios.

I really do not think that you will be able to create a proper container by replacing the root folder with the one of an other image.
Here is how you could do:
Build your jar file using an openjdk image
Create an image with python and Java installed and copy the .jar from the previous image
You can start from a python image and install Java or the opposite.
Here is an example:
FROM openjdk:14 AS build
WORKDIR /app
COPY . .
RUN build-my-app.sh
FROM openjdk:14-alpine AS runner
WORKDIR /app
# Install python
ENV PYTHONUNBUFFERED=1
RUN apk add --update --no-cache python3 && ln -sf python3 /usr/bin/python
RUN python3 -m ensurepip
RUN pip3 install --no-cache --upgrade pip setuptools
COPY --from=builder /app/dist/myapp.jar myapp.jar
COPY some-file.py some-file.py
CMD ["java", "-jar", "/app/some.jar"] #the command that will start the java app
EDIT: Apparently you are not using Docker to build your jar so you can simply copy it from your host machine (like that py file) and skip the build step.

Undefined References - Golang CGO build fails using Docker, but not on host machine

I'm trying to use the lilliput library for Go. It is only made to run on Linux and OS X.
On my Linux (Debian 10.3) host machine as well as my WSL2 setup (Ubuntu 20.04.1), I have no problems running and building my code (excerpt below).
// main.go
package main
import (
"github.com/discordapp/lilliput"
)
func main() {
...
decoder, err := lilliput.NewDecoder(data)
...
}
However, when I try to put it in a Docker container, with the following configuration, it fails to build.
# Dockerfile v1
FROM golang:1.14.4-alpine AS build
RUN apk add build-base
WORKDIR /src
ENV CGO_ENABLED=1
COPY go.mod .
COPY go.sum .
RUN go mod download
COPY . .
RUN go build -o /out/api .
ENTRYPOINT ["/out/api"]
EXPOSE 8080
I already tried adjusting the Dockerfile with different approaches, for example:
FROM alpine:edge AS build
RUN apk update
RUN apk upgrade
RUN apk add --update go=1.15.3-r0 gcc=10.2.0-r5 g++=10.2.0-r5
WORKDIR /app
RUN go env
ENV GOPATH /app
ADD . /app/src
WORKDIR /app/src
RUN go get -d -v
RUN CGO_ENABLED=1 GOOS=linux go build -o /app/bin/server
FROM alpine:edge
WORKDIR /app
RUN cd /app
COPY --from=build /app/bin/server /app/bin/server
CMD ["bin/server"]
Both result in the following build log:
https://pastebin.com/zMEbEac3
For completeness, the go env of the host machine.
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/kingofdog/.cache/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/kingofdog/go"
GOPROXY=""
GORACE=""
GOROOT="/usr/lib/go-1.11"
GOTMPDIR=""
GOTOOLDIR="/usr/lib/go-1.11/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/home/kingofdog/{PROJECT FOLDER}/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build589460337=/tmp/go-build -gno-record-gcc-switches"
I already searched online for this error, but all I could find dealt with errors in the way others imported C libraries in their Go project. Yet, in my case I'm quite sure that it is not a mistake regarding the source code but rather a configuration mistake of the docker container, as the code itself works perfectly fine outside Docker and I couldn't find a similar issue on the lilliput repository.

The alpine docker image is a minimalistic Linux version - using musl-libc instead of glibc - and is typically used for building tiny images.
To get the more featureful glibc - and resolve your missing CGO dependencies - use the non-alpine version of the golang Docker image to build your asset:
#FROM golang:1.14.4-alpine AS build
#RUN apk add build-base
FROM golang:1.14.4 AS build

Did you build the dependencies?
You have to run the script to build the dependencies on Linux.
Script: https://github.com/discord/lilliput/blob/master/deps/build-deps-linux.sh
Their documentation mentions:
Building Dependencies
Go does not provide any mechanism for arbitrary building of dependencies, e.g. invoking make or cmake. In order to make lilliput usable as a standard Go package, prebuilt static libraries have been provided for all of lilliput's dependencies on Linux and OSX. In order to automate this process, lilliput ships with build scripts alongside compressed archives of the sources of its dependencies. These build scripts are provided for OSX and Linux.
In case it still fails, then issue might be linked to glibc-musl because alpine images have musl libc instead of glibc (GNU's libc). So, you can try it with maybe Ubuntu/ CentOS/etc. minimal images or find a way to get glibc on alpine.

How to reduce Docker image size of Nuxt app?

I am using docker to deploy my nuxt app. However my docker image size is 260MB. Is it too big for a docker image. I've used node alpine to reduce docker size.
This is the dockerfile.
FROM node:10-alpine
RUN mkdir -p /usr/src/nuxt-app
WORKDIR /usr/src/nuxt-app
# copy the app, note .dockerignore
COPY package*.json ./
COPY . .
RUN npm install
RUN npm run build
EXPOSE 3000
ENV NUXT_HOST=0.0.0.0
# set app port
ENV NUXT_PORT=3000
# start the app
CMD [ "npm", "start" ]
I want to have an docker image of size <100MB. Is there any more configuration needed for nuxt app or docker commands to be added?

You have to do multi stage docker build.
Idea is, you use one image for build, and then just copy plain javascript files to alphine image.
Check good example here - https://github.com/nuxt/nuxt.js/issues/2871
Also, as JMLizano mentioned, at run image you can install packages without dev ones -
npm install --production
(example above just copy all build modules to run image)

I do not know Nuxt, but some things you can try are:
Group the two COPY statements, seems like it should be enough with COPY . .
Group the two RUN statements (Ex. RUN npm install && npm build)
Avoid installing dev packages -> Use the --production flag of npm install.
The two first will reduce the amount in layers in the image, but do not expect a huge size reduction. The third one is where you can save more space (in case you have a lot of dev packages).

Cache Cargo dependencies in a Docker volume

I'm building a Rust program in Docker (rust:1.33.0).
Every time code changes, it re-compiles (good), which also re-downloads all dependencies (bad).
I thought I could cache dependencies by adding VOLUME ["/usr/local/cargo"]. edit I've also tried moving this dir with CARGO_HOME without luck.
I thought that making this a volume would persist the downloaded dependencies, which appear to be in this directory.
But it didn't work, they are still downloaded every time. Why?
Dockerfile
FROM rust:1.33.0
VOLUME ["/output", "/usr/local/cargo"]
RUN rustup default nightly-2019-01-29
COPY Cargo.toml .
COPY src/ ./src/
RUN ["cargo", "build", "-Z", "unstable-options", "--out-dir", "/output"]
Built with just docker build ..
Cargo.toml
[package]
name = "mwe"
version = "0.1.0"
[dependencies]
log = { version = "0.4.6" }
Code: just hello world
Output of second run after changing main.rs:
...
Step 4/6 : COPY Cargo.toml .
---> Using cache
---> 97f180cb6ce2
Step 5/6 : COPY src/ ./src/
---> 835be1ea0541
Step 6/6 : RUN ["cargo", "build", "-Z", "unstable-options", "--out-dir", "/output"]
---> Running in 551299a42907
Updating crates.io index
Downloading crates ...
Downloaded log v0.4.6
Downloaded cfg-if v0.1.6
Compiling cfg-if v0.1.6
Compiling log v0.4.6
Compiling mwe v0.1.0 (/)
Finished dev [unoptimized + debuginfo] target(s) in 17.43s
Removing intermediate container 551299a42907
---> e4626da13204
Successfully built e4626da13204

A volume inside the Dockerfile is counter-productive here. That would mount an anonymous volume at each build step, and again when you run the container. The volume during each build step is discarded after that step completes, which means you would need to download the entire contents again for any other step needing those dependencies.
The standard model for this is to copy your dependency specification, run the dependency download, copy your code, and then compile or run your code, in 4 separate steps. That lets docker cache the layers in an efficient manner. I'm not familiar with rust or cargo specifically, but I believe that would look like:
FROM rust:1.33.0
RUN rustup default nightly-2019-01-29
COPY Cargo.toml .
RUN cargo fetch # this should download dependencies
COPY src/ ./src/
RUN ["cargo", "build", "-Z", "unstable-options", "--out-dir", "/output"]
Another option is to turn on some experimental features with BuildKit (available in 18.09, released 2018-11-08) so that docker saves these dependencies in what is similar to a named volume for your build. The directory can be reused across builds, but never gets added to the image itself, making it useful for things like a download cache.
# syntax=docker/dockerfile:experimental
FROM rust:1.33.0
VOLUME ["/output", "/usr/local/cargo"]
RUN rustup default nightly-2019-01-29
COPY Cargo.toml .
COPY src/ ./src/
RUN --mount=type=cache,target=/root/.cargo \
["cargo", "build", "-Z", "unstable-options", "--out-dir", "/output"]
Note that the above assumes cargo is caching files in /root/.cargo. You'd need to verify this and adjust as appropriate. I also haven't mixed the mount syntax with a json exec syntax to know if that part works. You can read more about the BuildKit experimental features here: https://github.com/moby/buildkit/blob/master/frontend/dockerfile/docs/experimental.md
Turning on BuildKit from 18.09 and newer versions is as easy as export DOCKER_BUILDKIT=1 and then running your build from that shell.

I would say, the nicer solution would be to resort to docker multi-stage build as pointed here and there
This way you can create yourself a first image, that would build both your application and your dependencies, then use, only, in the second image, the dependency folder from the first one
This is inspired by both your comment on #Jack Gore's answer and the two issue comments linked here above.
FROM rust:1.33.0 as dependencies
WORKDIR /usr/src/app
COPY Cargo.toml .
RUN rustup default nightly-2019-01-29 && \
mkdir -p src && \
echo "fn main() {}" > src/main.rs && \
cargo build -Z unstable-options --out-dir /output
FROM rust:1.33.0 as application
# Those are the lines instructing this image to reuse the files
# from the previous image that was aliased as "dependencies"
COPY --from=dependencies /usr/src/app/Cargo.toml .
COPY --from=dependencies /usr/local/cargo /usr/local/cargo
COPY src/ src/
VOLUME /output
RUN rustup default nightly-2019-01-29 && \
cargo build -Z unstable-options --out-dir /output
PS: having only one run will reduce the number of layers you generate; more info here

Here's an overview of the possibilities. (Scroll down for my original answer.)
Add Cargo files, create fake main.rs/lib.rs, then compile dependencies. Afterwards remove the fake source and add the real ones. [Caches dependencies, but several fake files with workspaces].
Add Cargo files, create fake main.rs/lib.rs, then compile dependencies. Afterwards create a new layer with the dependencies and continue from there. [Similar to above].
Externally mount a volume for the cache dir. [Caches everything, relies on caller to pass --mount].
Use RUN --mount=type=cache,target=/the/path cargo build in the Dockerfile in new Docker versions. [Caches everything, seems like a good way, but currently too new to work for me. Executable not part of image. Edit: See here for a solution.]
Run sccache in another container or on the host, then connect to that during the build process. See this comment in Cargo issue 2644.
Use cargo-build-deps. [Might work for some, but does not support Cargo workspaces (in 2019)].
Wait for Cargo issue 2644. [There's willingness to add this to Cargo, but no concrete solution yet].
Using VOLUME ["/the/path"] in the Dockerfile does NOT work, this is per-layer (per command) only.
Note: one can set CARGO_HOME and ENV CARGO_TARGET_DIR in the Dockerfile to control where download cache and compiled output goes.
Also note: cargo fetch can at least cache downloading of dependencies, although not compiling.
Cargo workspaces suffer from having to manually add each Cargo file, and for some solutions, having to generate a dozen fake main.rs/lib.rs. For projects with a single Cargo file, the solutions work better.
I've got caching to work for my particular case by adding
ENV CARGO_HOME /code/dockerout/cargo
ENV CARGO_TARGET_DIR /code/dockerout/target
Where /code is the directory where I mount my code.
This is externally mounted, not from the Dockerfile.
EDIT1: I was confused why this worked, but #b.enoit.be and #BMitch cleared up that it's because volumes declared inside the Dockerfile only live for one layer (one command).

You do not need to use an explicit Docker volume to cache your dependencies. Docker will automatically cache the different "layers" of your image. Basically, each command in the Dockerfile corresponds to a layer of the image. The problem you are facing is based on how Docker image layer caching works.
The rules that Docker follows for image layer caching are listed in the official documentation:
Starting with a parent image that is already in the cache, the next
instruction is compared against all child images derived from that
base image to see if one of them was built using the exact same
instruction. If not, the cache is invalidated.
In most cases, simply comparing the instruction in the Dockerfile with
one of the child images is sufficient. However, certain instructions
require more examination and explanation.
For the ADD and COPY instructions, the contents of the file(s) in the
image are examined and a checksum is calculated for each file. The
last-modified and last-accessed times of the file(s) are not
considered in these checksums. During the cache lookup, the checksum
is compared against the checksum in the existing images. If anything
has changed in the file(s), such as the contents and metadata, then
the cache is invalidated.
Aside from the ADD and COPY commands, cache checking does not look at
the files in the container to determine a cache match. For example,
when processing a RUN apt-get -y update command the files updated in
the container are not examined to determine if a cache hit exists. In
that case just the command string itself is used to find a match.
Once the cache is invalidated, all subsequent Dockerfile commands
generate new images and the cache is not used.
So the problem is with the positioning of the command COPY src/ ./src/ in the Dockerfile. Whenever there is a change in one of your source files, the cache will be invalidated and all subsequent commands will not use the cache. Therefore your cargo build command will not use the Docker cache.
To solve your problem it will be as simple as reordering the commands in your Docker file, to this:
FROM rust:1.33.0
RUN rustup default nightly-2019-01-29
COPY Cargo.toml .
RUN ["cargo", "build", "-Z", "unstable-options", "--out-dir", "/output"]
COPY src/ ./src/
Doing it this way, your dependencies will only be reinstalled when there is a change in your Cargo.toml.
Hope this helps.

With the integration of BuildKit into docker, if you are able to avail yourself of the superior BuildKit backend, it's now possible to mount a cache volume during a RUN command, and IMHO, this has become the best way to cache cargo builds. The cache volume retains the data that was written to it on previous runs.
To use BuildKit, you'll mount two cache volumes, one for the cargo dir, which caches external crate sources, and one for the target dir, which caches all of your built artifacts, including external crates and the project bins and libs.
If your base image is rust, $CARGO_HOME is set to /usr/local/cargo, so your command looks like this:
RUN --mount=type=cache,target=/usr/local/cargo,from=rust,source=/usr/local/cargo \
--mount=type=cache,target=target \
cargo build
If your base image is something else, you will need to change the /usr/local/cargo bit to whatever is the value of $CARGO_HOME, or else add a ENV CARGO_HOME=/usr/local/cargo line. As a side note, the clever thing would be to set literally target=$CARGO_HOME and let Docker do the expansion, but it
doesn't seem to work right - expansion happens, but buildkit still doesn't persist the same volume across runs when you do this.
Other options for achieving Cargo build caching (including sccache and the cargo wharf project) are described in this github issue.

I figured out how to get this also working with cargo workspaces, using romac's fork of cargo-build-deps.
This example has my_app, and two workspaces: utils and db.
FROM rust:nightly as rust
# Cache deps
WORKDIR /app
RUN sudo chown -R rust:rust .
RUN USER=root cargo new myapp
# Install cache-deps
RUN cargo install --git https://github.com/romac/cargo-build-deps.git
WORKDIR /app/myapp
RUN mkdir -p db/src/ utils/src/
# Copy the Cargo tomls
COPY myapp/Cargo.toml myapp/Cargo.lock ./
COPY myapp/db/Cargo.toml ./db/
COPY myapp/utils/Cargo.toml ./utils/
# Cache the deps
RUN cargo build-deps
# Copy the src folders
COPY myapp/src ./src/
COPY myapp/db/src ./db/src/
COPY myapp/utils/src/ ./utils/src/
# Build for debug
RUN cargo build

I'm sure you can adjust this code for use with a Dockerfile, but I wrote a dockerized drop-in replacement for cargo that you can save to a package and run as ./cargo build --release. This just works for (most) development (uses rust:latest), but isn't set up for CI or anything.
Usage: ./cargo build, ./cargo build --release, etc
It will use the current working directory and save the cache to ./.cargo. (You can ignore the entire directory in your version control and it doesn't need to exist beforehand.)
Create a file named cargo in your project's folder, run chmod +x ./cargo on it, and place the following code in it:
#!/bin/bash
# This is a drop-in replacement for `cargo`
# that runs in a Docker container as the current user
# on the latest Rust image
# and saves all generated files to `./cargo/` and `./target/`.
#
# Be sure to make this file executable: `chmod +x ./cargo`
#
# # Examples
#
# - Running app: `./cargo run`
# - Building app: `./cargo build`
# - Building release: `./cargo build --release`
#
# # Installing globally
#
# To run `cargo` from anywhere,
# save this file to `/usr/local/bin`.
# You'll then be able to use `cargo`
# as if you had installed Rust globally.
sudo docker run \
--rm \
--user "$(id -u)":"$(id -g)" \
--mount type=bind,src="$PWD",dst=/usr/src/app \
--workdir /usr/src/app \
--env CARGO_HOME=/usr/src/app/.cargo \
rust:latest \
cargo "$#"

Accessing file outside build context

I am aware that you cannot step outside of Docker's build context and I am looking for alternatives on how to share a file between two folders (outside the build context).
My folder structure is
project
- server
Dockerfile
- client
Dockerfile
My client folder needs to access a file inside the server folder for some code generation, where the client is built according to the contract of the server.
The client Dockerfile looks like the following:
FROM node:10-alpine AS build
WORKDIR /app
COPY . /app
RUN yarn install
RUN yarn build
FROM node:10-alpine
WORKDIR /app
RUN yarn install --production
COPY --from=build /app ./
EXPOSE 5000
CMD [ "yarn", "serve" ]
I run docker build -t my-name . inside the client directory.
During the RUN yarn build step, a script is looking for a file in ../server/src/schema/schema.graphql which can not be found, as the file is outside the client directory and therefore outside Docker's build context.
How can I get around this, or other suggestions to solving this issue?

The easiest way to do this is to use the root of your source tree as the Docker context directory, point it at one or the other of the Dockerfiles, and be explicit about whether you're using the client or server trees.
cd $HOME/project
docker build \
-t project-client:$(git rev-parse --short HEAD) \
-f client/Dockerfile \
.
FROM node:10-alpine AS build
WORKDIR /app
COPY client/ ./
ET cetera
In the specific case of GraphQL, depending on your application and library stack, it may be possible to avoid needing the schema at all and just make unchecked client calls; or to make an introspection query at startup time to dynamically fetch the schema; or to maintain two separate copies of the schema file. Some projects I work on use GraphQL interfaces but the servers and clients are in actual separate repositories and there's no choice but to store separate copies of the schema, but if you're careful about changes, this isn't been a problem in practice.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart