Dockerfile build remove source code from final image - docker

I am new to Docker. I want to build a docker image by building a c++ library using make command. The way I am doing it in Dockerfile is that
copy the source code from host
install required packages
run make
copy the libraries (.so) into different folder inside the image
delete the source code
The Dockerfile code is written below.
The problem I am facing is that even after deleting the source code the final image size is big.
Since each line of Dockerfile creates a different layer, there is a way to download the source code using curl or wget and later delete the source code in the same layer. But I don't like the solution.
FROM alpine
RUN apk update && apk add <required_packages>
COPY source_code /tmp/source_code
RUN make -C /tmp/source_code && \
mkdir /libraries/
cp /tmp/lib/* /libraries/
rm -rf /tmp/*
I just want to minimize the final image size. Is it the right way I am doing this or is there any better way? Please help.

You can do a multi-stage build and copy the artifacts on a new image from the previous one. Also install any required runtime dependencies (if any).
FROM alpine AS builder
RUN apk add --no-cache <build_dependencies>
COPY source_code /tmp/source_code
RUN make -C /tmp/source_code && \
mkdir /libraries/
cp /tmp/lib/* /libraries/
rm -rf /tmp/*
FROM alpine
RUN apk add --no-cache <runtime_dependencies>
COPY --from=builder /libraries/ /libraries/

Another way for compacting the resulting image, aside from using a multistage Docker build, is using the --squash build option. Example image build command line:
docker image build --squash -t your-image .
When deleting files in the Docker image, the files themselves aren't truly gone, but remain in previous Docker filesystem layers, so they still take up space.
Squashing collapses all filesystem layers of your image, so files that are deleted with rmwill be removed from the resulting single layer. This is an effective way for removing the source code from your image and compacting it.
Note that, squashing in an experimental Docker feature, and has to be enabled in Docker configuration.
For more details on docker build --squash, see:
Docker image build reference
How does the new Docker squash option work?

Related

Docker question: Does Docker link FROM images at runtime or during build?

Does Docker link images at build time or execution time? I am presenting a very simple case here to demonstrate my dilemma but I am presently rebuilding a chain of Docker files every time I change the base docker image and Im not sure if I even need to.
I have two docker files:
my base-docker-file contains
# this is not based on another image
RUN build commands here
which I build with
docker build --no-cache -t base-image -f base-docker-file.
and my from-docker-file here:
# this is based on the base-image
FROM base-image
RUN build commands here
which I build with
docker build --no-cache -t from-image -f from-docker-file.
Based on this simple setup, do I need to rebuild my from-image whenever I make a change to my base-image or does image linking occur at run time?
you will have to build each time if you make a change to the base-image. You could add these individual commands in a script to be executed together rather than running them separate everytime
1. Multi-stage build
Multi-stage build allows you to isolate images in one Dockerfile. 1
Example:
# syntax=docker/dockerfile:1
FROM alpine:latest AS builder
RUN apk --no-cache add build-base
FROM builder
COPY source1.cpp source.cpp
RUN g++ -o /binary source.cpp
2. Turn off no-cache when building the base-image.
Simple. If there are no changes in the base image, it will not rebuild due to cache.

Create multistage docker file

I have 2 different docker files (production and test environment). I want to build a single multistage docker file with these 2 dockerfiles.
First dockerfile as below:
FROM wildfly:17.0.0
USER jboss
RUN mkdir /opt/wildfly/install && mkdir /opt/wildfly/install/config
COPY --chown=jboss:jboss install /opt/wildfly/install
COPY --chown=jboss:jboss install.sh /opt/wildfly/bin
RUN mkdir -p $JBOSS_HOME/standalone/data/datastorage
CMD ["/opt/wildfly/bin/install.sh"]
Second dockerfile as below:
FROM wildfly:17.0.0
USER jboss
RUN mkdir /opt/wildfly/install && mkdir /opt/wildfly/install/config
COPY --chown=jboss:jboss ./install /opt/wildfly/install
COPY --chown=jboss:jboss install.sh /opt/wildfly/bin
RUN rm /opt/wildfly/install/wildfly-scripts/Setup.cli
RUN mv /opt/wildfly/install/wildfly-scripts/SetupforTest.cli /opt/wildfly/install/wildfly-scripts/Setup.cli
RUN rm /opt/wildfly/install/wildfly-scripts/Properties.properties
RUN mv /opt/wildfly/install/wildfly-scripts/Properties-test.properties /opt/wildfly/install/wildfly-scripts/Properties.properties
RUN mkdir -p $JBOSS_HOME/standalone/data/datastorage
CMD ["/opt/wildfly/bin/install.sh"]
Question How to create a multistage docker file for these 2 docker files?
Thanks in advance.
I'm not sure what you are trying to do requires a multi-stage build. Instead, what you may want to do is use a custom base image (for dev) which would be your first code block and use that base image for production.
If the first image was tagged as shanmukh/wildfly:dev:
FROM shanmukh/wildfly:dev
USER jboss
RUN rm /opt/wildfly/install/wildfly-scripts/Setup.cli
RUN mv /opt/wildfly/install/wildfly-scripts/SetupforTest.cli /opt/wildfly/install/wildfly-scripts/Setup.cli
RUN rm /opt/wildfly/install/wildfly-scripts/Properties.properties
RUN mv /opt/wildfly/install/wildfly-scripts/Properties-test.properties /opt/wildfly/install/wildfly-scripts/Properties.properties
This could be tagged as shanmukh/wildfly:prod.
The reason why I don't think you want a multi-stage build is because you mentioned you are trying to handle two environments (test and production).
Even if you did want to say use a multi-stage build for production, from what I can see, there is no reason to. If your initial build stage included installing build dependencies such as a compiler, copying code and building it, then it'd be efficient to use a multi-stage build as the final image would not include the initial dependencies (such as the compiler and other potential dangerous dev tools) or any unneeded build artifacts that were produced.
Hope this helps.

Cache Cargo dependencies in a Docker volume

I'm building a Rust program in Docker (rust:1.33.0).
Every time code changes, it re-compiles (good), which also re-downloads all dependencies (bad).
I thought I could cache dependencies by adding VOLUME ["/usr/local/cargo"]. edit I've also tried moving this dir with CARGO_HOME without luck.
I thought that making this a volume would persist the downloaded dependencies, which appear to be in this directory.
But it didn't work, they are still downloaded every time. Why?
Dockerfile
FROM rust:1.33.0
VOLUME ["/output", "/usr/local/cargo"]
RUN rustup default nightly-2019-01-29
COPY Cargo.toml .
COPY src/ ./src/
RUN ["cargo", "build", "-Z", "unstable-options", "--out-dir", "/output"]
Built with just docker build ..
Cargo.toml
[package]
name = "mwe"
version = "0.1.0"
[dependencies]
log = { version = "0.4.6" }
Code: just hello world
Output of second run after changing main.rs:
...
Step 4/6 : COPY Cargo.toml .
---> Using cache
---> 97f180cb6ce2
Step 5/6 : COPY src/ ./src/
---> 835be1ea0541
Step 6/6 : RUN ["cargo", "build", "-Z", "unstable-options", "--out-dir", "/output"]
---> Running in 551299a42907
Updating crates.io index
Downloading crates ...
Downloaded log v0.4.6
Downloaded cfg-if v0.1.6
Compiling cfg-if v0.1.6
Compiling log v0.4.6
Compiling mwe v0.1.0 (/)
Finished dev [unoptimized + debuginfo] target(s) in 17.43s
Removing intermediate container 551299a42907
---> e4626da13204
Successfully built e4626da13204
A volume inside the Dockerfile is counter-productive here. That would mount an anonymous volume at each build step, and again when you run the container. The volume during each build step is discarded after that step completes, which means you would need to download the entire contents again for any other step needing those dependencies.
The standard model for this is to copy your dependency specification, run the dependency download, copy your code, and then compile or run your code, in 4 separate steps. That lets docker cache the layers in an efficient manner. I'm not familiar with rust or cargo specifically, but I believe that would look like:
FROM rust:1.33.0
RUN rustup default nightly-2019-01-29
COPY Cargo.toml .
RUN cargo fetch # this should download dependencies
COPY src/ ./src/
RUN ["cargo", "build", "-Z", "unstable-options", "--out-dir", "/output"]
Another option is to turn on some experimental features with BuildKit (available in 18.09, released 2018-11-08) so that docker saves these dependencies in what is similar to a named volume for your build. The directory can be reused across builds, but never gets added to the image itself, making it useful for things like a download cache.
# syntax=docker/dockerfile:experimental
FROM rust:1.33.0
VOLUME ["/output", "/usr/local/cargo"]
RUN rustup default nightly-2019-01-29
COPY Cargo.toml .
COPY src/ ./src/
RUN --mount=type=cache,target=/root/.cargo \
["cargo", "build", "-Z", "unstable-options", "--out-dir", "/output"]
Note that the above assumes cargo is caching files in /root/.cargo. You'd need to verify this and adjust as appropriate. I also haven't mixed the mount syntax with a json exec syntax to know if that part works. You can read more about the BuildKit experimental features here: https://github.com/moby/buildkit/blob/master/frontend/dockerfile/docs/experimental.md
Turning on BuildKit from 18.09 and newer versions is as easy as export DOCKER_BUILDKIT=1 and then running your build from that shell.
I would say, the nicer solution would be to resort to docker multi-stage build as pointed here and there
This way you can create yourself a first image, that would build both your application and your dependencies, then use, only, in the second image, the dependency folder from the first one
This is inspired by both your comment on #Jack Gore's answer and the two issue comments linked here above.
FROM rust:1.33.0 as dependencies
WORKDIR /usr/src/app
COPY Cargo.toml .
RUN rustup default nightly-2019-01-29 && \
mkdir -p src && \
echo "fn main() {}" > src/main.rs && \
cargo build -Z unstable-options --out-dir /output
FROM rust:1.33.0 as application
# Those are the lines instructing this image to reuse the files
# from the previous image that was aliased as "dependencies"
COPY --from=dependencies /usr/src/app/Cargo.toml .
COPY --from=dependencies /usr/local/cargo /usr/local/cargo
COPY src/ src/
VOLUME /output
RUN rustup default nightly-2019-01-29 && \
cargo build -Z unstable-options --out-dir /output
PS: having only one run will reduce the number of layers you generate; more info here
Here's an overview of the possibilities. (Scroll down for my original answer.)
Add Cargo files, create fake main.rs/lib.rs, then compile dependencies. Afterwards remove the fake source and add the real ones. [Caches dependencies, but several fake files with workspaces].
Add Cargo files, create fake main.rs/lib.rs, then compile dependencies. Afterwards create a new layer with the dependencies and continue from there. [Similar to above].
Externally mount a volume for the cache dir. [Caches everything, relies on caller to pass --mount].
Use RUN --mount=type=cache,target=/the/path cargo build in the Dockerfile in new Docker versions. [Caches everything, seems like a good way, but currently too new to work for me. Executable not part of image. Edit: See here for a solution.]
Run sccache in another container or on the host, then connect to that during the build process. See this comment in Cargo issue 2644.
Use cargo-build-deps. [Might work for some, but does not support Cargo workspaces (in 2019)].
Wait for Cargo issue 2644. [There's willingness to add this to Cargo, but no concrete solution yet].
Using VOLUME ["/the/path"] in the Dockerfile does NOT work, this is per-layer (per command) only.
Note: one can set CARGO_HOME and ENV CARGO_TARGET_DIR in the Dockerfile to control where download cache and compiled output goes.
Also note: cargo fetch can at least cache downloading of dependencies, although not compiling.
Cargo workspaces suffer from having to manually add each Cargo file, and for some solutions, having to generate a dozen fake main.rs/lib.rs. For projects with a single Cargo file, the solutions work better.
I've got caching to work for my particular case by adding
ENV CARGO_HOME /code/dockerout/cargo
ENV CARGO_TARGET_DIR /code/dockerout/target
Where /code is the directory where I mount my code.
This is externally mounted, not from the Dockerfile.
EDIT1: I was confused why this worked, but #b.enoit.be and #BMitch cleared up that it's because volumes declared inside the Dockerfile only live for one layer (one command).
You do not need to use an explicit Docker volume to cache your dependencies. Docker will automatically cache the different "layers" of your image. Basically, each command in the Dockerfile corresponds to a layer of the image. The problem you are facing is based on how Docker image layer caching works.
The rules that Docker follows for image layer caching are listed in the official documentation:
Starting with a parent image that is already in the cache, the next
instruction is compared against all child images derived from that
base image to see if one of them was built using the exact same
instruction. If not, the cache is invalidated.
In most cases, simply comparing the instruction in the Dockerfile with
one of the child images is sufficient. However, certain instructions
require more examination and explanation.
For the ADD and COPY instructions, the contents of the file(s) in the
image are examined and a checksum is calculated for each file. The
last-modified and last-accessed times of the file(s) are not
considered in these checksums. During the cache lookup, the checksum
is compared against the checksum in the existing images. If anything
has changed in the file(s), such as the contents and metadata, then
the cache is invalidated.
Aside from the ADD and COPY commands, cache checking does not look at
the files in the container to determine a cache match. For example,
when processing a RUN apt-get -y update command the files updated in
the container are not examined to determine if a cache hit exists. In
that case just the command string itself is used to find a match.
Once the cache is invalidated, all subsequent Dockerfile commands
generate new images and the cache is not used.
So the problem is with the positioning of the command COPY src/ ./src/ in the Dockerfile. Whenever there is a change in one of your source files, the cache will be invalidated and all subsequent commands will not use the cache. Therefore your cargo build command will not use the Docker cache.
To solve your problem it will be as simple as reordering the commands in your Docker file, to this:
FROM rust:1.33.0
RUN rustup default nightly-2019-01-29
COPY Cargo.toml .
RUN ["cargo", "build", "-Z", "unstable-options", "--out-dir", "/output"]
COPY src/ ./src/
Doing it this way, your dependencies will only be reinstalled when there is a change in your Cargo.toml.
Hope this helps.
With the integration of BuildKit into docker, if you are able to avail yourself of the superior BuildKit backend, it's now possible to mount a cache volume during a RUN command, and IMHO, this has become the best way to cache cargo builds. The cache volume retains the data that was written to it on previous runs.
To use BuildKit, you'll mount two cache volumes, one for the cargo dir, which caches external crate sources, and one for the target dir, which caches all of your built artifacts, including external crates and the project bins and libs.
If your base image is rust, $CARGO_HOME is set to /usr/local/cargo, so your command looks like this:
RUN --mount=type=cache,target=/usr/local/cargo,from=rust,source=/usr/local/cargo \
--mount=type=cache,target=target \
cargo build
If your base image is something else, you will need to change the /usr/local/cargo bit to whatever is the value of $CARGO_HOME, or else add a ENV CARGO_HOME=/usr/local/cargo line. As a side note, the clever thing would be to set literally target=$CARGO_HOME and let Docker do the expansion, but it
doesn't seem to work right - expansion happens, but buildkit still doesn't persist the same volume across runs when you do this.
Other options for achieving Cargo build caching (including sccache and the cargo wharf project) are described in this github issue.
I figured out how to get this also working with cargo workspaces, using romac's fork of cargo-build-deps.
This example has my_app, and two workspaces: utils and db.
FROM rust:nightly as rust
# Cache deps
WORKDIR /app
RUN sudo chown -R rust:rust .
RUN USER=root cargo new myapp
# Install cache-deps
RUN cargo install --git https://github.com/romac/cargo-build-deps.git
WORKDIR /app/myapp
RUN mkdir -p db/src/ utils/src/
# Copy the Cargo tomls
COPY myapp/Cargo.toml myapp/Cargo.lock ./
COPY myapp/db/Cargo.toml ./db/
COPY myapp/utils/Cargo.toml ./utils/
# Cache the deps
RUN cargo build-deps
# Copy the src folders
COPY myapp/src ./src/
COPY myapp/db/src ./db/src/
COPY myapp/utils/src/ ./utils/src/
# Build for debug
RUN cargo build
I'm sure you can adjust this code for use with a Dockerfile, but I wrote a dockerized drop-in replacement for cargo that you can save to a package and run as ./cargo build --release. This just works for (most) development (uses rust:latest), but isn't set up for CI or anything.
Usage: ./cargo build, ./cargo build --release, etc
It will use the current working directory and save the cache to ./.cargo. (You can ignore the entire directory in your version control and it doesn't need to exist beforehand.)
Create a file named cargo in your project's folder, run chmod +x ./cargo on it, and place the following code in it:
#!/bin/bash
# This is a drop-in replacement for `cargo`
# that runs in a Docker container as the current user
# on the latest Rust image
# and saves all generated files to `./cargo/` and `./target/`.
#
# Be sure to make this file executable: `chmod +x ./cargo`
#
# # Examples
#
# - Running app: `./cargo run`
# - Building app: `./cargo build`
# - Building release: `./cargo build --release`
#
# # Installing globally
#
# To run `cargo` from anywhere,
# save this file to `/usr/local/bin`.
# You'll then be able to use `cargo`
# as if you had installed Rust globally.
sudo docker run \
--rm \
--user "$(id -u)":"$(id -g)" \
--mount type=bind,src="$PWD",dst=/usr/src/app \
--workdir /usr/src/app \
--env CARGO_HOME=/usr/src/app/.cargo \
rust:latest \
cargo "$#"

Docker image for built Golang binary

I have a Go application that I build into a binary and distribute as a Docker image.
Currently, I'm using ubuntu as my base image, but this causes an issue where if a user tries to use a Timezone other than UTC or their local timezone, they get an error stating:
pod error: panic: open /usr/local/go/lib/time/zoneinfo.zip: no such file or directory
This error is caused because the LoadLocation package in Go requires that file.
I can think of two ways to fix this issue:
Continue using the ubuntu base image, but in my Dockerfile add the commands: RUN apt-get install -y tzdata
Use one of Golang's base images, eg. golang:1.7.5-alpine.
What would be the recommended way? I'm not sure if I need to or should be using a Golang image since this is the container where the pre-built binary runs. My understanding is that Golang images are good for building the binary in the first place.
I prefer to use multi-stage build. On 1st step you use special golang building container for installing all dependencies and build an application. On 2nd stage I copy binary file to empty alpine container. This allows having all required tooling and minimal docker image simultaneously (in my case 6MB instead of 280MB).
Example of Dockerfile:
# build stage
FROM golang:1.8
ADD . /src
RUN set -x && \
cd /src && \
go get -t -v github.com/lisitsky/go-site-search-string && \
CGO_ENABLED=0 GOOS=linux go build -a -o goapp
# final stage
FROM alpine
WORKDIR /app
COPY --from=0 /src/goapp /app/
ENTRYPOINT /goapp
EXPOSE 8080
Since not all OS have localized timezone installed, this is what I did to make it work:
ADD https://github.com/golang/go/raw/master/lib/time/zoneinfo.zip /usr/local/go/lib/time/zoneinfo.zip
The full example of multi-step Docker image for adding timezone is here
This is more of a vote, but apt-get is what we (my company's tech group) do in situations like this. It gives us complete control over the hierarchy of images, but this is assuming you may have future images based on this one.
You can use the system's tzdata. Or you can copy $GOROOT/lib/time/zoneinfo.zip into your image, which is a trimmed version of the system one.

Docker container with build output and no source

I have a build process that converts typescript into javascript, minifies and concatenates css files, etc.
I would like to put those files into an nginx docker container, but I don't want the original javascript / css source to be included, nor the tools that I use to build them. Is there a good way to do this, or do I have to run the build outside docker (or in a separately defined container), then COPY the relevant files in?
This page talks about doing something similar in a manual way, but doesn't explain how to automate the process e.g. with docker-compose or something.
Create a docker images with all required tools to build your code also that can clone code and build it. After build it have to copy
into docker volume for example volume name is /opt/webapp.
Launch build docker container using build image in step 1
docker run -d -P --name BuildContainer -v /opt/webapp:/opt/webapp build_image_name
Launch nginx docker container that will use shared volume of build docker in which your build code resides.
docker run -d -P --name Appserver -v /opt/webapp:/usr/local/nginx/html nginx_image_name
After building and shipping your build code to Appserver . you can delete BuildContainer because that is no more required.
Advantage of above steps:
your build code will in host machine so if one Appserver docker fail or stop then your build code will be safe in host machine and you can launch new docker using that build code.
if you create docker image for building code then every time no need to install required tool while launching docker.
you can build your code in host machine also but if you want your code should be build in fresh environment every time then this will be good. or if you use same host machine to build/compile code every time then some older source code may create problem or git clone error etc.
EDIT:
you can append :ro (Read only) to volume by which one container will not affect another. you can Read more about docker volume Here . Thanks #BMitch for suggestion.
The latest version of docker supports multi-stage builds where build products can be copied from on container to another.
https://docs.docker.com/engine/userguide/eng-image/multistage-build/
This is an ideal scenario for a multi-stage build. You perform the compiling in the first stage, copy the output of that compile to the second stage, and only ship that second stage. Each stage is an independent image that begins with a FROM line. And to transfer files between stages, there's now a COPY --from syntax. The result looks roughly like:
# first stage with your full compile environment, e.g. maven/jdk
FROM maven as build
WORKDIR /src
COPY src /src
RUN mvn install
# second stage starts below with just a jre base image
FROM openjdk:jre
# copy the jar from the first stage here
COPY --from=build /src/result.jar /app
CMD java -jar /app/result.jar
Original answer:
Two common options:
As mentioned, you can build outside and copy the compiled result into the container.
You merge your download, build, and cleanup step into a single RUN command. This is a common best practice to minimize the size of each layer.
An example Dockerfile for the second option would look like:
FROM mybase:latest
RUN apt-get update && apt-get install tools \
&& git clone https://github.com/myproj \
&& cd myproj \
&& make \
&& make install
&& cd .. \
&& apt-get rm tools && apt-get clean \
&& rm -rf myproj
The lines would be a little more complicated than that, but that's the gist.
As #dnephin suggested in his comments on the question and on #pl_rock's answer, the standard docker tools are not designed to do this, but you can use a third party tool like one of the following:
dobi (48 GitHub stars)
packer (6210 GitHub stars)
rocker (759 GitHub stars)
conveyor (152 GitHub stars)
(GitHub stars correct when I wrote the answer)
We went with dobi as it was the first one we heard of (because of this question), but it looks like packer is the most popular.
Create a docker file to run your build process, then run cleanup code
Example:
FROM node:latest
# Provides cached layer for node_modules
ADD package.json /tmp/package.json
RUN cd /tmp && npm install
RUN mkdir -p /dist && cp -a /tmp/node_modules /dist/
RUN cp /tmp/package.json /dist
ADD . /tmp
RUN cd /tmp && npm run build
RUN mkdir -p /dist && cp -a /tmp/. /dist
#run some clean up code here
RUN npm run cleanup
# Define working directory
WORKDIR /dist
# Expose port
EXPOSE 4000
# Run app
CMD ["npm", "run", "start"]
In your docker compose file
web:
build: ../project_path
environment:
- NODE_ENV=production
restart: always
ports:
- "4000"

Resources