Docker image size and layer caching best practices

Docker image size and layer caching best practices - docker

Might be a noob question here. I'm playing around with Docker builds for my Meteor project and noticed that in this repo for a popular base image, the author suggests using the devbuild during development and the onbuild for production.
devbuild
ONBUILD RUN bash $METEORD_DIR/lib/install_meteor.sh # install dependencies
ONBUILD COPY ./ /app
ONBUILD RUN bash $METEORD_DIR/lib/build_app.sh # build the app
vs onbuild
ONBUILD COPY ./ /app
ONBUILD RUN bash $METEORD_DIR/lib/install_meteor.sh # install dependencies
ONBUILD RUN bash $METEORD_DIR/lib/build_app.sh # build the app
I assume the former uses Docker's layer caching capability to speed up the build, and he warns that using the devbuild for the final build would result in a much larger image than necessary since it contains the full meteor installation.
This seems to contradict what I've read in guides like this from the Docker Quickstart, and this one, that recommend installing dependencies first so they can be cached.
Is there a difference between the situation presented in the meteor guide vs the node guides, and what's the best way to build these dependencies in production?

Related

setting correct WORKDIR in a multi-stage build for docker

I wish to build a docker image that can start a container where I can use both node version 14 and lz4. The dockerfile I have so far is:
FROM node:14-alpine
WORKDIR /app
RUN apk update
RUN apk add --upgrade lz4
node --version and lz4 --help seem to run ok with the docker run command - but I wanted to ask whether there is a specific WORKDIR I should be using in the dockerfile to follow any best practices (if any exist), or does it not matter what I set the WORKDIR to? Note I'm not sure of all my future requirements, but I may need to use this image to build other images in the future, so I want to ensure WORKDIR is set appropriately.

WORKDIR should be set to set the working directory for the subsequent docker commands in dockerfile, which makes things a little easy to understand as the paths will be relative to the working directory.
By default, / root dir is the set working directory. Without setting any other workdir, all the commands can have absolute paths which make it even more easy to understand.

It doesn't really matter much. Besides, you could always change it for your future builds.

Docker: Best practices for installing dependencies - Dockerfile or ENTRYPOINT?

Being relatively new to Docker development, I've seen a few different ways that apps and dependencies are installed.
For example, in the official Wordpress image, the WP source is downloaded in the Dockerfile and extracted into /usr/src and then this is installed to /var/www/html in the entrypoint script.
Other images download and install the source in the Dockerfile, meaning the entrypoint just deals with config issues.
Either way the source scripts have to be updated if a new version of the source is available, so one way versus the other doesn't seem to make updating for a new version any more efficient.
What are the pros and cons of each approach? Is one recommended over the other for any specific sorts of setup?

Generally you should install application code and dependencies exclusively in the Dockerfile. The image entrypoint should never download or install anything.
This approach is simpler (you often don't need an ENTRYPOINT line at all) and more reproducible. You might run across some setups that run commands like npm install in their entrypoint script; this work will be repeated every time the container runs, and the container won't start up if the network is unreachable. Installing dependencies in the Dockerfile only happens once (and generally can be cached across image rebuilds) and makes the image self-contained.
The Docker Hub wordpress image is unusual in that the underlying Wordpress libraries, the custom PHP application, and the application data are all stored in the same directory tree, and it's typical to use a volume mount for that application tree. Its entrypoint script looks for a wp-includes/index.php file inside the application source tree, and if it's not there it copies it in. That's a particular complex entrypoint script.
A generally useful pattern is to keep an application's data somewhere separate from the application source tree. If you're installing a framework, install it as a library using the host application's ordinary dependency system (for example, list it in a Node package.json file rather than trying to include it in a base image). This is good practice in general; in Docker it specifically lets you mount a volume on the data directory and not disturb the application.
For a typical Node application, for example, you might install the application and its dependencies in a Dockerfile, and not have an ENTRYPOINT declared at all:
FROM node:14
WORKDIR /app
# Install the dependencies
COPY package.json yarn.lock ./
RUN yarn install
# Install everything else
COPY . ./
# Point at some other data directory
RUN mkdir /data
ENV DATA_DIR=/data
# Application code can look at process.env.DATA_DIR
# Usual application metadata
EXPOSE 3000
CMD yarn start
...and then run this with a volume mounted for the data directory, leaving the application code intact:
docker build -t my-image .
docker volume create my-data
docker run -p 3000:3000 -d -v my-data:/data my-image

How to debug docker-compose cache miss when building

I'm executing the same docker-compose build command and I see that it misses the cache
Building service1
Step 1/31 : FROM node:10 as build-stage
---> a7366e5a78b2
Step 2/31 : WORKDIR /app
---> Using cache
---> 8a744e522376
Step 3/31 : COPY package.json yarn.lock ./
---> Using cache
---> 66c9bb64a364
Step 4/31 : RUN yarn install --production
---> Running in 707365c332e7
yarn install v1.21.1
..
..
..
As you can see the cache was missed, but I couldn't understand why
What is the best method to debug what changed and try to figure out why
EDIT: The question is not to debug my specific problem. But how can I generally debug a problem of this kind. How can I know WHY docker-compose thinks things changed (although I'm pretty sure NOTHING changed), which files/commands/results are different?

how can I generally debug a problem of this kind. How can I know WHY docker-compose thinks things changed (although I'm pretty sure NOTHING changed), which files/commands/results are different
In general, as shown here:
I'm a bit bummed that I can't seem to find any way to make the Docker build more verbose
But when it comes to docker-compose, it depends on your version and option used.
moby/moby issue 30081 explains (by Sebastiaan van Stijn (thaJeztah):
Current versions of docker-compose and docker build in many (or all) cases will not share the build cache, or at least not produce the same digest.
The reason for that is that when sending the build context with docker-compose, it will use a slightly different compression (docker-compose is written in Python, whereas the docker cli is written in Go).
There may be other differences due to them being a different implementation (and language).
(that was also discussed in docker/compose issue 883)
The next release of docker compose will have an (currently opt-in) feature to make it use the actual docker cli to perform the build (by setting a COMPOSE_DOCKER_CLI_BUILD=1 environment variable). This was implemented in docker/compose#6865 (1.25.0-rc3+, Oct. 2019)
With that feature, docker compose can also use BuildKit for building images (setting the DOCKER_BUILDKIT=1 environment variable).
I would highly recommend using buildkit for your builds if possible.
When using BuildKit (requires Docker 18.09 or up, and at this point is not supported for building Windows containers), you will see a huge improvement in build-speed, and the duration taken to send the build-context to the daemon in repeated builds (buildkit uses an interactive session to send only those files that are needed during build, instead of uploading the entire build context).
So double-check first if your docker-compose uses BuildKit, and if the issue (caching not reused) persists then:
COMPOSE_DOCKER_CLI_BUILD=1 DOCKER_BUILDKIT=1 docker-compose build
Sebastiaan added in issue 4012:
BuildKit is still opt-in (because there's no Windows support yet), but is production quality, and can be used as the default for building Linux images.
Finally, I realize that for the Azure pipelines, you (probably) don't have control over the versions of Docker and Docker Compose installed, but for your local machines, make sure to update to the latest 19.03 patch release; for example, Docker 19.03.3 and up have various improvements and fixes in the caching mechanisms of BuildKit (see, e.g., docker#373).
Note, in your particular case, even though this is not the main issue in your question, it would be interesting to know if the following helps:
yarnpkg/yarn/issue 749 suggests:
You wouldn't mount the Yarn cache directory. Instead, you should make sure you take advantage of Docker's image layer caching.
These are the commands I am using:
COPY package.json yarn.lock ./
RUN yarn --pure-lockfile
Then try your yarn install command, and see if docker still doesn't use its cache.
RUN yarn install --frozen-lockfile --production && yarn cache clean
Don't forget a yarn cache clean in order to prevent the yarn cache from winding up in docker layers.
If the issue persists, switch to buildkit directly (for testing), with a buildctl build --progress=plain to see a more verbose output, and debug the caching situation.
Typically, a multi-stage approach, as shown here, can be useful:
FROM node:alpine
WORKDIR /usr/src/app
COPY . /usr/src/app/
# We don't need to do this cache clean, I guess it wastes time / saves space: https://github.com/yarnpkg/rfcs/pull/53
RUN set -ex; \
yarn install --frozen-lockfile --production; \
yarn cache clean; \
yarn run build
FROM nginx:alpine
WORKDIR /usr/share/nginx/html
COPY --from=0 /usr/src/app/build/ /usr/share/nginx/html
As noted by nairum in the comments:
I just found that it is required to use cache_from to make caching working when I use my multi-stage Dockerfile with Docker Compose.
From the documentation:
cache_from defines a list of sources the Image builder SHOULD use for cache resolution.
Cache location syntax MUST follow the global format [NAME|type=TYPE[,KEY=VALUE]].
Simple NAME is actually a shortcut notation for type=registry,ref=NAME.
build:
context: .
cache_from:
- alpine:latest
- type=local,src=path/to/cache
- type=gha

How can I use Erlang with Docker to run a Phoenix application?

I want to use a docker image in production to run a Phoenix container, However, since Elixir is just a layer on top of Erlang, it feels like it might be a waste of space to have Elixir running in my production environment.
Ideally, I would be able to compile an entire Phoenix application into Erlang, and then use an image from erlang:alpine to actually run the app in production. Something like this...
FROM elixir:alpine as builder
(install dependencies and copy files)
RUN mix compile_app_to_erlang
FROM erlang:alpine
COPY --from=builder /path/to/compiled/erlang /some/other/path
CMD ["erlang", "run"]
note: compile_app_to_erlang is not a real command, but I'm looking for something like it. Also, I have no idea how erlang runs, so all the code in there is completely made up.
Also, from what I know, there is a project called distillery that kind of does this, but this seems like the type of thing that shouldn't be too complicated (if I knew how erlang worked,) and I'd rather not rely on another dependency if I don't have too. Plus it looks like if you use distillery you also have to use custom made docker images to run the code which is something I try to avoid.
Is something like this even possible?
If so, anyone know a DIY solution?

Elixir 1.9 added the concept of a "release" to Mix. (This was released about 11 months after the question was initially asked.) Running mix release will generate a tree containing the BEAM runtime, your compiled applications, and all of its dependencies. There is extensive documentation for the mix release task on hexdocs.pm.
In a Docker context, you can combine this with a multi-stage build to do exactly what you're requesting: start from the complete elixir image, create a tree containing the minimum required to run the image, and COPY it into a runtime image. I've been working with a Dockerfile like:
FROM elixir:1.13 AS build
WORKDIR /build
ENV MIX_ENV=prod
# Install two tools needed to build other dependencies.
RUN mix do local.hex --force, local.rebar --force
# Download dependencies.
COPY mix.exs mix.lock ./
RUN mix deps.get --only prod
# Compile dependencies (could depend on config/config.exs)
COPY config/ config/
RUN mix deps.compile
# Build the rest of the application.
COPY lib/ lib/
COPY priv/ priv/
RUN mix release --path /app
FROM ubuntu:20.04
# Get the OpenSSL runtime library
RUN apt-get update \
&& DEBIAN_FRONTEND=noninteractive \
apt-get install --no-install-recommends --assume-yes \
libssl1.1
# Get the compiled application.
COPY --from=build /app /app
ENV PATH=/app/bin:$PATH
# Set ordinary metadata to run the container.
EXPOSE 4000
CMD ["myapp", "start"]
If you're using Phoenix, the Phoenix documentation has a much longer example. On the one hand that covers some details like asset compilation; on the other, its runtime image seems to have a bit more in it than may be necessary. That page also has some useful discussion on running Ecto migrations; with the Elixir fragment described there you could docker run a temporary container to do migrations, run them in an entrypoint wrapper script, or use any other ordinary Docker technique.

I suggest you to use distillery to build a binary.
Then just run a alpine container, mount the distillery release to it, run the binary. Yeah, you can eve use supervisor to run it.
You can use remote_console of distillery to link to the console of this binary.

Dockerfile ONBUILD instruction

I read on the docker documentation how ONBUILD instruction can be used, but it is not clear at all.
Can someone please explain it to me?

The ONBUILD instruction is very useful for automating the build of your chosen software stack.
Example
The Maven container is designed to compile java programs. Magically all your project's Dockerfile needs to do is reference the base container containing the ONBUILD intructions:
FROM maven:3.3-jdk-8-onbuild
CMD ["java","-jar","/usr/src/app/target/demo-1.0-SNAPSHOT-jar-with-dependencies.jar"]
The base image's Dockerfile tells all
FROM maven:3-jdk-8
RUN mkdir -p /usr/src/app
WORKDIR /usr/src/app
ONBUILD ADD . /usr/src/app
ONBUILD RUN mvn install
There's a base image that has both Java and Maven installed and a series of instructions to copy files and run Maven.
The following answer gives a Java example
How to build a docker container for a java app

As stated by the docker docs:
The ONBUILD instruction adds to the image a trigger instruction to be executed at a later time, when the image is used as the base for another build. The trigger will be executed in the context of the downstream build, as if it had been inserted immediately after the FROM instruction in the downstream Dockerfile.
So what does that mean? Let's take this Nodejs Dockerfile:
FROM node:0.12.6
RUN mkdir -p /usr/src/app
WORKDIR /usr/src/app
ONBUILD COPY package.json /usr/src/app/
ONBUILD RUN npm install
ONBUILD COPY . /usr/src/app
CMD [ "npm", "start" ]
In your own Dockerfile, when you do FROM node:0.12.6-onbuild you're getting an image, which means the build command has already been run, so the instructions have ALREADY been executed as well, however all but those starting with ONBUILD. These have been deferred to another time, when the downstream build (when your image is getting built from your own Dockerfile) uses this image as the base (FROM node:0.12.6-onbuild).
You can’t just call ADD and RUN now, because you don’t yet have access to the application source code, and it will be different for each application build.
That's right! The image containing onbuild instructions wasn't built on your machine, so it doesn't yet have access to package.json.
Then when you build your own Dockerfile, before executing any instruction in your file, the builder will look for ONBUILD triggers, which were added to the metadata of the parent image when it was built.
That spares you the hassle of executing these commands yourself, it really is as though these commands were written in your own Dockerfile.
Finally, they add:
You could simply provide application developers with a boilerplate Dockerfile to copy-paste into their application, but that is inefficient, error-prone and difficult to update because it mixes with application-specific code.
The thing is that if these instructions are modified in the boilerplate Dockerfile, you will have to modify them as well in your Dockerfile. But thanks to the ONBUILD instruction, we don't have to worry about it.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart