For node applications, what is the better option for building lightweight images?
Single Docker Image. Might require build tools and would expose all build time environment variables to the container.
yarn install # install deps and devDeps
yarn build # build our application
yarn test # perform tests
yarn install --production --ignore-scripts --prefer-offline # Remove devDeps
rm -rf src # Remove source files
yarn start
Multiple Docker Images
In one docker container that has build tools, do the:
yarn install
yarn build
yarn test
Then take the build assets and package.json, and copy them into a new container which would have only runtime environment variables. The result is a much smaller image (perhaps node:alpine) that has only very limited source files.
yarn install --production --ignore-scripts --prefer-offline
yarn start
If you want to build lightweight image for your application, keep in mind the following:
Try to use alpine images, e.g. node:8.12.0-alpine as alpine images are lightest base os image. If you want to install packages, do RUN apk add --no-cache --virtual your_packages... && rm -rf /var/cache/apk/*
Try to reduce number of layers by running multiple commands in same RUN statement, e.g. RUN yarn install && yarn build && yarn test && yarn install --production --ignore-scripts --prefer-offline && rm
-rf src && yarn start
Try to club commands that cancel out each other, e.g. RUN apk update && apk add ... && rm -rf /var/cache/apk/*. Here apk update creates a cache and rm -rf /var/cache/apk/* clears it. No need to run these two commands separately as they are cancelling out each others work hence no point in having two layers which negate each other and inflate the size of the final image.
Note: Having multiple docker files instead of one is not going to reduce the number of layers or shrink the size. It only gives you logical separation of tasks that you want to handle individually.
Related
I'm building a container using a binary like this:
Basically the container will run an executable go program.
FROM myrepo/ubi8/go-toolset:latest AS build
COPY --chown=1001:0 . /build
RUN cd /build && \
go env -w GO111MODULE=auto && \
go build
#---------------------------------------------------------------
FROM myrepo/ubi8/ubi-minimal:latest AS runtime
RUN microdnf update -y --nodocs && microdnf clean all && \
microdnf install go -y && \
microdnf install cronie -y && \
groupadd -g 1000 usercontainer && adduser -u 1000 -g usercontainer usercontainer && chmod 755 /home/usercontainer && \
microdnf clean all
ENV XDG_CACHE_HOME=/home/usercontainer/.cache
COPY executable.go /tmp/executable.go
RUN chmod 0555 /tmp/executable.go
USER usercontainer
WORKDIR /home/usercontainer
However, when running the container in Jenkins I'm getting this error:
failed to initialize build cache at /.cache/go-build: mkdir /.cache: permission denied
When running the container manually in a kubernetes deployment I'm not getting any issue but Jenkins is throwing this error and I can see the pod in CrashLoopBackOff and the container is showing the previous permissions issue.
Also, I'm not sure if I'm building the container correctly. Maybe I need to include the executable go program in the binary and later create the runtime?
Any clear example would be appreciated.
Go is a compiled language, which means that you don't actually need the go tool to run a Go program. In a Docker context, a typical setup is to use a multi-stage build to compile an application, and then copy the built application into a final image that runs it. The final image doesn't need the Go toolchain or the source code, just the compiled binary.
I might rewrite the final stage as:
FROM myrepo/ubi8/go-toolset:latest AS build
# ... as you have it now ...
FROM myrepo/ubi8/ubi-minimal:latest AS runtime
# Do not install `go` in this sequence
RUN microdnf update -y --nodocs &&
microdnf install cronie -y && \
microdnf clean all
# Create a non-root user, but not a home directory;
# specific uid/gid doesn't matter
RUN adduser --system usercontainer
# Get the built binary out of the first container
# and put it somewhere in $PATH
COPY --from=build /build/build /usr/local/bin/myapp
# Switch to a non-root user and explain how to run the container
USER usercontainer
CMD ["myapp"]
This sequence doesn't use go run or use any go command in the final image, which hopefully gets around the issue of needing a $HOME/.cache directory. (It will also give you a smaller container and faster startup time.)
I'm not a docker expert and I've been searching for an answer to this as it seems like it should be pretty simple -- specifically as a multi-stage build. But if so, it's still not clear to me how I pull off what I'm trying to do within the multi-stage build framework.
original Dockerfile:
FROM python:3.8.5
RUN mkdir /src
WORKDIR /src
RUN apt-get update && apt-get install -y --no-install-recommends \
git \
COPY api/requirements.txt /src/
RUN pip install pip
RUN pip install -r requirements.txt
COPY . /src/
Additional commands that I'd like effectively inserted after the two RUN pip install lines:
COPY api/requirements-dev.txt /src/
RUN pip install -r requirements-dev.txt
Ideally the second couple of lines would (with whatever FROM ... AS statements might be needed) be in Dockerfile-dev, and then I could just build from Dockerfile-dev to capture whatever changes might be in Dockerfile and tack on my dev dependencies.
Obviously I could just copy the original Dockerfile, add the extra lines, call the result Dockerfile-dev, and build from that. However I'm trying to corral all of the dev dependencies into their own files that explicitly inherit the "prod" files as much as possible, as with docker-compose.yml-like inheritance/overrides. That lets me leave the "prod" code untouched and avoid e.g. conflicts when I merge it in, and makes it clear via additional files what stuff is being added to make my dev environment.
I know that every line of RUN ... will add a layer to the docker image and that it is recommended to make RUN commands connected with &&. But my question is:
Is better this:
RUN apk update && apk upgrade \
&& apk add openssh \
&& apk add --update nodejs nodejs-npm \
&& npm install -g #angular/cli \
&& apk add openjdk11 \
&& apk add maven \
&& apk add git
Or this:
RUN apk update && apk upgrade
RUN apk add openssh
RUN apk add --update nodejs nodejs-npm
RUN npm install -g #angular/cli
RUN apk add openjdk11
RUN apk add maven
RUN apk add git
The first one creates just one layer but when a version of anything changes the image would have to start from the beginning, not from cash. The second approach will create more layers but when just the version of git changes only the git layer needs to be build again and all previous layers can be used from cash.
I'd recommend:
Install all the OS packages in a single apk invocation: there is some overhead in starting the package manager (more noticeable with dpkg/apt) and it is faster if you start it once and install several packages
If you need to run an update command, always run it in the same RUN command as your other package-manager steps. This avoids some trouble with Docker layer caching (again, very noticeable with apt) where docker build doesn't re-run update, but then it does try to run a changed install step; when it tries to install a package using yesterday's package index, the upload of that package that happened today deleted yesterday's file and the download will fail.
Don't npm install single packages. That means your package.json file is incomplete. Add it there.
I've seen recommendations both ways as to whether or not to run a full upgrade. Keeping up-to-date on security fixes is important; the underlying base images on Docker Hub also update pretty regularly. So if your image is FROM alpine:latest, doing a docker build --pull will get you much of the effect of an explicit apk upgrade.
Stylistically, if I need any substantial number of packages, I find the list a little more maintainable if I sort it alphabetically and put one package on a line, but this is purely personal preference.
Putting this all together would transform your example into:
RUN apk update \
&& apk upgrade \
&& apk add \
git \
maven \
nodejs \
nodejs-npm \
openjdk11 \
openssh
COPY package.json package-lock.json . # includes #angular/cli
RUN npm ci
Don't be afraid to use multiple containers, if that makes sense. (What's your application that uses both Java and Node together; can it be split into two single-language parts?) Don't install unnecessary developer-oriented tools in your image. (Does your application invoke git while it's running; do you install a dependency directly from GitHub; or can you remove git?) Don't try to run an ssh daemon in your container. (It breaks the "one process per container" rule which instantly makes things harder to manage; 90% of the SO examples have a hard-coded user password plus sudo rights, which is not really a security best practice; managing the credentials is essentially impossible.)
Both approaches are on the extreem, you need to try to minimize the layers for "reusability" at the same time to optimize for lower number of layers.
Based on your example, the build can be organized as follows:
RUN apk update && apk upgrade \
&& apk add openssh \
&& apk add --update nodejs nodejs-npm \
&& apk add openjdk11 \
&& apk add maven \
&& apk add git
RUN npm install -g #angular/cli \
Now I have only 2 layers, first one is bringing the OS packages and the second one dealing with the node.js packages. Now this can better be reused in other builds.
Once you have done this modification, you can move to multistage build where you will be able to better control and reuse the intermediate containers like in this example
I am running my monolith application in a docker container and k8s on GKE.
The application contains python & node dependencies also webpack for front end bundle.
We have implemented CI/CD which is taking around 5-6 min to build & deploy new version to k8s cluster.
Main goal is to reduce the build time as much possible. Written Dockerfile is multi stage.
Webpack is taking more time to generate the bundle.To buid docker image i am using already high config worker.
To reduce time i tried using the Kaniko builder.
Issue :
As docker cache layers for python code it's working perfectly. But when there is any changes in JS or CSS file we have to generate bundle.
When there is any changes in JS & CSS file instead if generate new bundle its use caching layer.
Is there any way to separate out build new bundle or use cache by passing some value to docker file.
Here is my docker file :
FROM python:3.5 AS python-build
WORKDIR /app
COPY requirements.txt ./
RUN pip install -r requirements.txt &&\
pip3 install Flask-JWT-Extended==3.20.0
ADD . /app
FROM node:10-alpine AS node-build
WORKDIR /app
COPY --from=python-build ./app/app/static/package.json app/static/
COPY --from=python-build ./app ./
WORKDIR /app/app/static
RUN npm cache verify && npm install && npm install -g --unsafe-perm node-sass && npm run sass && npm run build
FROM python:3.5-slim
COPY --from=python-build /root/.cache /root/.cache
WORKDIR /app
COPY --from=node-build ./app ./
RUN apt-get update -yq \
&& apt-get install curl -yq \
&& pip install -r requirements.txt
EXPOSE 9595
CMD python3 run.py
I would suggest to create separate build pipelines for your docker images, where you know that the requirements for npm and pip aren't so frequent.
This will incredibly improve the speed, reducing the time of access to npm and pip registries.
Use a private docker registry (the official one or something like VMWare harbor or SonaType Nexus OSS).
You store those build images on your registry and use them whenever something on the project changes.
Something like this:
First Docker Builder // python-builder:YOUR_TAG [gitrev, date, etc.)
docker build --no-cache -t python-builder:YOUR_TAG -f Dockerfile.python.build .
FROM python:3.5
WORKDIR /app
COPY requirements.txt ./
RUN pip install -r requirements.txt &&\
pip3 install Flask-JWT-Extended==3.20.0
Second Docker Builder // js-builder:YOUR_TAG [gitrev, date, etc.)
docker build --no-cache -t js-builder:YOUR_TAG -f Dockerfile.js.build .
FROM node:10-alpine
WORKDIR /app
COPY app/static/package.json /app/app/static
WORKDIR /app/app/static
RUN npm cache verify && npm install && npm install -g --unsafe-perm node-sass
Your Application Multi-stage build:
docker build --no-cache -t app_delivery:YOUR_TAG -f Dockerfile.app .
FROM python-builder:YOUR_TAG as python-build
# Nothing, already "stoned" in another build process
FROM js-builder:YOUR_TAG AS node-build
ADD ##### YOUR JS/CSS files only here, required from npm! ###
RUN npm run sass && npm run build
FROM python:3.5-slim
COPY . /app # your original clean app
COPY --from=python-build #### only the files installed with the pip command
WORKDIR /app
COPY --from=node-build ##### Only the generated files from npm here! ###
RUN apt-get update -yq \
&& apt-get install curl -yq \
&& pip install -r requirements.txt
EXPOSE 9595
CMD python3 run.py
A question is: why do you install curl and execute again the pip install -r requirements.txt command in the final docker image?
Triggering every time an apt-get update and install without cleaning the apt cache /var/cache/apt folder produces a bigger image.
As suggestion, use the docker build command with the option --no-cache to avoid caching result:
docker build --no-cache -t your_image:your_tag -f your_dockerfile .
Remarks:
You'll have 3 separate Dockerfiles, as I listed above.
Build the Docker images 1 and 2 only if you change your python-pip and node-npm requirements, otherwise keep them fixed for your project.
If any dependency requirement changes, then update the docker image involved and then the multistage one to point to the latest built image.
You should always build only the source code of your project (CSS, JS, python). In this way, you have also guaranteed reproducible builds.
To optimize your environment and copy files across the multi-stage builders, try to use virtualenv for python build.
I'm working on a Dockerfile with a multi-stage build. The general idea is to build the binary for the backend, build the javascript bundle for the frontend, and then put these two things in a final container for the app.
Here's the docker file:
# go binary
FROM golang:alpine as build-go
RUN apk --no-cache add git bzr mercurial
ENV D=/go/src/github.com/tamuhack-org/quack
RUN go get -d -v golang.org/x/net/html
RUN go get -d -v github.com/gorilla/handlers
RUN go get -d -v github.com/gorilla/mux
COPY ./main.go $D/main.go
COPY ./frontend/dist $D/frontend/dist
RUN rm -rf $D/frontend/dist/index.html
RUN rm -rf $D/frontend/dist/index.js
RUN cd $D && go build -o main && cp main /tmp/
# ui
FROM node:alpine AS build-node
RUN mkdir -p /src/ui
COPY ./frontend/package.json /src/ui/
RUN cd /src/ui && yarn install
COPY ./frontend /src/ui
# Replace the dev instance of index.html with the prod version.
RUN rm -rf /src/ui/dist/index.html
RUN mv /src/ui/dist/index-prod.html /src/ui/dist/index.html
RUN cd /src/ui && yarn build
# final
FROM alpine
RUN apk --no-cache add ca-certificates
WORKDIR /app/server/
COPY --from=build-go /tmp/main /app/server/
COPY --from=build-node /src/ui/dist /app/server/frontend/dist
EXPOSE 8080
CMD ["./main"]
What I've noticed is that when I update the frontend source code and build the docker container, the new version of the container doesn't update with the new bundle. Are there any obvious errors in the Dockerfile that may be the reason for why I'm not seeing any file changes? If I run yarn build locally, the bundle is accurate, but the docker container seems to be caching an older version. Thoughts?