Docker image is very large when including a small node library - docker

I'm building a custom Docker image for use in CI. It's based on Node Alpine, and only includes git and the node CLI for publishing the app, expo-cli - but the resulting image takes up 1.27GB of space.
This is the CLI
https://github.com/expo/expo-cli
The corresponding folder in my node_modules is 16.4 MB
The Dockerfile:
FROM node:alpine
RUN set -xe \
&& apk add --no-cache git \yarn -v
&& git --version && node -v && yarn -v
RUN yarn global add expo-cli#2.18.3
The build context itself is tiny:
docker build . -t no-unsafe
Sending build context to Docker daemon 4.096kB
Step 1/4 : FROM node:alpine
...
Using Dive to analyze the image, it says that RUN yarn global add expo-cli#2.18.3 adds 1.1GB to the image:
Inspecting that layer specifically shows the following folders to be the culprit
I don't understand how this image ends up so huge?

Related

Docker CMD and WORKDIR getting converted to RUN

I've created a docker image with Artifactory and Terraform to be used by pods in a K8s Cluster but it wont persist, the pod gets deleted immediately after spinning up and wasn't able to execute the job it was assigned to. Upon checking the pushed image in Artifactory converts the WORKDIR and CMD to RUN, is there anything I'm missing?
Here's the Dockerfile:
FROM alpine:3.16.2
LABEL maintainer="Platform Engineering"
# install dependencies
RUN apk add terraform
RUN apk add curl
RUN apk add tree
RUN curl -fL https://install-cli.jfrog.io | sh
RUN curl --location --output /usr/local/bin/release-cli "https://release-cli-downloads.s3.amazonaws.com/latest/release-cli-linux-amd64"
RUN chmod +x /usr/local/bin/release-cli
# check version of installed dependencies
RUN terraform -v
RUN jf -v
RUN release-cli -v
# target ci workspace under /tmp directory
WORKDIR /tmp/ci-workspace
CMD ["/bin/sh"]
Here's how the layers look like in Artifactory:
Tried rebuilding in Windows and other machine and installing one binary at a time, nothing worked.

Copy file from dockerfile build to host - bandit

I just started learning docker. To teach myself, I managed to containerize bandit (a python code scanner) but I'm not able to see the output of the scan before the container destroys itself. How can I copy the output file from inside the container to the host, or otherwise save it?
Right now i'm just using bandit to scan itself basically :)
Dockerfile
FROM python:3-alpine
WORKDIR /
RUN pip install bandit
RUN apk update && apk upgrade
RUN apk add git
RUN git clone https://github.com/PyCQA/bandit.git ./code-to-scan
CMD [ "python -m bandit -r ./code-to-scan -o bandit.txt" ]
You can mount a volume on you host where you can share the output of bandit.
For example, you can run your container with:
docker run -v $(pwd)/output:/tmp/output -t your_awesome_container:latest
And you in your dockerfile:
...
CMD [ "python -m bandit -r ./code-to-scan -o /tmp/bandit.txt" ]
This way the bandit.txt file will be found in the output folder.
Better place the code in your image not in the root directory.
I did some adjustments to your Dockerfile.
FROM python:3-alpine
WORKDIR /usr/myapp
RUN pip install bandit
RUN apk update && apk upgrade
RUN apk add git
RUN git clone https://github.com/PyCQA/bandit.git .
CMD [ "bandit","-r",".","-o","bandit.txt" ]`
This clones git in your WORKDIR.
Note the CMD, it is an array, so just devide all commands and args as in the Dockerfile about.
I put the the Dockerfile in my D:\test directory (Windows).
docker build -t test .
docker run -v D:/test/:/usr/myapp test
It will generate you bandit.txt in the test folder.
After the code is execute the container exits, as there are nothing else to do.
you can also put --rm to remove the container once it finishs.
docker run --rm -v D:/test/:/usr/myapp test

Why docker image built locally and on github actions have different digest

I am building and push docker image from github actions to AWS. Both locally and on gitHub, I use the same command to build the image; however, it results to different image digests.
Command I used:
docker buildx build --build-arg DOCKER_REGISTRY=$ECR_REGISTRY --platform=linux/amd64 -t "$ECR_REGISTRY/$ECR_REPO_NAME:$LATEST_TAG" -t "$ECR_REGISTRY/$ECR_REPO_NAME:$ECR_TAG" -f Dockerfile .
My Dockerfile:
FROM node:16
RUN apt-get update \
&& apt-get install -y chromium \
fonts-ipafont-gothic fonts-wqy-zenhei fonts-thai-tlwg fonts-kacst fonts-freefont-ttf libxss1 \
--no-install-recommends
RUN mkdir -p /usr/src/app
RUN chown -R node: /usr/src/app
USER node
WORKDIR /usr/src/app
COPY --chown=node:node package.json package-lock.json /usr/src/app/
ENV PUPPETEER_SKIP_CHROMIUM_DOWNLOAD true
ENV PUPPETEER_EXECUTABLE_PATH /usr/bin/chromium
RUN npm config set registry https://registry.npmjs.org/
RUN npm install
# Copy crawl scripts
RUN mkdir -p /usr/src/app/scripts
RUN mkdir -p /usr/src/app/collected
COPY --chown=node:node lib/ /usr/src/app/lib/
COPY --chown=node:node scripts/ /usr/src/app/scripts/
You're basing your image on a the node:16 image, which is a moving target. Depending on when you last pulled that image, you might actually be using node:16.8.0 or node:16.9.1.
Depending on which base image you start with, you're going to end up with a final image digest.
If you want to make absolutely sure you're building the same image in both environments, build using the digest rather than using a tag, e.g:
FROM node#sha256:0c672d547405fe64808ea28b49c5772b1026f81b3b716ff44c10c96abf177d6a
...
Docker image builds from docker build and buildx are not reproducible (at least for now). Everything pushed to a registry is content addressable, referenced by the digest of the content itself. The image digest is from a json manifest that contains digests of various components, similar to a Merkle tree. If any of those components change, the top level digest also changes. And the components include things that change. Among the things that change are:
image creation time and time for each history step, there is a timestamp directly in the image config
file timestamps, which will be different if they are created at the time of the image build
base image, if you don't pin that base image to a digest
results of RUN steps, when those steps pull from external resources that give you the "latest" of some resource, e.g. apt-get install some_pkg will change every time the package or one of it's dependencies gets updated
I've been making efforts to modify an image after the fact, back dating timestamps, proxying all network requests, etc, but this is still pre-alpha and dependent on each image being built to find what changes between builds. For some of the debugging using regctl I've been adding regctl manifest diff, regctl blob diff-config, and regctl blob diff-layer. And then to change the image, there's an experimental regctl image mod command.

docker image run failing, ipmi_exporter for Prometheus

I'm trying to create a docker image of soundcloud/ipmi-exporter to run with Prometheus on Ubuntu Bionic with Docker 19.03.6, build 369ce74a3c. Docker on my OS X laptop is Docker version 20.10.2, build 2291f61. I am forced to build the (customized) image on my laptop because Bionic has a version of golang that's older than what ipmi-exporter wants, and I'm not allowed to update the Ubuntu server.
Anyway, can someone tell me what I'm doing wrong in my Dockerfile?
# Container image
FROM quay.io/prometheus/golang-builder:1.13-base AS builder
ADD . /go/src/github.com/soundcloud/ipmi_exporter/
RUN cd /go/src/github.com/soundcloud/ipmi_exporter && make
# Container image
FROM ubuntu:18.04
WORKDIR /
RUN apt-get update && apt-get install freeipmi-tools -y --no-install-recommends && rm -rf /var/lib/apt/lists/*
COPY --from=builder /go/src/github.com/soundcloud/ipmi_exporter/ipmi_exporter /bin/ipmi_exporter
EXPOSE 8888
ENTRYPOINT ["ipmi_exporter"]
CMD ["--config.file", "/ipmi_remote.yml"]
CMD ["--web.listen-address=":8889"" "--freeipmi.path=/etc/freeipmi" "--log.level="debug""]
When I run the image all I see is
ipmi_exporter: error: unexpected /bin/sh, try --help
I have ipmi_exporter running on the OS directly and I never configured a config.yml. What config.yml is the Dockerfile author talking about? It's mentioned in the last line of https://github.com/soundcloud/ipmi_exporter/blob/master/Dockerfile
The image lives here: https://github.com/soundcloud/ipmi_exporter The sample/example Dockerfile refers to a config.yaml which this software does not use.
I just can't figure out how to make the image pull in the config file I specify.

Docker container with build output and no source

I have a build process that converts typescript into javascript, minifies and concatenates css files, etc.
I would like to put those files into an nginx docker container, but I don't want the original javascript / css source to be included, nor the tools that I use to build them. Is there a good way to do this, or do I have to run the build outside docker (or in a separately defined container), then COPY the relevant files in?
This page talks about doing something similar in a manual way, but doesn't explain how to automate the process e.g. with docker-compose or something.
Create a docker images with all required tools to build your code also that can clone code and build it. After build it have to copy
into docker volume for example volume name is /opt/webapp.
Launch build docker container using build image in step 1
docker run -d -P --name BuildContainer -v /opt/webapp:/opt/webapp build_image_name
Launch nginx docker container that will use shared volume of build docker in which your build code resides.
docker run -d -P --name Appserver -v /opt/webapp:/usr/local/nginx/html nginx_image_name
After building and shipping your build code to Appserver . you can delete BuildContainer because that is no more required.
Advantage of above steps:
your build code will in host machine so if one Appserver docker fail or stop then your build code will be safe in host machine and you can launch new docker using that build code.
if you create docker image for building code then every time no need to install required tool while launching docker.
you can build your code in host machine also but if you want your code should be build in fresh environment every time then this will be good. or if you use same host machine to build/compile code every time then some older source code may create problem or git clone error etc.
EDIT:
you can append :ro (Read only) to volume by which one container will not affect another. you can Read more about docker volume Here . Thanks #BMitch for suggestion.
The latest version of docker supports multi-stage builds where build products can be copied from on container to another.
https://docs.docker.com/engine/userguide/eng-image/multistage-build/
This is an ideal scenario for a multi-stage build. You perform the compiling in the first stage, copy the output of that compile to the second stage, and only ship that second stage. Each stage is an independent image that begins with a FROM line. And to transfer files between stages, there's now a COPY --from syntax. The result looks roughly like:
# first stage with your full compile environment, e.g. maven/jdk
FROM maven as build
WORKDIR /src
COPY src /src
RUN mvn install
# second stage starts below with just a jre base image
FROM openjdk:jre
# copy the jar from the first stage here
COPY --from=build /src/result.jar /app
CMD java -jar /app/result.jar
Original answer:
Two common options:
As mentioned, you can build outside and copy the compiled result into the container.
You merge your download, build, and cleanup step into a single RUN command. This is a common best practice to minimize the size of each layer.
An example Dockerfile for the second option would look like:
FROM mybase:latest
RUN apt-get update && apt-get install tools \
&& git clone https://github.com/myproj \
&& cd myproj \
&& make \
&& make install
&& cd .. \
&& apt-get rm tools && apt-get clean \
&& rm -rf myproj
The lines would be a little more complicated than that, but that's the gist.
As #dnephin suggested in his comments on the question and on #pl_rock's answer, the standard docker tools are not designed to do this, but you can use a third party tool like one of the following:
dobi (48 GitHub stars)
packer (6210 GitHub stars)
rocker (759 GitHub stars)
conveyor (152 GitHub stars)
(GitHub stars correct when I wrote the answer)
We went with dobi as it was the first one we heard of (because of this question), but it looks like packer is the most popular.
Create a docker file to run your build process, then run cleanup code
Example:
FROM node:latest
# Provides cached layer for node_modules
ADD package.json /tmp/package.json
RUN cd /tmp && npm install
RUN mkdir -p /dist && cp -a /tmp/node_modules /dist/
RUN cp /tmp/package.json /dist
ADD . /tmp
RUN cd /tmp && npm run build
RUN mkdir -p /dist && cp -a /tmp/. /dist
#run some clean up code here
RUN npm run cleanup
# Define working directory
WORKDIR /dist
# Expose port
EXPOSE 4000
# Run app
CMD ["npm", "run", "start"]
In your docker compose file
web:
build: ../project_path
environment:
- NODE_ENV=production
restart: always
ports:
- "4000"

Resources