How can I reduce the size of Docker images - docker

I have Node.js app which I am running as a docker container. Here is
a Dockerfile for that application.
FROM ubuntu
ARG ENVIRONMENT
ARG PORT
RUN apt-get update -qq
RUN apt-get install -y build-essential nodejs npm nodejs-legacy vim
RUN mkdir /consumer_portal
ADD . /consumer_portal
WORKDIR /consumer_portal
RUN npm install -g express
RUN npm install -g path
RUN npm cache clean
RUN npm install
EXPOSE $PORT
ENTRYPOINT [ "node", "server.js" ]
CMD [ $PORT, $ENVIRONMENT ]
Can I modify something in this Dockerfile to reduce the docker image size

Using the official node alpine image as a base image, as most here suggested, is a simple solution to reduce the overall size of the image, because even the base alpine image is a lot smaller compared to the base ubuntu image.
A Dockerfile could look like this:
FROM node:alpine
ARG ENVIRONMENT
ARG PORT
RUN mkdir /consumer_portal \
&& npm install -g express path
COPY . /consumer_portal
WORKDIR /consumer_portal
RUN npm cache clean \
&& npm install
EXPOSE $PORT
CMD [ "node", "server.js" ]
It's nearly the same and should work as expected. Most of the commands from your ubuntu image can be applied the same way in the alpine image.
When I add mock-data to be create a similar project as you might have, results in an ubuntu image with a size of 491 MB and the alpine version is only 62.5 MB big:
REPOSITORY TAG IMAGE ID CREATED SIZE
alpinefoo latest 8ca6f338475e 5 minutes ago 62.5MB
ubuntufoo latest 38620a1bd5a6 6 minutes ago 491MB

Try to pack all RUN instructions together, it will reduce the number of intermediate images. (But it won"t reduce the size).
Adding rm -rf /var/lib/apt/lists/* after apt-get update will reduce image size by removing all useless apt-get stuff.
You may also remove vim from your image in the last RUN instruction.
FROM ubuntu
ARG ENVIRONMENT
ARG PORT
RUN apt-get update \
&& apt-get install -y build-essential nodejs npm nodejs-legacy vim \
&& rm -rf /var/lib/apt/lists/* \
&& mkdir /consumer_portal
ADD . /consumer_portal
WORKDIR /consumer_portal
RUN npm install -g express \
&& npm install -g path \
&& npm cache clean \
&& npm install
EXPOSE $PORT
ENTRYPOINT [ "node", "server.js" ]
CMD [ $PORT, $ENVIRONMENT ]

1) Moving to Alpine is probably the best bet. I just ported an Ubuntu docker file to Alpine and went from 1.5GB to 585MB. I followed these instructions. Note, you'll be using apk instead of apt-get and the Alpine package names are a bit different.
2) It is also possible to reduce layers by merging run commands (each new run command creates a new layer).
RUN apt-get update -qq && apt-get install -y build-essential nodejs npm nodejs-legacy vim
RUN npm install -g express path && npm cache clean && npm install
3) You may also be interested in multi-stage build wherein you only copy necessary components to the final image.

Consider using this:
Consider using a --no-install-recommends when apt-get installing packages. This will result in a smaller image size. For more information, see this blog post
There is a good blog to tell you a few steps to go to reduce image size.
Tips to Reduce Docker Image Sizes
https://hackernoon.com/tips-to-reduce-docker-image-sizes-876095da3b34

Image generated at the first step , alias: builder
Copy the product of the first step image to the current image, only one image layer is used, saving the number of image layers of the previous step.
FROM node:10-alpine as builder
WORKDIR /web-console
ADD . /web-console
RUN npm install
RUN npm run build
FROM node:10-alpine
WORKDIR /web-console
COPY --from=builder . /web-console
CMD ["npm", "run", "docker-start"]
here an example of Java with Maven : 2 steps
FROM maven:3.5.2-jdk-8-alpine as BUILD
WORKDIR /app
COPY pom.xml /app/
COPY src /app/src/
RUN mvn clean package -DskipTests
FROM alpine
WORKDIR /app
COPY --from=BUILD /app/target/*.jar application.jar
ENTRYPOINT ["java", "-jar", "/application.jar"]

If you base on Ubuntu then a smart move is to make this
RUN apt-get update && apt-get install -y \
build-essential \
cwhatever-you-want \
vim \
&& rm -rf /var/lib/apt/lists/*
The last line will clear a lot :)
You should always apt-get update in same line, because otherwise it will be cached and not fired on next builds if you add another lib to install.

The image size of a container is an issue that should be addressed properly.
Some suggest to use the alpine distribution to save space.
That in principle is a good suggestion as there is a nodejs image for alpine that is ready to be used.
But you have to be carefull here, because you have to build all the binaries. Even node_modules usually contain just javascript packages, in some case you have binary that have to be build.
If your dockerfile is working right now, this should not be your case, but as you're moving from an ubuntu to a different kind of image it's better to keep in mind that all binaries that you need to use in the future have to be compiled in a alpine image.
Said that you should consider how you use your image before choose where to cut size.
Is your application a single application that lives alone just in a own container without any other node app around?
In case the answer is no, you should be aware that the size of each image in the local docker registry is not counted as summary to obtain the total used size.
Instead you have to split each image in the basic layers and sum each uniq layer.
What I mean here is that the single image is not so important if you have many node apps that run on a node.
You can save space by sharing the node_modules exporting it as a volume that contains all the needed dependencies.
Or better, you can start from an official nodejs image to create an intermediate image that contains the root of dependencies of you apps. For example expressjs and path. And then install in each application image the dedicated dependencies.
So you gain the advantage to share the layers in common reducing the total used size of the local docker registry.
Minor considerations
You don't need to install express and path globally inside a container image.
Do you really need vim in a container?
Consider that modify a container is not safe even in development. You can use the volume to point resources on your server file system.
Or copy in/out a file or folder from your container when running.
And if you just need to read something, just use commands like less, more or cat.

Related

Docker build on top existing docker image

I have a nodejs app wrapped in a docker image. Whenever I do any change in the code, even adding few console.logs, I need to rebuild the whole image, a long process of over 10 minutes, during the CI process.
Is there a way to build one image on top of another, say adding only the delta?
Thanks
EDIT in response to #The Fool comment
I'm running the CI using aws tools, mainly CodeBuild. I can use the latest docker image I created (all are stored in aws ecr) and build based on it, would it know to take only the delta even if it conflicts with the new code?
Following is my Dockerfile:
FROM node:15.3.0
RUN mkdir -p /usr/src/app
WORKDIR /usr/src/app
COPY package.json /usr/src/app
RUN apt-get update
RUN apt-get install -y build-essential libcairo2-dev libpango1.0-dev libjpeg-dev libgif-dev librsvg2-dev
RUN npm install
COPY . /usr/src/app
EXPOSE 3000
CMD bash -c "npm run sequelize db:migrate&&npm run sequelize db:seed:all&&npm run prod"

how would I set up a docker file that compiles using rust image but dependent on another image?

I have a rust service that is dependent on gstreamer and I would like to containerize it. There exists a rust image with debian and a gstreamer image with ubuntu on docker hub. I am still new to creating docker files so I'm not sure what the correct way to go about this is.
This is what I got from following along this log rocket article: https://blog.logrocket.com/packaging-a-rust-web-service-using-docker/. I modified it a bit to what my needs were.
FROM rust:1.51.0 as builder
RUN USER=root cargo new --bin ahps
WORKDIR /ahps
RUN touch ./src/lib.rs
RUN mv ./src/main.rs ./src/bin.rs
COPY ./Cargo.toml ./Cargo.toml
RUN cargo build --release
RUN rm -r src/*
ADD . ./
RUN rm ./target/release/deps/ahps*
RUN cargo build --release
FROM restreamio/gstreamer:latest-prod
ARG APP=/usr/src/ahps
COPY --from=builder /ahps/target/release/ahps ${APP}/ahps
WORKDIR ${APP}
CMD ["./ahps"]
This doesn't work because the rust build depends on gstreamer. I was able to get a working file by using apt-get to install gstreamer on top of the rust image, but the size went from ~700MB to 2.6GB due to the gstreamer image being much more optimized. This is the docker file that makes a 2.6gb image:
FROM rust:1.51.0
RUN apt-get update
RUN apt-get -y install libgstreamer1.0-0 gstreamer1.0-plugins-base gstreamer1.0-plugins-good gstreamer1.0-plugins-bad gstreamer1.0-plugins-ugly gstreamer1.0-libav gstreamer1.0-doc gstreamer1.0-tools gstreamer1.0-x gstreamer1.0-alsa gstreamer1.0-gl gstreamer1.0-gtk3 gstreamer1.0-qt5 gstreamer1.0-pulseaudio
WORKDIR /ahps
COPY . .
RUN cargo build --release
WORKDIR /ahps/target/release
CMD ["./ahps"]
Overall I'm looking for a way to utilize both images where the gstreamer one is a dependency for my rust build. Thanks in advance.

when does docker-compose use currently built images and when does it remake the local image?

with a dockerfile like so
FROM python:3.5.5-alpine
ARG CONTAINER_HOME_DIR
ENV CONTAINER_HOME_DIR=$CONTAINER_HOME_DIR
WORKDIR $CONTAINER_HOME_DIR
COPY wall_e/src/requirements.txt .
RUN apk add --update alpine-sdk libffi-dev && \
apk add freetype-dev && \
apk add postgresql-dev && \
pip install --no-cache-dir Cython && \
pip install --no-cache-dir -r requirements.txt && \
apk --update add postgresql-client
COPY wall_e/src ./
CMD ["./wait-for-postgres.sh", "db", "python", "./main.py" ]
If I then use this dockerfile with docker-compose, at what point does docker-compose determine that it needs to re-create the docker image that it will use to make the docker container from that image?
Will it remake the docker image if I make changes to the wall_e/src/requirements.txt file or will it remake the docker image if I make a change to the RUN line or the make a change to any files located in wall_e/src or even change the COPY LINE entirely or the CMD line?
Lets assume that I am using docker-compose up and I am not using the option --force-recreate
Once the image is built, docker-compose will not look into your dockerfile.
It will use that image, and only apply its configuration (docker-compose.yml).
So let's say you docker-compose up, then edit your docker-compose.yml file, your requirements.txt and your dockerfile, when you use docker-compose restart only the docker-compose.yml changes will be taken into account.
If you need to rebuild your image, you'll have to use specifically :
docker-compose build [my_service]
Docker will not rebuild images unless 1) instructed to do so, or 2) if the named image doesn’t exist.
That being said, when you rebuild, it will attempt to rebuild based on any cached information it has, and does not re-process steps unless the dockerfile has been modified, or if a file referenced via COPY has been modified. Any subsequent steps after a re-processed step will also be re-processed, because the build process basically creates new sub-images based on sub-images built from previous steps in the dockerfile.
However, if you specify —no-cache, it will re-process all steps in the dockerfile.

How to Edit Docker Image?

I did a basic search in the community and could not find a suitable answer, so I am asking here. Sorry if it was asked earlier.
Basically , I am working on a certain project and we keep changing code at a regular interval . So ,we need to build docker image everytime due to that we need to install dependencies from requirement.txt from scratch which took around 10 min everytime.
How can I perform direct change to docker image and also how to configure entrypoint(in Docker File) which reflect changes in Pre-Build docker image
You don't edit an image once it's been built. You always run docker build from the start; it always runs in a clean environment.
The flip side of this is that Docker caches built images. If you had image 01234567, ran RUN pip install -r requirements.txt, and got image 2468ace0 out, then the next time you run docker build it will see the same source image and the same command, and skip doing the work and jump directly to the output images. COPY or ADD files that change invalidates the cache for future steps.
So the standard pattern is
FROM node:10 # arbitrary choice of language
WORKDIR /app
# Copy in _only_ the requirements and package lock files
COPY package.json yarn.lock ./
# Install dependencies (once)
RUN yarn install
# Copy in the rest of the application and build it
COPY src/ src/
RUN yarn build
# Standard application metadata
EXPOSE 3000
CMD ["yarn", "start"]
If you only change something in your src tree, docker build will skip up to the COPY step, since the package.json and yarn.lock files haven't changed.
In my case, I was facing the same, after minor changes, i was building the image again and again.
My old DockerFile
FROM python:3.8.0
WORKDIR /app
# Install system libraries
RUN apt-get update && \
apt-get install -y git && \
apt-get install -y gcc
# Install project dependencies
COPY ./requirements.txt .
RUN pip install --upgrade pip
RUN pip install --no-cache-dir -r requirements.txt --use-deprecated=legacy-resolver
# Don't use terminal buffering, print all to stdout / err right away
ENV PYTHONUNBUFFERED 1
COPY . .
so what I did, created a base image file first like this (Avoided the last line, did not copy my code)
FROM python:3.8.0
WORKDIR /app
# Install system libraries
RUN apt-get update && \
apt-get install -y git && \
apt-get install -y gcc
# Install project dependencies
COPY ./requirements.txt .
RUN pip install --upgrade pip
RUN pip install --no-cache-dir -r requirements.txt --use-deprecated=legacy-resolver
# Don't use terminal buffering, print all to stdout / err right away
ENV PYTHONUNBUFFERED 1
and then build this image using
docker build -t my_base_img:latest -f base_dockerfile .
then the final Dockerfile
FROM my_base_img:latest
WORKDIR /app
COPY . .
And as my from this image, I was not able to up the container, issues with my copied python code, so you can edit the image/container code, to fix the issues in the container, by this mean i avoided the task of building images again and again.
When my code got fixed, I copied the changes from container to my code base and then finally, I created the final image.
There are 4 Steps
Start the image you want to edit (e.g. docker run ...)
Modify the running image by shelling into it with docker exec -it <container-id> (you can get the container id with docker ps)
Make any modifications (install new things, make a directory or file)
In a new terminal tab/window run docker commit c7e6409a22bf my-new-image (substituting in the container id of the container you want to save)
An example
# Run an existing image
docker run -dt existing_image
# See that it's running
docker ps
# CONTAINER ID IMAGE COMMAND CREATED STATUS
# c7e6409a22bf existing-image "R" 6 minutes ago Up 6 minutes
# Shell into it
docker exec -it c7e6409a22bf bash
# Make a new directory for demonstration purposes
# (note that this is inside the existing image)
mkdir NEWDIRECTORY
# Open another terminal tab/window, and save the running container you modified
docker commit c7e6409a22bf my-new-image
# Inspect to ensure it saved correctly
docker image ls
# REPOSITORY TAG IMAGE ID CREATED SIZE
# existing-image latest a7dde5d84fe5 7 minutes ago 888MB
# my-new-image latest d57fd15d5a95 2 minutes ago 888MB

Docker-compose cannot find Java

I am trying to use a Python wrapper for a Java library called Tabula. I need both Python and Java images within my Docker container. I am using the openjdk:8 and python:3.5.3 images. I am trying to build the file using Docker-compose, but it returns the following message:
/bin/sh: 1: java: not found
when it reaches the line RUN java -version within the Dockerfile. The line RUN find / -name "java" also doesn't return anything, so I can't even find where Java is being installed in the Docker environment.
Here is my Dockerfile:
FROM python:3.5.3
FROM openjdk:8
FROM tailordev/pandas
RUN apt-get update && apt-get install -y \
python3-pip
# Create code directory
ENV APP_HOME /usr/src/app
RUN mkdir -p $APP_HOME/temp
WORKDIR /$APP_HOME
# Install app dependencies
ADD requirements.txt $APP_HOME
RUN pip3 install -r requirements.txt
# Copy source code
COPY *.py $APP_HOME/
RUN find / -name "java"
RUN java -version
ENTRYPOINT [ "python3", "runner.py" ]
How do I install Java within the Docker container so that the Python wrapper class can invoke Java methods?
This Dockerfile can not work because the multiple FROM statements at the beginning don't mean what you think it means. It doesn't mean that all the contents of the Images you're referring to in the FROM statements will end up in the Images you're building somehow, it actually meant two different concepts throughout the history of docker:
In the newer Versions of Docker multi stage builds, which is a very different thing from what you're trying to achieve (but very interesting nontheless).
In earlier Versions of Docker, it gave you the ability to simply build multiple images in one Dockerfile.
The behavior you are describing makes me assume you are using such an earlier Version. Let me explain what's actually happening when you run docker build on this Dockerfile:
FROM python:3.5.3
# Docker: "The User wants me to build an
Image that is based on python:3.5.3. No Problem!"
# Docker: "Ah, the next FROM Statement is coming up,
which means that the User is done with building this image"
FROM openjdk:8
# Docker: "The User wants me to build an Image that is based on openjdk:8. No Problem!"
# Docker: "Ah, the next FROM Statement is coming up,
which means that the User is done with building this image"
FROM tailordev/pandas
# Docker: "The User wants me to build an Image that is based on python:3.5.3. No Problem!"
# Docker: "A RUN Statement is coming up. I'll put this as a layer in the Image the user is asking me to build"
RUN apt-get update && apt-get install -y \
python3-pip
...
# Docker: "EOF Reached, nothing more to do!"
As you can see, this is not what you want.
What you should do instead is build a single image where you will first install your runtimes (python, java, ..), and then your application specific dependencies. The last two parts you're already doing, here's how you could go about installing your general dependencies:
# Let's start from the Alpine Java Image
FROM openjdk:8-jre-alpine
# Install Python runtime
RUN apk add --update \
python \
python-dev \
py-pip \
build-base \
&& pip install virtualenv \
&& rm -rf /var/cache/apk/*
# Install your framework dependencies
RUN pip install numpy scipy pandas
... do the rest ...
Note that I haven't tested the above snippet, you may have to adapt a few things.

Resources