Docker image size is coming up to 1.7 G for Ubuntu with Python packages - docker

Following is my Dockerfile :-
FROM ubuntu:18.04 AS builder
RUN apt update -y
RUN apt install python3.8 -y && apt install python3-pip -y
RUN apt install build-essential automake pkg-config libtool libffi-dev libgmp-dev -y
RUN apt install libsecp256k1-dev -y
RUN apt install openjdk-8-jre -y
RUN apt install git -y
RUN apt install libkrb5-dev -y
RUN apt install vim -y
RUN mkdir /opt/app
RUN chown -R root:root /opt/app
COPY ["requirements.txt","/opt/app/requirements.txt"]
SHELL ["/bin/bash", "-c"]
WORKDIR /opt/app
RUN pip3 install -r requirements.txt && apt-get -y clean all
RUN mkdir /opt/app/
RUN chown -R root:root /opt/app/
RUN cd /opt/app/
RUN git clone -b master https://bitbucket.org/heroes/test.git
CMD ["bash","/opt/app/bin/connect.sh"]
Docker image is generating with an image file size of 1.7G. I need to have OpenJDK hence cannot use a standard python package as a base package. When I perform docker history , I can see 2 or 3 layers (installing packages above like Python3.8, OpenJDK and libsecp256k1-dev) taking up to 400MB to 500MB in size. Ubuntu as a base image takes only 64 MB however rest of size is taking by my dockerfile layers.
I believe I need to re-write the dockerfile in order to reduce the file size which I did but nothing happened concrete.
Please assist me on reducing the image less than 1 GB at least.
[Update]
Below is my updated Dockerfile:-
FROM ubuntu:18.04 AS builder
WORKDIR /opt/app
COPY requirements.txt /opt/app/aws/requirements.txt
RUN mkdir -p /opt/app/aws \
&& apt-get update -yq \
&& apt-get install -y python3.8 python3-pip openjdk-8-jre -yq && apt-get -y clean all \
&& chown -R root:root /opt/app && cd /opt/app/aws && pip3 install -r requirements.txt
FROM alpine
COPY --from=builder /opt/app /opt/app
SHELL ["/bin/bash", "-c"]
CMD ["bash","/opt/app/aws/bin/connector/connect.sh"]
Screenshot of image size:-
After removing unwanted libraries like git, etc and using the multi-stage build, the image is now approx 1.7 GB which I believe is a lot. Any suggestion to improve this?

You have multiple issues going on.
First, each of your RUN apt install is increasing your image size, you should have them all in the same RUN stage, and at the end of the stage, delete all cached apt files.
Second, you're installing unnecessary stuff. Why would you need vim and git for instance? Why are you installing build-essential and other build-related stuff if you're not building anything?
Third, it seems you tried to do a multi-stage build but ended up adding everything to the same image. Read up on python multi-stage builds.

If we consider best practices instead of multiple RUN use single RUN.
For example
RUN apt-get update -yq \
&& apt-get install -y python3-dev build-essential -yq \
&& apt-get install curl -yq \
&& pip install -r requirements.txt \
&& apt-get purge -y --auto-remove gcc python3-dev build-essential
you can use multistage builds if you don't require git in your final image you can remove in final stage
Also if possible you can use alpine version also.

Try disabling recommended packages of APT with --no-install-recommends, you can read more about it from here.

Now the image is smaller:
FROM ubuntu:18.04 AS builder
RUN apt update -y
RUN apt install python3-pip -y
RUN apt install build-essential automake pkg-config libtool libffi-dev libgmp-dev -y
RUN apt install libsecp256k1-dev -y
RUN apt install openjdk-8-jre-headless -y
RUN apt install git -y
RUN apt install libkrb5-dev -y
RUN apt install vim -y
RUN mkdir /opt/app
RUN chown -R root:root /opt/app
COPY ["requirements.txt","/opt/app/requirements.txt"]
SHELL ["/bin/bash", "-c"]
WORKDIR /opt/app
RUN pip3 install -r requirements.txt && apt-get -y clean all
RUN mkdir /opt/app/
RUN chown -R root:root /opt/app/
RUN cd /opt/app/
RUN git clone -b master https://bitbucket.org/heroes/test.git
CMD ["bash","/opt/app/bin/connect.sh"]

Related

Error running python code via docker image

I have a python code which runs fine to pull data from an API but I am getting issues to run it via docker. I am using pyodbc to load data into SQLServer in my python code. Here is my dockerfile:
FROM python:3.9.2
RUN apt-get update -y && apt-get install -y --no-install-recommends \
unixodbc-dev \
unixodbc \
libpq-dev
WORKDIR /app
COPY requirements.txt requirements.txt
RUN pip3 install -r requirements.txt
COPY . .
CMD ["python3","LoadAPI_data.py"]
After creating the docker image, when I am trying to run the docker image, I get the following error:
Error !!!!: ('01000', "[01000] [unixODBC][Driver Manager]Can't open
lib 'ODBC Driver 17 for SQL Server' : file not found (0)
(SQLDriverConnect)")
Can anyone let me know how do I get rid of this error?
I was able to get my code running by updating my dockerfile to run installation of SQL DB as well as python. Here is what my new dockerfile looks like.
FROM ubuntu:18.04
RUN apt-get update -y && \
apt-get install -y \
libpq-dev \
gcc \
python3-pip \
unixodbc-dev
RUN apt-get update && apt-get install -y \
curl apt-utils apt-transport-https debconf-utils gcc build-essential g++-5\
&& rm -rf /var/lib/apt/lists/*
RUN curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add -
RUN curl https://packages.microsoft.com/config/ubuntu/18.04/prod.list > /etc/apt/sources.list.d/mssql-release.list
RUN apt-get update
RUN ACCEPT_EULA=Y apt-get install -y --allow-unauthenticated msodbcsql17
RUN pip3 install pyodbc
WORKDIR /app
COPY requirements.txt requirements.txt
RUN pip3 install -r requirements.txt
COPY . .
CMD ["python3","LoadAPI_data.py"]

Docker image build fails on file add

This is my Dockerfile:
FROM debian:latest
LABEL MAINTAINER DINESH
LABEL version="1.0"
LABEL description="First image with Dockerfile & DINESH."
RUN apt-get clean
RUN apt-get update
RUN apt-get install -qy git
RUN apt-get install -qy locales
RUN apt-get install -qy nano
RUN apt-get install -qy tmux
RUN apt-get install -qy wget
RUN apt-get install -qy python3
RUN apt-get install -qy python3-psycopg2
RUN apt-get install -qy python3-pystache
RUN apt-get install -qy python3-yaml
RUN apt-get -qy autoremove
# ** ERROR IS BELOW **
ADD .bashrc /root/.bashrc
ADD .profile /root/.profile
ADD app /app
RUN locale-gen C.UTF-8 && /usr/sbin/update-locale LANG=C.UTF-8
ENV PYTHONIOENCODING UTF-8
ENV PYTHONPATH /app/
When i run this command docker build -t myimage ., it is giving error below.
"Step 17/20 : ADD app /app
ADD failed: stat /var/lib/docker/tmp/docker-builder687980062/.bashrc: no such file or directory"
I gave permission the above give path but it is not resolved. Please let me know how I can solve it.
First please make sure file is existing in proper directory. as error suggesting no such file or directory
Please instead of ADD try using COPY working for me
COPY .bashrc /root/
COPY .profile /root/
also make file exist at source place and destination is proper.
Also as per best practices you can merge line and make a single command
RUN apt-get update -yq \
&& apt-get install -y python3-dev build-essential -yq \
&& apt-get install curl -yq \
&& pip install -r requirements.txt \
&& apt-get purge -y --auto-remove gcc python3-dev build-essential
change to:
ADD .bashrc /root/
ADD .profile /root/
ADD app /
From documentation:
ADD src ... dest.
The dest is an absolute path, or a path relative
to WORKDIR, into which the source will be copied inside the
destination container.

How to install AWS CLI in docker container based on image “java:8”

I have a Dockerfile that is like:
FROM java:8
LABEL maintainer="CMS"
RUN apt-get install python-pip
RUN pip install awscli
....
.....
[Error: Unable to locate package python-pip]
My end goal is to have java8 and aws-cli installed. Also I don't want to use curl statements in the Dockerfile. Also I don't want to use the plain ubuntu image.
How should I go about doing it?
The error says Pip is not installed. Try installing it properly. If installed try executing same commands to verify.
try to update your docker file to
FROM java:8
LABEL maintainer="CMS"
RUN apt-get update && apt-get install -y \
software-properties-common
RUN add-apt-repository universe
RUN apt-get update && apt-get install -y \
python3.4 \
python3-pip
RUN pip install awscli
....
.....
If you want to base it on top of openjdk:8 image, try the following:
FROM openjdk:8
RUN set -eux; \
apt-get update; \
apt-get install -y --no-install-recommends \
python3-setuptools \
python3-pip \
; \
rm -rf /var/lib/apt/lists/*
RUN pip3 --no-cache-dir install -U awscli
RUN apt-get clean
The other option is to use Alpine distribution:
FROM openjdk:8-alpine
RUN set -eux; \
apk add python3 ; \
pip3 --no-cache-dir install -U awscli
Sources:
https://bitbucket.org/vodkaseledka/openjdk8-awscli
https://bitbucket.org/vodkaseledka/openjdk8-awscli-alpine
Or you can get pre-builds from here:
https://hub.docker.com/repository/docker/savnn/openjdk8-awscli
https://hub.docker.com/repository/docker/savnn/openjdk8-awscli-alpine
this work for me: create dockerfile
FROM openjdk:8-alpine
RUN apk update;
RUN set -eux; \
apk add python3 ; \
pip3 --no-cache-dir install -U awscli; \
pip3 install --upgrade pip;
RUN apk add groff
use docker build . -t aws then run: docker run -it aws /bin/sh

How to reduce multistage build duplicate steps time cost issue?

I have a go application, which depends on cgo. When build, it needs libsodium-dev, libzmq3-dev, libczmq-dev, and when run it also needs above three packages.
Currently, I use next multistage build: a golang build environment as the first stage & a debian slim as the second stage. But you could see the 3 packages installed for two times which waste time(Later I may have more such kinds of package added).
FROM golang:1.12.9-buster AS builder
WORKDIR /src/pigeon
COPY . .
RUN apt-get update && \
apt-get install -y --no-install-recommends libsodium-dev && \
apt-get install -y --no-install-recommends libzmq3-dev && \
apt-get install -y --no-install-recommends libczmq-dev && \
go build cmd/main/pgd.go
FROM debian:buster-slim
RUN apt-get update && \
apt-get install -y --no-install-recommends libsodium-dev && \
apt-get install -y --no-install-recommends libzmq3-dev && \
apt-get install -y --no-install-recommends libczmq-dev && \
apt-get install -y --no-install-recommends python3 && \
apt-get install -y --no-install-recommends python3-pip && \
pip3 install jinja2
WORKDIR /root/
RUN mkdir logger
COPY --from=builder /src/pigeon/pgd .
COPY --from=builder /src/pigeon/logger logger
CMD ["./pgd"]
Of course, I can give up multi-stage build, just use golang1.12.9-buster for build, and continue for run, but this will make final run image bigger (which is the advantage of multi-stage build).
Do I miss something or I had to make choice between above?
this is my take about your question:
FROM debian:buster-slim as base
RUN mkdir /debs /debs_tmp \
&& chmod 777 /debs /debs_tmp
WORKDIR /debs
RUN apt-get update \
&& apt-get install -y -d \
--no-install-recommends \
-o dir::cache::archives="/debs_tmp/" \
libsodium-dev \
libzmq3-dev \
libczmq-dev \
&& mv /debs_tmp/*.deb /debs \
&& rm -rf /debs_tmp \
&& apt-get install -y --no-install-recommends \
python3 \
python3-pip \
&& pip3 install jinja2 \
&& rm -rf /var/lib/apt/lists/*
##################
FROM golang:1.12.9-buster AS builder
COPY --from=base /debs /debs
WORKDIR /debs
RUN dpkg -i *.deb
WORKDIR /src/pigeon
COPY . .
RUN go build cmd/main/pgd.go
##################
FROM base
RUN rm -rf /debs
WORKDIR /root/
RUN mkdir logger
COPY --from=builder /src/pigeon/pgd .
COPY --from=builder /src/pigeon/logger logger
CMD ["./pgd"]
You can download the required packages in a temporary folder, move the debs in a new location and finally COPY the debs in the next stage. Finally you simply use the first image you've created.
BTW the containers will run as root. This might be an issue depending on what the software does, you might want to consider to use a user without "powers".
EDIT: sorry for the edits but I ran a couple of example locally and didn't have a go script ready.
At the COPY . . step, any time your source changes, the cache will bust and you will run all later steps again. You can reorder the steps to allow docker to cache the install of your dependencies. You can also join the apt-get install commands into one to reduce overhead of processing the package manager db.
FROM golang:1.12.9-buster AS builder
WORKDIR /src/pigeon
RUN apt-get update \
&& apt-get install -y --no-install-recommends \
libsodium-dev \
libzmq3-dev \
libczmq-dev
COPY . .
RUN go build cmd/main/pgd.go
FROM debian:buster-slim
RUN apt-get update \
&& apt-get install -y --no-install-recommends \
libsodium-dev \
libzmq3-dev \
libczmq-dev \
python3 \
python3-pip \
&& pip3 install jinja2
WORKDIR /root/
RUN mkdir logger
COPY --from=builder /src/pigeon/pgd .
COPY --from=builder /src/pigeon/logger logger
CMD ["./pgd"]
You will still install the packages twice, but now those installs are cached for future builds. The way to reuse the install of the libraries is to reorder the steps, installing the libraries in a common base image, and then install the go compiler on your build stage, but that will almost certainly be more overhead than installing libraries twice.
With BuildKit, you could share the apt cache between builds using an experimental syntax, but this requires that all builds use BuildKit (the syntax is not backwards compatible), and modifying docker's Debian image to preserve the apt package cache. From the BuildKit experimental documentation, there's the following example for apt:
# syntax = docker/dockerfile:experimental
FROM ubuntu
RUN rm -f /etc/apt/apt.conf.d/docker-clean; echo 'Binary::apt::APT::Keep-Downloaded-Packages "true";' > /etc/apt/apt.conf.d/keep-cache
RUN --mount=type=cache,target=/var/cache/apt --mount=type=cache,target=/var/lib/apt \
apt update && apt install -y gcc
https://github.com/moby/buildkit/blob/master/frontend/dockerfile/docs/experimental.md

E: Unable to locate package in multistage Docker build

When I build just the main image, all the packages instead. But as soon as I turn it into a multi-stage build and it gets to RUN apt-get install -y python3-pip, I get "E: Unable to locate package in multistage Docker build"
FROM gcc:8.2.0 as builder
# FROM ownyourbits/debiandev:latest
RUN apt-get update
# RUN apt-get install -y libxerces-c-dev automake cmake libboost-all-dev build-essential
RUN apt-get install -y libxerces-c-dev automake cmake libboost-all-dev build-essential
RUN git clone https://github.com/mypackage/mypackage-d.git
WORKDIR /mypackage-d/
RUN autoreconf -if
RUN ./configure --enable-silent-rules 'CFLAGS=-g -O0 -w' 'CXXFLAGS=-g -O0 -w' 'LDFLAGS=-g -O0 -w'
RUN make
RUN make install
RUN ls .
# Main Image
FROM library/python:3.7-stretch
COPY --from=builder /mypackage-d/mypackaged.bin /mypackage-d
RUN apt-get update
RUN apt-get install -y python3
RUN apt-get install -y python3-pip
RUN apt-get install -y postgresql-client
RUN apt-get install -y libxerces-c-dev
# For VIM
RUN apt-get install -y apt-file
RUN apt-file update
RUN apt-get install -y vim
RUN pip install --upgrade pip
COPY requirements.txt /
RUN pip3 install --trusted-host pypi.org -r /requirements.txt
WORKDIR /code
ENTRYPOINT ["/bin/bash", "start.sh"]
Moving the COPY --from=builder command below the apt-get install and pip install statements worked for me.

Resources