Is there a lightweight GCC distribution that I can install in Alpine?
I am trying to make a small Docker image. For that reason, I am using Alpine as the base image (5MB). The standard GCC install dwarfs this in comparison (>100MB).
So is there a lightweight GCC distribution that I can install on Alpine?
Note: Clang is much worse (475MB last I checked).
There isn't such an image available, AFAIK, but you can make GCC slimmer by deleting unneeded GCC binaries.
It very much depends on what capabilities are required from GCC.
As a starting point, I'm assuming you need C support only, which means the gcc and musl-dev packages (for standard headers) are installed, which result with a ~100MB image with Alpine 3.8.
If you don't need Objective-C support, you could remove cc1obj, which is the Objective-C backend. On Alpine 3.8, it would be located at /usr/libexec/gcc/x86_64-alpine-linux-musl/6.4.0/cc1obj, and takes up 17.6MB.
If you don't need link time optimization (LTO), you could remove the LTO wrapper and main executables, lto-wrapper and lto1, which take up 700kb and 16.8MB respectively.
While LTO optimization may be powerful, on most applications it's likely to result with only minor speed and size improvements (few percents). Plus, you have to opt-in for LTO, which is not done by most applications, so it may be a good candidate for removal.
You could remove the Java front end, gcj, which doesn't seem to be working anyways. It is located at /usr/bin/x86_64-alpine-linux-musl-gcj, and weights 812kb.
By removing these, and squashing the resulting image, it would shrink into 64.4MB, which is still considerably large. You may be able to shrink further by removing additional files, but then you may loose some desired functionality and with a less appealing tradeoff.
Here's an example Dockerfile:
FROM alpine:3.8
RUN set -ex && \
apk add --no-cache gcc musl-dev
RUN set -ex && \
rm -f /usr/libexec/gcc/x86_64-alpine-linux-musl/6.4.0/cc1obj && \
rm -f /usr/libexec/gcc/x86_64-alpine-linux-musl/6.4.0/lto1 && \
rm -f /usr/libexec/gcc/x86_64-alpine-linux-musl/6.4.0/lto-wrapper && \
rm -f /usr/bin/x86_64-alpine-linux-musl-gcj
Tested using:
sudo docker image build --squash -t alpine-gcc-minimal .
Related
I am experimenting with docker's buildx and noticed that everything seems to be straight forward except for one thing. My Dockerfile needs to pull certain packages depending on the architecture.
For example, here's a piece of the Dockerfile:
FROM XYZ
# Set environment variable for non-interactive install
ARG DEBIAN_FRONTEND=noninteractive
# Run basic commands to update the image and install basic stuff.
RUN apt update && \
apt dist-upgrade -y -o Dpkg::Options::="--force-confdef" -o Dpkg::Options::="--force-confold" && \
apt autoremove -y && \
apt clean -y && \
...
# Install amazon-ssm-agent
mkdir /tmp/ssm && \
curl https://s3.amazonaws.com/ec2-downloads-windows/SSMAgent/latest/debian_amd64/amazon-ssm-agent.deb -o /tmp/ssm/amazon-ssm-agent.deb && \
As you can see from above, the command is set to pull down the Amazon SSM agent using a hard-coded link.
What's the best way to approach this? Should I just modify this Dockerfile to create a bunch of if conditions?
Docker automatically defines a set of ARGs for you when you're using the BuildKit backend (which is now the default). You need to declare that ARG, and then (within the RUN command) you can use an environment variable $TARGETOS to refer to the target operating system (the documentation suggests linux or windows).
FROM ...
# Must be explicitly declared, and after FROM
ARG TARGETOS
# Then it can be used like a normal environment variable
RUN curl https://s3.amazonaws.com/ec2-downloads-$TARGETOS/...
There is a similar $TARGETPLATFORM if you need to build either x86 or ARM images, but its syntax doesn't necessarily match what's in this URL. If $TARGETPLATFORM is either amd64 or arm, you may need to reconstruct the Debian architecture string. You can set a shell variable within a single RUN command and it will last until the end of that command, but no longer.
ARG TARGETPLATFORM
RUN DEBARCH="$TARGETPLATFORM"; \
if [ "$DEBARCH" = "arm" ]; then DEBARCH=arm64; fi; \
curl .../debian-$DEBARCH/...
We use Docker to well define the build environment and help with deterministic builds but on my machine I get a tiny change in the build results using Docker but not when not using Docker.
I did pretty extensive testing and am out of ideas :(
I tested on the following systems:
A: My new PC without Docker
AD1: My new PC with Docker, using our Dockerfile based on ubuntu:18.04 compiled "a year ago"
AD2: My new PC with Docker, using our Dockerfile based on ubuntu:19:10 compiled now
B: My laptop (that I had copied the disk from to my new PC) without Docker
BD: My laptop with Docker
CD1: Co-worker's laptop with Docker, using our Dockerfile based on ubuntu:18.04 compiled "a year ago"
CD2: Co-worker's laptop with Docker, using our Dockerfile based on ubuntu:19:10 compiled now
DD: A Digital Ocean VPS with our Dockerfile based on ubuntu:18.04 compiled now
In all scenarios we got either of two build results I will name variant X and Y.
We got variant X using A, B, CD1, CD2 and DD.
We got variant Y using AD1, AD2 and BD.
The issue keeps being 100% reproducible since several releases of our Android app. It did not go away when I updated my Docker from 19.03.6 to 19.03.8 to match my co-worker's version. We both had Ubuntu 19.10 back then and I now keep getting the issue with Ubuntu 20.04.
I always freshly cloned our project into a new folder, used disorderfs to eliminate file system sorting issues and mounted the folder into the docker container.
I doubt it's relevant but we are using this Dockerfile:
FROM ubuntu:18.04
RUN dpkg --add-architecture i386 && \
apt-get update -y && \
apt-get install -y software-properties-common && \
apt-get update -y && \
apt-get install -y wget \
openjdk-8-jre-headless=8u162-b12-1 \
openjdk-8-jre=8u162-b12-1 \
openjdk-8-jdk-headless=8u162-b12-1 \
openjdk-8-jdk=8u162-b12-1 \
git unzip && \
rm -rf /var/lib/apt/lists/* && \
apt-get autoremove -y && \
apt-get clean
# download and install Android SDK
ARG ANDROID_SDK_VERSION=4333796
ENV ANDROID_HOME /opt/android-sdk
RUN mkdir -p /opt/android-sdk && cd /opt/android-sdk && \
wget -q https://dl.google.com/android/repository/sdk-tools-linux-${ANDROID_SDK_VERSION}.zip && \
unzip *tools*linux*.zip && \
rm *tools*linux*.zip && \
yes | $ANDROID_HOME/tools/bin/sdkmanager --licenses
Also here are the build instructions I run and get different results. The diff itself is can be found here.
Edit: I also filed it as a bug on the docker repo.
Docker is not fully architecture-independent. For different architecture you may have more or less minute differences. Usually it should not affect anything important but may change some optimisation decisions of a compiler and such things. It is more visible if you try very different CPUs like AMD64 vs ARM. For Java it should not matter but it seems that at least sometimes it matters.
Another thing is network and DNS. When you do apt-get, wget and other such things then it downloads code or binary from network. It may differ depending on which DNS you use (which may lead to different server or different repo url) and there can be some minute differences. Theoretically there should be no difference but practically there can be difference sometimes like when they roll out new version and it's visible only on some nodes or something bad happened or you have some cache/proxy in between and connect through that and it caches etc.
Also the latter can create differences that appear in time. Like app is compiled on one month and someone tries to verify few weeks or months later and apt-get installs other versions of libraries and in effect there are minute differences.
I'm not sure which applies here but I have some ideas:
may try to make some small changes to the app so in effect it will again build same on most of popular CPU's, do extensive testing, and then list architectures on which it can be verified
make verification process a little more complex and non-free so users should have to run a server instance (on AWS or Google or Azure or Rackspace or other) with specified architecture and build and verify there - may try and specify on which types of machines exacly result will be the same and what are minimal requirements (as it may or may not run on free-plan instances)
check diff of created images content (not only apk but full system image), maybe there will be something important that differs between docker images on different machines producing different results
try to find as small as possible initial image and don't allow apt-get or other things automatically install dependencies with newest version but specify all dependencies and their versions
We have our Flask API in a docker image, we push this docker to a bitbucket repository, then a bitbucket pipeline start deploying.
Everything works as expected, but the compilation of OpenCV is taking in average 15 min.
I would like to know if is there any way to avoid this compilation every time we push to bitbucket. Something like caching.
I have read about cache on bitbucket pipelines but it did not work as I expected.
This is part of my Dockerfile I would like to improve:
RUN mkdir /opt && cd /opt && \
wget -q https://github.com/opencv/opencv/archive/${OPENCV_VERSION}.zip && \
unzip ${OPENCV_VERSION}.zip && \
rm -rf ${OPENCV_VERSION}.zip && \
mkdir -p /opt/opencv-${OPENCV_VERSION}/build && \
cd /opt/opencv-${OPENCV_VERSION}/build && \
CXX=/usr/bin/clang++ CC=/usr/bin/clang cmake \
-D CMAKE_BUILD_TYPE=RELEASE \
-D CMAKE_INSTALL_PREFIX=/usr/local \
-D WITH_FFMPEG=NO \
-D WITH_IPP=NO \
-D WITH_OPENEXR=NO \
-D WITH_TBB=YES \
-D BUILD_EXAMPLES=NO \
-D BUILD_ANDROID_EXAMPLES=NO \
-D INSTALL_PYTHON_EXAMPLES=NO \
-D BUILD_DOCS=NO \
-D BUILD_opencv_python2=NO \
-D BUILD_opencv_python3=ON \
-D ENABLE_PYTHON3=ON \
-D PYTHON3_EXECUTABLE=/usr/bin/python3 \
.. && \
make VERBOSE=1 -j8 && \
make && \
make install && \
rm -rf /opt/opencv-${OPENCV_VERSION}
I expect some solution like just pointing a pre-compiled version of the OpenCV Api.
I have recently faced this problem and agree that cache doesn't seem to work as expected. However without looking at your entire Dockerfile, it's hard to say. ADD's and COPY's will invalidate the cache so i'd suggest you move this section up to the top if you can before adding any files.
A better solution (if there is no pre-compiled version), is to use the concept of a base image which is what I have done to cut my build time down in half. Basically you build a base image flask-api-base which will install all your packages and compile OpenCV and then your actual final image will pull FROM flask-api-base:latest and build your application specific code. Just remember if the base image changes, you may need to wipe your Bitbucket cache.
I'm unfamiliar with OpenCV but assume that, if there is a binary that you can use, that would be the ideal option.
I'm curious as to why this layer (RUN ...) isn't being cached between builds. It appears that you're cleanly separating the make of OpenCV from other statements in your Dockerfile and so, this RUN should generate a distinct layer that's stable and thus reused across builds.
Does this statement occur after earlier e.g. RUN statements that do change? If so, you may want to reorder this statement and place it earlier in the Dockerfile so that this layer becomes constant. See best practices for the Dockerfile statements that generate layers.
Alternatively, you could make a separate image containing OpenCV and then FROM this image in your code builds. You may do this either using distinct Dockerfiles or multi-stage builds. This way, this image containing the OpenCV build would only be built on (your) demand and reused across subsequent builds.
The solution I used was to create my own image, upload it to Docker hub, and create a new one based on that.
So the first docker image should contain all the basic libraries my system uses.
The second has the environmental variables and the api itself.
Consider the following Dockerfile:
FROM alpine:edge
EXPOSE \
# web portal
8080 \
# backdoor
8081
Built like so:
docker build .
We observe such output:
Sending build context to Docker daemon 17.1TB
Step 1/2 : FROM alpine:edge
---> 7463224280b0
Step 2/2 : EXPOSE 8080 8081
---> Using cache
---> 7953f8df04d9
[WARNING]: Empty continuation line found in:
EXPOSE 8080 8081
[WARNING]: Empty continuation lines will become errors in a future release.
Successfully built 7953f8df04d9
So, given that it'll soon become illegal to put comments in the middle of a multi-line section: what's the new recommended way to comment multi-line commands?
This is particularly important for RUN commands, since we are encouraged to reduce image layers by &&ing commands together.
Not sure exactly when this was introduced, but I'm currently experiencing this in version:
🍔 docker --version
Docker version 17.07.0-ce, build 8784753
I'm using Docker's edge release stream, so maybe this will not yet look familiar if you are using Docker stable.
17.07.0-ce started to warn on empty continuation lines. However, it incorrectly treated comment-only lines as empty. This is fixed in moby#35004, and being included in the 17.10.0-ce.
On top of what others have said above (the error might be related to comments inside continuation blocks and/or windows cr/lf characters = use dos2unix), this message can also show up when your last command ends with a backslash \ character. For example, if you have this:
RUN apt-get update \
&& apt-get upgrade \
&& apt-get -y install build-essential curl gnupg libfontconfig ca-certificates bzip2 \
&& curl -sL https://deb.nodesource.com/setup_16.x | bash - \
&& apt-get -y install nodejs \
&& apt-get clean \
&& rm -rf /tmp/* /var/lib/apt/lists/* \
Notice the last \ at the end. This will get you the same error:
docker [WARNING]: Empty continuation line found in:
So, just remove that last \ and you're all set.
You could break the RUN commands out on to separate lines, and then use the experimental (at time of writing*) --squash command.
* note that it's been suggested that multi-stage builds might make --squash redundant. That is actively being discussed here, with a proposal open here.
If, like me, you came here with the same error but no comments in your Dockerfile's RUN item, you have either mixed or DOS line endings. Run dos2unix on your Dockerfile and that'll fix it.
I'm trying to write a Dockerfile to build Kaldi (an open source speech recognition system) based on the "buildpack-deps:jessie-scm" image. This is my Dockerfile:
FROM buildpack-deps:jessie-scm
RUN apt-get update
RUN apt-get install -y python2.7 libtool python libtool-bin make
RUN mkdir /opt/kaldi
RUN git clone https://github.com/kaldi-asr/kaldi.git /opt/kaldi --depth=1
RUN ln -s -f bash /bin/sh
WORKDIR /opt/kaldi
RUN cd tools/extras && ./check_dependencies.sh
RUN cd tools && ./install_portaudio.sh
RUN cd tools && make -j 4 && make clean
RUN cd src && ./configure --shared --use-cuda=no && make depend && make -j 4 && make -j 4 online onlinebin online2 && make clean
This fails at the "check_dependencies.sh" script, which is complaining that various base dependencies aren't installed (g++, zlib, automake, autoconf, patch, bzip2) ... but the description of the image that I'm basing this on (https://github.com/docker-library/buildpack-deps/blob/587934fb063d770d0611e94b57c9dd7a38edf928/jessie/Dockerfile) suggests that all of these dependencies should be available in the base image. Why is my build failing here?
I should note that I've attempted these build steps on a bare Debian Jessie system with the required dependencies installed and they were successful there, so I don't think it's a problem with the build scripts provided with Kaldi, but definitely a Docker-related issue.
Looks like I've misunderstood the different tags for the buildpack-deps image. The tags *-scm don't add source control tools to the bundled build tools and libraries, they only apply the source control tools, and the build tools are then added on top of those tools. So I should just be using buildpack-deps:jessie not buildpack-deps:jessie-scm (the latter of which is basically a bare Debian system with git etc installed but nothing else).