If I want to install code from a release version in Github in Docker, how can I do that taking up the least space possible in the image? Currently, I've done something like:
RUN wget https://github.com/some/repo/archive/v1.5.1.tar.gz¬
RUN tar -xvzf v1.5.1.tar.gz¬
WORKDIR /unzipped-1.5.1/¬
RUN make; make install
Issue here is the final image will have the downloaded tar, the unzipped version, and everything that gets created during make. I don't need the vast majority of this. How do I install my library in my image without keeping all of this extra data?
This is the textbook definition of the problem that the docker multi-stage build aims to solve.
The idea is to use a separate build with the dependencies and use that docker image to build the final product.
Note that this is available only in the new versions of Docker (17.05 onwards).
Related
I have a multistage Docker build file. The first stage creates an installable package (e.g. a .deb file). The second stage needs only to install the .deb package.
Here's an example:
FROM debian:buster AS build_layer
COPY src/ /src/
WORKDIR /src
RUN ./build_deb.sh
# A lot of stuff happens here and a big huge .deb package pops out.
# Suppose the package is 300MB.
FROM debian:buster AS app_layer
COPY --from=build_layer /src/myapp.deb /
RUN dpkg -i /myapp.deb && rm /myapp.deb
ENTRYPOINT ["/usr/bin/myapp"]
# This image will be well over 600MB, since it contains both the
# installed package as well as the deleted copy of the .deb file.
The problem with this is that the COPY stage runs in its own layer, and drops the large .deb package into the final build context. Then, the next step installs the package and removes the .deb file. However, since the COPY stage has to execute independently, the .deb package still takes up room in the final image. If it were a small package you might just deal with it, but in my case the package file is hundreds of MB, so its presence in the final build layers does increase the image size appreciably with no benefit.
There are related posts on SO, such as this one which discusses files containing secrets, and this one which is for copying a large installer from outside the container into it (and the solution for that one is still kinda janky, requiring you to run a temporary local http server). However neither of these address the situation of needing to copy from another build stage but not retain the copied file in the final package.
The only way I could think of to do this would be to extend the idea of a web server and make available an SFTP or similar server so that the build layer can upload the package somewhere. But this also requires extra infrastructure, and now you're also dealing with SSH secrets and such, and this starts to get real complex and is a lot less reproducible on another developer's system or in a CI/CD environment.
Alternatively I could use the --squash option in BuildKit, but this ends up killing the advantages of the layer system. I can't then reuse similar layers across multiple images (e.g. the image now can't take advantage of the fact that the Debian base image might exist on the end user's system already). This would minimize space usage, but wouldn't be ideal for a lot of other reasons.
What's the recommended way to approach this?
I would to build a Docker image using multi-stage.
We are using yarn 2 and Zero installs feature which stores dependencies in .yarn/cache under zip format.
To minimize the size of my Docker image, I would like to only have the production dependencies.
Previsously, we would do
yarn install --non-interactive --production=true
But by doing that with a former version of yarn, we don't benefit from the .yarn/cache folder and it takes time to download dependencies whereas there are already here but not readable by the former version of yarn.
Is there a way to tell yarn 2 to get only production dependencies from the .yarn/cache folder and put it into another one ? Thus I could copy this folder inside my image and save time and space.
How to configure docker specific artifact dependencies which are managed in a different source code repository. My docker image depends on jar files (say project-auth), configuration (say project-theme) which is actually maintained in a different repository than the docker image.
What would be the best way to copy dependencies for a docker image (say project-deploy repo), prior to building the image. i.e in the above case project-deploy needs jar files and configuration which needs to be mounted as a volume from the current folder.
I don't want this to be committed as the dependencies tend to get stale and I want the docker image creation to be part of the build process itself.
You can use Docker multi-stage builds for this purpose.
With multi-stage builds, you use multiple FROM statements in your Dockerfile. Each FROM instruction can use a different base, and each of them begins a new stage of the build. You can selectively copy artifacts from one stage to another, leaving behind everything you don’t want in the final image.
For example:
Suppose that the source code for dependencies is present in repo - "https://github.com/demo/demo.git"
Using multi stage builds, you can create a stage in which you'll clone the git repo and create the dependency Jar (or anything else that you need) at runtime.
At last, you can copy the jar into your final image.
# Use any base image. I took centos here
FROM centos:7 as builder
# Install only those packages which are required.
RUN yum install -y maven git \
&& git clone <YOUR_GIT_REPO_URL>
WORKDIR /myfolder
# Create jar at run time. You can update this step according to your project requirements.
RUN mvn clean package
# From here our normal Dockerfile steps starts.
FROM centos:7
# Add all the necessary steps required to build your image
.
.
.
# This is how you can copy the jar which was created above (Step 4) in your final docker image.
COPY --from=builder SOURCE_PATH DESTINATION_PATH
Please refer this to get a better understanding about multi stage builds in docker.
I have a Django app that uses Docker and has a bunch of library dependencies in the requirements.txt Any time I add a new dependency, I have to re-build the image and it downloads all of the dependencies from scratch. Is there a way to cache dependencies when building a docker image?
The most common solution is to create a new base image on top the one that already has all the dependencies. However, if you update all your dependencies very regularly, it might be easier to set a CI process where you build a new base image every so often (every week? every day?)
Multistage might not work in Docker because the dependencies are part of your base image, so then you do docker build . it will always want to pull all the dependencies when you do a pip3 install -r requirements.txt
Is there a way to only download the dependencies but do not compile source.
I am asking because I am trying to build a Docker build environment for my bigger project.
The Idear is that during docker build I clone the project, download all dependencies and then delete the code.
Then use docker run -v to mount the frequently changing code into the docker container and start compiling the project.
Currently I just compile the code during build and then compile it again on run. The problem ist that when a dependencie changes I have to build from scratch and that takes a long time.
Run sbt's update command. Dependencies will be resolved and retrieved.