How can I build an image in Docker without downloading all dependencies every time? - docker

I have a Django app that uses Docker and has a bunch of library dependencies in the requirements.txt Any time I add a new dependency, I have to re-build the image and it downloads all of the dependencies from scratch. Is there a way to cache dependencies when building a docker image?

The most common solution is to create a new base image on top the one that already has all the dependencies. However, if you update all your dependencies very regularly, it might be easier to set a CI process where you build a new base image every so often (every week? every day?)
Multistage might not work in Docker because the dependencies are part of your base image, so then you do docker build . it will always want to pull all the dependencies when you do a pip3 install -r requirements.txt

Related

Build .deb package in one Docker build stage, install in another stage, but without the .deb package itself taking up space in the final image?

I have a multistage Docker build file. The first stage creates an installable package (e.g. a .deb file). The second stage needs only to install the .deb package.
Here's an example:
FROM debian:buster AS build_layer
COPY src/ /src/
WORKDIR /src
RUN ./build_deb.sh
# A lot of stuff happens here and a big huge .deb package pops out.
# Suppose the package is 300MB.
FROM debian:buster AS app_layer
COPY --from=build_layer /src/myapp.deb /
RUN dpkg -i /myapp.deb && rm /myapp.deb
ENTRYPOINT ["/usr/bin/myapp"]
# This image will be well over 600MB, since it contains both the
# installed package as well as the deleted copy of the .deb file.
The problem with this is that the COPY stage runs in its own layer, and drops the large .deb package into the final build context. Then, the next step installs the package and removes the .deb file. However, since the COPY stage has to execute independently, the .deb package still takes up room in the final image. If it were a small package you might just deal with it, but in my case the package file is hundreds of MB, so its presence in the final build layers does increase the image size appreciably with no benefit.
There are related posts on SO, such as this one which discusses files containing secrets, and this one which is for copying a large installer from outside the container into it (and the solution for that one is still kinda janky, requiring you to run a temporary local http server). However neither of these address the situation of needing to copy from another build stage but not retain the copied file in the final package.
The only way I could think of to do this would be to extend the idea of a web server and make available an SFTP or similar server so that the build layer can upload the package somewhere. But this also requires extra infrastructure, and now you're also dealing with SSH secrets and such, and this starts to get real complex and is a lot less reproducible on another developer's system or in a CI/CD environment.
Alternatively I could use the --squash option in BuildKit, but this ends up killing the advantages of the layer system. I can't then reuse similar layers across multiple images (e.g. the image now can't take advantage of the fact that the Debian base image might exist on the end user's system already). This would minimize space usage, but wouldn't be ideal for a lot of other reasons.
What's the recommended way to approach this?

Get only production dependencies from .yarn/cache to build Docker image

I would to build a Docker image using multi-stage.
We are using yarn 2 and Zero installs feature which stores dependencies in .yarn/cache under zip format.
To minimize the size of my Docker image, I would like to only have the production dependencies.
Previsously, we would do
yarn install --non-interactive --production=true
But by doing that with a former version of yarn, we don't benefit from the .yarn/cache folder and it takes time to download dependencies whereas there are already here but not readable by the former version of yarn.
Is there a way to tell yarn 2 to get only production dependencies from the .yarn/cache folder and put it into another one ? Thus I could copy this folder inside my image and save time and space.

How to configure docker specific image dependencies which are managed in a different source code repository

How to configure docker specific artifact dependencies which are managed in a different source code repository. My docker image depends on jar files (say project-auth), configuration (say project-theme) which is actually maintained in a different repository than the docker image.
What would be the best way to copy dependencies for a docker image (say project-deploy repo), prior to building the image. i.e in the above case project-deploy needs jar files and configuration which needs to be mounted as a volume from the current folder.
I don't want this to be committed as the dependencies tend to get stale and I want the docker image creation to be part of the build process itself.
You can use Docker multi-stage builds for this purpose.
With multi-stage builds, you use multiple FROM statements in your Dockerfile. Each FROM instruction can use a different base, and each of them begins a new stage of the build. You can selectively copy artifacts from one stage to another, leaving behind everything you don’t want in the final image.
For example:
Suppose that the source code for dependencies is present in repo - "https://github.com/demo/demo.git"
Using multi stage builds, you can create a stage in which you'll clone the git repo and create the dependency Jar (or anything else that you need) at runtime.
At last, you can copy the jar into your final image.
# Use any base image. I took centos here
FROM centos:7 as builder
# Install only those packages which are required.
RUN yum install -y maven git \
&& git clone <YOUR_GIT_REPO_URL>
WORKDIR /myfolder
# Create jar at run time. You can update this step according to your project requirements.
RUN mvn clean package
# From here our normal Dockerfile steps starts.
FROM centos:7
# Add all the necessary steps required to build your image
.
.
.
# This is how you can copy the jar which was created above (Step 4) in your final docker image.
COPY --from=builder SOURCE_PATH DESTINATION_PATH
Please refer this to get a better understanding about multi stage builds in docker.

Build multiple docker images without building binaries in each Dockerfile

I have a .NET Core solution with 6 runnable applications (APIs) and multiple netstandard projects. In a build pipeline on Azure DevOps I need to create 6 Docker images and push them to the Azure Registry.
Right now what I do is I build image by image and every one of these 6 Dockerfiles builds the solution from scratch (restores, builds, publishes). This takes a few minutes and the whole pipeline goes almost to 30 minutes.
My goal is to optimize the time of the build. I figured two possible, parallel, ways of doing that:
remove restore and build, run just publish (because it restores references and does the same thing as build)
publish the code once (for all runnable applications) and in Dockerfiles just copy binaries, without building again
Are both ways doable? I can't figure out how to make the second one work - should I just run dotnet publish for each runnable application and then gather all Dockerfiles within the folder with binaries and run docker build? My concern is - I will need to copy required .dll files to the image but how do I choose which ones, without explicitly specifying them?
EDIT:
I'm using Linux containers. I don't write my Dockerfiles - they are autogenerated by Visual Studio. I'll show you one example:
FROM mcr.microsoft.com/dotnet/core/aspnet:2.2-stretch-slim AS base
WORKDIR /app
EXPOSE 80
EXPOSE 443
FROM mcr.microsoft.com/dotnet/core/sdk:2.2-stretch AS build
WORKDIR /src
COPY ["Application.WebAPI/Application.WebAPI.csproj", "Application.WebAPI/"]
COPY ["Processing.Dependency/Processing.Dependency.csproj", "Processing.Dependency/"]
COPY ["Processing.QueryHandling/Processing.QueryHandling.csproj", "Processing.QueryHandling/"]
COPY ["Model.ViewModels/Model.ViewModels.csproj", "Model.ViewModels/"]
COPY ["Core.Infrastructure/Core.Infrastructure.csproj", "Core.Infrastructure/"]
COPY ["Model.Values/Model.Values.csproj", "Model.Values/"]
COPY ["Sql.Business/Sql.Business.csproj", "Sql.Business/"]
COPY ["Model.Events/Model.Events.csproj", "Model.Events/"]
COPY ["Model.Messages/Model.Messages.csproj", "Model.Messages/"]
COPY ["Model.Commands/Model.Commands.csproj", "Model.Commands/"]
COPY ["Sql.Common/Sql.Common.csproj", "Sql.Common/"]
COPY ["Model.Business/Model.Business.csproj", "Model.Business/"]
COPY ["Processing.MessageBus/Processing.MessageBus.csproj", "Processing.MessageBus/"]
COPY ["Processing.CommandHandling/Processing.CommandHandling.csproj", "Processing.CommandHandling/"]
COPY ["Processing.EventHandling/Processing.EventHandling.csproj", "Processing.EventHandling/"]
COPY ["Sql.System/Sql.System.csproj", "Sql.System/"]
COPY ["Application.Common/Application.Common.csproj", "Application.Common/"]
RUN dotnet restore "Application.WebAPI/Application.WebAPI.csproj"
COPY . .
WORKDIR "/src/Application.WebAPI"
RUN dotnet build "Application.WebAPI.csproj" -c Release -o /app
FROM build AS publish
RUN dotnet publish "Application.WebAPI.csproj" -c Release -o /app
FROM base AS final
WORKDIR /app
COPY --from=publish /app .
ENTRYPOINT ["dotnet", "Application.WebApi.dll"]
One more thing - The problem is that azure devops has this job which builds an image and I just copied this job 6 times, pointing every copy to other Dockerfile. That's why they don't reuse the code - I would love to change that so they base on the same binaries. Here are steps in Azure DevOps:
Get sources
Build and push image no. 1
Build and push image no. 2
Build and push image no. 3
Build and push image no. 4
Build and push image no. 5
Build and push image no. 6
Every single 'Build and push image' does:
dotnet restore
dotnet build
dotnet publish
I want to get rid of this overhead - is it possible?
It's hard to say without seeing your Dockerfiles, but you probably are making some mistakes that are adding time to the image build. For example, each command in a Dockerfile results in a layer. Docker caches these layers and only rebuilds the layer if it or previous layers have changed.
A very common mistake people make is to copy their entire project with all the files within first, and then run dotnet restore. When you do that, any change to any file invalidates that copy layer and thus also the dotnet restore layer, meaning that you have to restore packages every single build. The only thing necessary for the dotnet restore is the project file(s), so if you copy just those, run dotnet restore, and then copy all the files, those layers will be cached, unless the project file itself changes. Since that normally only happens when you change packages (add, update, remove, etc.), most of the time, you will not have to repeat the restore step, and the build will go much quicker.
Another issue can occur when you're using npm and Linux images on Windows. This one bit me personally. In order to support Linux images, Docker uses a Linux VM (MobyLinux). At the start of a build, Docker lifts the entire filesystem context (i.e. where you run the docker command) into the MobyLinux VM, first, as all the Dockerfile commands will be run actually in the VM, and thus the files will need to reside there. If you have a node_modules directory, it can take a significant amount of time to move all that over. You can solve this by adding node_modules to your .dockerignore file.
There's other similar types of mistakes you might be making. We'd really need to see your Dockerfiles to help you further. Regardless, you should not go with either of your proposed approaches. Just running publish will suffer from the same issues described above, and gives you no recourse to alleviate the problem at that point. Publishing outside of the image can lead to platform inconsistencies and other problems unless you're very careful. It also adds a bunch of manual steps to the image building process, which defeats a lot of the benefit Docker provides. Your images will be larger as well, unless you just happen to publish on exactly the same architecture as what the image will use. If you're developing on Windows, but using Linux images, for example, you'll have to include the full ASP.NET Core runtime. If you build and publish within the image, you can include the SDK only in a stage to build and publish, and then target something like alpine linux, with a self-contained architecture-specific publish.

Installing Git Release in Docker

If I want to install code from a release version in Github in Docker, how can I do that taking up the least space possible in the image? Currently, I've done something like:
RUN wget https://github.com/some/repo/archive/v1.5.1.tar.gz¬
RUN tar -xvzf v1.5.1.tar.gz¬
WORKDIR /unzipped-1.5.1/¬
RUN make; make install
Issue here is the final image will have the downloaded tar, the unzipped version, and everything that gets created during make. I don't need the vast majority of this. How do I install my library in my image without keeping all of this extra data?
This is the textbook definition of the problem that the docker multi-stage build aims to solve.
The idea is to use a separate build with the dependencies and use that docker image to build the final product.
Note that this is available only in the new versions of Docker (17.05 onwards).

Resources