I'm using poetry in my project and now working on a feature that will allow to run the app inside of a docker container. Now, my Dockerfile looks like this:
COPY pyproject.toml /
...
RUN poetry install
The last command takes around 4 minutes which is quite a lot so I thought of caching somehow this dependencies. I'm trying to convert my pyproject.toml to requirements.txt so I could feed it to Docker and it would cache it if the file hasn't been changed since the last run.
Now I'm trying:
poetry export -f requirements.txt --output requirements.txt
And it only writes dependencies from [tool.poetry.dependencies] section, but the problem is that I have other sections and would like to see dependencies from those in my requirements.txt file. How should I modify the command above so it would take dependencies from other sections as well.
P.S. Maybe you might know other ways of how to cache poetry install in docker, I'd really appreciate that!
I think you can do 2-step dependencies install with poetry to cache dependensies like in example here - https://pythonspeed.com/articles/poetry-vs-docker-caching/, no need to migrate to requirements.txt. The idea is to copy only toml, install dependencies (this way dependencies will cache and need to update only if toml changes), then copy you source files (which change more often, than toml) and do install again. More detailed explanation in the link above (https://pythonspeed.com/articles/poetry-vs-docker-caching/)
Related
I have two local (custom) NPM packages that I've used before. When I install them with npm i parentFolder/package1 parentFolder/package2, they install just fine. However, when I add them to my package.json file, and then copy / npm install in my DockerFile, I get an error saying "Could not install from "x" as it does not contain a package.json file.". I honestly don't know what to make of this, since it does have a package.json file, and it installs just fine otherwise.
Is there a special step that I need to take for custom packages to work in docker, or am I just lost somehow? I've been staring at this error so long that my brain's a mess, so I probably forgot to add some details. If I forgot something you need to know in order to help, please let me know.
Docker-compose file:
volumes:
- ./sprinklers:/app
# - sprinklers_node_modules:/app/node_modules/
- ./sprinklers/node_modules_temp:/app/node_modules/
# - sprinklers_persistent:/app/.node-persist/
- ./sprinklers/.node-persist:/app/.node-persist/
As you can see, I was using named volumes, but tried going to mounted ones to see what happened.
DockerFile:
FROM node:14.16.0-slim
WORKDIR /app
RUN npm install
The COPY statements here were removed, since all files are mounted in the docker-compose file.
I have a multi-stage Dockerfile (uisng BuildKit) which contains an initial stage to go get several tools that I need to use as binaries in a later stage.
The following example is the gist of it:
# syntax = docker/dockerfile:1.0-experimental
# Go build stage
FROM golang:1.14-alpine3.12 AS gobuild
RUN apk add --no-cache git
RUN GO111MODULE=on go get -v github.com/tool1/tool1
RUN GO111MODULE=on go get -v github.com/tool2/tool2
RUN GO111MODULE=on go get -v github.com/tool3/tool3
# ...
# Release stage
FROM base AS release
# Copy Go binaries
COPY --from=gobuild /go/bin/tool1 /usr/local/bin/tool1
COPY --from=gobuild /go/bin/tool2 /usr/local/bin/tool2
COPY --from=gobuild /go/bin/tool3 /usr/local/bin/tool3
# ...
This works well. The only issue I have is that every docker build requires the Go Modules to be downloaded again, i.e. they aren't cached.
After some research, I read about go mod download which is meant to cache Go Modules locally according to go.mod.
The go.mod file is a good solution for me, in that it states exact module versions; so when using Docker, caching will be much simpler since layers can be re-used unless go.mod has changed.
I easily achieve this by running go mod init github.com/me/myproject and then subsequent go get calls add the relevant modules to go.mod.
But I am missing the final piece, something similar to go mod download but with the same "output" as go get, which saves the built binaries to GOBIN.
Just to clarify, I am using binaries of tools built with Go, but my project itself isn't a Go app, it only makes use of these tools.
As suggested in the comments, using Go 1.16 and go install package#version serves as the best solution in my case.
My Dockerfile now looks like this:
# syntax = docker/dockerfile:1.0-experimental
# Go build stage
FROM golang:1.16-alpine3.12 AS gobuild
RUN apk add --no-cache git
RUN go install github.com/tool1/tool1#latest
RUN go install github.com/tool2/tool2#latest
RUN go install github.com/tool3/tool3#v1.0.0
# ...
I see that bundle install and yarn install are usually done in Dockerfile as:
RUN bundle install && yarn install
Which means that if I modify Gemfile or yarn.lock, I need to re-build the image again. I know that there is layer caching and the docker build will not rebuild other layers except bundle install && yarn install layer. But it means I have to do docker-compose up -d --build
But I was wondering if it is ok to put these commands inside an entry script of docker-compose or in command as:
command: bundle install && yarn install && rails s
In this way, I believe, whenever I do docker-compose up -d, bundle install and yarn install will be executed without having to build the image.
Not sure if it has any advantages over conventional bundle install in Dockerfile except not having to append --build in docker-compose up. Correct that if I do this, bundle install and yarn install will get executed even when there are no changes to Gemfile or Yarn files. I guess this is one of the bad sides.
Please correct me if it is not the ideal way to go.
New to docker world.
It wastes several minutes of time and uses up network bandwidth every time you start your application. When you're doing local development, it'd be the equivalent of doing this, every time you run the application:
rm -rf vendor node_modules
bundle install # from scratch
yarn install # from scratch
bundle exec rails s
A core part of Docker is rebuilding images (in the same way that languages like Go, Java, Typescript, etc. have a "build" phase). Trying to avoid image rebuilds isn't usually advisable. With a well-written Dockerfile, and particularly for an interpreted language, running docker build should be fairly efficient.
The one important trick is to separately copy the files that specify dependencies, and the rest of your application. As soon as a Dockerfile COPY instruction encounters a file that's changed it will disable layer caching for the rest of the application. Since dependencies change relatively infrequently, a sequence that first copies the dependency file, then installs the dependencies, then copies the application can jump straight to the last step if the dependency file hasn't changed.
COPY Gemfile Gemfile.lock ./
RUN bundle install
COPY package.json yarn.lock ./
RUN yarn install
COPY . ./
(Make sure to include the Bundler vendor directory and the node_modules directory in a .dockerignore file so the last COPY step doesn't overwrite what previously got installed.)
This question is opinion based. As you already found out yourself, it is a common practice to install dependencies (bundle, yarn, others) during the image build process, and not image run process.
The rationale is that you run more times than you build, and you want the run operation to start quickly.
In the same way that you do apt install... or yum install... in the build stage, you should normally do bundle install in the build stage as well.
That said, if it makes sense to you to bundle install as a part of the entrypoint, that is your choice. I suspect that after you do it, you will see that it is less common for a reason.
Another note about docker layers: If the Gemfile change, not only the layer that refers to it will change, but all subsequent layers as well. For that reason, it is often common to separate the copy of the dependencies manifest (Gemfile.*) from the copying of the app, like this:
# Pre-install gems
COPY Gemfile* ./
RUN gem install bundler && \
bundle install --jobs=3 --retry=3
# Copy the rest of the app
COPY . .
So this way, if your app files change, but not the dependencies, the build will be faster.
Currently I am copying pre-downloaded packages and then installed on the docker image. The COPY layer currently has the same size as the directory being copied. Directory is later erased on another layer. Dockerfile looks as follows:
COPY python-packages /tmp/python-packages
RUN pip install -f /tmp/python-packages --no-index <pkg-name> \
&& rm -rf /tmp/*
Is there a way to copy files without having a layer the same size as the directory being copied? Any way to reduce COPY layer size?
Unfortunately as of this time you cannot reduce the size or eliminate the layer, RUN, COPY and ADD will create a layer every time.
What you can do is use pip to install directly from your version control
e.g. pip install git+https://git.example.com/MyProject#egg=MyProject
More info: https://pip.pypa.io/en/stable/reference/pip_install/#vcs-support
This brings the downside that you will have to give access to pip if your code is private and will introduce the need for network connectivity on your private network or the internet, depending on where your code is, on docker build time.
You could also use a multi stage build and install the python module with pip in another docker image and then just copy the artifacts to the final docker image. I highly do not recommend this though unless you have no choice and understand the risks, since you would have to copy all the folders and/or files pip touches in the install process and maybe create some others that it expects to be present and also get the permissions right in the final docker image, this will be hard to get right without deep diving in pip internals and also hard to maintain in the long run since pip might change its files and folders locations and/or structure in the future.
More on multi stage builds: https://docs.docker.com/develop/develop-images/multistage-build/
I am trying to setup docker-compose architecture for local development and production and I can't figure when in the containers life it's the best time to install library dependencies. In the same time I am not sure if these should be placed in the container or in external volume.
All my code is mounted in external volumes, so that changes are immidiately taken into without rebuilding the containers, but I am not sure about libraries that need to be installed by pip (I am running python backend) and npm/yarn (for webpack front-end).
Placing requirments.txt and package.json into the containers and running pip install and yarn install in the container build process means that I have to rebuild the container any time dependecies change - that is too much overhead.
Putting them in an external volume and running pip install and yarn install as part of the command of each container when it is started seems to solve the issue.
The build process of each container then contains only platform dependencies (eg. installing python, webpack or other platform tools), but libraries are installed after started (with CMD directive).
Is this the correct approach? I have seen lot of examples doing exactly the oposite and running npm install in the build process of the container - but I don't see any advantage for that, am I missing something?
Installing dependecies is usually part of the build process. Mounting code is a good trick when developing in order to get changes directly reflected.
Concerning adding requirements.txt or package.json. Installing dependecies takes time, and for that you need to take advantage of docker layer caching. In particular, you want to avoid cache invalidation.
For pip I suggest the following in development phase: For dependencies that you are unlikely to change, install these in separate RUN instuction. Your Docker file will look something like.
FROM ..
RUN pip install package1 package2 package3 ...
ADD requirements.txt requirements.txt
RUN RUN pip install -r requirements.txt
...
Keep only dependencies that might be changed in requirements.txt. Once you are done developing, add the packages back to the requirements.txt and build using the requirements file.
A similar approach would be adding two requirements files, and at the end combining them.