We have Jenkins running within ECS. We are using pipelines for our build and deploy process. The pipeline uses the docker plugin to pull an image which has some dependencies for testing etc, all our steps then occur within this docker container.
The issue we currently have is that our NPM install takes about 8 minutes. We would like to speed this process up. As containers are being torn down at the end of each build then the node_modules that are generated are disposed of. I've considered NPM caching but due to the nature of docker this seemed irrelevant unless we pre-install the dependencies into the docker image (but this triples the size of the image almost). Are there simple solutions to this that will help our NPM install speeds?
You should be using package caching but not caching node_modules directly. Instead you mount cache directories that your package installer uses, and your install will be blazing fast. Docker does make that possible by allowing you to mount directories in a container, that persist across builds.
For yarn mount ~/.cache or ~/.cache/yarn
For npm mount ~/.npm
docker run -it -v ~/.npm:/.npm ~/.cache:/.cache /my-app:/my-app testing-image:1.0.0 bash -c 'npm ci && npm test`
Note: I'm using npm ci here, which will always delete node_modules and reinstall using exact versions in the package-lock.json, so you get very consistent builds. (In yarn, this is yarn install --frozen-lockfile)
You could set up a Http proxy and cache all dependencies (*)(**).
Then use --build-arg to set HTTP_PROXY variable:
docker build --build-arg HTTP_PROXY=http://<cache ip>:3128 .
*: This will not work improve performance on dependencies that need to be compiled (ie: c/c++ bindings)
**: I use a Squid container to share cache configuration
In my case it was a bunch of corporate software installed in my computer apparently some anti virus analyzing all the node_modules files from the container when I mounted the project folder on the host machine, what I did was avoid mounting node_modules locally. Immediately sped up from 25 min to 5.
I have explained what I did with a possible implementation here. I have not used the package-lock.json but the npm ls command to check for changes in the node_modules folder so that I could potentially skip the step of re-uploading the cached modules on the bind mount.
#bkucera 's answer points you in the right direction with the bind mount, in general the easiest option in a containerized environment is to create a volume storing the cached packages. These packages could be archived in a tarball, which is the most common option, or even compressed if necessary (files in a .tar are not compressed).
Related
I installed oyente using docker installation as described in the link
https://github.com/enzymefinance/oyente using the following command.
docker pull luongnguyen/oyente && docker run -i -t luongnguyen/oyente
I can analyse older smart contracts but I get compilation error when I try it on newer contracts. I need to update the version of solc but I couldn't.
On the container the current version is
solc, the solidity compiler commandline interface
Version: 0.4.21+commit.dfe3193c.Linux.g++ .
I read that the best way to update it is to use the command npm so I executed the following command but I am getting errors cause I assume npm version is not new also.
docker exec -i container_name bash -c "npm install -g solc"
I would appreciate, cause I am trying to sole this for hours now. Thanks in advance,
Ferda
Docker's standard model is that an image is immutable: it contains a fixed version of your application and its dependencies, and if you need to update any of this, you need to build a new image and start a new container.
The first part of this, then, looks like any other Node package update. Install Node in the unlikely event you don't have it on your host system. Run npm update --save solc to install the newer version and update your package.json and package-lock.json files. This is the same update you'd do if Docker weren't involved.
Then you can rebuild your Docker image with docker build. This is the same command you ran to initially build the image. Once you've created the new image, you can stop, delete, and recreate your container.
# If you don't already have Node, get it
# brew install nodejs
# Update the dependency
npm update --save solc
npm run test
# Rebuild the image
docker build -t image_name .
# Recreate the container
docker stop container_name
docker rm container_name
docker run -d --name container_name image_name
npm run integration
git add package*.json
git commit -m 'update solc version to 0.8.14'
Some common Docker/Node setups try to store the node_modules library tree in an anonymous volume. This can't be easily updated, and hides the node_modules tree that gets built from the image. If you have this setup (maybe in a Compose volumes: block) I'd recommend deleting any volumes or mounts that hide the image contents.
Note that this path doesn't use docker exec at all. Think of this like getting a debugger inside your running process: it's very useful when you need it, but anything you do there will be lost as soon as the process or container exits, and it shouldn't be part of your normal operational toolkit.
Use npm link for authoring multiple packages simultaneously in docker dev containers
PkgA is a dependency of PkgB, I'm making changes to both. Goal is to be able to link PkgA in PkgB without publishing each small update and re-installing. npm|yarn link solve this, but I'm developing in docker containers.
https://github.com/npm/npm/issues/14325
1. Create a directory on the host machine to serve as the global repo
(I like to make a docker dir and put all of my volumes in it)
mkdir -p ~/docker/volumes/yalc
2. Mount the volume in both (or more) dev containers
https://code.visualstudio.com/docs/remote/containers-advanced
devcontainer.json
...
"mounts": ["source=/Users/evan/docker/volumes/yalc,target=/yalc,type=bind,consistency=cached"],
...
and rebuild the container
3. Install yalc and publish the package (In dependency repo container)
https://www.npmjs.com/package/yalc
npm i yalc -g
yalc publish --store-folder /yalc
--store-folder tells yalc to publish the repo to our volume
4. Link to the package in consuming repo
consider adding yalc to .gitignore first:
.yalc
yalc.lock
Run the link command
npm i yalc -g
yalc link PkgA --store-folder /yalc
Where PkgA is the name of the package as defined in it's package.json
I am in the process of creating a Dockerfile that can build a haskell program. The Dockerfile uses ubuntu focal as a base image, installs ghcup, and then builds a haskell program. There are multiple reasons why I am doing this; it can support a low-configuration CI environment, and it can help new developers who are trying to build a complicated project.
In order to speed up build times, I am using docker v20 with buildkit. I have a sequence of events like this (it's quite a long file, but this excerpt is the relevant part):
# installs haskell
WORKDIR $HOME
RUN git clone https://github.com/haskell/ghcup-hs.git
WORKDIR ghcup-hs
RUN BOOTSTRAP_HASKELL_NONINTERACTIVE=NO ./bootstrap-haskell
#RUN source ~/.ghcup/env # Uh-oh: can't do this.
# We recreate the contents of ~/.ghcup/env
ENV PATH=$HOME/.cabal/bin:$HOME/.ghcup/bin:$PATH
# builds application
COPY application $HOME/application
WORKDIR $HOME/application
RUN mkdir -p logs
RUN --mount=type=cache,target=$HOME/.cabal \
--mount=type=cache,target=$HOME/.ghcup \
--mount=type=cache,target=$HOME/application/dist-newstyle \
cabal build |& tee logs/configure.log
But when I change some non-code files (README.md for example) in application, and build my docker image ...
DOCKER_BUILDKIT=1 docker build -t application/application:1.0 .
... it takes quite a bit of time and the output from cabal build includes a lot of Downloading [blah] followed by Building/Installing/Completed messages from cabal install.
However when I go into my container and type cabal build, it is much faster (it is already built):
host$ docker run -it application/application:1.0
container$ cabal build # this is fast
I would expect it to be just as fast in the prior case as well. Since I have not really changed the code files, and the dependencies are all downloaded, and since I am using RUN --mount.
Are there files somewhere that my --mount=type=cache entries are not covering? Is there a package registry file somewhere that I need to include in its own --mount=type=cache line? As far as I can tell, my builds ought to be nearly instant instead of taking several minutes to complete.
I have a docker container which I use to build software and generate shared libraries in. I would like to use those libraries in another docker container for actually running applications. To do this, I am using the build docker with a mounted volume to have those libraries on the host machine.
My docker file for the RUNTIME container looks like this:
FROM openjdk:8
RUN apt update
ENV LD_LIBRARY_PATH /build/dist/lib
RUN ldconfig
WORKDIR /build
and when I run with the following:
docker run -u $(id -u ${USER}):$(id -g ${USER}) -it -v $(realpath .):/build runtime_docker bash
I do not see any of the libraries from /build/dist/lib in the ldconfig -p cache.
What am I doing wrong?
You need to COPY the libraries into the image before you RUN ldconfig; volumes won't help you here.
Remember that first you run a docker build command. That runs all of the commands in the Dockerfile, without any volumes mounted. Then you take that image and docker run a container from it. Volume mounts only happen when the docker run happens, but the RUN ldconfig has already happened.
In your Dockerfile, you should COPY the files into the image. There's no particular reason to not use the "normal" system directories, since the image has an isolated filesystem.
FROM openjdk:8
# Copy shared-library dependencies in
COPY dist/lib/libsomething.so.1 /usr/lib
RUN ldconfig
# Copy the actual binary to run in and set it as the default container command
COPY dist/bin/something /usr/bin
CMD ["something"]
If your shared libraries are only available at container run-time, the conventional solution (as far as I can tell) would be to include the ldconfig command in a startup script, and use the dockerfile ENTRYPOINT directive to make your runtime container execute this script every time the container runs.
This should achieve your desired behaviour, and (I think) should avoid needing to generate a new container image every time you rebuild your code. This is slightly different from the common Docker use case of generating a new image for every build by running docker build at build-time, but I think it's a perfectly valid use case, and quite compatible with the way Docker works. Docker has historically been used as a CI/CD tool to streamline post-build workflows, but it is increasingly being used for other things, such as the build step itself. This naturally means people are coming up with slightly different ways of using Docker to facilitate various new and different types of workflow.
I'm executing the same docker-compose build command and I see that it misses the cache
Building service1
Step 1/31 : FROM node:10 as build-stage
---> a7366e5a78b2
Step 2/31 : WORKDIR /app
---> Using cache
---> 8a744e522376
Step 3/31 : COPY package.json yarn.lock ./
---> Using cache
---> 66c9bb64a364
Step 4/31 : RUN yarn install --production
---> Running in 707365c332e7
yarn install v1.21.1
..
..
..
As you can see the cache was missed, but I couldn't understand why
What is the best method to debug what changed and try to figure out why
EDIT: The question is not to debug my specific problem. But how can I generally debug a problem of this kind. How can I know WHY docker-compose thinks things changed (although I'm pretty sure NOTHING changed), which files/commands/results are different?
how can I generally debug a problem of this kind. How can I know WHY docker-compose thinks things changed (although I'm pretty sure NOTHING changed), which files/commands/results are different
In general, as shown here:
I'm a bit bummed that I can't seem to find any way to make the Docker build more verbose
But when it comes to docker-compose, it depends on your version and option used.
moby/moby issue 30081 explains (by Sebastiaan van Stijn (thaJeztah):
Current versions of docker-compose and docker build in many (or all) cases will not share the build cache, or at least not produce the same digest.
The reason for that is that when sending the build context with docker-compose, it will use a slightly different compression (docker-compose is written in Python, whereas the docker cli is written in Go).
There may be other differences due to them being a different implementation (and language).
(that was also discussed in docker/compose issue 883)
The next release of docker compose will have an (currently opt-in) feature to make it use the actual docker cli to perform the build (by setting a COMPOSE_DOCKER_CLI_BUILD=1 environment variable). This was implemented in docker/compose#6865 (1.25.0-rc3+, Oct. 2019)
With that feature, docker compose can also use BuildKit for building images (setting the DOCKER_BUILDKIT=1 environment variable).
I would highly recommend using buildkit for your builds if possible.
When using BuildKit (requires Docker 18.09 or up, and at this point is not supported for building Windows containers), you will see a huge improvement in build-speed, and the duration taken to send the build-context to the daemon in repeated builds (buildkit uses an interactive session to send only those files that are needed during build, instead of uploading the entire build context).
So double-check first if your docker-compose uses BuildKit, and if the issue (caching not reused) persists then:
COMPOSE_DOCKER_CLI_BUILD=1 DOCKER_BUILDKIT=1 docker-compose build
Sebastiaan added in issue 4012:
BuildKit is still opt-in (because there's no Windows support yet), but is production quality, and can be used as the default for building Linux images.
Finally, I realize that for the Azure pipelines, you (probably) don't have control over the versions of Docker and Docker Compose installed, but for your local machines, make sure to update to the latest 19.03 patch release; for example, Docker 19.03.3 and up have various improvements and fixes in the caching mechanisms of BuildKit (see, e.g., docker#373).
Note, in your particular case, even though this is not the main issue in your question, it would be interesting to know if the following helps:
yarnpkg/yarn/issue 749 suggests:
You wouldn't mount the Yarn cache directory. Instead, you should make sure you take advantage of Docker's image layer caching.
These are the commands I am using:
COPY package.json yarn.lock ./
RUN yarn --pure-lockfile
Then try your yarn install command, and see if docker still doesn't use its cache.
RUN yarn install --frozen-lockfile --production && yarn cache clean
Don't forget a yarn cache clean in order to prevent the yarn cache from winding up in docker layers.
If the issue persists, switch to buildkit directly (for testing), with a buildctl build --progress=plain to see a more verbose output, and debug the caching situation.
Typically, a multi-stage approach, as shown here, can be useful:
FROM node:alpine
WORKDIR /usr/src/app
COPY . /usr/src/app/
# We don't need to do this cache clean, I guess it wastes time / saves space: https://github.com/yarnpkg/rfcs/pull/53
RUN set -ex; \
yarn install --frozen-lockfile --production; \
yarn cache clean; \
yarn run build
FROM nginx:alpine
WORKDIR /usr/share/nginx/html
COPY --from=0 /usr/src/app/build/ /usr/share/nginx/html
As noted by nairum in the comments:
I just found that it is required to use cache_from to make caching working when I use my multi-stage Dockerfile with Docker Compose.
From the documentation:
cache_from defines a list of sources the Image builder SHOULD use for cache resolution.
Cache location syntax MUST follow the global format [NAME|type=TYPE[,KEY=VALUE]].
Simple NAME is actually a shortcut notation for type=registry,ref=NAME.
build:
context: .
cache_from:
- alpine:latest
- type=local,src=path/to/cache
- type=gha