Concourse CI: leverage docker image cache - docker

I totally understand that Concourse is meant to be stateless, but nevertheless is there any way to re-use already pulled docker images?
In my case, I build ~10 docker images which have the same base image, but each time build is triggered Concourse pulls base image 10 times.
Is it possible to pull that image once and re-use it later (at least in scope of the same build) using standard docker resource?
Yeah, it should be possible to do that using custom image and code it in sh script, but I'm not in fond of inviting bicycles.
If standard docker resource does not allow that, is it possible to extend it somehow to enable such behaviour?
--cache-from is not helpful, as CI spends most of time pulling image, not building new layers.

Theory
First, some Concourse theory (at least as of v3.3.1):
People often talk about Concourse having a "cache", but misinterpret what that means. Every concourse worker has a set of volumes on disk which are left around, forming a volume cache. This volume cache contains volumes that have been populated by resource get and put and task outputs.
People also often misunderstand how the docker-image-resource uses Docker. There is no global docker server running with your Concourse installation, in fact Concourse containers are not Docker containers, they are runC containers. Every docker-image-resource process (check, get, put) is run inside of its own runC container, inside of which there is a local docker server running. This means that there's no global docker server that is pulling docker images and caching the layers for further use.
What this implies is that when we talk about caching with the docker-image-resource, it means loading or pre-pulling images into the local docker server.
Practice
Now to the options for optimizing build times:
load_base
Background
The load_base param in your docker-image-resource put tells the resource to first docker load an image (retrieved via a get) into its local docker server, before building the image specified via your put params.
This is useful when you need to pre-populate an image into your "docker cache." In your case, you would want to preload the image used in the FROM directive. This is more efficient because it uses Concourse's own volume caching to only pull the "base" once, making it available to the docker server during the execution of the FROM command.
Usage
You can use load_base as follows:
Suppose you want to build a custom python image, and you have a git repository with a file ci/Dockerfile as follows:
FROM ubuntu
RUN apt-get update
RUN apt-get install -y python python-pip
If you wanted to automate building/pushing of this image while taking advantage of Concourse volume caching as well as Docker image layer caching:
resources:
- name: ubuntu
type: docker-image
source:
repository: ubuntu
- name: python-image
type: docker-image
source:
repository: mydocker/python
- name: repo
type: git
source:
uri: ...
jobs:
- name: build-image-from-base
plan:
- get: repo
- get: ubuntu
params: {save: true}
- put: python-image
params:
load_base: ubuntu
dockerfile: repo/ci/Dockerfile
cache & cache_tag
Background
The cache and cache_tag params in your docker-image-resource put tell the resource to first pull a particular image+tag from your remote source, before building the image specified via your put params.
This is useful when it's easier to pull down the image than it is to build it from scratch, e.g. you have a very long build process, such as expensive compilations
This DOES NOT utilize Concourse's volume caching, and utilizes Docker's --cache-from feature (which runs the risk of needing to first perform a docker pull) during every put.
Usage
You can use cache and cache_tag as follows:
Suppose you want to build a custom ruby image, where you compile ruby from source, and you have a git repository with a file ci/Dockerfile as follows:
FROM ubuntu
# Install Ruby
RUN mkdir /tmp/ruby;\
cd /tmp/ruby;\
curl ftp://ftp.ruby-lang.org/pub/ruby/2.0/ruby-2.0.0-p247.tar.gz | tar xz;\
cd ruby-2.0.0-p247;\
chmod +x configure;\
./configure --disable-install-rdoc;\
make;\
make install;\
gem install bundler --no-ri --no-rdoc
RUN gem install nokogiri
If you wanted to automate building/pushing of this image while taking advantage of only Docker image layer caching:
resources:
- name: compiled-ruby-image
type: docker-image
source:
repository: mydocker/ruby
tag: 2.0.0-compiled
- name: repo
type: git
source:
uri: ...
jobs:
- name: build-image-from-cache
plan:
- get: repo
- put: compiled-ruby-image
params:
dockerfile: repo/ci/Dockerfile
cache: mydocker/ruby
cache_tag: 2.0.0-compiled
Recommendation
If you want to increase efficiency of building docker images, my personal belief is that load_base should be used in most cases. Because it uses a resource get, it takes advantage of Concourse volume caching, and avoids needing to do extra docker pulls.

Related

How to setup Docker in Docker (DinD) on CloudBuild?

I am trying to run a script (unitest) that uses docker behind the scenes on a CI. The script works as expected on droneci but switching to CloudBuild it is not clear how to setup DinD.
For the droneci I basically use the DinD as shown here my question is, how do I translate the code to Google CloudBuild. Is it even possible?
I searched the internet for the syntax of CloudBuild wrt DinD and couldn't find something.
Cloud Build lets you create Docker container images from your source code. The Cloud SDK provides the container buildsubcommand for using this service easily.
For example, here is a simple command to build a Docker image:
gcloud builds submit -t gcr.io/my-project/my-image
This command sends the files in the current directory to Google Cloud Storage, then on one of the Cloud Build VMs, fetch the source code, run Docker build, and upload the image to Container Registry
By default, Cloud Build runs docker build command for building the image. You can also customize the build pipeline by having custom build steps.If you can use any arbitrary Docker image as the build step, and the source code is available, then you can run unit tests as a build step. By doing so, you always run the test with the same Docker image. There is a demonstration repository at cloudbuild-test-runner-example. This tutorial uses the demonstration repository as part of its instructions.
I would also recommend you to have a look at these informative links with similar use case:
Running Integration test on Google cloud build
Google cloud build pipeline
I managed to figure out a way to run Docker-in-Docker (DinD) in CloudBuild. To do that we need to launch a service in the background with docker-compose. Your docker-compose.yml script should look something like this.
version: '3'
services:
dind-service:
image: docker:<dnd-version>-dind
privileged: true
ports:
- "127.0.0.1:2375:2375"
- "127.0.0.1:2376:2376"
networks:
default:
external:
name: cloudbuild
In my case, I had no problem using versions 18.03 or 18.09, later versions should also work. Secondly, it is important to attach the container to the cloudbuild network. This way the dind container will be on the same network as every container spawned during your step.
To start the service you need to add a step to your cloudbuild.yml file.
- id: start-dind
name: docker/compose
args: ['-f', 'docker-compose.yml', 'up', '-d', 'dind-service']
To validate that the dind service works as expected, you can just create a ping step.
- id: 'Check service is listening'
name: gcr.io/cloud-builders/curl
args: ["dind-service:2375"]
waitFor: [start-dind]
Now if it works you can run your script as normal with dind in the background. What is important is to pass the DOCKER_HOST env variable so that the docker client can locate the docker engine.
- id: my-script
name: my-image
script: myscript
env:
- 'DOCKER_HOST=tcp://dind-service:2375'
Take note, any container spawned by your script will be located in dind-service, thus if you are to do any request to it you shouldn't do it to http://localhost but instead to the http://dind-service. Moreover, if you are to use private images you will require some type of authentication before running your script. For that, you should run gcloud auth configure-docker --quiet before running your script. Make sure your docker image has gcloud installed. This creates the required authentication credentials to run your app. The credentials are saved in path relevant to the $HOME variable, so make sure your app is able to access it. You might have some problems if you use tox for example.

Docker "artifact image" vs "services image" vs "single FROM image" vs "multiple FROM image"

I'm trying to understand the pros and cons of these four methods of packaging an application using Docker after development:
Use a very light-weight image (such as Alpine) as the base of the image containing the main artifact, then update the original docker compose file to use it along with other services when creating and deploying the final containers.
Something else I can do is, first docker commit, then use the result image as the base image of my artifact image.
One other method can be using a single FROM only, to base my image on one of the required services, and then use RUN commands to install the other required services as "Linux packages"(e.g. apt-get another-service) inside the container when it's run.
Should I use multiple FROMs for those images? Wouldn't it be complicated and only needed in more complex projects? Also it sounds vague to decide in what order those FROMs need to be written if none of them seems to be more important than the others as much as my application is concerned.
In the development phase, I used a "docker compose file" to run multiple docker containers. Then I used these containers and developed a web application (accessing files on the host machine using a bind volume). Now I want to write a Dockerfile to create an image that will contain my application's artifact, plus those services present in the initial docker compose file.
I'd suggest these rules of thumb:
A container only runs one program. If you need multiple programs (or services) run multiple containers.
An image contains the minimum necessary to run its application, and no more (and no less -- do not depend on bind mounts for the application to be functional).
I think these best match your first option. Your image is built FROM a language runtime, COPYs its code in, and does not include any other services. You can then use Compose or another orchestrator to run multiple containers in parallel.
Using Node as an example, a super-generic Dockerfile for almost any Node application could look like:
# Build the image FROM an appropriate language runtime
FROM node:16
# Install any OS-level packages, if necessary.
# RUN apt-get update \
# && DEBIAN_FRONTEND=noninteractive \
# apt-get install --no-install-recommends --assume-yes \
# alphabetical \
# order \
# packages
# Set (and create) the application directory.
WORKDIR /app
# Install the application's library dependencies.
COPY package.json package-lock.json .
RUN npm ci
# Install the rest of the application.
COPY . .
# RUN npm build
# Set metadata for when the application is run.
EXPOSE 3000
CMD npm run start
A matching Compose setup that includes a PostgreSQL database could look like:
version: '3.8'
services:
app:
build: .
ports: ['3000:3000']
environment:
PGHOST: db
db:
image: postgres:14
volumes:
- dbdata:/var/lib/postgresql/data
# environment: { ... }
volumes:
dbdata:
Do not try to (3) run multiple services in a container. This is complex to set up, it's harder to manage if one of the components fails, and it makes it difficult to scale the application under load (you can usually run multiple application containers against a single database).
Option (2) suggests doing setup interactively and then docker commit an image from it. You should almost never run docker commit, except maybe in an emergency when you haven't configured persistent storage on a running container; it's not part of your normal workflow at all. (Similarly, minimize use of docker exec and other interactive commands, since their work will be lost as soon as the container exits.) You mention docker save; that's only useful to move built images from one place to another in environments where you can't run a Docker registry.
Finally, option (4) discusses multi-stage builds. The most obvious use of these is to remove build tools from a final build; for example, in our Node example above, we could RUN npm run build, but then have a final stage, also FROM node, that NODE_ENV=production npm ci to skip the devDependencies from package.json, and COPY --from=build-stage the built application. This is also useful with compiled languages where a first stage contains the (very large) toolchain and the final stage only contains the compiled executable. This is largely orthogonal to the other parts of the question; you could update the Dockerfile I show above to use a multi-stage build without changing the Compose setup at all.
Do not bind-mount your application code into the container. This hides the work that the Dockerfile does, and it's possible the host filesystem will have a different layout from the image (possibly due to misconfiguration). It means you're "running in Docker", with the complexities that entails, but it's not actually the image you'll actually deploy. I'd recommend using a local development environment (try running docker-compose up -d db to get a database) and then using this Docker setup for final integration testing.

How to use helm charts without internet access

All of us know that helm charts are amazing and make our lives easier.
However, I got a use case when I would like to use helm charts - WITHOUT INTERNET ACCESS
And there are two steps:
Downloading chart from Git
Pulling Docker images from Dockerhub (spefified in values.yaml files)
How can I do this?
Using a helm chart offline involves pulling the chart from the internet then installing it.:
$ helm pull <chart name>
$ ls #The chart will be pulled as a tar to the local directory
$ helm install <whatever release name you want> <chart name>.tgz
For this method to work you'll need all the docker images the chart uses locally as you mentioned.
I know the answer to the first part of my question.
you can actually git clone https://github.com/kubernetes/charts.git all offical charts from github and specify path to a chart (folder) on your filesystem you want to install.
This will be in form of helmfile
execute the command like this:
helmfile -f deployment.yaml sync
cat deployment.yaml
...
repositories:
- name: roboll
url: http://roboll.io/charts
context: example.int.com # kube-context (--kube-context)
releases:
# Prometheus deployment
- name: my_prometheus # name of this release
namespace: monitoring # target namespace
chart: /opt/heml/charts/stable/prometheus # the chart being installed to create this release, referenced by `repository/chart` syntax
values: ["values/values_prometheus_ns_monitoring.yaml"]
set: # values (--set)
- name: rbac.create
value: true
# Grafana deployment
- name: my_grafana # name of this release
namespace: monitoring # target namespace
chart: /opt/heml/charts/stable/grafana
values: ["values/values_grafana_ns_monitoring.yaml"]
So as you can see I have specified some custom values_<software>_ns_monitoring.yaml files.
The second part of my original question is still unanswered.
I want to be able to tell docker to use a local docker image in this section
cat values_grafana_ns_monitoring.yaml
replicas: 1
image:
repository: grafana/grafana
tag: 5.0.4
pullPolicy: IfNotPresent
I have managed to manually copy/paste - then load docker image
so it is visible in my computer - but I can't figure out how to convince
docker + helmfile to use my image. The goal is to process totally offlilne installation.
ANY IDEAS ???
sudo docker images
[sudo] password for jantoth:
REPOSITORY TAG IMAGE ID CREATED SIZE
my_made_up_string/custom_grafana/custom_grafana 5.1.2 917f46a60761 6 days ago 238 MB
Pulling docker images from Dockerhub WITHOUT INTERNET ACCESS
Obviously, that is not possible. However, the problem can be addressed by splitting it into stages:
Pull the docker images from Dockerhub to a system that has internet access.
Save the docker images docker save, copy them to the destination environment where you want to do offline installation and load the images back docker load
Setup a docker registry in the destination environment and tag / push the docker images into this registry. Local docker registry
In the helm charts / kubernetes yaml files, update image references to point to the local docker registry Kubernetes and private docker registry
Alternatively, you can look at offline packaging / deployment tools like Gravity

Mixing local and remote Docker images repo?

I work on a Kubernetes cluster based CI-CD pipeline.
The pipeline runs like this:
An ECR machine has Docker.
Jenkins runs as a container.
"Builder image" with Java, Maven etc is built.
Then this builder image is run to build an app image(s)
Then the app is run in kubernetes AWS cluster (using Helm).
Then the builder image is run with params to run Maven-driven tests against the app.
Now part of these steps doesn't require the image to be pushed. E.g. the builder image can be cached or disposed at will - it would be rebuilt if needed.
So these images are named like mycompany/mvn-builder:latest.
This works fine when used directly through Docker.
When Kubernetes and Helm comes, it wants the images URI's, and try to fetch them from the remote repo. So using the "local" name mycompany/mvn-builder:latest doesn't work:
Error response from daemon: pull access denied for collab/collab-services-api-mvn-builder, repository does not exist or may require 'docker login'
Technically, I can name it <AWS-repo-ID>/mvn-builder and push it, but that breaks the possibility to run all this locally in minikube, because that's quite hard to keep authenticated against the silly AWS 12-hour token (remember it all runs in a cluster).
Is it possible to mix the remote repo and local cache? In other words, can I have Docker look at the remote repository and if it's not found or fails (see above), it would take the cached image?
So that if I use foo/bar:latest in a Kubernetes resource, it will try to fetch, find out that it can't, and would take the local foo/bar:latest?
I believe an initContainer would do that, provided it had access to /var/run/docker.sock (and your cluster allows such a thing) by conditionally pulling (or docker load-ing) the image, such that when the "main" container starts, the image will always be cached.
Approximately like this:
spec:
initContainers:
- name: prime-the-cache
image: docker:18-dind
command:
- sh
- -c
- |
if something_awesome; then
docker pull from/a/registry
else
docker load -i some/other/path
fi
volumeMounts:
- name: docker-sock
mountPath: /var/run/docker.lock
readOnly: true
containers:
- name: primary
image: a-local-image
volumes:
- name: docker-sock
hostPath:
path: /var/run/docker.sock

TravisCI/Docker: parameterized start of docker containers with matrix feature

I have a software that should be tested against a serious of WebDAV backends that are available as Docker containers. The lame approach is to start all containers within the before_install section like
before_install:
- docker run image1
- docker run image2
- ...
This does not make much sense and wastes system resource since I only need to have only on particular docker container running as part of test run.
My test configuration uses a matrix...it is possible to configure the docker image to be run using an environment variable as part of the matrix specs?
This boils down to two questions:
can I use environment variables inside steps of the before_install section
is the 'matrix' evaluated before the before_install section in order to make
use of environment variables defined inside the matrix
?
The answer to both of your questions is yes.
I have been able to build indepentant dockerfiles using the matrix configuration. A sample dockerfile might look like
sudo: required
services:
- docker
env:
- DOCKERFILE=dockerfile-1
- DOCKERFILE=dockerfile-2
before_install:
- docker build -f $DOCKERFILE .
In this case there would be two independant runs each building a separate image. You could also use a docker pull command if your images are on the docker hub.

Resources