What is 'gitlab/gitlab-runner-helper' docker image used for?

What is 'gitlab/gitlab-runner-helper' docker image used for? - docker

My overall goal is to install a self-hosted gitlab-runner that is restricted to use prepared docker images from my own docker registry only.
For that I have a system.d configuration that looks like:
/etc/systemd/system/docker.service.d/allow-private-registry-only.conf
BLOCK_REGISTRY='--block-registry=all'
ADD_REGISTRY='--add-registry=my.private.registry:8080'
By this, docker pull is allowed to pull images from my.private.registry/ only.
After I had managed to get this working, I wanted to clean up my local registry and remove old docker images. It was during that process when I stumbled over a docker image named gitlab/gitlab-runner-helper which presumably is some component used by the gitlab-runner itself and presumably has been pulled from docker.io.
Now I'm wondering if it is even possible/advisable to block images from docker.io when using a gitlab-runner?
Any hints are appreciated!

I feel my sovereign duty to extend the accepted answer (it is great btw), because the word 'handle' basically tells us not so much, it is too abstract. Let me explain the whole flow in far more details:
When the build is about to begin, gitlab-runner creates a docker volume (you can observe it with docker volume ls if you want). This volume will server as a storage for caches and artifacts that you are using during the build.
The second thing - You will have at least 2 containers involved in each stage: gitlab-runner-helper, container and the container created from the image you specified (in .gitlab-ci.yml or in config.toml). What gitlab-runner-helper container does it, essentially, just cloning the remote git repository (that you are building) in the aforementioned docker volume along with caches and artifacts.
It can do it becuase within gitlab-runner-helper image itself are 2 important utilities: git (obviously - to clone the repo) and gitlab-runner-helper binary (this utility can pull and push artifacts, caches)
The gitlab-runner-helper container starts before each stage for a couple of seconds, to pull artifacts and caches, and then terminates. After that the container, created from image that you specified will be launched, ant it will also have this volume (from step 1) attached - this is how it receives artifacts btw.
The rest of the details about the registry from where gitlab-runner-helper get pulled are described by #Nicolas pretty well. I append this comment just for someone, who, perhaps, want to know what exactly means this sneaky 'handle' word.
Hope it helps, have a nice day, my friend!

gitlab-runner-helper image is used by GitLab Runner to handle Git, artifacts, and cache operations for docker, docker+machine or kubernetes executors.
As you prefer pulling an image from a private registry, you can override the helper image. Your configuration could be :
[[runners]]
(...)
executor = "docker"
[runners.docker]
(...)
helper_image = "my.private.registry:8080/gitlab/gitlab-runner-helper:tag"
Please ensure the image is present on your registry or your configuration enable proxying docker hub or registry.gitlab.com. For this last, you need to run at least Gitlab runner version 13.7 and having enabled FF_GITLAB_REGISTRY_HELPER_IMAGE feature flag.

Related

Docker local registry - Image naming [duplicate]

By default, if I issue command:
sudo docker pull ruby:2.2.1
it will pull from the docker.io offical site by default.
Pulling repository docker.io/library/ruby
How do I change it to my private registry. That means if I issue
sudo docker pull ruby:2.2.1
it will pull from my own private registry, the output is something like:
Pulling repository my_private.registry:port/library/ruby

UPDATE: Following your comment, it is not currently possible to change the default registry, see this issue for more info.
You should be able to do this, substituting the host and port to your own:
docker pull localhost:5000/registry-demo
If the server is remote/has auth you may need to log into the server with:
docker login https://<YOUR-DOMAIN>:8080
Then running:
docker pull <YOUR-DOMAIN>:8080/test-image

There is the use case of a mirror of Docker Hub (such as Artifactory or a custom one), which I haven't seen mentioned here. This is one of the most valid cases where changing the default registry is needed.
Luckily, Docker (at least version 19.03.3) allows you to set a mirror (tested in Docker CE). I don't know if this will work with additional images pushed to that mirror that aren't on Docker Hub, but I do know it will use the mirror instead. Docker documentation: https://docs.docker.com/registry/recipes/mirror/#configure-the-docker-daemon.
Essentially, you need to add "registry-mirrors": [] to the /etc/docker/daemon.json configuration file. So if you have a mirror hosted at https://my-docker-repo.my.company.com, your /etc/docker/daemon.json should contain:
{
"registry-mirrors": ["https://my-docker-repo-mirror.my.company.com"]
}
Afterwards, restart the Docker daemon. Now if you do a docker pull postgres:12, Docker should fetch the image from the mirror instead of directly from Docker Hub. This is much better than prepending all images with my-docker-repo.my.company.com

It turns out this is actually possible, but not using the genuine Docker CE or EE version.
You can either use Red Hat's fork of docker with the '--add-registry' flag or you can build docker from source yourself with registry/config.go modified to use your own hard-coded default registry namespace/index.

The short answer to this is you don't, or at least you really shouldn't.
Yes, there are some container runtimes that allow you to change the default namespace, specifically those from RedHat. However, RedHat now regrets this functionality and discourages customers from using it. Docker has also refused to support this.
The reason this is so problematic is because is results in an ambiguous namespace of images. The same command run on two different machines could pull different images depending on what registry they are configured to use. Since compose files, helm templates, and other ways of running containers are shared between machines, this actually introduces a security vulnerability.
An attacker could squat on well known image names in registries other than Docker Hub with the hopes that a user may change their default configuration and accidentally run their image instead of the one from Hub. It would be trivial to create a fork of a tool like Jenkins, push the image to other registries, but with some code that sends all the credentials loaded into Jenkins out to an attacker server. We've even seen this causing security vulnerability reports this year for other package managers like PyPI, NPM, and RubyGems.
Instead, the direction of container runtimes like containerd is to make all image names fully qualified, removing the Docker Hub automatic expansion (tooling on top of containerd like Docker still apply the default expansion, so I doubt this is going away any time soon, if ever).
Docker does allow you to define registry mirrors for Docker Hub that it will query first before querying Hub, however this assumes everything is still within the same namespace and the mirror is just a copy of upstream images, not a different namespace of images. The TL;DR on how to set that up is the following in the /etc/docker/daemon.json and then systemctl reload docker:
{
"registry-mirrors": ["https://<my-docker-mirror-host>"]
}
For most, this is a non-issue (this issue to me is the docker engine doesn't have an option to mirror non-Hub registries). The image name is defined in a configuration file, or a script, and so typing it once in that file is easy enough. And with tooling like compose files and Helm templates, the registry can be turned into a variable to allow organizations to explicitly pull images for their deploy from a configurable registry name.

if you are using the fedora distro, you can change the file
/etc/containers/registries.conf
Adding domain docker.io

Docker official position is explained in issue #11815 :
Issue 11815: Allow to specify default registries used in pull command
Resolution:
Like pointed out earlier (#11815), this would fragment the namespace, and hurt the community pretty badly, making dockerfiles no longer portable.
[the Maintainer] will close this for this reason.
Red Hat had a specific implementation that allowed it (see anwser, but it was refused by Docker upstream projet). It relied on --add-registry argument, which was set in /etc/containers/registries.conf on RHEL/CentOS 7.
EDIT:
Actually, Docker supports registry mirrors (also known as "Run a Registry as a pull-through cache").
https://docs.docker.com/registry/recipes/mirror/#configure-the-docker-daemon

It seems it won't be supported due to the fragmentation it would create within the community (i.e. two users would get different images pulling ubuntu:latest). You simply have to add the host in front of the image name. See this github issue to join the discussion.
(Note, this is not intended as an opinionated comment, just a very short summary of the discussion that can be followed in the mentioned github issue.)

I tried to add the following options in the /etc/docker/daemon.json.
(I used CentOS7)
"add-registry": ["192.168.100.100:5001"],
"block-registry": ["docker.io"],
after that, restarted docker daemon.
And it's working without docker.io.
I hope this someone will be helpful.

Earlier this could be achieved using DOCKER_OPTS in the /etc/default/docker config file which worked on Ubuntu 14:04 and had some issues on Ubuntu 15:04. Not sure if this has been fixed.
The below line needs to go into the file /etc/default/docker on the host which runs the docker daemon. The change points to the private registry is installed in your local network. Note: you would require to restart the docker service followed with this change.
DOCKER_OPTS="--insecure-registry <priv registry hostname/ip>:<port>"

I'm adding up to the original answer given by Guy which is still valid today (soon 2020).
Overriding the default docker registry, like you would do with maven, is actually not a good practice.
When using maven, you pull artifacts from Maven Central Repository through your local repository management system that will act as a proxy. These artifacts are plain, raw libs (jars) and it is quite unlikely that you will push jars with the same name.
On the other hand, docker images are fully operational, runnable, environments, and it makes total sens to pull an image from the Docker Hub, modify it and push this image in your local registry management system with the same name, because it is exactly what its name says it is, just in your enterprise context. In this case, the only distinction between the two images would precisely be its path!!
Therefore the need to set the following rule: the prefix of an image indicates its origin; by default if an image does not have a prefix, it is pulled from Docker Hub.

Didn't see the answer for MacOS, so want to add here:
2 Method as below:
Option 1 (Through Docker Desktop GUI):
Preference -> Docker Engine -> Edit file -> Apply and Restart
Option 2:
Directly edit the file ~/.docker/daemon.json

Haven't tried, but maybe hijacking the DNS resolution process by adding a line in /etc/hosts for hub.docker.com or something similar (docker.io?) could work?

How to improve automation of running container's base image updates?

I want all running containers on my server to always use the latest version of an official base image e.g. node:16.3 in order to get security updates. To achieve that I have implemented an image update mechanism for all container images in my registry using a CI workflow which has some limitations described below.
I have read the answers to this question but they either involve building or inspecting images on the target server which I would like to avoid.
I am wondering whether there might be an easier way to achieve the container image updates or to alleviate some of the caveats I have encountered.
Current Image Update Mechanism
I build my container images using the FROM directive with the minor version I want to use:
FROM node:16.13
COPY . .
This image is pushed to a registry as my-app:1.0.
To check for changes in the node:16.3 image compared to when I built the my-app:1.0 image I periodically compare the SHA256 digests of the layers of the node:16.3 with those of the first n=(number of layers of node:16.3) layers of my-app:1.0 as suggested in this answer. I retrieve the SHA256 digests with docker manifest inpect <image>:<tag> -v.
If they differ I rebuild my-app:1.0 and push it to my registry thus ensuring that my-app:1.0 always uses the latest node:16.3 base image.
I keep the running containers on my server up to date by periodically running docker pull my-app:1.0 on the server using a cron job.
Limitations
When I check for updates I need to download the manifests for all my container images and their base images. For images hosted on Docker Hub this unfortunately counts against the download rate limit.
Since I always update the same image my-app:1.0 it is hard to track which version is currently running on the server. This information is especially important when the update process breaks a service. I keep track of the updates by logging the output of the docker pull command from the cron job.
To be able to revert the container image on the server I have to keep previous versions of the my-app:1.0 images as well. I do that by pushing incremental patch version tags along with the my-app:1.0 tag to my registry e.g. my-app:1.0.1, my-app:1.0.2, ...
Because of the way the layers of the base image and the app image are compared it is not possible to detect a change in the base image where only the uppermost layers have been removed. However I do not expect this to happen very frequently.
Thank you for your help!

There are a couple of things I'd do to simplify this.
docker pull already does essentially the sequence you describe, of downloading the image's manifest and then downloading layers you don't already have. If you docker build a new image with an identical base image, an identical Dockerfile, and identical COPY source files, then it won't actually produce a new image, just put a new name on the existing image ID. So it's possible to unconditionally docker build --pull images on a schedule, and it won't really use additional space. (It could cause more redeploys if neither the base image nor the application changes.)
[...] this unfortunately counts against the download rate limit.
There's not a lot you can do about that beyond running your own mirror of Docker Hub or ensuring your CI system has a Docker Hub login.
Since I always update the same image my-app:1.0 it is hard to track which version is currently running on the server. [...] To be able to revert the container image on the server [...]
I'd recommend always using a unique image tag per build. A sequential build ID as you have now works, date stamps or source-control commit IDs are usually easy to come up with as well. When you go to deploy, always use the full image tag, not the abbreviated one.
docker pull registry.example.com/my-app:1.0.5
docker stop my-app
docker rm my-app
docker run -d ... registry.example.com/my-app:1.0.5
docker rmi registry.example.com/my-app:1.0.4
Now you're absolutely sure which build your server is running, and it's easy to revert should you need to.
(If you're using Kubernetes as your deployment environment, this is especially important. Changing the text value of a Deployment object's image: field triggers Kubernetes's rolling-update mechanism. That approach is much easier than trying to ensure that every node has the same version of a shared tag.)

How to pull image using HELM without URL [duplicate]

By default, if I issue command:
sudo docker pull ruby:2.2.1
it will pull from the docker.io offical site by default.
Pulling repository docker.io/library/ruby
How do I change it to my private registry. That means if I issue
sudo docker pull ruby:2.2.1
it will pull from my own private registry, the output is something like:
Pulling repository my_private.registry:port/library/ruby

UPDATE: Following your comment, it is not currently possible to change the default registry, see this issue for more info.
You should be able to do this, substituting the host and port to your own:
docker pull localhost:5000/registry-demo
If the server is remote/has auth you may need to log into the server with:
docker login https://<YOUR-DOMAIN>:8080
Then running:
docker pull <YOUR-DOMAIN>:8080/test-image

There is the use case of a mirror of Docker Hub (such as Artifactory or a custom one), which I haven't seen mentioned here. This is one of the most valid cases where changing the default registry is needed.
Luckily, Docker (at least version 19.03.3) allows you to set a mirror (tested in Docker CE). I don't know if this will work with additional images pushed to that mirror that aren't on Docker Hub, but I do know it will use the mirror instead. Docker documentation: https://docs.docker.com/registry/recipes/mirror/#configure-the-docker-daemon.
Essentially, you need to add "registry-mirrors": [] to the /etc/docker/daemon.json configuration file. So if you have a mirror hosted at https://my-docker-repo.my.company.com, your /etc/docker/daemon.json should contain:
{
"registry-mirrors": ["https://my-docker-repo-mirror.my.company.com"]
}
Afterwards, restart the Docker daemon. Now if you do a docker pull postgres:12, Docker should fetch the image from the mirror instead of directly from Docker Hub. This is much better than prepending all images with my-docker-repo.my.company.com

It turns out this is actually possible, but not using the genuine Docker CE or EE version.
You can either use Red Hat's fork of docker with the '--add-registry' flag or you can build docker from source yourself with registry/config.go modified to use your own hard-coded default registry namespace/index.

The short answer to this is you don't, or at least you really shouldn't.
Yes, there are some container runtimes that allow you to change the default namespace, specifically those from RedHat. However, RedHat now regrets this functionality and discourages customers from using it. Docker has also refused to support this.
The reason this is so problematic is because is results in an ambiguous namespace of images. The same command run on two different machines could pull different images depending on what registry they are configured to use. Since compose files, helm templates, and other ways of running containers are shared between machines, this actually introduces a security vulnerability.
An attacker could squat on well known image names in registries other than Docker Hub with the hopes that a user may change their default configuration and accidentally run their image instead of the one from Hub. It would be trivial to create a fork of a tool like Jenkins, push the image to other registries, but with some code that sends all the credentials loaded into Jenkins out to an attacker server. We've even seen this causing security vulnerability reports this year for other package managers like PyPI, NPM, and RubyGems.
Instead, the direction of container runtimes like containerd is to make all image names fully qualified, removing the Docker Hub automatic expansion (tooling on top of containerd like Docker still apply the default expansion, so I doubt this is going away any time soon, if ever).
Docker does allow you to define registry mirrors for Docker Hub that it will query first before querying Hub, however this assumes everything is still within the same namespace and the mirror is just a copy of upstream images, not a different namespace of images. The TL;DR on how to set that up is the following in the /etc/docker/daemon.json and then systemctl reload docker:
{
"registry-mirrors": ["https://<my-docker-mirror-host>"]
}
For most, this is a non-issue (this issue to me is the docker engine doesn't have an option to mirror non-Hub registries). The image name is defined in a configuration file, or a script, and so typing it once in that file is easy enough. And with tooling like compose files and Helm templates, the registry can be turned into a variable to allow organizations to explicitly pull images for their deploy from a configurable registry name.

if you are using the fedora distro, you can change the file
/etc/containers/registries.conf
Adding domain docker.io

Docker official position is explained in issue #11815 :
Issue 11815: Allow to specify default registries used in pull command
Resolution:
Like pointed out earlier (#11815), this would fragment the namespace, and hurt the community pretty badly, making dockerfiles no longer portable.
[the Maintainer] will close this for this reason.
Red Hat had a specific implementation that allowed it (see anwser, but it was refused by Docker upstream projet). It relied on --add-registry argument, which was set in /etc/containers/registries.conf on RHEL/CentOS 7.
EDIT:
Actually, Docker supports registry mirrors (also known as "Run a Registry as a pull-through cache").
https://docs.docker.com/registry/recipes/mirror/#configure-the-docker-daemon

It seems it won't be supported due to the fragmentation it would create within the community (i.e. two users would get different images pulling ubuntu:latest). You simply have to add the host in front of the image name. See this github issue to join the discussion.
(Note, this is not intended as an opinionated comment, just a very short summary of the discussion that can be followed in the mentioned github issue.)

I tried to add the following options in the /etc/docker/daemon.json.
(I used CentOS7)
"add-registry": ["192.168.100.100:5001"],
"block-registry": ["docker.io"],
after that, restarted docker daemon.
And it's working without docker.io.
I hope this someone will be helpful.

Earlier this could be achieved using DOCKER_OPTS in the /etc/default/docker config file which worked on Ubuntu 14:04 and had some issues on Ubuntu 15:04. Not sure if this has been fixed.
The below line needs to go into the file /etc/default/docker on the host which runs the docker daemon. The change points to the private registry is installed in your local network. Note: you would require to restart the docker service followed with this change.
DOCKER_OPTS="--insecure-registry <priv registry hostname/ip>:<port>"

I'm adding up to the original answer given by Guy which is still valid today (soon 2020).
Overriding the default docker registry, like you would do with maven, is actually not a good practice.
When using maven, you pull artifacts from Maven Central Repository through your local repository management system that will act as a proxy. These artifacts are plain, raw libs (jars) and it is quite unlikely that you will push jars with the same name.
On the other hand, docker images are fully operational, runnable, environments, and it makes total sens to pull an image from the Docker Hub, modify it and push this image in your local registry management system with the same name, because it is exactly what its name says it is, just in your enterprise context. In this case, the only distinction between the two images would precisely be its path!!
Therefore the need to set the following rule: the prefix of an image indicates its origin; by default if an image does not have a prefix, it is pulled from Docker Hub.

Didn't see the answer for MacOS, so want to add here:
2 Method as below:
Option 1 (Through Docker Desktop GUI):
Preference -> Docker Engine -> Edit file -> Apply and Restart
Option 2:
Directly edit the file ~/.docker/daemon.json

Haven't tried, but maybe hijacking the DNS resolution process by adding a line in /etc/hosts for hub.docker.com or something similar (docker.io?) could work?

Why Use a Private Docker Registry?

Why would someone use a private docker registry when they could just share their dockerfile's in source control and have docker image consumers build directly from the dockerfile with docker build?
To my untrained eye the private docker registry seems to serve the same purpose as source control except it adds complexity because it's been decoupled from the branch of code that you're in and so you (or your CI/CD server more to the point) has to reconstruct which tag to pull.

When you deploy on real machines (say you have a job that deploys on 10 machines), usually you don't really want to rebuild the image over and over again.
Instead you can build image once, store it somewhere (???) then maybe deploy on test environment, run some tests, make sure its a good image indeed (starts, tests pass, etc) and then deploy on production.
So this "somewhere" means that you should have some registry, and if you don't want to use docker hub - you can use private docker registry.
This makes sense from both security (don't publish on cloud if you don't need to)
and performance (moving images between servers in private network is faster).
If you're running kubernetes you can also configure it to pull the images from the docker registry and run pods based on these images as specified in the kubernetes deployment files.
If you're running on AWS you can use docker registry to run services like Fargate or ECS. They will take care of scaling out across machines (pretty much like k8s does) but they still need to take the images from somewhere, well, its a private registry (in this case in the cloud called ECS - Elasctic Container Registry).
Bottom line, in many (simple) cases you can live without it, but in some other cases it comes pretty handy

How can I save any changes of containers?

If I have one ubuntu container and I ssh to it and make one file after the container is destroyed or I reboot the container the new file was destroyed because the kubernetes load the ubuntu image that does not contain my changes.
My question is what should I do to save any changes?
I know it can be done because some cloud provider do that.
For example:
ssh ubuntu#POD_IP
mkdir new_file
ls
new_file
reboot
after reboot I have
ssh ubuntu#POD_IP
ls
ls shows nothing
But I want to it save my current state.
And I want to do it automatically.
If I use docker commit I can not control my images because it makes hundreds of images. because I should make images by every changes.
If I want to use storage I should mount /. but kubernetes does not allow me to mount /. and it gives me this error
Error: Error response from daemon: invalid volume specification: '/var/lib/kubelet/pods/26c39eeb-85d7-11e9-933c-7c8bca006fec/volumes/kubernetes.io~rbd/pvc-d66d9039-853d-11e9-8aa3-7c8bca006fec:/': invalid mount config for type "bind": invalid specification: destination can't be '/'

You can try to use docker commit but you will need to ensure that your Kubernetes cluster is picking up the latest image that you committed -
docker commit [OPTIONS] CONTAINER [REPOSITORY[:TAG]]
This is going to create a new image out of your container which you can feed to Kubernetes.
Ref - https://docs.docker.com/engine/reference/commandline/commit/
Update 1 -
In case you want to do it automatically, you might need to store the changed state or the files at a centralized file system like NFS etc & then mount it to all running containers whenever required with the relevant permissions.
K8s ref - https://kubernetes.io/docs/concepts/storage/persistent-volumes/

Docker and Kubernetes don't work this way. Never run docker commit. Usually you have very little need for an ssh daemon in a container/pod and you need to do special work to make both the sshd and the main process both run (and extra work to make the sshd actually be secure); your containers will be simpler and safer if you just remove these.
The usual process involves a technique known as immutable infrastructure. You never change code in an existing container; instead, you change a recipe to build a container, and tell the cluster manager that you want an update, and it will tear down and rebuild everything from scratch. To make changes in an application running in a Kubernetes pod, you typically:
Make and test your code change, locally, with no Docker or Kubernetes involved at all.
docker build a new image incorporating your code change. It should have a unique tag, often a date stamp or a source control commit ID.
(optional but recommended) docker run that image locally and run integration tests.
docker push the image to a registry.
Change the image tag in your Kubernetes deployment spec and kubectl apply (or helm upgrade) it.
Often you'll have an automated continuous integration system do steps 2-4, and a continuous deployment system do the last step; you just need to commit and push your tested change.
Note that when you docker run the image locally in step 3, you are running the exact same image your production Kubernetes system will run. Resist the temptation to mount your local source tree into it and try to do development there! If a test fails at this point, reduce it to the simplest failing case, write a unit test for it, and fix it in your local tree. Rebuilding an image shouldn't be especially expensive.
Your question hints at the unmodified ubuntu image. Beyond some very early "hello world" type experimentation, there's pretty much no reason to use this anywhere other than the FROM line of a Dockerfile. If you haven't yet, you should work through the official Docker tutorial on building and running custom images, which will be applicable to any clustering system. (Skip all of the later tutorials that cover Docker Swarm, if you've already settled on Kubernetes as an orchestrator.)

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart