How to indicate a private registry not using the Dockerfile? - docker

I have a Git repo with a simple Dockerfile. First row goes like this:
FROM python:3.7
My company has an internal registry with the base images. Because of this, the DevOps guys want me to change the Dockerfile to:
FROM registry.company.com:5000/python:3.7
I don't want this infrastructure detail baked in my code. URLs may change, I may want to build this image in another environment, etc. If possible, I would rather indicate the server in the pipeline, but the documentation regarding docker build has no parameter for this.
Is there a way to avoid editing the Dockerfile in this situation?

You would use a build arg for this:
ARG registry=docker.io/library
FROM ${registry}/python:3.7
Then for the build process:
docker build --build-arg registry=registry.company.com:5000 ...
Use docker.io for the registry name for the default Docker Hub, and library is the repository for official docker images, both of which you normally don't see when using the short format. Note that I usually include the library part in the local mirror so that official docker images and other repos that are mirrored can all use the same registry variable:
ARG registry=docker.io
FROM ${registry}/library/python:3.7
That means your local registry would need to have registry.company.com:5000/library/python:3.7.
To force users to specify the registry as part of the build, then don't provide a default value to the arg (or you could default the value of registry to something internal if that's preferred):
ARG registry
FROM ${registry}/python:3.7

You can work around the situation by manually pulling and re-tagging the image. docker build (and docker run) won't try to pull an image that already appears to be present locally, but that also means there's no verification that it actually matches what Docker Hub has. That means you can pull the image from your mirror, then docker tag it to look like a Docker Hub image:
docker pull registry.company.com:5000/python:3.7
docker tag registry.company.com:5000/python:3.7 python:3.7

Related

Dockerfile FROM command - Does it always download from Docker Hub?

I just started working with docker this week and came across a 'dockerfile'. I was reading up on what this file does, and the official documentation basically mentions that the FROM keyword is needed to build a "base image". These base images are pulled from Docker hub, or downloaded from there.
Silly question - Are base images always pulled from docker hub?
If so and if I understand correctly I am assuming that running the dockerfile to create an image is not done very often (only when needing to create an image) and once the image is created then the image is whats run all the time?
So the dockerfile then can be migrated to which ever enviroment and things can be set up all over again quickly?
Pardon the silly question I am just trying to understand the over all flow and how dockerfile fits into things.
If the local (on your host) Docker daemon (already) has a copy of the container image (i.e. it's been docker pull'd) specified by FROM in a Dockerfile then it's cached and won't be repulled.
Container images include a tag (be wary of ever using latest) and the image name e.g. foo combined with the tag (which defaults to latest if not specified) is the full name of the image that's checked i.e. if you have foo:v0.0.1 locally and FROM:v0.0.1 then the local copy is used but FROM foo:v0.0.2 will pull foo:v0.0.2.
There's an implicit docker.io prefix i.e. docker.io/foo:v0.0.1 that references the Docker registry that's being used.
You could repeatedly docker build container images on the machines where the container is run but this is inefficient and the more common mechanism is that, once a container image is built, it is pushed to a registry (e.g. DockerHub) and then pulled from there by whatever machines need it.
There are many container registries: DockerHub, Google Artifact Registry, Quay etc.
There are tools other than docker that can be used to interact with containers e.g. (Red Hat's) Podman.

Use ghcr in Dockerfile in GHA

I would like to use ghcr as cache to store docker image with part which almost do not change in my project (Ubuntu, miniconda and bunch of Python packages) and then use this image in Dockerfile which adds volumes and code of the project to it. Dockerfile is run by Github Actions. How could I reference to ghcr stored image in From statement of Dockerfile?
How could I reference to ghcr stored image in From statement of Dockerfile?
Image references have the registry in front of them, and when not included, will default to Docker Hub. So for a registry like ghcr you want:
FROM ghcr.io/path/to/image:tag

Modifying Docker Registry based on environment

My local development environment is behind a corporate proxy, with our own Docker registry, etc. However, we deploy on public infrastructure, meaning we can't access the corporate registries, and so have to pull from a public one (eg DockerHub).
Is there any way (eg via environment variables) for me to configure Docker to pull from a private registry when developing locally, and from a public registry when it goes through our CI/CD pipeline?
For example, let's say we're deploying a Node.JS application - locally, I would want the FROM node:16 line to get interpreted as FROM corporate.proxy/node:16.
There are a couple methods that would probably work - having two separate Dockerfiles, eg Dockerfile.dev and Dockerfile.prod, or wrapping it in some sort of script that will take care of making the change. I'm looking for a way to do it via Docker's configuration, if it's possible at all.
You can use ARG instruction to change the FROM line in Dockerfile.
ARG IMG
FROM ${IMG}
Then you can build image like this:
docker build --build-arg IMG=node:16 .
or
docker build --build-arg IMG=corporate.proxy/node:16 .
From Dockerfile reference document:
ARG is the only instruction that may precede FROM in the Dockerfile
https://docs.docker.com/engine/reference/builder/#understand-how-arg-and-from-interact

Creating Docker image hierarchy with private registry

TL;DR Must you put your private Docker URI in your Dockerfile FROM command if the parent image is in a private registry?
I thought this should be an easily answerable question, but cannot find a good set of Google keywords...
Detail:
I have three repos, all built with separate CI calls; the order of these CI executions is correct for this DAG (i.e. parent image will be available when a child needs it):
Repo 1 holds a Dockerfile that constructs a base image with dependencies. It is slow moving
Repo 2 & 3 hold applications that are built in a Dockerfile that pulls FROM the image built in Repo 1. These change frequently.
As I understand it, if you don't specify a repo URI in your FROM command, docker assumes you are pulling from DockerHub. Is this correct?
If these images are stored in a private registry, is it true that the private registry must be explicitly included in the child Dockerfiles? Is there another way?
Docker looks for the image name, on your local repository, if it does not find the image there, it would pull from the docker hub.
At first thought, it feels intuitive that we should be able to configure a private repository as the default, and then use it just as we would use docker hub.
However, this seems to be a topic of lengthy discussion, you can follow the discussion here.
Unfortunately, at the time of writing this, to build from a private repo, you will need to specify the complete URI in your Dockerfile:
FROM <aws_account_id>.dkr.ecr.<aws region>.amazonaws.com/<private image name>:latest
You will need to configure your Docker daemon to authenticate and pull from your private ECR repository:
aws ecr get-login-password --region region | docker login --username AWS --password-stdin aws_account_id.dkr.ecr.region.amazonaws.com
and then go for
docker build .
Alternatively, you can use arguments, to construct your ECR URI. This will keep things clean and parameterised in your Dockerfile while being explicit that you are using a private repo.
Eg:
In your Dockerfile
ARG PRIVATE_REPO
FROM ${PRIVATE_REPO}your_image_name
And build the docker image with:
docker build . --build-arg PRIVATE_REPO=aws_account_id.dkr.ecr.region.amazonaws.com/
Your basic assumptions are correct: if your base image is in a private registry, you generally must include the registry name in downstream Dockerfiles' FROM lines, docker run commands, Compose and Kubernetes image: lines, and so on. If the registry part of the image name is absent, it defaults to docker.io.
# Expands to docker.io/library/alpine:latest
FROM alpine
This only actually matters if the image isn't already present on the local machine, in which case Docker will automatically pull it. If it's already present, Docker doesn't worry about where it came from and just uses it as-is. If you have a manual multi-stage build, but don't actually push any of the intermediate stages anywhere, it doesn't actually matter if you tag it for any particular repository.
# "step 2" Dockerfile
# Build with: `docker build -t step2 -f step2.Dockerfile .`
# Starts from image in internal repository
FROM registry.example.com/img/step1
# Application Dockerfile
# Depends on "step 2" image being built first
# Since we expect step2:latest image to be present locally, won't pull
FROM step2

Pull docker images from a private repository during docker build?

Is there any way of pulling images from a private registry during a docker build instead of docker hub?
I deployed a private registry and I would like to be able to avoid naming its specific ip:port in the Dockerfile's FROM instruction. I was expecting a docker build option or a docker environment variable to change the default registry.
The image name should include the FQDN of the registry host.
So if you want to FROM <some private image> you must specifiy it as FROM registry_host:5000/foo/bar
In the future this won't be a requirement, but unfortunately for now it is.
I was facing the same issue in 2019. I solved this using arguments (ARG).
https://docs.docker.com/engine/reference/builder/#understand-how-arg-and-from-interact
Arguments allow you to set optional parameters (with defaults) that can be used in your FROM line.
Dockerfile-project-dev
ARG REPO_LOCATION=privaterepo.company.net/
ARG BASE_VERSION=latest
FROM ${REPO_LOCATION}project/base:${BASE_VERSION}
...
For my use-case I normally want to pull from the private repo, but if I'm working on the Dockerfiles I may want to be able to build from an image on my own machine, without having to modify the FROM line in my Dockerfile. To tell Docker to search my local machine for the image at build time I would do this:
docker build -t project/dev:latest -f ./Dockerfile-project-dev --build-arg REPO_LOCATION='' .
The docker folks generally want to ensure that if you run docker pull foo/bar you'll get the same thing (i.e., the foo/bar image from Docker Hub) regardless of your local environment.
This means that there are no options available to have Docker use anything else without an explicit hostname/port.

Resources