Check if stage exists in Dockerfile - docker

I have a CI script that builds Dockerfiles. My plan is that unit tests should be run in a test stage in each Dockerfile, for example:
FROM alpine AS build
WORKDIR /app
COPY src .
...
FROM build AS test
RUN mvn clean test
FROM build AS package
COPY --from=build ...
So, for a given Dockerfile, I would like to check if it has a test stage and, if so, run docker build --target test .... If it doesn't have a test stage, I don't want to run docker build (which would fail).
How can I check if a Dockerfile contains a certain stage without actually building it?
I do realize this question has some XY problem vibes to it, so feel free to enlighten me. But I also think the question can be generally useful anyway.

I'm going to shy away from trying to parse the Dockerfile since there are a lot of ways to inject false positives or negatives. E.g.
RUN echo \
FROM base as test
or
FROM base \
as test
So instead, I'm going to favor letting docker do the hard work, and modifying the file to not fail on a missing test. This can be done by adding a test stage to a file even when it already as a test stage. Whether you want to put this at the beginning or end of the Dockerfile depends on whether you are running buildkit:
$ cat df.dup-target
FROM busybox as test
RUN exit 1
FROM busybox as test
RUN exit 0
$ DOCKER_BUILDKIT=0 docker build --target test -f df.dup-target .
Sending build context to Docker daemon 20.99kB
Step 1/2 : FROM busybox as test
---> be5888e67be6
Step 2/2 : RUN exit 1
---> Running in 9f96f42bc6d8
The command '/bin/sh -c exit 1' returned a non-zero code: 1
$ DOCKER_BUILDKIT=1 docker build --target test -f df.dup-target .
[+] Building 0.1s (6/6) FINISHED
=> [internal] load build definition from df.dup-target 0.0s
=> => transferring dockerfile: 114B 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 34B 0.0s
=> [internal] load metadata for docker.io/library/busybox:latest 0.0s
=> [test 1/2] FROM docker.io/library/busybox 0.0s
=> CACHED [test 2/2] RUN exit 0 0.0s
=> exporting to image 0.0s
=> => exporting layers 0.0s
=> => writing image sha256:8129063cb183c1c1aafaf3eef0c8671e86a54f795092fa7a918145c14da3ec3b 0.0s
Then you could append the always successful test at the beginning or end, passing that modified Dockerfile to stdin for the docker build to process:
$ cat df.simple
FROM busybox as build
RUN exit 0
$ cat - df.simple <<EOF | DOCKER_BUILDKIT=1 docker build --target test -f - .
FROM busybox as test
RUN exit 0
EOF
[+] Building 0.1s (6/6) FINISHED
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 109B 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 34B 0.0s
=> [internal] load metadata for docker.io/library/busybox:latest 0.0s
=> [test 1/2] FROM docker.io/library/busybox 0.0s
=> CACHED [test 2/2] RUN exit 0 0.0s
=> exporting to image 0.0s
=> => exporting layers 0.0s
=> => writing image sha256:8129063cb183c1c1aafaf3eef0c8671e86a54f795092fa7a918145c14da3ec3b 0.0s

This is a simple grep invocation:
egrep -i -q '^FROM .* AS test$' Dockerfile
You also might consider running your unit tests outside of Docker, before you start building containers. (Or, if your CI system supports running steps inside containers, use a container to get a language runtime, but not necessarily run the Dockerfile.) You'll still need a Docker-based setup to run larger integration tests, but you can run these on your built production-ready containers.

Related

Removed Docker image is reappearing again upon new build command

Scenario:
I made a working dockerfile, and I want to test them from scratch. However, the remove command only removes the image temporarily, meaning that running build command again will make them reappear as if it was never removed in a first place.
Example:
This is what my terminal looks like:
*Note: first two images are irrelevant to this question.
The ***_seis image is removed using docker rmi ***_seis command, and as a result, running docker images will show that ***_seis image was deleted.
However, when I run the following build command:
docker build -f dockerfile -t ***_seis:latest .
It will build successfully, but gives this result:
Even though it was removed seconds ago, build took less than a minute and the created date indicates that it was made 3 days ago.
Log:
This is what my build log looks like:
docker build -f dockerfile -t ***_seis:latest .
[+] Building 11.3s (14/14) FINISHED
=> [internal] load build definition from dockerfile 0.0s
=> => transferring dockerfile: 38B 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load metadata for docker.io/jupyter/base-notebook:latest 11.2s
=> [1/9] FROM docker.io/jupyter/base-notebook:latest#sha256:bc9ad73498f21ae716ba0e58d660063eae1677f6dd2bd5b669248fd0bf22dc79 0.0s
=> [internal] load build context 0.0s
=> => transferring context: 32B 0.0s
=> CACHED [2/9] RUN apt update && apt install --no-install-recommends -y software-properties-common git zip unzip wget v 0.0s
=> CACHED [3/9] RUN conda install -c conda-forge jupyter_contrib_nbextensions jupyter_nbextensions_configurator jupyter-resource-usage 0.0s
=> CACHED [4/9] RUN mkdir /home/jovyan/environment_ymls 0.0s
=> CACHED [5/9] COPY seis.yml /home/jovyan/environment_ymls/seis.yml 0.0s
=> CACHED [6/9] RUN conda env create -f /home/jovyan/environment_ymls/seis.yml 0.0s
=> CACHED [7/9] RUN python -m ipykernel install --name seis--display-name "seis" 0.0s
=> CACHED [8/9] WORKDIR /home/jovyan/***_seis 0.0s
=> CACHED [9/9] RUN chown -R jovyan:users /home/jovyan 0.0s
=> exporting to image 0.0s
=> => exporting layers 0.0s
=> => writing image sha256:16a8e90e47c0adc1c32f28e32ad17a8bc72795c3ca9fc39e792fa383793c3bdb 0.0s
=> => naming to docker.io/library/***_seis:latest
Troubleshooting: So far, I've tried different ways of removing them, such as
docker rmi <image_name>
docker image prune
and manually removing from docker desktop.
I made sure that all containers are deleted by using:
docker ps -a
Expected result: If successful, it should rebuild from scratch, takes longer than a minute to build, and creation date should reflect the time it was actually created.
Question:
I would like to know what is the issue here in terms of image not being deleted completely. Why does it recreate image from the past rather than just starting new build?
Thank you in advance for your help.
It's building from the cache. Since no inputs appear to have changed to the build engine, and it has the steps from the previous build, they are reused, including the image creation date.
You can delete the build cache. But I'd recommend instead to run:
docker build --pull --no-cache -f dockerfile -t ***_seis:latest .
The --pull option pulls a new base image should you have an old version pulled locally. And the --no-cache option skips the caching for various steps (in particular a RUN step that may fetch the latest external dependency).

Install pyspark + pytest in docker container

I'm trying to unit test my pyspark code using pytest but can't figure out the proper steps and method of installation. I was able to get this working locally on my Mac using this tutorial. I've tried 2 methods to accomplish this:
Try to replicate what I did on my Mac in the Dockerfile. i.e. install pypark, apache-spark, java 8, scala, pytest, and make sure I get the ENV paths correct.
Use an image from docker like bitnami.
I attempted (1) but could not find the right RUN command to install java properly.
For (2), is there any way in the Dockerfile for me to install bitnami separately from pytest since bitnami does not give root access?
Note:
Bitnami does not put py4j in the PYTHONPATH so I had to add this line to the docker file:
ENV PYTHONPATH="${SPARK_HOME}/python/lib/py4j-0.10.9.3-src.zip:${PYTHONPATH}"
How about building your image FROM bitnami:spark and adding pytest?
I created test_spark.py:
from pyspark.sql import SparkSession
def test1():
spark = SparkSession.builder.getOrCreate()
data = spark.sql("SELECT 1").collect()
assert data == [(1,)]
and a Dockerfile:
FROM bitnami/spark:latest
RUN pip install pytest py4j
COPY test_spark.py .
CMD python -m pytest test_spark.py
Now I can build and run my container and execute the pytests:
docker build . -t pytest_spark && docker run pytest_spark
[+] Building 0.1s (8/8) FINISHED
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 36B 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load metadata for docker.io/bitnami/spark:latest 0.0s
=> [internal] load build context 0.0s
=> => transferring context: 35B 0.0s
=> [1/3] FROM docker.io/bitnami/spark:latest 0.0s
=> CACHED [2/3] RUN pip install pytest py4j 0.0s
=> CACHED [3/3] COPY test_spark.py . 0.0s
=> exporting to image 0.0s
=> => exporting layers 0.0s
=> => writing image sha256:33b5f945afb750aecb0a8e1b2e811eb71b2bb2e67752e1b73a2c321bcc433841 0.0s
=> => naming to docker.io/library/pytest_spark 0.0s
Use 'docker scan' to run Snyk tests against images to find vulnerabilities and learn how to fix them
08:13:35.34
08:13:35.34 Welcome to the Bitnami spark container
08:13:35.35 Subscribe to project updates by watching https://github.com/bitnami/containers
08:13:35.35 Submit issues and feature requests at https://github.com/bitnami/containers/issues
08:13:35.35
============================= test session starts ==============================
platform linux -- Python 3.8.15, pytest-7.2.0, pluggy-1.0.0
rootdir: /opt/bitnami/spark
collected 1 item
test_spark.py . [100%]
============================== 1 passed in 10.11s ==============================

Echo SSH key into the container messes docker build output

I need to use SSH keys inside a container during build stage and I do that with
RUN echo "${SSH_KEY}" > /root/.ssh/id_rsa
Where SSH_KEY is build arg. The problem is, once this command is done, the output is messed up:
=> [internal] load build definition from Dockerfile 0.0s
[+] Building 733.0s (21/22)
=> [internal] load build definition from Dockerfile 0.0s
[+] Building 733.2s (21/22)
=> [internal] load build definition from Dockerfile 0.0s
[+] Building 733.3s (21/22)
=> [internal] load build definition from Dockerfile 0.0s
[+] Building 733.5s (21/22)
=> [internal] load build definition from Dockerfile 0.0s
[+] Building 733.6s (21/22)
=> [internal] load build definition from Dockerfile 0.0s
[+] Building 733.6s (22/22) FINISHED
Above is printed repeatedly until the build is done. Is there anything I can do about that?
Otherwise, the container building works fine.
As commenters suggested, using --mount=type=ssh flag for RUN git clone lines works a lot better.

Docker Go image: starting container process caused: exec: "app": executable file not found in $PATH: unknown

I have been reading a lot of similar issues on different languages, none of them are Go.
I just created a Dockerfile with the instructions I followed on official Docker hub page:
FROM golang:1.17.3
WORKDIR /go/src/app
COPY . .
RUN go get -d -v ./...
RUN go install -v ./...
CMD ["app"]
This is my folder structure:
users-service
|-> .gitignore
|-> Dockerfile
|-> go.mod
|-> main.go
|-> README.md
If anyone needs to see some code, this is how my main.go looks like:
package main
import "fmt"
func main() {
fmt.Println("Hello, World!")
}
I ran docker build -t users-service .:
$ docker build -t users-service .
[+] Building 5.5s (11/11) FINISHED
=> [internal] load build definition from Dockerfile 0.1s
=> => transferring dockerfile: 154B 0.1s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load metadata for docker.io/library/golang:1.17.3 3.3s
=> [auth] library/golang:pull token for registry-1.docker.io 0.0s
=> [1/5] FROM docker.io/library/golang:1.17.3#sha256:6556ce40115451e40d6afbc12658567906c9250b0fda250302dffbee9d529987 0.3s
=> [internal] load build context 0.1s
=> => transferring context: 2.05kB 0.0s
=> [2/5] WORKDIR /go/src/app 0.1s
=> [3/5] COPY . . 0.1s
=> [4/5] RUN go get -d -v ./... 0.6s
=> [5/5] RUN go install -v ./... 0.7s
=> exporting to image 0.2s
=> => exporting layers 0.1s
=> => writing image sha256:1f0e97ed123b079f80eb259dh3e34c90a48bf93e8f55629d05044fec8bfcaca6 0.0s
=> => naming to docker.io/library/users-service 0.0s
Use 'docker scan' to run Snyk tests against images to find vulnerabilities and learn how to fix them
Then I ran docker run users-service but I get that error:
$ docker run users-service
docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: exec: "app": executable file not found in $PATH: unknown.
I remember I had some troubles with GOPATH environment variable in Visual Studio Code on Windows, maybe it's related... Any sugguestions?
The official Docker documentation has useful instructions for building a Go image: https://docs.docker.com/language/golang/build-images/
In summary, you need to build your Go binary and you need to configure the CMD appropriately, e.g.:
FROM golang:1.17.3
WORKDIR /app
COPY main.go .
COPY go.mod ./
RUN go build -o /my-go-app
CMD ["/my-go-app"]
Build the container:
$ docker build -t users-service .
Run the docker container:
$ docker run --rm -it users-service
Hello, World!
Your "app" executable binary should be available in your $PATH to call globally without any path prefix. Otherwise, you have to supply your full path to your executable like CMD ["/my/app"]
Also, I recommend using an ENTRYPOINT instruction. ENTRYPOINT indicates the direct path to the executable, while CMD indicates arguments supplied to the ENTRYPOINT.
Using combined RUN instructions make your image layers minimal, your overall image size becomes little bit smaller compared to using multiple RUNs.

Using ARG in FROM in dockerfile

Problem statement: I need to pull docker(projectA and projectB) from two different urls based on the arg provided.
ARG url=docker-local.artifactory.com/projectA #By default its for A.
RUN echo ${url}
FROM $url
Ideal Solution:
docker build -t hello . should build docker of project A
docker build --build-arg url="docker-local.artifactory.com/projectB" -t hello . should build docker of project B.
Current Issue:
"base name ($url) should not be blank"
Using the docs for reference, if you want to use ARG before FROM, don't use anything in-between. See this section for details.
This minimal Dockerfile works:
ARG url=docker-local.artifactory.com/projectA
FROM $url
Built using this command with a build arg:
docker build -t from --build-arg url=alpine:3.9 .
[+] Building 0.1s (5/5) FINISHED
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 116B 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load metadata for docker.io/library/alpine:3.9 0.0s
=> CACHED [1/1] FROM docker.io/library/alpine:3.9 0.0s
=> exporting to image 0.0s
=> => exporting layers 0.0s
=> => writing image sha256:352159a49b502edb1c17a3ad142b320155bd541830000c02093b79f4058a3bd1 0.0s
=> => naming to docker.io/library/from
The docs also show an example if you want to re-use the ARG value after the first FROM command:
ARG url=docker-local.artifactory.com/projectA
FROM $url
ARG url
RUN echo $url
Using the following build file,
ARG VERSION=busybox:latest
FROM $VERSION
ARG VERSION
RUN echo $VERSION
Running with the default value
docker build -t test .
Sending build context to Docker daemon 16.38kB
Step 1/4 : ARG VERSION=busybox:latest
Step 2/4 : FROM $VERSION
latest: Pulling from library/busybox
Running with value changed during build
docker build -t test --build-arg VERSION="ubuntu:20.04" .
Sending build context to Docker daemon 16.38kB
Step 1/4 : ARG VERSION=busybox:latest
Step 2/4 : FROM $VERSION
20.04: Pulling from library/ubuntu

Resources