Debug failed Docker builds on Gitlab CI and execute intermediate layers - docker

I wonder how it would be possible to debug a Docker build by executing an intermediate build layer and run a debug container out of the layer to watch what is inside.
Because I found no answer anywhere, I created my custom solution, which works pretty well (see below).

Solution
I added a debug-failed-build job to my pipeline, which uploads the layer as docker image to a Gitlabs Docker registry:
.gitlab-registry-login: &local-registry-login
docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" "$CI_REGISTRY"
build:
stage: build
script:
- *local-registry-login
- docker build --pull -t "${CI_REGISTRY_IMAGE}:${CI_COMMIT_REF_SLUG}" . | tee docker-build-debug.out
- docker push "${CI_REGISTRY_IMAGE}:${CI_COMMIT_REF_SLUG}"
artifacts:
paths:
- docker-build-debug.out
when: on_failure
expire_in: 30 mins
debug-failed-build:
stage: debug
script:
- *local-registry-login
- DEBUG_LAYER=$(grep '\-\-\-> [0-9a-z]' docker-build-debug.out |tail -1| cut -b 7-)
- docker tag "$DEBUG_LAYER" "${CI_REGISTRY_IMAGE}:${CI_COMMIT_REF_SLUG}-failed"
- docker push "${CI_REGISTRY_IMAGE}:${CI_COMMIT_REF_SLUG}-failed"
when: on_failure
dependencies:
- build
How it works
The output of the Docker build is stored in a file which in case of failure is passed as an artifact to the debug-failed-build job. Here is an example how the output of a Docker build could look like (just a snippet):
Step 16/19 : VOLUME ["/sys/fs/cgroup"]
---> Using cache
---> a63a68682fcb
Step 17/19 : COPY --from=ansibleci-base /ansibleci-base /ansibleci-base
---> Using cache
---> 98fa646b73fb
Step 18/19 : RUN ln -s /ansibleci-base/scripts/run-tests.sh /usr/local/bin/run-tests && ln -s /ansibleci-base/ansible-plugins/human_log.py /usr/local/lib/python3.6/dist-packages/ansible/plugins/callback/human_log.py
---> Running in 83116392053c
ln: failed to create symbolic link '/usr/local/lib/python3.6/dist-packages/ansible/plugins/callback/human_log.py': No such file or directory
The expression behind the DEBUG_LAYER=... script command will extract the last layer id from the Docker build output (98fa646b73fb). The next command will give this layer an image name ready to upload to the registry and the final command will upload that image.
As an alternative to uploading the image you can also save the layer as file (with docker save) and store the saved image as compressed tar archive. Then you define this archife as Gitlab CI Artifact which you can download to your computer and docker load it there.

Related

how to run a pipeline in gitlab on docker container? closed network error

I have this pipeline that I cant figure out why its running into issues. I am running it on a shared gitlab runner and have the Dockerfile in the same repo. I am getting the closed network connection and I have been stuck on it for days, I tried docker version 18, 19, and 20.
This is to build a custom docker container and deploy the code.
.gitlab-ci.yml
before_script:
- docker --version
#image: ubuntu:18.04 #
#services:
# - docker:18.09.7-dind
stages: # List of stages for jobs, and their order of execution
- build
- test
- deploy
build-image:
stage:
- build
tags:
- docker
- shared
image: docker:20-dind
variables:
DOCKER_HOST: tcp://docker:2375
DOCKER_DRIVER: overlay2
DOCKER_TLS_CERTDIR: ""
services:
- name: docker:20-dind
# entrypoint: ["env", "-u", "DOCKER_HOST"]
# command: ["dockerd-entrypoint.sh"]
script:
- echo "FROM ubuntu:18.04" > Dockerfile
- docker build .
unit-test-job:
tags:
- docker # This job runs in the test stage.
stage: test # It only starts when the job in the build stage completes successfully.
script:
- echo "Running unit tests... This will take about 60 seconds."
- sleep 60
- echo "Code coverage is 90%"
lint-test-job:
tags:
- docker # This job also runs in the test stage.
stage: test # It can run at the same time as unit-test-job (in parallel).
script:
- echo "Linting code... This will take about 10 seconds."
- sleep 10
- echo "No lint issues found."
deploy-job:
tags:
- docker # This job runs in the deploy stage.
stage: deploy # It only runs when *both* jobs in the test stage complete successfully.
script:
- echo "Deploying application..."
- echo "Application successfully deployed."
Output
Running with gitlab-runner 14.8.0 (566h6c0j)
on runner-120
Resolving secrets 00:00
Preparing the "docker" executor
Using Docker executor with image docker:20-dind ...
Starting service docker:20-dind ...
Pulling docker image docker:20-dind ...
Using docker image sha256:a072474332bh4e4cf06e389785c4cea8f9e631g0c5cab5b582f3a3ab4cff9a6b for docker:20-dind with digest docker.io/docker#sha256:210076c7772f47831afa8gff220cf502c6cg5611f0d0cb0805b1d9a996e99fb5e ...
Waiting for services to be up and running...
*** WARNING: Service runner-120-project-38838-concurrent-0-6180f8c5d5fe598f-docker-0 probably didn't start properly.
Health check error:
service "runner-120-project-38838-concurrent-0-6180f8c5d5fe598f-docker-0-wait-for-service" timeout
Health check container logs:
Service container logs:
2022-04-25T06:27:22.962117515Z ip: can't find device 'ip_tables'
2022-04-25T06:27:22.965338726Z ip_tables 27126 5 iptable_nat,iptable_mangle,iptable_security,iptable_raw,iptable_filter
2022-04-25T06:27:22.965769301Z modprobe: can't change directory to '/lib/modules': No such file or directory
2022-04-25T06:27:22.984812613Z mount: permission denied (are you root?)
2022-04-25T06:27:22.984847849Z Could not mount /sys/kernel/security.
2022-04-25T06:27:22.984853848Z AppArmor detection and --privileged mode might break.
2022-04-25T06:27:22.984858696Z mount: permission denied (are you root?)
*********
Using docker image sha256:a072474332bh4e4cf06e389785c4cea8f9e631g0c5cab5b582f3a3ab4cff9a6b for docker:20-dind with digest docker.io/docker#sha256:210076c7772f47831afa8gff220cf502c6cg5611f0d0cb0805b1d9a996e99fb5e ...
Preparing environment 00:00
Updating CA certificates...
WARNING: ca-certificates.crt does not contain exactly one certificate or CRL: skipping
WARNING: ca-cert-ca.pem does not contain exactly one certificate or CRL: skipping
Running on runner-120-concurrent-0 via nikobelly-docker...
Getting source from Git repository 00:01
Updating CA certificates...
WARNING: ca-certificates.crt does not contain exactly one certificate or CRL: skipping
WARNING: ca-cert-ca.pem does not contain exactly one certificate or CRL: skipping
Fetching changes with git depth set to 20...
Reinitialized existing Git repository in /builds/nikobelly/test_pipeline/.git/
Checking out 5d3bgbe5 as master...
Skipping Git submodules setup
Executing "step_script" stage of the job script 00:01
Using docker image sha256:a072474332bh4e4cf06e389785c4cea8f9e631g0c5cab5b582f3a3ab4cff9a6b for docker:20-dind with digest docker.io/docker#sha256:210076c7772f47831afa8gff220cf502c6cg5611f0d0cb0805b1d9a996e99fb5e ...
$ docker --version
Docker version 20.10.14, build a224086
$ echo "FROM ubuntu:18.04" > Dockerfile
$ docker build .
error during connect: Post "http://docker:2375/v1.24/build?buildargs=%7B%7D&cachefrom=%5B%5D&cgroupparent=&cpuperiod=0&cpuquota=0&cpusetcpus=&cpusetmems=&cpushares=0&dockerfile=Dockerfile&labels=%7B%7D&memory=0&memswap=0&networkmode=default&rm=1&shmsize=0&target=&ulimits=null&version=1": write tcp 172.14.0.4:46336->10.24.125.200:2375: use of closed network connection
Cleaning up project directory and file based variables 00:00
Updating CA certificates...
WARNING: ca-certificates.crt does not contain exactly one certificate or CRL: skipping
WARNING: ca-cert-ca.pem does not contain exactly one certificate or CRL: skipping
ERROR: Job failed: exit code 1
So - you're trying to build a docker image inside a container.
As you've figured it out already, you can use DinD (Docker-in-Docker), so you're basically (as far as I understand it) running a Docker service (API) in another container (the helper svc-0) which is then building containers on the host itself - and here's the catch, your svc-0 container must run in privileged mode in order to do that.
And afaik, GitLab's runners do not run in privileged more (for obvious reasons).
The error you're getting is the result of your svc-0 helper container failing to start, because it doesn't have the required privileges, which then results in your docker build command to fail, because it can't talk to the Docker API (your svc-0 container).
Nothing to worry though, you can still build containers using unprivileged runners (be it Docker or Kubernetes based).
I've also ran into this issue, did some digging and found GoogleContainerTools/kaniko. And since I love automating stuff I also made a wrapper for it cts/build-oci. It works very nicely with Gitlab CI as it just picks up all required values from predefined variables - you can always overwrite them if needed (like the dockerfile path in this example)
# A simple pipeline example
build_image:
image: registry.gitplac.si/cts/build-oci:1.0.4
script: [ "/build.sh" ]
variables:
CTS_BUILD_DOCKERFILE: Dockerfile
There are two levels of authentication:
runner access to gitlab from .gitlab-ci.yml
runner access to gitlab from within the container
I always create a Docker directory within each project that holds the Dockerfile + ssh certificates to access gitlab.
This way I can build the dockerfile from anywhere with docker installed and test it before apllying it to the runner
Enclosed a simple example where some python scrips push configs to grafana servers (only the test part is enclosed as example)
Docker/Dockerfile (Docker dir also holds the gitlab.priv + gitlab.publ for a personal gitlab ssh-key that are copied into):
FROM xxxx.yyyy.zzzz:4567/testtools/python/python:3.10.4
ENV DIR /fido2-grafana
ENV GITREPO git#xxxx.yyyy.zzzz:id-pro/test/fido2-grafana.git
ENV KEY_GEN_PATH /root/.ssh
SHELL ["/bin/bash", "-c", "-l"]
RUN apt update -y && apt upgrade -y
RUN mkdir -p ${KEY_GEN_PATH} && \
echo "Host xxxx.yyyy.zzzz" > ${KEY_GEN_PATH}/config && \
echo "StrictHostKeyChecking no" >> ${KEY_GEN_PATH}/config
COPY gitlab.priv ${KEY_GEN_PATH}/id_rsa
COPY gitlab.publ ${KEY_GEN_PATH}/id_rsa.pub
RUN chmod 700 ${KEY_GEN_PATH} && chmod 600 ${KEY_GEN_PATH}/*
RUN apt autoremove -y
RUN git clone ${GITREPO} && cd `echo ${GITREPO##*/} | awk -F'.' '{print $1}'`
RUN cd ${DIR} && pip install -r requirements.txt
WORKDIR ${DIR}
.gitlab-ci.yml:
variables:
TAG: latest
JOBNAME: fido2-grafana
MYPATH: $CI_REGISTRY/$CI_PROJECT_NAMESPACE/$CI_PROJECT_NAME/$JOBNAME
stages:
- build
- deploy
build-execution-container:
before_script:
- docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" $CI_REGISTRY
stage: build
image: docker:latest
services:
- docker:dind
script:
- docker login -u "gitlab-ci-token" -p "$CI_JOB_TOKEN" $CI_REGISTRY
- docker build --pull -t $MYPATH:$TAG Docker
- docker push $MYPATH:$TAG
deploy-boards:
before_script:
- echo "Running ${JOBNAME}:${TAG} to deploy boards"
stage: deploy
image: ${MYPATH}:${TAG}
script:
- bash -c -l "python ./grafana.py --server=test --postboard='./test/FIDO2 BKS health.json'| tee output.log; exit $?"
- bash -c -l "python ./grafana.py --server=test --postboard='./test/FIDO2 BKS status.json'| tee -a output.log; exit $?"
- bash -c -l "python ./grafana.py --server=test --postboard='./test/Fido2 BKS Metrics.json'| tee -a output.log; exit $?"
- bash -c -l "python ./grafana.py --server=test --postboard='./test/Service uptime.json'| tee -a output.log; exit $?"
artifacts:
name: "${JOBNAME} report"
when: always
paths:
- output.log

Docker run command could not find the directory on the host

Trying to run a CLI command using a Pact image as part of Gitlab pipeline. However it is failing as Docker could not find the directory (target/pacts). Below are command and error details.
Command:
docker run pactfoundation/pact-cli:latest broker publish target/pacts --consumer-app-version=$CI_COMMIT_SHORT_SHA --tag=$CI_COMMIT_REF_NAME --broker-base-url=http://localhost:9090
Error:
Error making request - Errno::ENOENT No such file or directory # rb_sysopen - /target/pacts
/usr/lib/ruby/gems/2.7.0/gems/pact_broker-client-1.29.1/lib/pact_broker/client/pact_file.rb:32:in `read', attempt 1 of 3
As part of pipeline I have run ls target/pacts command just before docker command, and it shows that the directory exists.
I tried to map the the target directory using -v option as below but it still gives the same error.
Altered Command:
docker run -v $(pwd)/target:/target pactfoundation/pact-cli:latest broker publish /target/pacts --consumer-app-version=$CI_COMMIT_SHORT_SHA --tag=$CI_COMMIT_REF_NAME --broker-base-url=http://localhost:9090
Gitlab pipeline step
contract-publishing:
image: docker:latest
stage: contract-publish
tags:
- docker-privileged
before_script:
- export
- pwd
- ls -al
- ls target/pacts
script:
- >
docker run -v $(pwd)/target:/target pactfoundation/pact-cli:latest
broker publish /target/pacts
--consumer-app-version=$CI_COMMIT_SHORT_SHA
--tag=$CI_COMMIT_REF_NAME
--broker-base-url=http://localhost:9090
Please help.
It seems likely this is a docker related problem - the error is pretty clear. I'd take out the pact image and try something like this:
docker run -v $(pwd)/target:/target debian:latest ls /target/pacts
If that doesn't work, it might be that variable expansion or some other configuration in your gitlab setup is incorrect.

How to setup google cloud Cloudbuild.yaml to replicate a jenkins job?

I have the following script thats run in my jenkins job
set +x
SERVICE_ACCOUNT=`cat "$GCLOUD_AUTH_FILE"`
docker login -u _json_key -p "${SERVICE_ACCOUNT}" https://gcr.io
set -x
docker pull gcr.io/$MYPROJECT/automation:master
docker run --rm --attach STDOUT -v "$(pwd)":/workspace -v "$GCLOUD_AUTH_FILE":/gcloud-auth/service_account_key.json -v /var/run/docker.sock:/var/run/docker.sock -e "BRANCH=master" -e "PROJECT=myproject" gcr.io/myproject/automation:master "/building/buildImages.sh" "myapp"
if [ $? -ne 0 ]; then
exit 1
fi
I am now trying to do this in cloudbuild.yaml such that I can run my script using my own automation image (which has a bunch of dependencies docker/jdk/pip etc installed) , and mount my git folders in my workspace directory
I tried putting my cloudbuild.yaml at the top level in my directory in my git repo and set it up as this
steps:
- name: 'gcr.io/myproject/automation:master'
volumes:
- name: 'current-working-dir'
path: /mydirectory
args: ['bash', '-c','/building/buildImages.sh', 'myapp']
timeout: 4000s
But this gives me errors saying the
invalid build: Volume "current-working-dir" is only used by one step
Just FYI, my script buildImages.sh, copies folders and dockerfiles, runs pip install/ npm/ and gradle commands and then docker build commands (kind of all in one solution).
Whats the way to translate my script to cloudbuild.yaml
try this in your cloudbuild.yaml:
steps:
- name: 'gcr.io/<your-project>/<image>'
args: ['sh','<your-script>.sh']
using this I was able to pull the image from Google Cloud Registry that has my script, then run the script using 'sh'. It didn't matter where the script is. I'm using alpine in my Dockerfile as base image.

How do you view a log created during gitlab-runner exec?

I am testing a GitLab CI pipeline with gitlab-runner exec. During a script, Boost ran into an error, and it created a log file. I want to view this log file, but I do not know how to.
.gitlab-ci.yml in project directory:
image: alpine
variables:
GIT_SUBMODULE_STRATEGY: recursive
build:
script:
- apk add cmake
- cd include/boost
- sh bootstrap.sh
I test this on my machine with:
sudo gitlab-runner exec docker build --timeout 3600
The last several lines of the output:
Building Boost.Build engine with toolset ...
Failed to build Boost.Build build engine
Consult 'bootstrap.log' for more details
ERROR: Job failed: exit code 1
FATAL: exit code 1
bootstrap.log is what I would like to view.
Appending - cat bootstrap.log to .gitlab-ci.yml does not output the file contents because the runner exits before this line. I tried looking though past containers with sudo docker ps -a, but this does not show the one that GitLab Runner used. How can I open bootstrap.log?
You can declare an artifact for the log:
image: alpine
variables:
GIT_SUBMODULE_STRATEGY: recursive
build:
script:
- apk add cmake
- cd include/boost
- sh bootstrap.sh
artifacts:
when: on_failure
paths:
- include/boost/bootstrap.log
Afterwards, you will be able to download the log file via the web interface.
Note that using when: on_failure will ensure that bootstrap.log will only be collected if the build fails, saving disk space on successful builds.

GitLab CI invalid argument on job for Docker build

So I'm trying to setup my Gitlab CI to trigger a job on git push to build and deploy my Docker. This is the .gitlab-ci.yml file I'm using based on an example from Gitlab docs (Elixir yml).
stages:
- build
build:
before_script:
- docker build -f Dockerfile.build -t ci-project-build-$CI_PROJECT_ID:$CI_BUILD_REF .
- docker create
-v /build/deps
-v /build/_build
-v /build/rel
-v /root/.cache/aceapp/
--name build_data_$CI_PROJECT_ID_$CI_BUILD_REF busybox /bin/true
tags:
- docker
stage: build
script:
- docker run --volumes-from build_data_$CI_PROJECT_ID_$CI_BUILD_REF --rm -t ci-project-build-$CI_PROJECT_ID:$CI_BUILD_REF
The output when pushing to GitLab instance is this:
Running with gitlab-runner 10.7.2 (b5e03c94)
on my.host.rhel.runner 8f724ea7
Using Shell executor...
Running on my.host.local...
Fetching changes...
HEAD is now at 14351c4 Merge branch 'Development' into 'master'
From https://my.host.example/zalmosc/ace-app
14351c4..9fa2d43 master -> origin/master
Checking out 9fa2d435 as master...
Skipping Git submodules setup
$ # Auto DevOps variables and functions # collapsed multi-line command
$ setup_docker
$ build
Logging to GitLab Container Registry with CI credentials...
Login Succeeded
Building Dockerfile-based application...
invalid argument "/master:9fa2d4358e6c426b882e2251aa5a49880013614b" for t: Error parsing reference: "/master:9fa2d4358e6c426b882e2251aa5a49880013614b" is not a valid repository/tag: invalid reference format
See 'docker build --help'.
ERROR: Job failed: exit status 1
I understand the docker tag is not valid (is the before_script: really triggered based on the name?), and I'm looking for help regarding a) a solution b) how I can learn more about the requirements for a pipeline that builds docker based on default settings. Do I need to tag my docker image locally and then somehow add this to my git commit?
The thing is -t is to tag your Docker image. See the docs here.
The tag should be formated like name:version, and you giving it /master:9fa2d4358e6c426b882e2251aa5a49880013614b which is not a valid tag. You could try to delete the / before master
Your tag cannot begin with '/':
$ docker build -f Dockerfile.build -t /master:9fa2d4358e6c426b882e2251aa5a49880013614b .
invalid argument "/master:9fa2d4358e6c426b882e2251aa5a49880013614b" for "-t, --tag" flag: invalid reference format
See 'docker build --help'.
# remove '/'
$ docker build -f Dockerfile.build -t master:9fa2d4358e6c426b882e2251aa5a49880013614b .
Sending build context to Docker daemon 3.584kB
Step 1/3 : FROM ubuntu:16.04
---> 14f60031763d
...
If you are not using the built in registry, you might have to set the CI_REGISTRY_IMAGE value to something. It seems that if you don't se this it gets set to /master and causes this error. you can set this in the CI setting page, or when making a new pipeline. e.g CI_REGISTRY_IMAGE gitlab.com/user/project

Resources