Why docker image built locally and on github actions have different digest

Why docker image built locally and on github actions have different digest - docker

I am building and push docker image from github actions to AWS. Both locally and on gitHub, I use the same command to build the image; however, it results to different image digests.
Command I used:
docker buildx build --build-arg DOCKER_REGISTRY=$ECR_REGISTRY --platform=linux/amd64 -t "$ECR_REGISTRY/$ECR_REPO_NAME:$LATEST_TAG" -t "$ECR_REGISTRY/$ECR_REPO_NAME:$ECR_TAG" -f Dockerfile .
My Dockerfile:
FROM node:16
RUN apt-get update \
&& apt-get install -y chromium \
fonts-ipafont-gothic fonts-wqy-zenhei fonts-thai-tlwg fonts-kacst fonts-freefont-ttf libxss1 \
--no-install-recommends
RUN mkdir -p /usr/src/app
RUN chown -R node: /usr/src/app
USER node
WORKDIR /usr/src/app
COPY --chown=node:node package.json package-lock.json /usr/src/app/
ENV PUPPETEER_SKIP_CHROMIUM_DOWNLOAD true
ENV PUPPETEER_EXECUTABLE_PATH /usr/bin/chromium
RUN npm config set registry https://registry.npmjs.org/
RUN npm install
# Copy crawl scripts
RUN mkdir -p /usr/src/app/scripts
RUN mkdir -p /usr/src/app/collected
COPY --chown=node:node lib/ /usr/src/app/lib/
COPY --chown=node:node scripts/ /usr/src/app/scripts/

You're basing your image on a the node:16 image, which is a moving target. Depending on when you last pulled that image, you might actually be using node:16.8.0 or node:16.9.1.
Depending on which base image you start with, you're going to end up with a final image digest.
If you want to make absolutely sure you're building the same image in both environments, build using the digest rather than using a tag, e.g:
FROM node#sha256:0c672d547405fe64808ea28b49c5772b1026f81b3b716ff44c10c96abf177d6a
...

Docker image builds from docker build and buildx are not reproducible (at least for now). Everything pushed to a registry is content addressable, referenced by the digest of the content itself. The image digest is from a json manifest that contains digests of various components, similar to a Merkle tree. If any of those components change, the top level digest also changes. And the components include things that change. Among the things that change are:
image creation time and time for each history step, there is a timestamp directly in the image config
file timestamps, which will be different if they are created at the time of the image build
base image, if you don't pin that base image to a digest
results of RUN steps, when those steps pull from external resources that give you the "latest" of some resource, e.g. apt-get install some_pkg will change every time the package or one of it's dependencies gets updated
I've been making efforts to modify an image after the fact, back dating timestamps, proxying all network requests, etc, but this is still pre-alpha and dependent on each image being built to find what changes between builds. For some of the debugging using regctl I've been adding regctl manifest diff, regctl blob diff-config, and regctl blob diff-layer. And then to change the image, there's an experimental regctl image mod command.

Related

Pip install local package invalidates docker cache in upper layers

I created a multistage docker file where in the base image I prepare anaconda environment with required packages and in the final image I copy the anaconda and install the local package.
I noticed that on every CI build and push all of the layers are recomputed and pushed, including the one big anaconda layer.
Here is how I build it
DOCKER_BUILTKIT=1 docker build -t my_image:240beac6 \
-f docker/dockerfiles/Dockerfile . \
--build-arg BASE_IMAGE=base_image:240beac64 --build-arg BUILDKIT_INLINE_CACHE=1 \
--cache-from my_image:latest
docker push my_image:240beac6
ARG BASE_IMAGE
FROM $BASE_IMAGE as base
FROM ubuntu:20.04
ENV DEBIAN_FRONTEND=noninteractive
# enable conda
ENV PATH=/root/miniconda3/bin/:${PATH}
COPY --from=base /opt/fast_align/* /usr/bin/
COPY --from=base /usr/local/bin/yq /usr/local/bin/yq
COPY --from=base /root/miniconda3 /root/miniconda3
COPY . /opt/my_package
# RUN pip install --no-deps /opt/my_package
If I leave the last run command commented out, the docker only builds the last COPY (if some file in the context changed) layer.
However, if I try to install it, it invalidates everything.
Is it because, I change the /root/miniconda3 with the pip install?
If so, I am surprised by that, I was hoping the lower RUN commands can't mess up the higher commands.
Is there a way to copy the conda environment from the base image, install the local image in a separate command and still benefit from the caching?
Any help is much appreciated.

One solution, albeit a bit hacky would be to replace the last RUN with CMD and install the package on start of the container. It would be almost instant as the requirements are already installed in the base image.

Why are two images created instead of one?

Please see the command below:
docker build -t iansbasecontainer:v1 -f DotnetDebug.Dockerfile .
It creates one container as shown below:
DotnetDebug.Dockerfile looks like this:
FROM microsoft/aspnetcore:2.0 AS base
# Install the SSHD server
RUN apt-get update \
&& apt-get install -y --no-install-recommends openssh-server \
&& mkdir -p /run/sshd \
&& echo "root:Docker!" | chpasswd
#Copy settings file. See elsewhere to find them.
COPY sshd_config /etc/ssh/sshd_config
COPY authorized_keys root/.ssh/authorized_keys
# Install Visual Studio Remote Debugger
RUN apt-get install zip unzip
RUN curl -sSL https://aka.ms/getvsdbgsh | bash /dev/stdin -v latest -l ~/vsdbg
EXPOSE 2222
I then run this command:
docker build -t iansimageusesbasecontainer:v1 -f DebugASP.Dockerfile .
However, two images appear:
DebugASP.Dockerfile looks like this:
FROM iansbasecontainer:v1 AS base
WORKDIR /app
MAINTAINER Vladimir Vladimir#akopyan.me
FROM microsoft/aspnetcore-build:2.0 AS build
WORKDIR /src
COPY ./DebugSample .
RUN dotnet restore
FROM build AS publish
RUN dotnet publish -c Debug -o /app
FROM base AS final
COPY --from=publish /app /app
COPY ./StartSSHAndApp.sh /app
EXPOSE 5000
CMD /app/StartSSHAndApp.sh
#If you wish to only have SSH running and start
#your service when you start debugging
#then use just the SSH server, you don't need the script
#CMD ["/usr/sbin/sshd", "-D"]
Why do two images appear? Please note I am relatively new to Docker so this may be a simple answer. I have spent the last few hours Googling it.
Also why is the repository and tag set to: .

Why do two images appear?
As mentioned here:
When using multi-stage builds, each stage produces a new image. That image is stored in the local image cache and will be used on subsequent builds (as part of the caching mechanism). You can run each build-stage (and/or tag the stage, if desired).
Read more about multi-stage builds here.

Docker produces intermediate(aka <none>:<none>) images for each layer, which are later used for final image. You can actually see them if execute docker images -a command.
But what you see is called dangling image. It happens, because some intermediate image is no longer used by final image. In case of multi-stage builds -- images for previous stages are not used in final image, so they become dangling.
Dangling images are useless and use your space, so it's recommended to regularly get rid of them(it's called pruning). You can do that with command:
docker image prune

How to Edit Docker Image?

I did a basic search in the community and could not find a suitable answer, so I am asking here. Sorry if it was asked earlier.
Basically , I am working on a certain project and we keep changing code at a regular interval . So ,we need to build docker image everytime due to that we need to install dependencies from requirement.txt from scratch which took around 10 min everytime.
How can I perform direct change to docker image and also how to configure entrypoint(in Docker File) which reflect changes in Pre-Build docker image

You don't edit an image once it's been built. You always run docker build from the start; it always runs in a clean environment.
The flip side of this is that Docker caches built images. If you had image 01234567, ran RUN pip install -r requirements.txt, and got image 2468ace0 out, then the next time you run docker build it will see the same source image and the same command, and skip doing the work and jump directly to the output images. COPY or ADD files that change invalidates the cache for future steps.
So the standard pattern is
FROM node:10 # arbitrary choice of language
WORKDIR /app
# Copy in _only_ the requirements and package lock files
COPY package.json yarn.lock ./
# Install dependencies (once)
RUN yarn install
# Copy in the rest of the application and build it
COPY src/ src/
RUN yarn build
# Standard application metadata
EXPOSE 3000
CMD ["yarn", "start"]
If you only change something in your src tree, docker build will skip up to the COPY step, since the package.json and yarn.lock files haven't changed.

In my case, I was facing the same, after minor changes, i was building the image again and again.
My old DockerFile
FROM python:3.8.0
WORKDIR /app
# Install system libraries
RUN apt-get update && \
apt-get install -y git && \
apt-get install -y gcc
# Install project dependencies
COPY ./requirements.txt .
RUN pip install --upgrade pip
RUN pip install --no-cache-dir -r requirements.txt --use-deprecated=legacy-resolver
# Don't use terminal buffering, print all to stdout / err right away
ENV PYTHONUNBUFFERED 1
COPY . .
so what I did, created a base image file first like this (Avoided the last line, did not copy my code)
FROM python:3.8.0
WORKDIR /app
# Install system libraries
RUN apt-get update && \
apt-get install -y git && \
apt-get install -y gcc
# Install project dependencies
COPY ./requirements.txt .
RUN pip install --upgrade pip
RUN pip install --no-cache-dir -r requirements.txt --use-deprecated=legacy-resolver
# Don't use terminal buffering, print all to stdout / err right away
ENV PYTHONUNBUFFERED 1
and then build this image using
docker build -t my_base_img:latest -f base_dockerfile .
then the final Dockerfile
FROM my_base_img:latest
WORKDIR /app
COPY . .
And as my from this image, I was not able to up the container, issues with my copied python code, so you can edit the image/container code, to fix the issues in the container, by this mean i avoided the task of building images again and again.
When my code got fixed, I copied the changes from container to my code base and then finally, I created the final image.

There are 4 Steps
Start the image you want to edit (e.g. docker run ...)
Modify the running image by shelling into it with docker exec -it <container-id> (you can get the container id with docker ps)
Make any modifications (install new things, make a directory or file)
In a new terminal tab/window run docker commit c7e6409a22bf my-new-image (substituting in the container id of the container you want to save)
An example
# Run an existing image
docker run -dt existing_image
# See that it's running
docker ps
# CONTAINER ID IMAGE COMMAND CREATED STATUS
# c7e6409a22bf existing-image "R" 6 minutes ago Up 6 minutes
# Shell into it
docker exec -it c7e6409a22bf bash
# Make a new directory for demonstration purposes
# (note that this is inside the existing image)
mkdir NEWDIRECTORY
# Open another terminal tab/window, and save the running container you modified
docker commit c7e6409a22bf my-new-image
# Inspect to ensure it saved correctly
docker image ls
# REPOSITORY TAG IMAGE ID CREATED SIZE
# existing-image latest a7dde5d84fe5 7 minutes ago 888MB
# my-new-image latest d57fd15d5a95 2 minutes ago 888MB

Does multistage docker images have the same layers?

Let's say I decided to make following multistage build:
FROM node:8.6-alpine AS build1
#some other commands
FROM node:8.5-alpine AS build2
# yet another commands
definitely there are some layers which will be common between build1 and build2. Will docker duplicate layers, or append somehow reference already built layers?

I believe ordinary docker build layer caching will apply, but there are other better answers.
FROM ubuntu:18.04 AS first
RUN apt-get update \
&& DEBIAN_FRONTEND=noninteractive \
apt-get install -y --no-install-recommends \
python3
RUN echo first
FROM ubuntu:18.04 AS second
RUN apt-get update \
&& DEBIAN_FRONTEND=noninteractive \
apt-get install -y --no-install-recommends \
python3
RUN echo second
The rules are that you must start from the exact same base image (in your example nothing is shared) and you must repeat the exact same commands (or COPY the exact same file content); as soon as you stray from this path nothing is shared, including in any later identical commands.
You can use the AS alias in later FROM directives so if you really want to share some base layer it's better to do it explicitly
FROM ubuntu:18.04 AS base
RUN apt-get update \
&& DEBIAN_FRONTEND=noninteractive \
apt-get install -y --no-install-recommends \
python3
FROM base AS first
RUN echo first
FROM base AS second
RUN echo second
The more common case with a multi-stage build is to have very different "build" and "runtime" images, so this often doesn't apply.
FROM golang:1.11 AS build
WORKDIR /go/src/github.com/me/myapp
COPY ./ ./
RUN go install .
FROM alpine
COPY --from=build /go/bin/myapp /usr/bin
CMD ["myapp"]

Next is the result of your dockerfile build on a fresh machine:
# docker build -t test:1 .
Sending build context to Docker daemon 2.048kB
Step 1/2 : FROM node:8.6-alpine AS build1
8.6-alpine: Pulling from library/node
88286f41530e: Pull complete
d0e8a23136b3: Pull complete
5ad5b12a980e: Pull complete
Digest: sha256:60cd58a7a2bd9fec161f53f8886e451f92db06b91f4f72d9188eeea040d195eb
Status: Downloaded newer image for node:8.6-alpine
---> b7e15c83cdaf
Step 2/2 : FROM node:8.5-alpine AS build2
8.5-alpine: Pulling from library/node
88286f41530e: Already exists
aa0be12c5610: Pull complete
719d346e6de2: Pull complete
Digest: sha256:945cf56668d3e58a3b045291564963ccde29a68da9c1483e19d8a0b06749db06
Status: Downloaded newer image for node:8.5-alpine
---> 7a779c246a41
Successfully built 7a779c246a41
Successfully tagged test:1
From the output you can see image id 88286f41530e was reused as Already exists.
And docker images output:
REPOSITORY TAG IMAGE ID CREATED SIZE
node 8.6-alpine b7e15c83cdaf 13 months ago 67.2MB
node 8.5-alpine 7a779c246a41 14 months ago 67MB
So, the base image of first stage in the multibuild also reserve in cache.
And from this post:
Since Docker v1.10, generally, images and layers are no longer synonymous.
Instead, an image directly references one or more layers that eventually contribute to a derived container's filesystem.
So, as some image reused, the layers surely be reused.
Of course, this depends on the base images you used in multibuild, they need have something to reuse.
Anyway, I think multibuild just add some trick compared to traditional build, but layer reuse mechanism is the same.

Docker image for built Golang binary

I have a Go application that I build into a binary and distribute as a Docker image.
Currently, I'm using ubuntu as my base image, but this causes an issue where if a user tries to use a Timezone other than UTC or their local timezone, they get an error stating:
pod error: panic: open /usr/local/go/lib/time/zoneinfo.zip: no such file or directory
This error is caused because the LoadLocation package in Go requires that file.
I can think of two ways to fix this issue:
Continue using the ubuntu base image, but in my Dockerfile add the commands: RUN apt-get install -y tzdata
Use one of Golang's base images, eg. golang:1.7.5-alpine.
What would be the recommended way? I'm not sure if I need to or should be using a Golang image since this is the container where the pre-built binary runs. My understanding is that Golang images are good for building the binary in the first place.

I prefer to use multi-stage build. On 1st step you use special golang building container for installing all dependencies and build an application. On 2nd stage I copy binary file to empty alpine container. This allows having all required tooling and minimal docker image simultaneously (in my case 6MB instead of 280MB).
Example of Dockerfile:
# build stage
FROM golang:1.8
ADD . /src
RUN set -x && \
cd /src && \
go get -t -v github.com/lisitsky/go-site-search-string && \
CGO_ENABLED=0 GOOS=linux go build -a -o goapp
# final stage
FROM alpine
WORKDIR /app
COPY --from=0 /src/goapp /app/
ENTRYPOINT /goapp
EXPOSE 8080

Since not all OS have localized timezone installed, this is what I did to make it work:
ADD https://github.com/golang/go/raw/master/lib/time/zoneinfo.zip /usr/local/go/lib/time/zoneinfo.zip
The full example of multi-step Docker image for adding timezone is here

This is more of a vote, but apt-get is what we (my company's tech group) do in situations like this. It gives us complete control over the hierarchy of images, but this is assuming you may have future images based on this one.

You can use the system's tzdata. Or you can copy $GOROOT/lib/time/zoneinfo.zip into your image, which is a trimmed version of the system one.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart