Dockerfiles and COPY/ADD - docker

I've been trying to build an application called Crowd for a client but my Docker experience isn't great.
This is the DockerHub page
Now, if I run docker run -d -p 8095:8095 --name crowd blacklabelops/crowd
It runs and I have no problem, but if copy and paste their Dockerfile
I get an error telling me that splash.xml doesn't exist. My understanding is that the ADD command in Dockerfile copies files from the source into the container. But obviously I don't have those files because I'm just running the Dockerfile.
So if the docker run command is running based on that Dockerfile, how would the Dockerfile work as a standalone? Please help me understand. Many thanks.

Dockerfile contains instructions on how to build an image. Note that in many cases, these instructions include copying (adding) files into the built image.
docker run runs a complete, built image.
If you wish to build from the Dockerfile source, you should probably clone/download the entire repository, and then docker build locally.

Related

Copy files to a Docker image using entrypoint instead of dockerfile for GitHub actions

I created a Dockerfile for my website development (Jekyll in this case, but I do not think that matters much).
In case this information is helpful, I code locally using Visual Studio Code and the Remote Containers extension. This extension allows me to manage my code locally while keeping it in sync with the container.
To publish my website, I run a GitHub Action that creates a container from my Dockerfile and then runs all the build code from an entrypoint.sh file. Here is the pertinent code from my Dockerfile:
FROM ruby:alpine as jekyll
ENV env_workspace_directory=$workspace_directory
... more irrelevant code ...
RUN echo "#################################################"
RUN echo "Copy the GitHub repo to the Docker container"
RUN echo "COPY . ${env_workspace_directory}"
COPY . ${env_workspace_directory}
RUN echo "#################################################"
RUN echo "Run the entrypoint "
ENTRYPOINT ["/entrypoint.sh"]
Because I am using the Remote Containers VS Code extension, I do not want the Dockerfile to contain the COPY . ${env_workspace_directory} code. Instead, I only want that code to run when used as a GitHub Action.
So I guess I have two questions, with the first being ideal:
Is it possible to write like-type code that will copy the contents of the currently open GitHub branch (or at least the main branch), including all files and subfolders into the Docker container using the entrypoint.sh file instead? If so, what would that entrypoint.sh code look like?
Is it possible to leave the COPY command in the Dockerfile and make it conditional? For example "Only run the COPY command if running from a GitHub Action"?
For #1 above, I reviewed this Stack Overflow article that says you can use the docker cp command, but I am unsure if that is (a) correct and (b) how to be sure I am using the $workspace_directory.
I am very new to Dockerfiles, writing shell commands, and GitHub Actions, so apologies if this question is an easy one or if more clarifications are required.
Here is the Development repo if that is useful.
A Docker volume mount happens after the image is built but before the main container process is run. So if you do something like
docker run -v "$PWD/site:/site" your-image
then the entrypoint.sh script will see the host content in the container's /site directory, even if something different had been COPYed in the Dockerfile.
There are no conditionals or flow control in Dockerfiles, beyond shell syntax within individual RUN instructions. You could in principle access a Git repository in your container process, but managing the repository location, ssh credentials, branches, uncommitted files, etc. can get complex.
Depending on how you're using the image, I could suggest two approaches here.
If you have a deploy-time action that uses this image in its entirety to build the site, then just leave your Dockerfile as-is. In development use a bind mount to inject your host content; don't especially worry about skipping the image COPY here.
Another approach is to build the image containing the Jekyll tool, but to treat the site itself as data. In that case you'd always run the image with a docker run -v or Compose volumes: option to inject the data. In the Dockerfile you might create an empty directory to be safe
RUN mkdir "${env_workspace_directory}" # consider a fixed path
and in your entrypoint script you can verify the site exists
if [ ! -f "$env_workspace_directory/_site.yml" ]; then
cat >&2 <<EOF
There does not seem to be a Jekyll site in $env_workspace_directory.
Please re-run this container with the site mounted.
EOF
exit 1
fi

Combining multiple images in docker-compose [duplicate]

I have a few Dockerfiles right now.
One is for Cassandra 3.5, and it is FROM cassandra:3.5
I also have a Dockerfile for Kafka, but t is quite a bit more complex. It is FROM java:openjdk-8-fre and it runs a long command to install Kafka and Zookeeper.
Finally, I have an application written in Scala that uses SBT.
For that Dockerfile, it is FROM broadinstitute/scala-baseimage, which gets me Java 8, Scala 2.11.7, and STB 0.13.9, which are what I need.
Perhaps, I don't understand how Docker works, but my Scala program has Cassandra and Kafka as dependencies and for development purposes, I want others to be able to simply clone my repo with the Dockerfile and then be able to build it with Cassandra, Kafka, Scala, Java and SBT all baked in so that they can just compile the source. I'm having a lot of issues with this though.
How do I combine these Dockerfiles? How do I simply make an environment with those things baked in?
You can, with the multi-stage builds feature introduced in Docker 1.17
Take a look at this:
FROM golang:1.7.3
WORKDIR /go/src/github.com/alexellis/href-counter/
RUN go get -d -v golang.org/x/net/html
COPY app.go .
RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o app .
FROM alpine:latest
RUN apk --no-cache add ca-certificates
WORKDIR /root/
COPY --from=0 /go/src/github.com/alexellis/href-counter/app .
CMD ["./app"]
Then build the image normally:
docker build -t alexellis2/href-counter:latest
From : https://docs.docker.com/develop/develop-images/multistage-build/
The end result is the same tiny production image as before, with a significant reduction in complexity. You don’t need to create any intermediate images and you don’t need to extract any artifacts to your local system at all.
How does it work? The second FROM instruction starts a new build stage with the alpine:latest image as its base. The COPY --from=0 line copies just the built artifact from the previous stage into this new stage. The Go SDK and any intermediate artifacts are left behind, and not saved in the final image.
You can't combine dockerfiles as conflicts may occur. What you want to do is to create a new dockerfile or build a custom image.
TL;DR;
If your current development container contains all the tools you need and works, then save it as an image and upon it to a repo and create a dockerfile to pull from that image off that repo.
Details:
Building a custom image is by far easier than creating a dockerfile using a public image as you can store whatever hacks and mods into the image. To do so, start a blank container with a basic Linux image (or broadinstitute/scala-baseimage), install whatever tools you need and configure them until everything works correctly, then save it (the container) as an image. Create a new container off this image and test to see if you can build your code on top of it via docker-compose (or however you want to do/build it). If it works, than you have a working base image that you can upload to a repo so others can pull it.
To build a dockerfile with a public image, you will need to put all hacks, mods and setup on the dockerfile itself. That is, you will need to place every command line that you used into a text file and reduce whatever hacks, mods and setup into command lines. At the end, your dockerfile will create an image automatically and you don't need to store this image into a repo and all you need to do is to give others the dockerfile and they can spin the image up at their own docker.
Note that once you have a working dockerfile, you can tweak it easily as it will create a new image every time you use the dockerfile. With a custom image, you may run into issues where you need to rebuild the image due to conflicts. For example, all of your tools work with openjdk until you install one that doesn't work. The fix may involve uninstalling openjdk and use the oracle one, but all configuration you did for all the tools that you have installed broke.
The following answer applies to docker 1.7 and above:
I would prefer to use --from=NAME and from image as NAME
Why?
You can use --from=0 and above but this might get little hard to manage when you have many docker stages in dockerfile.
sample example:
FROM golang:1.7.3 as backend
WORKDIR /backend
RUN go get -d -v golang.org/x/net/html
COPY app.go .
RUN #install some stuff, compile assets....
FROM golang:1.7.3 as assets
WORKDIR /assets
RUN ./getassets.sh
FROM nodejs:latest as frontend
RUN npm install
WORKDIR /assets
COPY --from=assets /asets .
CMD ["./app"]
FROM alpine:latest as mergedassets
WORKDIR /root/
COPY --from=frontend . /
COPY --from=backend ./backend .
CMD ["./app"]
Note: Managing dockerfile properly will help to build a docker image much faster. Internally docker usings docker layer caching to help with this process, incase the image have to be rebuilt.
Yes, you can roll a whole lot of software into a single Docker image (GitLab does this, with one image that includes Postgres and everything else), but generalhenry is right - that's not the typical way to use Docker.
As you say, Cassandra and Kafka are dependencies for your Scala app, they're not part of the app, so they don't all belong in the same image.
Having to orchestrate many containers with Docker Compose adds an extra admin layer, but it gives you much more flexibility:
your containers can have different lifespans, so when you have a new version of your app to deploy, you only need to run a new app container, you can leave the dependencies running;
you can use the same app image in any environment, using different configurations for your dependencies - e.g. in dev you can run a basic Kafka container and in prod have it clustered on many nodes, your app container is the same;
your dependencies can be used by other apps too - so multiple consumers can run in different containers and all work with the same Kafka and Cassandra containers;
plus all the scalability, logging etc. already mentioned.
When might you want to "combine" Docker images?
As others are pointing out here, you typically don't want to put your database and you application into the same Docker image. Ideally you want a Docker image to wrap a "single process"/"runtime". This allows each process to be scaled up/down and restarted individually.
Let's say you want to use some shared C-libraries/executables that are not available in the package manager of the image you are using, but someone else has created an image where they are precompiled - and you might not want to recompile these binaries as part of your build (depending on how long this takes). Is there a way to quickly create a POC-Docker image containing all of these executables/libraries based on the existing images?
Docker and Composition
Relevant discussion: https://github.com/moby/moby/issues/3378
What Docker lacks is a good way of composing images. You can copy individual files or entire file systems from other images into your own using COPY --from=<image> <from-path> <to-path>. There is no builtin way of copying the environment variables from another image into your own.
That said, I have personally created a custom frontend/parser for Dockerfiles that adds an INCLUDE <image>-keyword. This copies the entire filesystem, along with the environment variables into your image:
DOCKER_BUILDKIT=1 docker build -t myimage .
#syntax=bergkvist/includeimage
FROM alpine:3.12.0
INCLUDE rust:1.44-alpine3.12
INCLUDE python:3.8.3-alpine3.12
nixpkgs.dockerTools
if you want truly composable Docker builds, I recommend checking out dockerTools in nixpkgs. This will also result in more reproducible (and typically very small) images. See https://nix.dev/tutorials/building-and-running-docker-images
docker load < $(nix-build docker-image.nix)
# docker-image.nix
let
pkgs = import <nixpkgs> {};
python = pkgs.python38;
rustc = pkgs.rustc;
in pkgs.dockerTools.buildImage {
name = "myimage";
tag = "latest";
contents = [ python rustc ];
}
Docker doesn't do merges of the images, but there isn't anything stopping you combining the dockerfiles if available, and rolling into them into a fat image which you'd need to build. There's times where this makes sense, however, as for running multiple processes in a container most Docker dogma will point to this as less desirable especially with microservice architecture (however rules are there to be broken right?)
You could not combine docker images into 1 container. See the detail discussions in Moby issue, How do I combine several images into one via Dockerfile.
For your case, it is better to not include the whole Cassandra and Kafka images. The application would only need the Cassandra Scala driver and Kafka Scala driver. The container should include the drivers only.
I needed docker:latest and python:latest images for Gitlab CI. Here is what I came up with:
FROM ubuntu:latest
RUN apt update
RUN apt install -y sudo
RUN sudo apt install -y docker.io
RUN sudo apt install -y python3-pip
RUN sudo apt install -y python3
RUN docker --version
RUN pip3 --version
RUN python3 --version
After I've build and pushed it to my Docker Hub repo:
docker build -t docker-hub-repo/image-name:latest path/to/Dockerfile
docker push docker-hub-repo/image-name:latest
Don't forget to docker login before push
Hope it helps

How to change source code without rebuilding image in Docker?

What is the best practice to use Docker container for dev/prod.
Let's say I want my changes to be applied automatically during development without rebuilding and restarting images. As far as I understand I can inject volume for this when running container.
docker run -v `pwd`/src:/src --rm -it username/node-web-app0
Where pwd/src stands for the directory source code. It's working fine so far.
But how to delivery code to production? I think it worse to keep code along with binaries into the docker container. Do I need to create another similar docker file which will use COPY instead? Or it's better to deploy source-code separately like for dev-mode and mount it to docker.
The best practice is to build a new docker image for every version of your code. That has many advantages in production environments as faster deployments, independence from other systems, easier rollbacks, exportability, etc.
It is possible to do it within the same Dockerfile, using multi-stage builds.
The following is a simple example for a NodeJS app:
FROM node:10 as dev
WORKDIR /src
CMD ["myapp.js"]
FROM node:10
COPY package.json .
RUN npm install
COPY . .
Note that this Dockerfile is only for demo purposes, it can be improved in many ways.
When working on dev environment use the following commands to build the base image and run your code with a mounted folder:
docker build --target dev -t username/node-web-app0 .
docker run -v `pwd`/src:/src --rm -it username/node-web-app0
And when you're ready for production, just exec docker run without the --target argument to build the full image, that contains the code:
docker build -t username/node-web-app0:v0.1 .
docker push username/node-web-app0:v0.1

How does this Docker command, which makes a small GoLang container, work?

New to docker here. I followed the instructions here to make a slim & trim container for my Go Project. I do not fully understand what it's doing though, hopefully someone can enlighten me.
Specifically there are two steps to generating this docker container.
docker run --rm -it -v "$GOPATH":/gopath -v "$(pwd)":/app -e "GOPATH=/gopath" -w /app golang:1.8 sh -c 'CGO_ENABLED=0 go build -a --installsuffix cgo --ldflags="-s" -o hello'
docker build -t myDockerImageName .
The DockerFile itself just contains
FROM iron/base
WORKDIR /app
COPY hello /app/
ENTRYPOINT ["./hello"]
I understand (in a broad sense) that the 1st step is compiling the go program and statically linking the C-dependencies (and doing all this inside an unnamed docker container). The 2nd step just generates the docker image according to the instructions in the DockerFile.
What I don't understand is why the first command starts with docker run (why does it need to be run inside a docker container? Why are we not just generating the Go binary outside of it, and then copying it in?)
And if it's being run inside a docker container, how is binary generated in the docker container being dropped on my local machine's file system?(e.g. why do I need to copy the binary back into the image - as it seems to be doing on line 3 of the DockerFile?)
You're actually using 2 different docker containers, each with a different image. The first container is only around during the compilation... it uses the image golang:1.8. What you're doing is mounting your current working directory into that image and compiling it with the version of GO contained in the image.
The second command builds your custom image that uses the iron/base image as its base. You then copy your built application into that image and run it.
Using a golang container to build the binary is usually done for reproducibility of the build process, i.e.:
it ensures that always the same Go version is used,
compilation takes place in an alway clean environment,
and the build host needs no Go installed at all, or can have a different version,
This way, all parts needed to build the "hello" image can be tracked in a version control system.
However, this example mounts the whole local GOPATH, defeating above purpose. Dependencies must be available to the build container, e.g. by vendoring them. Maybe the author considered vendoring out of scope for his example.
(note: this should be a comment, but my reputation does not allow that)

Is there a way to combine Docker images into 1 container?

I have a few Dockerfiles right now.
One is for Cassandra 3.5, and it is FROM cassandra:3.5
I also have a Dockerfile for Kafka, but t is quite a bit more complex. It is FROM java:openjdk-8-fre and it runs a long command to install Kafka and Zookeeper.
Finally, I have an application written in Scala that uses SBT.
For that Dockerfile, it is FROM broadinstitute/scala-baseimage, which gets me Java 8, Scala 2.11.7, and STB 0.13.9, which are what I need.
Perhaps, I don't understand how Docker works, but my Scala program has Cassandra and Kafka as dependencies and for development purposes, I want others to be able to simply clone my repo with the Dockerfile and then be able to build it with Cassandra, Kafka, Scala, Java and SBT all baked in so that they can just compile the source. I'm having a lot of issues with this though.
How do I combine these Dockerfiles? How do I simply make an environment with those things baked in?
You can, with the multi-stage builds feature introduced in Docker 1.17
Take a look at this:
FROM golang:1.7.3
WORKDIR /go/src/github.com/alexellis/href-counter/
RUN go get -d -v golang.org/x/net/html
COPY app.go .
RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o app .
FROM alpine:latest
RUN apk --no-cache add ca-certificates
WORKDIR /root/
COPY --from=0 /go/src/github.com/alexellis/href-counter/app .
CMD ["./app"]
Then build the image normally:
docker build -t alexellis2/href-counter:latest
From : https://docs.docker.com/develop/develop-images/multistage-build/
The end result is the same tiny production image as before, with a significant reduction in complexity. You don’t need to create any intermediate images and you don’t need to extract any artifacts to your local system at all.
How does it work? The second FROM instruction starts a new build stage with the alpine:latest image as its base. The COPY --from=0 line copies just the built artifact from the previous stage into this new stage. The Go SDK and any intermediate artifacts are left behind, and not saved in the final image.
You can't combine dockerfiles as conflicts may occur. What you want to do is to create a new dockerfile or build a custom image.
TL;DR;
If your current development container contains all the tools you need and works, then save it as an image and upon it to a repo and create a dockerfile to pull from that image off that repo.
Details:
Building a custom image is by far easier than creating a dockerfile using a public image as you can store whatever hacks and mods into the image. To do so, start a blank container with a basic Linux image (or broadinstitute/scala-baseimage), install whatever tools you need and configure them until everything works correctly, then save it (the container) as an image. Create a new container off this image and test to see if you can build your code on top of it via docker-compose (or however you want to do/build it). If it works, than you have a working base image that you can upload to a repo so others can pull it.
To build a dockerfile with a public image, you will need to put all hacks, mods and setup on the dockerfile itself. That is, you will need to place every command line that you used into a text file and reduce whatever hacks, mods and setup into command lines. At the end, your dockerfile will create an image automatically and you don't need to store this image into a repo and all you need to do is to give others the dockerfile and they can spin the image up at their own docker.
Note that once you have a working dockerfile, you can tweak it easily as it will create a new image every time you use the dockerfile. With a custom image, you may run into issues where you need to rebuild the image due to conflicts. For example, all of your tools work with openjdk until you install one that doesn't work. The fix may involve uninstalling openjdk and use the oracle one, but all configuration you did for all the tools that you have installed broke.
The following answer applies to docker 1.7 and above:
I would prefer to use --from=NAME and from image as NAME
Why?
You can use --from=0 and above but this might get little hard to manage when you have many docker stages in dockerfile.
sample example:
FROM golang:1.7.3 as backend
WORKDIR /backend
RUN go get -d -v golang.org/x/net/html
COPY app.go .
RUN #install some stuff, compile assets....
FROM golang:1.7.3 as assets
WORKDIR /assets
RUN ./getassets.sh
FROM nodejs:latest as frontend
RUN npm install
WORKDIR /assets
COPY --from=assets /asets .
CMD ["./app"]
FROM alpine:latest as mergedassets
WORKDIR /root/
COPY --from=frontend . /
COPY --from=backend ./backend .
CMD ["./app"]
Note: Managing dockerfile properly will help to build a docker image much faster. Internally docker usings docker layer caching to help with this process, incase the image have to be rebuilt.
Yes, you can roll a whole lot of software into a single Docker image (GitLab does this, with one image that includes Postgres and everything else), but generalhenry is right - that's not the typical way to use Docker.
As you say, Cassandra and Kafka are dependencies for your Scala app, they're not part of the app, so they don't all belong in the same image.
Having to orchestrate many containers with Docker Compose adds an extra admin layer, but it gives you much more flexibility:
your containers can have different lifespans, so when you have a new version of your app to deploy, you only need to run a new app container, you can leave the dependencies running;
you can use the same app image in any environment, using different configurations for your dependencies - e.g. in dev you can run a basic Kafka container and in prod have it clustered on many nodes, your app container is the same;
your dependencies can be used by other apps too - so multiple consumers can run in different containers and all work with the same Kafka and Cassandra containers;
plus all the scalability, logging etc. already mentioned.
When might you want to "combine" Docker images?
As others are pointing out here, you typically don't want to put your database and you application into the same Docker image. Ideally you want a Docker image to wrap a "single process"/"runtime". This allows each process to be scaled up/down and restarted individually.
Let's say you want to use some shared C-libraries/executables that are not available in the package manager of the image you are using, but someone else has created an image where they are precompiled - and you might not want to recompile these binaries as part of your build (depending on how long this takes). Is there a way to quickly create a POC-Docker image containing all of these executables/libraries based on the existing images?
Docker and Composition
Relevant discussion: https://github.com/moby/moby/issues/3378
What Docker lacks is a good way of composing images. You can copy individual files or entire file systems from other images into your own using COPY --from=<image> <from-path> <to-path>. There is no builtin way of copying the environment variables from another image into your own.
That said, I have personally created a custom frontend/parser for Dockerfiles that adds an INCLUDE <image>-keyword. This copies the entire filesystem, along with the environment variables into your image:
DOCKER_BUILDKIT=1 docker build -t myimage .
#syntax=bergkvist/includeimage
FROM alpine:3.12.0
INCLUDE rust:1.44-alpine3.12
INCLUDE python:3.8.3-alpine3.12
nixpkgs.dockerTools
if you want truly composable Docker builds, I recommend checking out dockerTools in nixpkgs. This will also result in more reproducible (and typically very small) images. See https://nix.dev/tutorials/building-and-running-docker-images
docker load < $(nix-build docker-image.nix)
# docker-image.nix
let
pkgs = import <nixpkgs> {};
python = pkgs.python38;
rustc = pkgs.rustc;
in pkgs.dockerTools.buildImage {
name = "myimage";
tag = "latest";
contents = [ python rustc ];
}
Docker doesn't do merges of the images, but there isn't anything stopping you combining the dockerfiles if available, and rolling into them into a fat image which you'd need to build. There's times where this makes sense, however, as for running multiple processes in a container most Docker dogma will point to this as less desirable especially with microservice architecture (however rules are there to be broken right?)
You could not combine docker images into 1 container. See the detail discussions in Moby issue, How do I combine several images into one via Dockerfile.
For your case, it is better to not include the whole Cassandra and Kafka images. The application would only need the Cassandra Scala driver and Kafka Scala driver. The container should include the drivers only.
I needed docker:latest and python:latest images for Gitlab CI. Here is what I came up with:
FROM ubuntu:latest
RUN apt update
RUN apt install -y sudo
RUN sudo apt install -y docker.io
RUN sudo apt install -y python3-pip
RUN sudo apt install -y python3
RUN docker --version
RUN pip3 --version
RUN python3 --version
After I've build and pushed it to my Docker Hub repo:
docker build -t docker-hub-repo/image-name:latest path/to/Dockerfile
docker push docker-hub-repo/image-name:latest
Don't forget to docker login before push
Hope it helps

Resources