1. Scenario with ONBUILD
Base Dockerfile
FROM ubuntu:latest
RUN apt-get update && apt-get install python3
ONBUILD COPY test.py test.py
Obliviously, when we build above Dockerfile(test-image:latest), the COPY wont affected.(The test.py not copied)
Now onbuild Dockerfile
FROM test-image:latest
Now, when we build above Dockerfile, the COPY will affect, copies test.py
2. Scenario without ONBUILD
I achieve same thing without use ONBUILD
Base Dockerfile
FROM ubuntu:latest
RUN apt-get update && apt-get install python3
Above Dockerfile build docker image which has python3 (test-image2:latest)
Now child docker image Dockerfile
FROM test-image2:latest
COPY test.py /test.py
So, my question is, why should I use ONBUILD or when should use? is there any performance difference
I think that the answer is simple: you want to use ONBUILD when your parent image has to be used in various children images, so you
avoid repetitions
contstrain the user of the image to have test.py copied
In general you shouldn't use ONBUILD at all. Having a later Dockerfile FROM line do something other than simply incorporate its contents violates the principle of least surprise.
If the thing you're trying to do ONBUILD is something like a RUN or ENV instruction, semantically it makes no difference whether you do it in the base image or the derived image. It will be more efficient if you do it in the base image (once ever, as opposed to once each time a derived image is built).
If you're trying to ONBUILD COPY ... then you're trying to force a specific file to be on the host system at the point you run docker build, which is a little strange as a consumer. Docker's Best practices for writing Dockerfiles notes
Be careful when putting ADD or COPY in ONBUILD. The “onbuild” image fails catastrophically if the new build’s context is missing the resource being added. Adding a separate tag, as recommended above, helps mitigate this by allowing the Dockerfile author to make a choice.
As that page notes, if you must use ONBUILD, you should call it out in the image tag so it's clear when you build a Dockerfile FROM that image, something strange is going on. Most current Docker Hub images don't have -onbuild variants at all, even for things like tomcat that generally have extremely formulaic uses.
Related
I have two Dockerfile,
Dockerfile1
FROM centos:centos7
WORKDIR /root
ONBUILD COPY ./onbuilddemo.txt /tmp/onbuilddemo.txt
Dockerfile2
FROM onbuilddemo:latest
FROM adoptopenjdk/openjdk8:jre8u352-b05-ea-ubuntu-nightly
EXPOSE 8080
WORKDIR /root
CMD ["npm", "start"]
The image created out of dockerfile1 is onbuilddemo:latest
Now, when Im running the container built out of the image created from Dockerfile2 , then Im not seeing the file (onbuilddemo.txt) created/available in /tmp folder
Can someone please help , what Im missing . Thanks
You never used the onbuilddemo:latest image for anything, and if built with buildkit, this first step would be completely skipped:
FROM onbuilddemo:latest
FROM adoptopenjdk/openjdk8:jre8u352-b05-ea-ubuntu-nightly
A multi-stage build is used to split build dependencies from the runtime image. It does not merge multiple images together (there's no way to universally do this with arbitrary Linux filesystems that would result in a lot of broken use cases).
You need to remove the second from step, or copy the file from the first to second stage (using copy --from), or add the onbuild definition to the other base image.
Note that onbuild tends to be a bad idea, it's hard to debug and is rarely documented in places that someone is looking to explain the behavior of their build. If you can't run the steps in an entrypoint, consider templating the Dockerfile so that it's clear exactly what's being performed in the build.
I wish to build a docker image that can start a container where I can use both node version 14 and lz4. The dockerfile I have so far is:
FROM node:14-alpine
WORKDIR /app
RUN apk update
RUN apk add --upgrade lz4
node --version and lz4 --help seem to run ok with the docker run command - but I wanted to ask whether there is a specific WORKDIR I should be using in the dockerfile to follow any best practices (if any exist), or does it not matter what I set the WORKDIR to? Note I'm not sure of all my future requirements, but I may need to use this image to build other images in the future, so I want to ensure WORKDIR is set appropriately.
WORKDIR should be set to set the working directory for the subsequent docker commands in dockerfile, which makes things a little easy to understand as the paths will be relative to the working directory.
By default, / root dir is the set working directory. Without setting any other workdir, all the commands can have absolute paths which make it even more easy to understand.
It doesn't really matter much. Besides, you could always change it for your future builds.
I have a few Dockerfiles right now.
One is for Cassandra 3.5, and it is FROM cassandra:3.5
I also have a Dockerfile for Kafka, but t is quite a bit more complex. It is FROM java:openjdk-8-fre and it runs a long command to install Kafka and Zookeeper.
Finally, I have an application written in Scala that uses SBT.
For that Dockerfile, it is FROM broadinstitute/scala-baseimage, which gets me Java 8, Scala 2.11.7, and STB 0.13.9, which are what I need.
Perhaps, I don't understand how Docker works, but my Scala program has Cassandra and Kafka as dependencies and for development purposes, I want others to be able to simply clone my repo with the Dockerfile and then be able to build it with Cassandra, Kafka, Scala, Java and SBT all baked in so that they can just compile the source. I'm having a lot of issues with this though.
How do I combine these Dockerfiles? How do I simply make an environment with those things baked in?
You can, with the multi-stage builds feature introduced in Docker 1.17
Take a look at this:
FROM golang:1.7.3
WORKDIR /go/src/github.com/alexellis/href-counter/
RUN go get -d -v golang.org/x/net/html
COPY app.go .
RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o app .
FROM alpine:latest
RUN apk --no-cache add ca-certificates
WORKDIR /root/
COPY --from=0 /go/src/github.com/alexellis/href-counter/app .
CMD ["./app"]
Then build the image normally:
docker build -t alexellis2/href-counter:latest
From : https://docs.docker.com/develop/develop-images/multistage-build/
The end result is the same tiny production image as before, with a significant reduction in complexity. You don’t need to create any intermediate images and you don’t need to extract any artifacts to your local system at all.
How does it work? The second FROM instruction starts a new build stage with the alpine:latest image as its base. The COPY --from=0 line copies just the built artifact from the previous stage into this new stage. The Go SDK and any intermediate artifacts are left behind, and not saved in the final image.
You can't combine dockerfiles as conflicts may occur. What you want to do is to create a new dockerfile or build a custom image.
TL;DR;
If your current development container contains all the tools you need and works, then save it as an image and upon it to a repo and create a dockerfile to pull from that image off that repo.
Details:
Building a custom image is by far easier than creating a dockerfile using a public image as you can store whatever hacks and mods into the image. To do so, start a blank container with a basic Linux image (or broadinstitute/scala-baseimage), install whatever tools you need and configure them until everything works correctly, then save it (the container) as an image. Create a new container off this image and test to see if you can build your code on top of it via docker-compose (or however you want to do/build it). If it works, than you have a working base image that you can upload to a repo so others can pull it.
To build a dockerfile with a public image, you will need to put all hacks, mods and setup on the dockerfile itself. That is, you will need to place every command line that you used into a text file and reduce whatever hacks, mods and setup into command lines. At the end, your dockerfile will create an image automatically and you don't need to store this image into a repo and all you need to do is to give others the dockerfile and they can spin the image up at their own docker.
Note that once you have a working dockerfile, you can tweak it easily as it will create a new image every time you use the dockerfile. With a custom image, you may run into issues where you need to rebuild the image due to conflicts. For example, all of your tools work with openjdk until you install one that doesn't work. The fix may involve uninstalling openjdk and use the oracle one, but all configuration you did for all the tools that you have installed broke.
The following answer applies to docker 1.7 and above:
I would prefer to use --from=NAME and from image as NAME
Why?
You can use --from=0 and above but this might get little hard to manage when you have many docker stages in dockerfile.
sample example:
FROM golang:1.7.3 as backend
WORKDIR /backend
RUN go get -d -v golang.org/x/net/html
COPY app.go .
RUN #install some stuff, compile assets....
FROM golang:1.7.3 as assets
WORKDIR /assets
RUN ./getassets.sh
FROM nodejs:latest as frontend
RUN npm install
WORKDIR /assets
COPY --from=assets /asets .
CMD ["./app"]
FROM alpine:latest as mergedassets
WORKDIR /root/
COPY --from=frontend . /
COPY --from=backend ./backend .
CMD ["./app"]
Note: Managing dockerfile properly will help to build a docker image much faster. Internally docker usings docker layer caching to help with this process, incase the image have to be rebuilt.
Yes, you can roll a whole lot of software into a single Docker image (GitLab does this, with one image that includes Postgres and everything else), but generalhenry is right - that's not the typical way to use Docker.
As you say, Cassandra and Kafka are dependencies for your Scala app, they're not part of the app, so they don't all belong in the same image.
Having to orchestrate many containers with Docker Compose adds an extra admin layer, but it gives you much more flexibility:
your containers can have different lifespans, so when you have a new version of your app to deploy, you only need to run a new app container, you can leave the dependencies running;
you can use the same app image in any environment, using different configurations for your dependencies - e.g. in dev you can run a basic Kafka container and in prod have it clustered on many nodes, your app container is the same;
your dependencies can be used by other apps too - so multiple consumers can run in different containers and all work with the same Kafka and Cassandra containers;
plus all the scalability, logging etc. already mentioned.
When might you want to "combine" Docker images?
As others are pointing out here, you typically don't want to put your database and you application into the same Docker image. Ideally you want a Docker image to wrap a "single process"/"runtime". This allows each process to be scaled up/down and restarted individually.
Let's say you want to use some shared C-libraries/executables that are not available in the package manager of the image you are using, but someone else has created an image where they are precompiled - and you might not want to recompile these binaries as part of your build (depending on how long this takes). Is there a way to quickly create a POC-Docker image containing all of these executables/libraries based on the existing images?
Docker and Composition
Relevant discussion: https://github.com/moby/moby/issues/3378
What Docker lacks is a good way of composing images. You can copy individual files or entire file systems from other images into your own using COPY --from=<image> <from-path> <to-path>. There is no builtin way of copying the environment variables from another image into your own.
That said, I have personally created a custom frontend/parser for Dockerfiles that adds an INCLUDE <image>-keyword. This copies the entire filesystem, along with the environment variables into your image:
DOCKER_BUILDKIT=1 docker build -t myimage .
#syntax=bergkvist/includeimage
FROM alpine:3.12.0
INCLUDE rust:1.44-alpine3.12
INCLUDE python:3.8.3-alpine3.12
nixpkgs.dockerTools
if you want truly composable Docker builds, I recommend checking out dockerTools in nixpkgs. This will also result in more reproducible (and typically very small) images. See https://nix.dev/tutorials/building-and-running-docker-images
docker load < $(nix-build docker-image.nix)
# docker-image.nix
let
pkgs = import <nixpkgs> {};
python = pkgs.python38;
rustc = pkgs.rustc;
in pkgs.dockerTools.buildImage {
name = "myimage";
tag = "latest";
contents = [ python rustc ];
}
Docker doesn't do merges of the images, but there isn't anything stopping you combining the dockerfiles if available, and rolling into them into a fat image which you'd need to build. There's times where this makes sense, however, as for running multiple processes in a container most Docker dogma will point to this as less desirable especially with microservice architecture (however rules are there to be broken right?)
You could not combine docker images into 1 container. See the detail discussions in Moby issue, How do I combine several images into one via Dockerfile.
For your case, it is better to not include the whole Cassandra and Kafka images. The application would only need the Cassandra Scala driver and Kafka Scala driver. The container should include the drivers only.
I needed docker:latest and python:latest images for Gitlab CI. Here is what I came up with:
FROM ubuntu:latest
RUN apt update
RUN apt install -y sudo
RUN sudo apt install -y docker.io
RUN sudo apt install -y python3-pip
RUN sudo apt install -y python3
RUN docker --version
RUN pip3 --version
RUN python3 --version
After I've build and pushed it to my Docker Hub repo:
docker build -t docker-hub-repo/image-name:latest path/to/Dockerfile
docker push docker-hub-repo/image-name:latest
Don't forget to docker login before push
Hope it helps
I have a few Dockerfiles right now.
One is for Cassandra 3.5, and it is FROM cassandra:3.5
I also have a Dockerfile for Kafka, but t is quite a bit more complex. It is FROM java:openjdk-8-fre and it runs a long command to install Kafka and Zookeeper.
Finally, I have an application written in Scala that uses SBT.
For that Dockerfile, it is FROM broadinstitute/scala-baseimage, which gets me Java 8, Scala 2.11.7, and STB 0.13.9, which are what I need.
Perhaps, I don't understand how Docker works, but my Scala program has Cassandra and Kafka as dependencies and for development purposes, I want others to be able to simply clone my repo with the Dockerfile and then be able to build it with Cassandra, Kafka, Scala, Java and SBT all baked in so that they can just compile the source. I'm having a lot of issues with this though.
How do I combine these Dockerfiles? How do I simply make an environment with those things baked in?
You can, with the multi-stage builds feature introduced in Docker 1.17
Take a look at this:
FROM golang:1.7.3
WORKDIR /go/src/github.com/alexellis/href-counter/
RUN go get -d -v golang.org/x/net/html
COPY app.go .
RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o app .
FROM alpine:latest
RUN apk --no-cache add ca-certificates
WORKDIR /root/
COPY --from=0 /go/src/github.com/alexellis/href-counter/app .
CMD ["./app"]
Then build the image normally:
docker build -t alexellis2/href-counter:latest
From : https://docs.docker.com/develop/develop-images/multistage-build/
The end result is the same tiny production image as before, with a significant reduction in complexity. You don’t need to create any intermediate images and you don’t need to extract any artifacts to your local system at all.
How does it work? The second FROM instruction starts a new build stage with the alpine:latest image as its base. The COPY --from=0 line copies just the built artifact from the previous stage into this new stage. The Go SDK and any intermediate artifacts are left behind, and not saved in the final image.
You can't combine dockerfiles as conflicts may occur. What you want to do is to create a new dockerfile or build a custom image.
TL;DR;
If your current development container contains all the tools you need and works, then save it as an image and upon it to a repo and create a dockerfile to pull from that image off that repo.
Details:
Building a custom image is by far easier than creating a dockerfile using a public image as you can store whatever hacks and mods into the image. To do so, start a blank container with a basic Linux image (or broadinstitute/scala-baseimage), install whatever tools you need and configure them until everything works correctly, then save it (the container) as an image. Create a new container off this image and test to see if you can build your code on top of it via docker-compose (or however you want to do/build it). If it works, than you have a working base image that you can upload to a repo so others can pull it.
To build a dockerfile with a public image, you will need to put all hacks, mods and setup on the dockerfile itself. That is, you will need to place every command line that you used into a text file and reduce whatever hacks, mods and setup into command lines. At the end, your dockerfile will create an image automatically and you don't need to store this image into a repo and all you need to do is to give others the dockerfile and they can spin the image up at their own docker.
Note that once you have a working dockerfile, you can tweak it easily as it will create a new image every time you use the dockerfile. With a custom image, you may run into issues where you need to rebuild the image due to conflicts. For example, all of your tools work with openjdk until you install one that doesn't work. The fix may involve uninstalling openjdk and use the oracle one, but all configuration you did for all the tools that you have installed broke.
The following answer applies to docker 1.7 and above:
I would prefer to use --from=NAME and from image as NAME
Why?
You can use --from=0 and above but this might get little hard to manage when you have many docker stages in dockerfile.
sample example:
FROM golang:1.7.3 as backend
WORKDIR /backend
RUN go get -d -v golang.org/x/net/html
COPY app.go .
RUN #install some stuff, compile assets....
FROM golang:1.7.3 as assets
WORKDIR /assets
RUN ./getassets.sh
FROM nodejs:latest as frontend
RUN npm install
WORKDIR /assets
COPY --from=assets /asets .
CMD ["./app"]
FROM alpine:latest as mergedassets
WORKDIR /root/
COPY --from=frontend . /
COPY --from=backend ./backend .
CMD ["./app"]
Note: Managing dockerfile properly will help to build a docker image much faster. Internally docker usings docker layer caching to help with this process, incase the image have to be rebuilt.
Yes, you can roll a whole lot of software into a single Docker image (GitLab does this, with one image that includes Postgres and everything else), but generalhenry is right - that's not the typical way to use Docker.
As you say, Cassandra and Kafka are dependencies for your Scala app, they're not part of the app, so they don't all belong in the same image.
Having to orchestrate many containers with Docker Compose adds an extra admin layer, but it gives you much more flexibility:
your containers can have different lifespans, so when you have a new version of your app to deploy, you only need to run a new app container, you can leave the dependencies running;
you can use the same app image in any environment, using different configurations for your dependencies - e.g. in dev you can run a basic Kafka container and in prod have it clustered on many nodes, your app container is the same;
your dependencies can be used by other apps too - so multiple consumers can run in different containers and all work with the same Kafka and Cassandra containers;
plus all the scalability, logging etc. already mentioned.
When might you want to "combine" Docker images?
As others are pointing out here, you typically don't want to put your database and you application into the same Docker image. Ideally you want a Docker image to wrap a "single process"/"runtime". This allows each process to be scaled up/down and restarted individually.
Let's say you want to use some shared C-libraries/executables that are not available in the package manager of the image you are using, but someone else has created an image where they are precompiled - and you might not want to recompile these binaries as part of your build (depending on how long this takes). Is there a way to quickly create a POC-Docker image containing all of these executables/libraries based on the existing images?
Docker and Composition
Relevant discussion: https://github.com/moby/moby/issues/3378
What Docker lacks is a good way of composing images. You can copy individual files or entire file systems from other images into your own using COPY --from=<image> <from-path> <to-path>. There is no builtin way of copying the environment variables from another image into your own.
That said, I have personally created a custom frontend/parser for Dockerfiles that adds an INCLUDE <image>-keyword. This copies the entire filesystem, along with the environment variables into your image:
DOCKER_BUILDKIT=1 docker build -t myimage .
#syntax=bergkvist/includeimage
FROM alpine:3.12.0
INCLUDE rust:1.44-alpine3.12
INCLUDE python:3.8.3-alpine3.12
nixpkgs.dockerTools
if you want truly composable Docker builds, I recommend checking out dockerTools in nixpkgs. This will also result in more reproducible (and typically very small) images. See https://nix.dev/tutorials/building-and-running-docker-images
docker load < $(nix-build docker-image.nix)
# docker-image.nix
let
pkgs = import <nixpkgs> {};
python = pkgs.python38;
rustc = pkgs.rustc;
in pkgs.dockerTools.buildImage {
name = "myimage";
tag = "latest";
contents = [ python rustc ];
}
Docker doesn't do merges of the images, but there isn't anything stopping you combining the dockerfiles if available, and rolling into them into a fat image which you'd need to build. There's times where this makes sense, however, as for running multiple processes in a container most Docker dogma will point to this as less desirable especially with microservice architecture (however rules are there to be broken right?)
You could not combine docker images into 1 container. See the detail discussions in Moby issue, How do I combine several images into one via Dockerfile.
For your case, it is better to not include the whole Cassandra and Kafka images. The application would only need the Cassandra Scala driver and Kafka Scala driver. The container should include the drivers only.
I needed docker:latest and python:latest images for Gitlab CI. Here is what I came up with:
FROM ubuntu:latest
RUN apt update
RUN apt install -y sudo
RUN sudo apt install -y docker.io
RUN sudo apt install -y python3-pip
RUN sudo apt install -y python3
RUN docker --version
RUN pip3 --version
RUN python3 --version
After I've build and pushed it to my Docker Hub repo:
docker build -t docker-hub-repo/image-name:latest path/to/Dockerfile
docker push docker-hub-repo/image-name:latest
Don't forget to docker login before push
Hope it helps
I am presently working with a third party Docker image whose Dockerfile is based on the empty image, starting with the FROM scratch directive.
How can Bash be installed on such image? I tried adding some extra commands to the Dockerfile, but apparently the RUN directive itself requires Bash.
When you start a Docker image FROM scratch you get absolutely nothing. Usually the way you work with one of these is by building a static binary on your host (or these days in an earlier Dockerfile build stage) and then COPY it into the image.
FROM scratch
COPY mybinary /
ENTRYPOINT ["/mybinary"]
Nothing would stop you from creating a derived image and COPYing additional binaries into it. Either you'd have to specifically build a static binary or install a full dynamic library environment.
If you're doing this to try to debug the container, there is probably nothing else in the image. One thing this means is that the set of things you can do with a shell is pretty boring. The other is that you're not going to have the standard tool set you're used to (there is not an ls or a cp). If you can live without bash's various extensions, BusyBox is a small tool designed to be statically built and installed in limited environments that provides minimal versions of most of these standard tools.
The question is old but I see a similar question and came here, SO posting to deal with such case below.
I am presently working with a third party Docker image whose
Dockerfile is based on the empty image, starting with the FROM scratch
directive.
As mentioned by #David there is nothing in such image that is based on scratch, If the image is based on the scratch image they just copy the binaries to image and that's it.
So the hack around with such image is to copy the binaries into your extend image and use them in your desired docker image.
For example postgres_exporter
FROM scratch
ARG binary
COPY $binary /postgres_exporter
EXPOSE 9187
ENTRYPOINT [ "/postgres_exporter" ]
So this is based on scratch and I can not install bash or anything else I can just copy binaries to run.
So here is the work, use them as the multi-stage base image, copy the binaries and installed packages in your docker images.
Below we need to add wait-for-it
FROM wrouesnel/postgres_exporter
# use the above base image
FROM debian:7.11-slim
RUN useradd -u 20001 postgres_exporter
USER postgres_exporter
#copy binires
COPY --from=0 /postgres_exporter /postgres_exporter
EXPOSE 9187
COPY wait-for-it.sh wait-for-it.sh
USER root
RUN chmod +x wait-for-it.sh
USER postgres_exporter
RUN pwd
ENTRYPOINT ["./wait-for-it.sh", "db:5432", "--", "./postgres_exporter"]