Dockerfile with multiple base image - docker

I m trying to create a simple dockerfile in which I need to build my nodejs project in multiple steps :
Installing and caching my dependencies
Running my unit tests
Running my acceptance tests
Building my project
to ensure that my project is working great. Here's what I m having for now :
FROM node:6.9
# Enviroment variables
ENV HOMEDIR /data
RUN mkdir -p ${HOMEDIR}
WORKDIR ${HOMEDIR}
# install all dependencies
ADD package.json ./
RUN npm install
# ... some stuff goes here without any importance
# add node content initially
ADD . .
CMD CI=true npm test && npm run test:acceptance && npm run build
When running my acceptance tests, I use a selenium server. And I need java for this.
The fact is that I don't have java installed, and I wanted to use a "standard" image, while conserving my current node:6.9 image, that would allow me to switch easily from version to version (like https://hub.docker.com/_/openjdk/). I mean there that I don't want to manually install JAVA on my current image.
My problem is that I can't use multiple FROM sources inside my dockerfile and I don't know if what I need is even possible.
Any suggestion ?

The docker way is have small and lightweight image as possible. You production image does not need a java, selenium and etc...
Building and testing application must be outside of container. It can be another image (with selenium, java, etc; or building cluster with multiple containers like selenium, java, etc) for building production images.

I would recommend you to have a base image that only contains the base OS and required software that your application require for running.
Use that base image and create multiple images for different tests.
Once you are done with all the testing use the same base image to package and dockerize your application.

Related

docker-compose ignores ubuntu:latest in Dockerfile [duplicate]

I want to build a docker image for the Linkurious project on github, which requires both the Neo4j database, and Node.js to run.
My first approach was to declare a base image for my image, containing Neo4j. The reference docs do not define "base image" in any helpful manner:
Base image:
An image that has no parent is a base image
from which I read that I may only have a base image if that image has no base image itself.
But what is a base image? Does it mean, if I declare neo4j/neo4j in a FROM directive, that when my image is run the neo database will automatically run and be available within the container on port 7474?
Reading the Docker reference I see:
FROM can appear multiple times within a single Dockerfile in order to create multiple images. Simply make a note of the last image ID output by the commit before each new FROM command.
Do I want to create multiple images? It would seem what I want is to have a single image that contains the contents of other images e.g. neo4j and node.js.
I've found no directive to declare dependencies in the reference manual. Are there no dependencies like in RPM where in order to run my image the calling context must first install the images it needs?
I'm confused...
As of May 2017, multiple FROMs can be used in a single Dockerfile.
See "Builder pattern vs. Multi-stage builds in Docker" (by Alex Ellis) and PR 31257 by Tõnis Tiigi.
The general syntax involves adding FROM additional times within your Dockerfile - whichever is the last FROM statement is the final base image. To copy artifacts and outputs from intermediate images use COPY --from=<base_image_number>.
FROM golang:1.7.3 as builder
WORKDIR /go/src/github.com/alexellis/href-counter/
RUN go get -d -v golang.org/x/net/html
COPY app.go .
RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o app .
FROM alpine:latest
RUN apk --no-cache add ca-certificates
WORKDIR /root/
COPY --from=builder /go/src/github.com/alexellis/href-counter/app .
CMD ["./app"]
The result would be two images, one for building, one with just the resulting app (much, much smaller)
REPOSITORY TAG IMAGE ID CREATED SIZE
multi latest bcbbf69a9b59 6 minutes ago 10.3MB
golang 1.7.3 ef15416724f6 4 months ago 672MB
what is a base image?
A set of files, plus EXPOSE'd ports, ENTRYPOINT and CMD.
You can add files and build a new image based on that base image, with a new Dockerfile starting with a FROM directive: the image mentioned after FROM is "the base image" for your new image.
does it mean that if I declare neo4j/neo4j in a FROM directive, that when my image is run the neo database will automatically run and be available within the container on port 7474?
Only if you don't overwrite CMD and ENTRYPOINT.
But the image in itself is enough: you would use a FROM neo4j/neo4j if you had to add files related to neo4j for your particular usage of neo4j.
Let me summarize my understanding of the question and the answer, hoping that it will be useful to others.
Question: Let’s say I have three images, apple, banana and orange. Can I have a Dockerfile that has FROM apple, FROM banana and FROM orange that will tell docker to magically merge all three applications into a single image (containing the three individual applications) which I could call smoothie?
Answer: No, you can't. If you do that, you will end up with four images, the three fruit images you pulled, plus the new image based on the last FROM image. If, for example, FROM orange was the last statement in the Dockerfile without anything added, the smoothie image would just be a clone of the orange image.
Why Are They Not Merged? I Really Want It
A typical docker image will contain almost everything the application needs to run (leaving out the kernel) which usually means that they’re built from a base image for their chosen operating system and a particular version or distribution.
Merging images successfully without considering all possible distributions, file systems, libraries and applications, is not something Docker, understandably, wants to do. Instead, developers are expected to embrace the microservices paradigm, running multiple containers that talk to each other as needed.
What’s the Alternative?
One possible use case for image merging would be to mix and match Linux distributions with our desired applications, for example, Ubuntu and Node.js. This is not the solution:
FROM ubuntu
FROM node
If we don’t want to stick with the Linux distribution chosen by our application image, we can start with our chosen distribution and use the package manager to install the applications instead, e.g.
FROM ubuntu
RUN apt-get update &&\
apt-get install package1 &&\
apt-get install package2
But you probably knew that already. Often times there isn’t a snap or package available in the chosen distribution, or it’s not the desired version, or it doesn't work well in a docker container out of the box, which was the motivation for wanting to use an image. I’m just confirming that, as far as I know, the only option is to do it the long way, if you really want to follow a monolithic approach.
In the case of Node.js for example, you might want to manually install the latest version, since apt provides an ancient one, and snap does not come with the Ubuntu image. For neo4j we might have to download the package and manually add it to the image, according to the documentation and the license.
One strategy, if size does not matter, is to start with the base image that would be hardest to install manually, and add the rest on top.
When To Use Multiple FROM Directives
There is also the option to use multiple FROM statements and manually copy stuff between build stages or into your final one. In other words, you can manually merge images, if you know what you're doing. As per the documentation:
Optionally a name can be given to a new build stage by adding AS name
to the FROM instruction. The name can be used in subsequent FROM and
COPY --from=<name> instructions to refer to the image built in this
stage.
Personally, I’d only be comfortable using this merge approach with my own images or by following documentation from the application vendor, but it’s there if you need it or you're just feeling lucky.
A better application of this approach though, would be when we actually do want to use a temporary container from a different image, for building or doing something and discard it after copying the desired output.
Example
I wanted a lean image with gpgv only, and based on this Unix & Linux answer, I installed the whole gpg with yum and then copied only the binaries required, to the final image:
FROM docker.io/photon:latest AS builder
RUN yum install gnupg -y
FROM docker.io/photon:latest
COPY --from=builder /usr/bin/gpgv /usr/bin/
COPY --from=builder /usr/lib/libgcrypt.so.20 /usr/lib/libgpg-error.so.0 /usr/lib/
The rest of the Dockerfile continues as usual.
The first answer is too complex, historic, and uninformative for my tastes.
It's actually rather simple. Docker provides for a functionality called multi-stage builds the basic idea here is to,
Free you from having to manually remove what you don't want, by forcing you to allowlist what you do want,
Free resources that would otherwise be taken up because of Docker's implementation.
Let's start with the first. Very often with something like Debian you'll see.
RUN apt-get update \
&& apt-get dist-upgrade \
&& apt-get install <whatever> \
&& apt-get clean
We can explain all of this in terms of the above. The above command is chained together so it represents a single change with no intermediate Images required. If it was written like this,
RUN apt-get update ;
RUN apt-get dist-upgrade;
RUN apt-get install <whatever>;
RUN apt-get clean;
It would result in 3 more temporary intermediate Images. Having it reduced to one image, there is one remaining problem: apt-get clean doesn't clean up artifacts used in the install. If a Debian maintainer includes in his install a script that modifies the system that modification will also be present in the final solution (see something like pepperflashplugin-nonfree for an example of that).
By using a multi-stage build you get all the benefits of a single changed action, but it will require you to manually allowlist and copy over files that were introduced in the temporary image using the COPY --from syntax documented here. Moreover, it's a great solution where there is no alternative (like an apt-get clean), and you would otherwise have lots of un-needed files in your final image.
See also
Multi-stage builds
COPY syntax
Here is probably one of the most fundamental use cases of using multiple FROMs, aka, multi stage builds.
I want want one dockerfile, and I want to change one word and depending on what I set that word to, I get different images depending on whether I want to run, Dev or Publish the application!
Run - I just want to run the app
Dev - I want to edit the code and run the app
Publish - Run the app in production
Lets suppose we're working in the dotnet environment. Heres one single Dockerfile. Without multi stage build, there would be multiple files (builder pattern)
#See https://aka.ms/containerfastmode to understand how Visual Studio uses this Dockerfile to build your images for faster debugging.
FROM mcr.microsoft.com/dotnet/runtime:5.0 AS base
WORKDIR /app
FROM mcr.microsoft.com/dotnet/sdk:5.0 AS build
WORKDIR /src
COPY ["ConsoleApp1/ConsoleApp1.csproj", "ConsoleApp1/"]
RUN dotnet restore "ConsoleApp1/ConsoleApp1.csproj"
COPY . .
WORKDIR "/src/ConsoleApp1"
RUN dotnet build "ConsoleApp1.csproj" -c Release -o /app/build
FROM build AS publish
RUN dotnet publish "ConsoleApp1.csproj" -c Release -o /app/publish
FROM base AS final
WORKDIR /app
COPY --from=publish /app/publish .
ENTRYPOINT ["dotnet", "ConsoleApp1.dll"]
Want to run the app? Leave FROM base AS final as it currently is in the dockerfile above.
Want to dev the source code in the container? Change the same line to FROM build AS final
Want to release into prod? Change the same line to FROM publish AS final
I agree with the OP, that this feature is useful for docker! Here is a different view into the same problem:
If you had multiple FROMs (or a "FROM" and multiple "MERGE"'s, for example) then you can use the docker registry versioning system for the base docker image AND other container elements, and that is the win here: I have third party development tools which do not exist in .deb format, so these tools must be installed by un-taring a tball and is HUGE, so caching on the docker host will be important but versioning/change control of the image is equally important. I (think I) can simply use "RUN git ....", and docker will deal with the caching of the new layer for me, which is what I want; because another container will have the same base image but a different set of HUGE third party tools, so the caching of the base image and the tools image is really important (the 3rd party tools tar can be as big as the base image of say ubuntu so caching of these is really important too). The (suggested) feature just allows all these elements to be managed in a central repo. versioning system.
Said a different way, why do we use FROM at all? If I were to simply git clone an ubuntu image using the RUN command for my "base image/layer", this would create a new layer and docker would cache this anyway...so is there any difference/advantage in using FROM, other than it uses dockers internal versioning system/syntax?

Separating Docker files and application source files to optimize production environment

I have a bunch of (Ruby) scripts stored on a server. Up until now, my team has used them by opening an accessor app that launches a list of the script names, and they select the script they want to run in that instance on the files in their working folder. The scripts are run directly from the server, so updates made to the script files are automatically reflected when a user runs the script.
The scripts require a fair amount of specific dependencies, so I'm trying to move to a Docker-based workflow to eliminate the problems we encounter with incongruent computer environments. I've been able to successfully build an image with our script library and run an instance of it on my computer.
However, all of the documentation and tutorials include the application source files when building an image, so that all the files are copied over by the Dockerfile. From my understanding, this means that any time the code in the application files needs to be updated, all the users will need to rebuild the image before trying to run anything. I would very rarely ever need to make changes to the environment settings/dependencies, but the app code is changed relatively frequently, so it seems like having every user rebuild an image every single time a line of app code is changed would actually slow down everyone's workflow considerably.
My question is this: Is it not possible to have Docker simply create the environment that a user must have to run the applications, but have the applications themselves still run directly off the server where they were originally stored? And does a new container need to be created every single time a user wants to run any one of the scripts? (The users are not tech-savvy.)
Generally you'd do this by using a Docker image instead of the checked-out tree of scripts. You can use a Docker registry to store a built copy of the image somewhere on the network; Docker Hub works for this, most large public-cloud providers have some version of this (AWS ECR, Google GCR, Azure ACR, ...), or you can run your own. The workflow for using this would generally look like
# Get any updates to the "latest" version of the image
# (can be run infrequently)
docker pull ourorg/scripts
# Actually run the script, injecting config files and credentials
docker run --rm \
-v $PWD/config:/config \
-v $HOME/.ssh:/config/.ssh \
ourorg/scripts \
some_script.rb
# Nothing in this example actually requires a local copy of the scripts
I'm envisioning a directory that has kind of a mix of scripts and support files and not a lot of organization to it. Still, you could write a simple Dockerfile that looks like
FROM ruby:2.7
WORKDIR /opt/scripts
# As of Bundler 2.1, there is no compatibility between Bundler
# versions; this must match exactly what is in Gemfile.lock
RUN gem install bundler -v 2.1.4
# Copy the scripts in and do basic installation
COPY Gemfile Gemfile.lock .
RUN bundle install
COPY . .
ENV PATH /opt/scripts:$PATH
# Prefix all commands with...
ENTRYPOINT ["bundle", "exec"]
# The default command to run is...
CMD ["ls"]
On the back end you'd need a continous integration service (Jenkins is popular if a little unwieldy; there are a large selection of cloud-hosted ones) that can rebuild the Docker image whenever there's a commit to the source repository. You can generally rig this up so that it happens automatically whenever anybody pushes anything.
This process makes more sense of most people are just using the set of scripts and few of them are developing them. It's also a little bit difficult to discover what the scripts are (you might be able to docker run --rm ourorg/scripts ls though).
Is it not possible to have Docker simply create the environment that a user must have to run the applications, but have the applications themselves still run directly off the server where they were originally stored?
This always strikes me as an ineffective use of Docker. You have all of the fiddly steps of your current workflow that require everyone to run a git pull or equivalent routinely, but you also have to inject the host source tree into the container. If there are OS incompatibilities in, for example, native gems in the vendor tree, you have to work around that.
# You still need to do this periodically
git pull
# And you also need to
sudo docker run \
--rm \
-v $PWD:/app \
-v $HOME/config:/config \
-v $HOME/.ssh:/config/.ssh \
-w /app \
ruby:2.7 \
bundle exec ./some_script.rb
Some of these details (especially the config file and credentials) you'd have to deal with even if you did build an image; some others of the details you could improve by building an image. Inside the image you need to correct the ownership and permissions on the ssh keys and replace the $PWD/vendor tree with something the container can run, without modifying the mounted host directories.
Is it not possible to have Docker simply create the environment that a user must have to run the applications, but have the applications themselves still run directly off the server where they were originally stored?
You can build an image with all the environment already installed then mount the directory with the scripts so the container can read the scripts from the host. Something like
docker run -it --rm -v /opt/myscripts:/myscripts myimage somescript.rb
Then your image Dockerfile would end with:
WORKDIR /myscripts
ENTRYPOINT ["/usr/bin/ruby"]
And does a new container need to be created every single time a user wants to run any one of the scripts?
Of course, a container is just an isolated process managed by docker, you could make a wrapper so the users wouldn't need to type the full docker run command.

How do I use environment variables in a static site inside docker?

I have a react app built with webpack that I want to deploy inside a docker container. I'm currently using the DefinePlugin to pass the api url for the app along with some other environment variables into the app during the build phase. The relevant part of my webpack config looks something like:
plugins: [
new DefinePlugin({
GRAPHQL_API_URL: JSON.stringify(process.env.GRAPHQL_API_URL),
DEBUG: process.env.DEBUG,
...
}),
...
]
Since this strategy requires the environment variables at build time, my docker file is a little icky, since I need to actually put the webpack build call as part of the CMD command:
FROM node:10.16.0-alpine
WORKDIR /usr/app/
COPY . ./
RUN npm install
# EXPOSE and serve -l ports should match
EXPOSE 3000
CMD npm run build && npm run serve -- -l 3000
I'd love for the build step in webpack to be a layer in the docker container (a RUN command), so I could potentially clean out all the source files after the build succeeds, and so start up is faster. Is there a standard strategy for dealing with this issue of using information from the docker environment when you are only serving static files?
How do I use environment variables in a static site inside docker?
This question is broader than your specific problem I think. The generic answer to this is, you can't, by nature of the fact that the content is static. If you need the API URL to be dynamic and modifiable at runtime then there needs to be some feature to support that. I'm not familiar enough with webpack to know if this can work but there is a lot of information at the following link that might help you.
Passing environment-dependent variables in webpack
Is there a standard strategy for dealing with this issue of using information from the docker environment when you are only serving static files?
If you are happy to have the API URL baked into the image then the standard strategy with static content in general is to use a multistage build. This generates the static content and then copies it to a new base image, leaving behind any dependencies that were required for the build.
https://docs.docker.com/develop/develop-images/multistage-build/

Selecting different code branches when using a shared base image in Docker

I am containerising a codebase that serves multiple applications. I have created three images;
app-base:
FROM ubuntu
RUN apt-get install package
COPY ./app-code /code-dir
...
app-foo:
FROM app-base:latest
RUN foo-specific-setup.sh
and app-buzz which is very similar to app-foo.
This works currently, except I want to be able to build versions of app-foo and app-buzz for specific code branches and versions. It's easy to do that for app-base and tag appropriately, but app-foo and app-buzz can't dynamically select that tag, they are always pinned to app-base:latest.
Ultimately I want this build process automated by Jenkins. I could just dynamically re-write the Dockerfile, or not have three images and just have two nearly-but-not-quite identical Dockerfiles for each app that would need to be kept in sync manually (later increasing to 4 or 5). Each of those solutions has obvious drawbacks however.
I've seen lots of discussions in the past about things such as an INCLUDE statement, or dynamic tags. None seemed to come to anything.
Does anyone have a working, clean(ish) solution to this problem? As long as it means Dockerfile code can be shared across images, I'd be happy. If it also means that the shared layers of images don't need to be rebuilt for each app, then even better.
You could still use build args to do this.
Dockerfile:
FROM ubuntu
ARG APP_NAME
RUN echo $APP_NAME-specific-setup.sh >> /root/test
ENTRYPOINT cat /root/test
Build:
docker build --build-arg APP_NAME=foo -t foo .
Run:
$ docker run --rm foo
foo-specific-setup.sh
In your case you could run the correct script in the RUN using the argument you just set before. You would have one Dockerfile per app-base variant and run the correct set-up based on the build argument.
FROM ubuntu
RUN apt-get install package
COPY ./app-code /code-dir
ARG APP_NAME
RUN $APP_NAME-specific-setup.sh
Any layers before setting the ARG would not need to be rebuilt when creating other versions.
You can then push the built images to separate docker repositories for each app.
If your apps need different ENTRYPOINT instructions, you can have an APP_NAME-entrypoint.sh per app and rename it to entrypoint.sh within your APP_NAME-specific-setup.sh (or pass it through as an argument to run).

Development dependencies in Dockerfile or separate Dockerfiles for production and testing

I'm not sure if I should create different Dockerfile files for my Node.js app. One for production without the development dependencies and one for testing with the development dependencies included.
Or one file which is basically the development Dockerfile.dev. Then main difference of both files is the npm install command:
Production:
FROM ...
...
RUN npm install --quiet --production
...
CMD ...
Development/Test:
FROM ...
...
RUN npm install
...
CMD ...
The question arises because I want to be able to run my tests inside the container via docker run command. Therefore I need the test dependencies (typically dev dependencies for me).
Seems a little bit odd to put dependencies not needed in production into the image. On the other hand creating/maintaining a second Dockerfile.dev which just minor differences seems also not right. So what is the a good practise for this kind of problem.
No, you don't need to have different Dockerfiles and in fact you should avoid that.
The goal of docker is to ship your app in an immutable, well tested artifact (docker images) which is identical for production and test and even dev.
Why? Because if you build different artifacts for test and production how can you guarantee what you have already tested is working in production too? you can't because they are two different things.
Given all that, if by test you mean unit tests, then you can mount your source code inside docker container and run tests without building any docker images. And that's fine. Remember you can build image for tests but that terribly slow and makes development quiet difficult and slow which is not good at all. Then if your test passed you can build you app container safely.
But if you mean acceptance test that actually needs to run against your running application then you should create one image for your app (only one) and run tests in another container (mount test source code for example) and run tests against that container. This obviously means what your build for your app is different for npm installs for your tests.
I hope this gives you some over view.
Well then you'll have to support several Dockerfiles that are almost identical. Instead I recommend to use NodeJS feature like production profile. And another one recommendation regarding to
RUN npm install --quiet --production
It is better to create separate .sh file and do something like this instead:
ADD ./scripts/run.sh /run.sh
RUN chmod +x /*.sh
And also think about to start using Gulp.
UPD #1
By default npm install installs devDependencies. In order to get around this - use npm install --production OR set the NODE_ENV environment variable to production value.
Putting script line in separate file is a good practice in order not to change Dockerfile often. If you'll need changes next time then you'll have to update only script-file and you're done. In future you could also have some additional work to do.

Resources