Additional steps in Dockerfile - docker

I have a Docker image which is a server for a web IDE (Jupyter notebook) for Haskell.
Each time I want to allow the usage of a library in the IDE, I have to go to the Dockerfile and add the install command into it, then rebuild the image.
Another drawback of this, I have to fork the original image on Github, not allowing me to contribute to it.
I was thinking about writing another Dockerfile which pulls the base one with the FROM directive and then RUNs the commands to install the libraries. But, as they are in separate layers, the guest system does not find the Haskell package manager command.
TL;DR: I want to run stack install <library> (stack is like npm or pip, but for Haskell) from the Dockerfile, but I dont want to have a fork of the base image.
How could I solve this problem?

I was thinking about writing another Dockerfile which pulls the base one with the FROM directive and then RUNs the commands to install the libraries. But, as they are in separate layers, the guest system does not find the Haskell package manager command.
This is indeed the correct way to do this, and it should work. I'm not sure I understand the "layers" problem here - the commands executed by RUN should be running in an intermediate container that contains all of the layers from the base image and the previous RUN commands. (Ignoring the possibility of multi-stage builds, but these were added in 17.05 and did not exist when this question was posted.)
The only scenario I can see where stack might work in the running container but not in the Dockerfile RUN command would be if the $PATH variable isn't set correctly at this point. Check this variable, and make sure RUN is running as the correct user?

Related

Why do we build "inside" docker?

When I first learned Docker I expected a config file, image producer, CLI, and options for mounting and networks. That's all there.
I did not expect to put build commands inside a Dockerfile. I thought docker would wrap/tar/include a prebuilt task I made. Why give build commands in Docker?
Surely it can import a task thus keeping Jenkins/Bazel etc. distinct and apart for making an image/container?
I guess we are dealing with a misconception here. Docker is NOT a lighweight version of VMware/Xen/KVM/Parallels/FancyVirtualization.
Disclaimer: The following is heavily simplified for the sake of comprehensiveness.
So what is Docker?
In one sentence: Docker is a system to isolate processes from the other processes within an operating system as much as possible while still providing all means to run them. Put differently:
Docker is a package manager for isolated processes.
One of its closest ancestors are chroot and BSD jails. What those basically do is to isolate (more in the case of BSD, less in the case of chroot) a part of your OS resources and have a complete environment running independently from the rest of the OS - except for the kernel.
In order to be able to do that, a Docker image obviously needs to contain everything except for a kernel. So you need to provide a shell (if you choose to do so), standard libraries like glibc and even resources like CA certificates. For reference: In order to set up chroot jails, you did all this by hand once upon a time, preinstalling your chroot environment with each and every piece of software required. Docker is basically taking the heavy lifting from you here.
The mentioned isolation even down to the installed (and usable software) sounds cumbersome, but it gives you several advantages as a developer. Since you provide basically everything except for a (compatible) kernel, you can develop and test your code in the same environment it will run later down the road. Not a close approximation, but literally the same environment, bit for bit. A rather famous proverb in relation to Docker is:
"Runs on my machine" is no excuse any more.
Another advantage is that can add static resources to your Docker image and access them via quite ordinary file system semantics. While it is true that you can do that with virtualisation images as well, they usually do not come with a language for provisioning. Docker does - the Dockerfile:
FROM alpine
LABEL maintainer="you#example.com"
COPY file/in/host destination/on/image
Ok, got it, now why the build commands?
As described above, you need to provide all dependencies (and transitive dependencies) your application has. The easiest way to ensure that is to build your application inside your Docker image:
FROM somebase
RUN yourpackagemanager install long list of dependencies && \
make yourapplication && \
make install
If the build fails, you know you have missing dependencies. Now you can tweak and tune your Dockerfile until it compiles and is tested. So now your Docker image is finished, you can confidently distribute it, since you know that as long as the docker daemon runs on the machine somebody tries to run your image on, your image will run.
In the Go ecosystem, you basically assure your go.mod and go.sum are up to date and working and your work stay's reproducible.
Again, this works with virtualisation as well, so where is the deal?
A (good) docker image only runs what it needs to run. In the vast majority of docker images, this means exactly one process, for example your Go program.
Side note: It is very bad practise to run multiple processes in one Docker image, say your application and a database server and a cache and whatnot. That is what docker-compose is there for, or more generally container orchestration. But this is far too big of a topic to explain here.
A virtualised OS, however, needs to run a kernel, a shell, drivers, log systems and whatnot.
So the deal basically is that you get all the good stuff (isolation, reproducibility, ease of distribution) with less waste of resources (running 5 versions of the same OS with all its shenanigans).
Because we want to have enviroment for reproducible build. We don't want to depend on version of language, existence of compiler, version of libraires and so on.
Building inside a Dockerfile allows you to have all the tools and environment you need inside independently of your platform and ready to use. In a development perspective is easier to have all you need inside the container.
But you have to think about the objective of building inside a Dockerfile, if you have a very complex build process with a lot of dependencies you have to be worried about having all the tools inside and it reflects on the final size of your resulting image. Because this is not the same building to generate an artifact than building to produce the final container.
Thinking about this two aspects you have to learn to use the multistage build process in Docker here. The main idea is closer to your question because you can have a as many stages as you need depending on your build process and use different FROM images to ensure you have the correct requirements and dependences on each stage, to finally generate the image with the minimum dependencies and smaller size.
I'll add to the answers above:
Doing builds in or out of docker is a choice that depends on your goal. In my case I am more interested in docker containers for kubernetes, and in addition we have mature builds already.
This link shows how you take prebuilt tasks and add them to an image. This strategy together with adding libs, env etc leverages docker well and shows that indeed docker is flexible. https://medium.com/#chemidy/create-the-smallest-and-secured-golang-docker-image-based-on-scratch-4752223b7324

Docker containers with multiple bases?

I don't really understand something basic about Docker, specifically if I wanted to build from multiple bases within the same Dockerfile. For example, I know that these two lines wouldn't work:
FROM X
FROM Y
(well it would compile but then it seems to only include the image from X in the final build). Or perhaps I'm wrong and this is correct but I still haven't seen any other Dockerfiles like this.
Why would I want to do this? For instance, if X and Y are images I found on DockerHub that I would like to build from. For a concrete example if I wanted ubuntu and I also wanted python:
FROM python:2
FROM ubuntu:latest
What is the best way to go about it? Am I just limited to one base? If I want the functionality from both am I supposed to go into the docker files until I find something in common to both of them and build the image myself by copying the one of the dockerfile's code manually all the way through sub images until I reach the common base and add those lines to the other Dockerfile? I imagine this is not the correct way to do this as it seems quite involved and not in line with the simplicity that Docker aims to provide.
For a concrete example if I wanted ubuntu and I also wanted python:
FROM python:2
FROM ubuntu:latest
Ubuntu is Os, not a python. so what you need a Ubuntu base image which has python installed.
you can check offical python docker hub are based on ubuntu, so at one image you will get ubuntu + python, then why bother with two FROM? which is not also not working.
Some of these tags may have names like buster or stretch in them.
These are the suite code names for releases of Debian and indicate
which release the image is based on. If your image needs to install
any additional packages beyond what comes with the image, you'll
likely want to specify one of these explicitly to minimize breakage
when there are new releases of Debian.
So for you below question
What is the best way to go about it? Am I just limited to one base? If
I want the functionality from both am I supposed to go into the docker
files
yes, limit it one base image suppose your base image
python:3.7-stretch
So with this base image, you have python and ubuntu both. you do not need to make Dockerfile that have two FROM.
Also, you do need to maintain and build the image from scratch, use the offical one and extend as per your need.
For example
FROM python:3.7-stretch
RUN apt-get update && apt-get install -y vim
RUN pip install mathutils
Multiple FROM lines in a Dockerfile are used to create a multi-stage build. The result of a build will still be a single image, but you can run commands in one stage, and copy files to the final stage from earlier stages. This is useful if you have a full compile environment with all of the build tools to compile your application, but only want to ship the runtime environment with the image you deploy. Your first stage would be something like a full JDK with Maven or similar tools, while your final stage would be just your JRE and the JAR file copied from the first stage.

What is the best practices for getting code into a docker container?

What is the best practices for getting code into a docker container?
Here are some possible approaches:
ADD call in docker container
git clone or git wget from repo in docker container
Mount an external device
Usually, we use mount for dev/local environment with docker to have the code changes instantly applied.
You can use RUN git clone but you will need to install git and have access to the repository.
The easiest and most used/recommended way is putting the Dockerfile inside your repo and use the ADD call. But use the COPY directive instead because it’s more explicit
You should always COPY your application code in the Dockerfile. You generally shouldn't mount something over that, and you almost definitely shouldn't run git in your Dockerfile.
Running Git has a number of obvious issues. You can only build whatever specific commit the Dockerfile mentions; in real development it's not unusual to want to build a branch or a past commit. You need to install both git itself and relevant credentials in the image, with corresponding space and security issues. Docker layer caching means that docker build by default won't repeat a git clone step if it thinks it's already done that, which means you need to go out of your way to get a newer build.
It's very common in SO questions to mount code into a container. This very much defeats the point of Docker, IMHO. If you go to run your application in production you want to just distribute the image and not also separately distribute its code, but if you've been developing in an environment where you always hide the code built into the image, you've never actually run the image itself. There are also persistent performance and consistency problems in various environments. I don't really recommend docker run -v or Docker Compose volumes: as a way of getting code into a container.
I've seen enough SO questions that have a Dockerfile that's essentially just FROM python, for example, and then use Docker Compose volumes: and command: options to get a live-development environment on top of that. It's much easier to just install Python, and have the tools you need right in front of you without needing to go through a complex indirection sequence to run them in Docker.

Buiding docker containers for golang applications without calling go build

I'm building my first Dockerfile for a go application and I don't understand while go build or go install are considered a necessary part of the docker container.
I know this can be avoided using muilt-stage but I don't know why it was ever put in the container image in the first place.
What I would expect:
I have a go application 'go-awesome'
I can build it locally from cmd/go-awesome
My Dockerfile contains not much more than
COPY go-awesome .
CMD ["go-awesome"]
What is the downside of this configuration? What do I gain by instead doing
COPY . .
RUN go get ./...
RUN go install ./..
Links to posts showing building go applications as part of the Dockerfile
https://www.callicoder.com/docker-golang-image-container-example/
https://blog.codeship.com/building-minimal-docker-containers-for-go-applications/
https://www.cloudreach.com/blog/containerize-this-golang-dockerfiles/
https://medium.com/travis-on-docker/how-to-dockerize-your-go-golang-app-542af15c27a2
You are correct that you can compile your application locally and simply copy the executable into a suitable docker image.
However, there are benefits to compiling the application inside the docker build, particularly for larger projects with multiple collaborators. Specifically the following reasons come to mind:
There are no local dependencies (aside from docker) required to build the application source. Someone wouldn't even need to have go installed. This is especially valuable for projects in which multiple languages are in use. Consider someone who might want to edit an HTML template inside of a go project and see what that looked like in the container runtime..
The build environment (version of go, dependency managment, file paths...) is constant. Any external dependencies can be safely managed and maintained via the Dockerfile.

Install ansible application to docker container

I have an application which can be installed with ansible. No I want to create docker image that includes installed application.
My idea is to up docker container from some base image, after that start installation from external machine, to this docker container. After that create image from this container.
I am just starting with dockers, could you please advise if it is good idea and how can I do it?
This isn’t the standard way to create a Docker image and it isn’t what I’d do, but it will work. Consider looking at a tool like Hashicorp’s Packer that can automate this sequence.
Ignoring the specific details of the tools, the important thing about the docker build sequence is that you have some file checked into source control that an automated process can use to build a Docker image. An Ansible playbook coupled with a Packer JSON template would meet this same basic requirement.
The important thing here though is that there are some key differences between the Docker runtime environment and a bare-metal system or VM that you’d typically configure with Ansible: it’s unlikely you’ll be able to use your existing playbook unmodified. For example, if your playbook tries to configure system daemons, install a systemd unit file, add ssh users, or other standard system administrative tasks, these generally aren’t relevant or useful in Docker.
I’d suggest making at least one attempt to package your application using a standard Dockerfile to actually understand the ecosystem. Don’t expect to be able to use an Ansible playbook unmodified in a Docker environment; but if your organization has a lot of Ansible experience and you can easily separate “install the application” from “configure the server”, the path you’re suggesting is technically fine.
You can use multi-stage builds in Docker, which might be a nice solution:
FROM ansible/centos7-ansible:stable as builder
COPY playbook.yaml .
RUN ansible-playbook playbook.yaml
FROM alpine:latest # Include whatever image you need for your application
# Add required setup for your app
COPY --from=builder . . # Copy files build in the ansible image, aka your app
CMD ["<command to run your app>"]
Hopefully the example is clear enough for you to create your Dockerfile

Resources