Is `FROM` clause required in Dockerfile? - docker

For all the Dockerfiles I've come across thus (which admittedly is not many), all of them have used a FROM clause to base off an existing image, even if it's FROM scratch.
Is this clause required? Is it possible to have a Dockerfile with no FROM clause? Will this container created thus be able to do anything?
EDIT
I read
A Dockerfile with no FROM directive has no parent image, and is called
a base image.
https://docs.docker.com/glossary/?term=parent%20image
But I think this may be an error.

Based on the official documentation it's required:
The FROM instruction initializes a new build stage and sets the Base
Image for subsequent instructions. As such, a valid Dockerfile MUST
start with a FROM instruction. The image can be any valid image – it
is especially easy to start by pulling an image from the Public
Repositories.
https://docs.docker.com/engine/reference/builder/#from

Short answer is yes, the FROM clause is required. But it's easier to come to this conclusion if you think of the image building process a bit.
Dockerfile is just a way to describe a sequence of commands to be executed by Docker build subsystem to create an image. And an image is just a bunch of regular files, most notably, user land files of a particular Linux distribution, but possibly with some extra files on top of it. Every Docker image is based on the parent image and adds its own files to the parent's set. Every image has to start FROM something, i.e. specify its parent. And the parent of all parents is a scratch image defined as noop, i.e. an empty set of files.
Take a look at busybox image:
FROM scratch
ADD busybox.tar.xz /
CMD ["sh"]
It starts from scratch, i.e. an empty set of files, and adds (i.e. copies) to this set a bunch of files from busybox.tar.xz archive.
Now, if you want to create your own image, you can start from busybox image and describe what files (and how) you are going to add:
FROM busybox:latest
ADD myfile.txt /
But every time a new image has to start FROM something.

Yes, it is. It defines the layers on which the image you are building is based on.
If you want to start an image from scratch docker offers an image called scratch
The documentation also says:
A parent image is the image that your image is based on
also
A base image has FROM scratch in its Dockerfile.
Refer to base images documentation

Related

Docker - workflow for updating container

I'm just getting to grips with Docker. I need to update a base image for my image.
Questions
Do I need to completely recreate all the changes I made on top of the
new base image and save it as a new image?
What do people do to remember the changes they've made to their
image?
Do I need to completely recreate all the changes I made on top of the new base image and save it as a new image?
You don't. It is up to you whether you want to completely rebuild the image or to use your old one as a new base but unless we are talking about generic base image, such as one where you just preinstall things that you want available to all the derived images, it is probably better to just rebuild the image from scratch, otherwise you might end up cluttering images with stuff they don't need which is never a good thing (both from the perspective of size and security).
What do people do to remember the changes they've made to their image?
Right out of the box you can use history command to see what went into the image
docker image history <image>
which lists image's filesystem layers.
Personally, when I build images, I copy Dockerfile to the image so that I can quickly cat it.
docker exec <image> cat Dockerfile
It is more convenient for me than listing through the history output (I don't include anything sensitive in a dockerfile and all the information that it has is already available within the container if someone breaks in).

Is it good practice to commit docker container frequently?

I'm using WebSphere Liberty inside. As WebSphere Liberty requires frequent xml editing, which is impossible with Dockerfile commands. I have to docker-commit the container from time to time, for others to make use of my images.
The command is like:
docker commit -m "updated sa1" -a "Song" $id company/wlp:v0.1
Colleges are doing similar things to the image, they continue to docker commit the container several times every day.
One day we're going to deploy the image on production.
Q1: Is the practice of frequent docker-committing advised?
Q2: Does it leave any potential problem behind?
Q3: Does it create an extra layer? I read the docker-commit document, which didn't mention if it creates another layer, I assume it means no.
I wouldn't use docker commit,
It seems like a really good idea but you can't reproduce the image at will like you can with a Dockerfile and you can't change the base image once this is done either, so makes it very hard to commit say for example a security patch to the underlying os base image.
If you go the full Dockerfile approach you can re-run docker build and you'll get the same image again. And you are able to change the base image.
So my rule of thumb is if you are creating a temporary tool and you don't care about reuse or reproducing the image at will then commit is convenient to use.
As I understand Docker every container image has two parts this is a group of read-only layers making up the bulk of the image and then a small layer which is writeable where any changes are committed.
When you run commit docker goes ahead and creates a new image this is the base image plus changes you made (the image created is a distinct image), it copies up the code to the thin writable layer. So a new read-only layer is not created it merely stores the deltas you make into the thin writeable layer.
Don't just take my word for it, take Redhats advice
For clarity that article in step 5 says:
5) Don’t create images from running containers – In other terms, don’t
use “docker commit” to create an image. This method to create an image
is not reproducible and should be completely avoided. Always use a
Dockerfile or any other S2I (source-to-image) approach that is totally
reproducible, and you can track changes to the Dockerfile if you store
it in a source control repository (git).

Best practice for dockerfile maintain?

I have a Dockerfile something like follows:
FROM openjdk:8u151
# others here
I have 2 questions about the base image:
1. How to get the tags?
Usually, I get it from dockerhub, let's say openjdk:8u151, I can get it from dockerhub's openjdk repository.
If I could get all tags from any local docker command, then I no need to visit web to get the tags, really a little low efficiency?
2. Will the base image safe?
I mean if my base image always there?
Look at the above openjdk repo, it is an offical repo.
I found there is only 8u151 left for me to choose. But I think there should be a lots of jdk8 release during the process, so should also a lots of jdk8 images there, something like 8u101, 8u163 etc.
So can I guess the maintainer will delete some old images for openjdk?
Then if this happen, how my Dockerfile work? I should always change my base image if my upstream delete there image? Really terrible for me to maintain such kind of thing.
Even if the openjdk really just generate one release of jdk8. My puzzle still cannot be avoided, as dockerhub really afford the delete button for users.
What's the best practice, please suggest, thanks.
How to get the tags?
See "How to list all tags for a Docker image on a remote registry?".
The API is enough
For instance, visit:
https://registry.hub.docker.com/v2/repositories/library/java/tags/?page_size=100&page=2
Will the base image safe?
As long as you save your own built image in a registry (eithe rpublic one, or a self-hosted one), yes: you will be able to at least build new images based on the one you have done.
Or, even if the base image disappears, you still have its layers in your own image, and can re-tag it (provided the build cache is available).
See for instance "Is there a way to tag a previous layer in a docker image or revert a commit?".
See caveats in "can I run an intermediate layer of docker image?".

When would a Docker image and its repository have different names?

The standard usage of the docker tag command is:
docker tag <image> <username>/<repository>:<tag>
So for example: docker tag friendlyhello john/get-started:part1.
Coming from Java-land, I'm used to Maven/Gradle-style coordinates of group:artifact:version, so to me, it makes sense for the image and the repository to be one in the same:
The image is the artifact you're producing, and in Java-land there's usually a 1:1 relationsip between the generated artifact and the source repo its code lives inside of. So to me, it makes more sense for the command to be just:
docker tag <username>/<repository>:<tag>
So for example: docker tag john/get-started:part1, where john is the username/group, get-started is the artifact/repo and part1 is the tag/version.
TO BE CLEAR: I am not asking what the difference is between an image and a repository! I understand that a repository is a location where images are stored, and I understand that an image is a Docker executable consisting of your Dockerized app and its dependencies. But from a naming standpoint, I'm confused as to why/when they should ever be different from each another.
So I ask: what is the difference between an image and a repository from a naming convention standpoint? For example if I wanted to make my own MySQL Docker image, I'd chose to make the image named "myapp-db", and that would also be the name of the repository where it lived (smeeb/myapp-db:v1, smeeb/myapp-db:v2, etc.).
So under what circumstances are/should image and repository names be different?
First a prerequisite: a tag is a pointer to an image, and an image is a sha256 reference to a manifest of configuration and layers that docker uses to make containers. What that means is that friendlyhello is not the name of an image, it's a tag that points to an image. The image is the id, something like c75bebcdd211.....
Next, each image can have zero, one, or multiple tags all pointing to it. When it doesn't have any tags pointing to it, that's referred to as a dangling image. That can happen if you build an image with a tag, and then rebuild it. The previous image is now untagged because the tag is pointing to the new image. Similarly you can have the tags image:latest, image:v1, image:1.0.1, and myrepo:5000/image:1.0 all pointing to the same image id.
Tags have a dual use. They can be for convenience. But they are also used by docker push and docker pull to lookup where to send or retrieve the package. If you don't do a push or a pull, then you can name it whatever you want and no one will know the difference. But if you do want to store it on a registry, the tag needs to identify which registry, or the default docker hub. And that tag also needs to identify the path on the registry, called the repository, and the versioning after the colon.
One confusing bit is that the short name at the end of the repository name is often called an "image name", and the versioning after the colon is often called a "tag", and I think this is much easier to understand if you forget those terms were ever overloaded like that.
Now with all that background (sorry, that was a lot), here are a few corrections to the question:
Instead of:
docker tag <image> <username>/<repository>:<tag>
Think of the syntax as:
docker tag <source> <tag>
Where <source> can be an image id, or another tag name. This means the following command won't make sense:
docker tag <username>/<repository>:<tag>
Because docker tag needs a source to tag, and it has no sense of context for what image you are currently working with.
Lastly, why would you use a name other than your repository name for an image, here are a few reasons I've encountered:
The image won't be pushed to a repository. It could be for local testing, or an intermediate step in a workflow, or you build and run your images on the same system.
You may have multiple names for the same image. registry/repo/image:v1 and registry/repo/image:v1.0.1 is a common example. I'll also tag the current image in a specific environment with registry/repo/image:STAGE to note that it made it through dev and CI and is now in the staging environment.
You may be moving images between registries. We pull images from hub.docker.com and retag them locally with a local registry. That gives us both a local cache and also a way to control when we update our base images to the next version. That's preferable to having an update image update in the middle of a production rollout.
I've also used tags to override upstream images. So instead of changing all my build scripts for an issue I have with an upstream image, I can just make my change and tag it with the upstream name. Then as long as I don't run a pull on that docker host, the builds will run using my modified base image.
One situation where you can have an image with a different tag than the repository name is if you have an image in use that is outdated.
For instance you download and run a MySQL:5 image. This container is still running when you pull a newer version of the MySQL:5 image. At that point the old image will be untagged (identifiable only by its hash), but not deleted, because it is still in use by the running MySQL container.
Another situation is that you can have intermediate images while building a new image. Basically each line gets committed as a new image, but they are not named with the name you specify as the final image name.
When using docker tag you don't even have to use the image name as the first parameter. You can even use the hash of the image that you want to tag as the first parameter, so it's more flexible than just namespace/repository:tag.
The difference between an image and repository must be stated:
An image is a tagged repository. That's the only difference. The <username> is part of the repository name.
From the overview of the Docker Registry Distribution API:
Classically, repository names have always been two path components
where each path component is less than 30 characters. The V2 registry
API does not enforce this. The rules for a repository name are as
follows:
A repository name is broken up into path components. A component of a
repository name must be at least one lowercase, alpha-numeric
characters, optionally separated by periods, dashes or underscores.
More strictly, it must match the regular expression
[a-z0-9]+(?:[._-][a-z0-9]+)*. If a repository name has two or more
path components, they must be separated by a forward slash ("/"). The
total length of a repository name, including slashes, must be less
than 256 characters.
Just use meaningful names for your images and tags. You could have smeeb/myapp and smeeb/myapp-db. For tags, the convention is to use versioned tags and a latest one.

Where do untagged Docker images come from?

I'm creating some very simple Docker containers. I understand that after each step a new container is created. However, when using other Dockerfiles from the Hub I don't wind up with untagged images. So where do they come from? After browsing around online I have found out how to remove them but I want to gain a better understanding where they come from. Ideally I would like to prevent them from ever being created.
From their documentation
This will display untagged images, that are the leaves of the images
tree (not intermediary layers). These images occur when a new build of
an image takes the repo:tag away from the IMAGE ID, leaving it
untagged. A warning will be issued if trying to remove an image when a
container is presently using it. By having this flag it allows for
batch cleanup.
I don't quite understand this. Why are builds taking the repo:tag away from the IMAGE ID?
Whenever you assign a tag that is already in use to a new image (say, by building image foo, making a change in its Dockerfile, and then building foo again), the old image will lose that tag but will still stay around, even if all of its tags are deleted. These older versions of your images are the untagged entries in the output of docker images, and you can safely delete them with docker rmi <IMAGE HASH> (though the deletion will be refused if there's an extant container still using that image).
Docker uses a file system called AUFS, which stands for Augmented File System. Pretty much each line of a Docker file will create a new image and when you stack or augment them all on top of each other you'll get your final docker image. This is essentially a way of caching, so if you change only the 9th line of your Docker file it wont rebuild the entire image set. (Well depends on what commands you have on your Docker file, if you have a COPY or ADD nothing after that point is cached for ex)
The final image will get tagged with whatever label it has, but all these intermediary images are necessary in order to create the final image so it doesn't make sense to delete them or prevent them from being created. Hope that makes sense.

Resources