Extending an existing Docker Image on Docker Hub - docker

I'm new to Docker and trying to get my head around extending existing Images.
I understand you can extend an existing Docker image using the FROM command in a Dockerfile (e.g. How to extend an existing docker image?), but my question is -- in general, how can I install additional software / packages without knowing what the base operating system is of the base image or which package manager is available?
Or am I thinking about this the wrong way?

The best practice is to run the base image you want to start FROM (perhaps using docker exec) and see what package managers are available (if any). Then you can write your Dockerfile with the correct software installation procedure.
Think of it the same way you'd add software to any computer: you'd either log into it yourself and poke around, or write an installation program that can handle all of the expected variations.

In most cases, the source Dockerfile is provided and you can walk the chain backwards and gain a better understanding as you do.
For example, if we look at the official Redis image we see the information tab says
Supported tags and respective Dockerfile links
2.6.17, 2.6 (2.6/Dockerfile)
2.8.19, 2.8, 2, latest (2.8/Dockerfile)
So if you're interested in building off redis:latest you'd follow the second link and see that it in turn is built off of debian:wheezy.
Most user-created images will either include their Dockerfile on the hub page or from a link there.

Related

Docker best practice: use OS or application as base image?

I would like to build a docker image which contains Apache and Python3. What is the suggested base image to use in this case?
There is a offical apache:2.4.43-alpine image I can use as the base image, or I can install apache on top of a alpine base image.
What would be the best approach in this case?
Option1:
FROM apache:2.4.43-alpine
<Install python3>
Option2:
FROM alpine:3.9.6
<Install Apache>
<Install Python3>
Here are my rules.
rule 1: If the images are official images (such as node, python, apache, etc), it is fine to use them as your application's base image directly, more than you build your own.
rule 2:, if the images are built by the owner, such as hashicorp/terraform, hashicorp is the owner of terraform, then it is better to use it, more than build your own.
rule 3: If you want to save time only, choice the most downloaded images with similar applications installed as base image
Make sure you can view its Dockerfile. Otherwise, don't use it at all, whatever how many download counted.
rule 4: never pull images from public registry servers, if your company has security compliance concern, build your own.
Another reason to build your own is, the exist image are not built on the operation system you prefer. Such as some images proved by aws, they are built with amazon linux 2, in most case, I will rebuild with my own.
rule 5: When build your own, never mind from which base image, no need reinvent the wheel and use exis image's Dockerfile from github.com if you can.
Avoid Alpine, it will often make Python library installs muuuuch slower (https://pythonspeed.com/articles/alpine-docker-python/)
In general, Python version is more important than Apache version. Latest Apache from stable Linux distro is fine even if not latest version, but latest Python might be annoyingly too old. Like, when 3.9 comes out, do you want to be on 3.7?
As such, I would recommend python:3.8-slim-buster (or whatever Python version you want), and install Apache with apt-get.

What is the difference between docker FROM and RUN apt-get?

I see that some containers are created FROM official Apache docker image while some others are created from a Debian image with RUN apt get install. What is the difference? What is the best practice here and which one should I prefer?
This is really basic. The purpose of the two commands are different.
When you want to create an image of your own for your specific purpose you you go thru two steps:
Find a suitable base image to start from. And there is a lot of images out there. That is where you use the FROM clause... To get a starting point.
Specialize the image to a more specific purpose. And that is where your use RUN to install new things into the new image and often also COPY to add scripts and configurations to the new image.
So in your case: If you want to control the installation of Apache then you start of with a basic Debian image (FROM) and control the installation on Apache yourself (RUN). Or if you want to make it easy your find an image where Apache is alreay there, ready to run.

How to find a Docker image on Docker Hub?

I am new to Docker. Using Kitematic, how can I setup a Docker container containing the following?
Apache, Memcached, MySQL, Nginx, PHP FPM
Should I find one single image with all these? If so, how do I find that on https://hub.docker.com? It doesn't seem possible to filter by above requirements.
Or should I install these as separate containers?
Bart,
I don't know anything about kitematic but I can give you some general information though to clear things up.
General concensus is to run only a single process per container. There are lot's of discussions and information around why this would be good or bad, one such discussion for example: https://devops.stackexchange.com/questions/447/why-it-is-recommended-to-run-only-one-process-in-a-container.
That said, these are the images I would choose for an environment with the software you described above:
Memcache: https://hub.docker.com/_/memcached
MySql: https://hub.docker.com/_/mysql
Nginx: https://hub.docker.com/_/nginx
PHP FPM: https://hub.docker.com/_/php
How do I get these images? I go to hub.docker.com and search for the software I want, I then start with the official images and see if they suite my needs. If they do, great! Otherwise, I would look for non-official images and eventually if I don't find what I want I will extend the existing images by creating a custom image, based on one from hub.docker.com
Some more explanation about the last one, PHP. PHP is distributed with multiple tags. By going to the docker hub page ('description'-tab) you can see the supported tags. Clicking the tag you are interested in will lead you to a github repo where the Dockerfile is hosted. This file contains the commands, used to construct the image you are researching. You can check all the tags to see which one installs the software you need. For example, there are PHP tags where apache is installed (i.e. 7-apache) and there are tags where FPM is installed (i.e. 7-fpm).
Hope this will help you with the research about what images to use!
You need to run those images within the same docker network, tough a docker-compose (and is associated docker-compose.yml) such as this one.
The docker-compose support in Kinematic UI though... is still an open issue.
you cant find all of these containers as one image.. all you can do is create a docker-compose file and add all those independent images into the compose file.
This way you can handle all your containers as a service in a single with there dependencies too..
For further info refer to https://docs.docker.com/compose/

How to compose it?

Target: build opencv docker
Dockerfile creation:
From Ubuntu14.04
or
From Python3.7
Which to choose and why?
I was trying to write dockerfile from scratch without copy paste from others dockerfile.
I would usually pick the highest-level Docker Hub library image that matches what I need. It's also worth searching the https://hub.docker.com/ search box which will often find relevant things, though of rather varied ownership and maintenance levels.
The official Docker Hub images tend to have thought through a lot of issues around persistence and configuration and first-time setup. Compare "I'll just apt-get install mysql-server" with all of the parts that go into the official mysql image; just importing that real-world experience and reusing it can save you some trouble.
I'd consider building my own from an OS base like ubuntu:16.04 if:
There is a requirement that Docker images must be built from some specific distribution base ("my job requires everything to be built off of CentOS so I need a CentOS-based MySQL image")
I need a combination of software versions or patches that the Docker Hub image no longer supports (jruby:9.1.16.0 is no longer being built, so if I need OS updates, I need to build my own base image)
I need an especially exotic set of build options for whatever reason ("I have a C extension that only works if the interpreter is specifically built with UTF-16 Unicode support")
I need or want very detailed control over what version(s) of software are embedded; for example if it's something Java-based where there's a JVM version and a runtime version and an application version that all could matter
In my opinion you should choose From Python3.7.
Since you are writing a dockerfile for opencv which is an open source computer vision and machine learning software library so you may require python also in your container.
Now if you use From Ubuntu14.04 you may need to add python also in the dockerfile whereas with From Python3.7 that will become redundant and will also make the dockerfile a bit shorter.

Dockerfile or Registry? Which is the preferred strategy for distribution?

If you are making a service with a Dockerfile is it preferred for you to build an image with the Dockerfile and push it to the registry -- rather than distribute the Dockerfile (and repo) for people to build their images?
What use cases favour Dockerfile+repo distribution, and what use case favour Registry distribution?
I'd imagine the same question could be applied to source code versus binary package installs.
Pushing to a central shared registry allows you to freeze and certify a particular configuration and then make it available to others in your organisation.
At DevTable we were initially using a Dockerfile that was run when we deployed our servers in order to generate our Docker images. As our docker image become more complex and had more dependencies, it was taking longer and longer to generate the image from the Dockerfile. What we really needed was a way to generate the image once and then pull the finished product to our servers.
Normally, one would accomplish this by pushing their image to index.docker.io, however we have proprietary code that we couldn't publish to the world. You may also end up in such a situation if you're planning to build a hosted product around Docker.
To address this need in to community, we built Quay, which aims to be the Github of Docker images. Check it out and let us know if it solves a need for you.
Private repositories on your own server are also an option.
To run the server, clone the https://github.com/dotcloud/docker-registry to your own server.
To use your own server, prefix the tag with the address of the registry's host. For example:
# Tag to create a repository with the full registry location.
# The location (e.g. localhost.localdomain:5000) becomes
# a permanent part of the repository name
$ sudo docker tag 0u812deadbeef your_server.example.com:5000/repo_name
# Push the new repository to its home location on your server
$ sudo docker push your_server.example.com:5000/repo_name
(see http://docs.docker.io.s3-website-us-west-2.amazonaws.com/use/workingwithrepository/#private-registry)
I think it depends a little bit on your application, but I would prefer the Dockerfile:
A Dockerfile...
... in the root of a project makes it super easy to build and run it, it is just one command.
... can be changed by a developer if needed.
... is documentation about how to build your project
... is very small compared with an image which could be useful for people with a slow internet connection
... is in the same location as the code, so when people checkout the code, they will find it.
An Image in a registry...
... is already build and ready!
... must be maintained. If you commit new code or update your application you must also update the image.
... must be crafted carefully: Can the configuration be changed? How you handle the logs? How big is it? Do you package an NGINX within the image or is this part of the outer world? As #Mark O'Connor said, you will freeze a certain configuration, but that's maybe not the configuration someone-else want to use.
This is why I would prefer the Dockerfile. It is the same with a Vagrantfile - it would prefer the Vagrantfile instead of the VM image. And it is the same with a ANT or Maven script - it would prefer the build script instead of the packaged artifact (at least if I want to contribute code to the project).

Resources