Identifying files contained within a docker image (or Application dependencies) - docker

I'm currently running a project with Docker, and I have this question. Is there any possible way that you can know what are the exact files that are contained in a docker image, without running that image ?
In a different way, I'm scheduling docker containers to run on the cloud based on the binaries/libraries they use, so that containers with common dependencies (have common binaries and libraries) are scheduled to run on the same host thus share these dependencies on the hosts os. Is there a possible way to identify the dependencies / do that ?

You could run docker diff on each layer to see what files were added. It is not very succinct, but it should be complete.
Alternatively your distro and programming language may have tools that help identify which dependencies have been added. For example, Python users can check the output of pip freeze and Debian users can check the output of dpkg --get-selections to check on what system packages have been installed.

Related

Red Hat base image - list packages installed in an image with "rpm", without running it

Our org builds apps based on some Red Hat base images, we find that rpm -qa can be used to check installed system packages and run security checks. Here "packages" means system libs, like libxml2 or expat.(For software dependencies we have Maven). From the dockerfiles, we know that it's not downloading sources and install from it, so in theory, all packages are detected by rpm.
Some of them always pointing to latest, I have to check regularly to make sure which package is at which level. Need to automate this.
When it's impossible to run the image first(for example, when entrypoint is defined as java -jar), I cannot easily run the container and run rpm -qa. I have to create a project, create a main class doing some long-running job, set all the things in Maven jib plugin, build and run it, and docker exec xxx rpm -qa. It's inconvenient.
I have tried dive, I see things but I cannot view the content. I can docker save -o foo.tar and try to extract files from there, but it's inconvenient. Besides, I am not aware of any file containing a list of packages installed. Is there any?
Tried docker history, not very helpful.
I would like a feature from docker to list all packages and versions, for vulnerability checks, delegating the listing of packages to rpm or dpkg(for Ubuntu based images, maybe in the future) depending on the availability of any one of them.
dockerfile can be inaccessible if the image comes from some remote registry. Needs to analyze the binary.
If the default ENTRYPOINT of a given image is on your way for a particular operation, you can unilaterally decide to change it to whatever you want at run time, even to totally drop it.
In your particular case, this should do the trick:
docker run -it --rm --entrypoint '' <your_image> rpm -it
Change the final command to the one you need. You can even run bash and interactively enter your own set of commands to inspect.

Best practice for spinning up container-based (development) environments

OCI containers are a convenient way to package suitable toolchain for a project so that the development environments are consistent and new project members can start quickly by simply checking out the project and pulling the relevant containers.
Of course I am not talking about projects that simply need a C++ compiler or Node.JS. I am talking about projects that need specific compiler packages that don't work with newer than Fedora 22, projects with special tools that need to be installed manually into strange places, working on multiple projects that have tools that are not co-installable and such. For this kind of things it is easier to have a container than follow twenty installation steps and then pray the bits left from previous project don't break things for you.
However, starting a container with compiler to build a project requires quite a few options on the docker (or podman) command-line. Besides the image name, usually:
mount of the project working directory
user id (because the container should access the mounted files as the user running it)
if the tool needs access to some network resources, it might also need
some credentials, via environment or otherwise
ssh agent socket (mount and environment variable)
if the build process involves building docker containers
docker socket (mount); buildah may work without special setup though
and if is a graphic tool (e.g. IDE)
X socket mount and environment variable
--ipc host to make shared memory work
And then it can get more complicated by other factors. E.g. if the developers are in different departments and don't have access to the same docker repository, their images may be called differently, because docker does not support symbolic names of repositories (podman does though).
Is there some standard(ish) way to handle these options or is everybody just using ad-hoc wrapper scripts?
I use Visual Studio Code Remote - Containers extension to connect the source code to a Docker container that holds all the tools needed to build the code (e.g npm modules, ruby gems, eslint, Node.JS, java). The container contains all the "tools" used to develop/build/test the source code.
Additionally, you can also put the VSCode extensions into the Docker image to help keep VSCode IDE tools portable as well.
https://code.visualstudio.com/docs/remote/containers#_managing-extensions
You can provide a Dockerfile in the source code for newcomers to build the Docker image themselves or attach VSCode to an existing Docker container.
If you need to run a server inside the Docker container for testing purposes, you can expose a port on the container via VSCode, and start hitting the server inside the container with a browser or cURL from the host machine.
Be aware of the known limitations to Visual Studio Code Remote - Containers extension. The one that impacts me the most is the beta support for Alphine Linux. I have often noticed some of the popular Docker Hub images are based on Alphine.

Install ansible application to docker container

I have an application which can be installed with ansible. No I want to create docker image that includes installed application.
My idea is to up docker container from some base image, after that start installation from external machine, to this docker container. After that create image from this container.
I am just starting with dockers, could you please advise if it is good idea and how can I do it?
This isn’t the standard way to create a Docker image and it isn’t what I’d do, but it will work. Consider looking at a tool like Hashicorp’s Packer that can automate this sequence.
Ignoring the specific details of the tools, the important thing about the docker build sequence is that you have some file checked into source control that an automated process can use to build a Docker image. An Ansible playbook coupled with a Packer JSON template would meet this same basic requirement.
The important thing here though is that there are some key differences between the Docker runtime environment and a bare-metal system or VM that you’d typically configure with Ansible: it’s unlikely you’ll be able to use your existing playbook unmodified. For example, if your playbook tries to configure system daemons, install a systemd unit file, add ssh users, or other standard system administrative tasks, these generally aren’t relevant or useful in Docker.
I’d suggest making at least one attempt to package your application using a standard Dockerfile to actually understand the ecosystem. Don’t expect to be able to use an Ansible playbook unmodified in a Docker environment; but if your organization has a lot of Ansible experience and you can easily separate “install the application” from “configure the server”, the path you’re suggesting is technically fine.
You can use multi-stage builds in Docker, which might be a nice solution:
FROM ansible/centos7-ansible:stable as builder
COPY playbook.yaml .
RUN ansible-playbook playbook.yaml
FROM alpine:latest # Include whatever image you need for your application
# Add required setup for your app
COPY --from=builder . . # Copy files build in the ansible image, aka your app
CMD ["<command to run your app>"]
Hopefully the example is clear enough for you to create your Dockerfile

Is there any way by which we can list out all the dependencies or libraries installed in running docker container?

Is there any way by which we can list out all the dependencies or libraries installed in running docker container?
Not exactly.
You can inspect the history of the image of a container: that will give you an idea of the various operations (RUN, COPY, ADD) done from a base image in order to build said image.
But don't forget a container can be run from an image a simple as SCRATCH (no files) + 1 executable (making only system call). No OS, no bash, no nothing except that one executable.
In that case, there is no dependencies or libraries to list.

Handling software updates in Docker images

Let's say I create a docker image called foo that contains the apt package foo. foo is a long running service inside the image, so the image isn't restarted very often. What's the best way to go about updating the package inside the container?
I could tag my images with the version of foo that they're running and install a specific version of the package inside the container (i.e. apt-get install foo=0.1.0 and tag my container foo:0.1.0) but this means keeping track of the version of the package and creating a new image/tag every time the package updates. I would be perfectly happy with this if there was some way to automate it but I haven't seen anything like this yet.
The alternative is to install (and update) the package on container startup, however that means a varying delay on container startup depending on whether it's a new container from the image or we're starting up an existing container. I'm currently using this method but the delay can be rather annoying for bigger packages.
What's the (objectively) best way to go about handling this? Having to wait for a container to start up and update itself is not really ideal.
If you need to update something in your container, you need to build a new container. Think of the container as a statically compiled binary, just like you would with C or Java. Everything inside your container is a dependency. If you have to update a dependency, you recompile and release a new version.
If you tamper with the contents of the container at startup time you lose all the benefits of Docker: That you have a traceable build process and each container is verifiably bit-for-bit identical everywhere and every time you copy it.
Now let's address why you need to update foo. The only reason you should have to update a dependency outside of the normal application delivery cycle is to patch a security vulnerability. If you have a CVE notice that ubuntu just released a security patch then, yep, you have to rebuild every container based on ubuntu.
There are several services that scan and tell you when your containers are vulnerable to published CVEs. For example, Quay.io and Docker Hub scan containers in your registry. You can also do this yourself using Clair, which Quay uses under the hood.
For any other type of update, just don't do it. Docker is a 100% fossilization strategy for your application and the OS it runs on.
Because of this your Docker container will work even if you copy it to 1000 hosts with slightly different library versions installed, or run it alongside other containers with different library versions installed. You container will continue to work 2 years from now, even if the dependencies can no longer be downloaded from the internet.
If for some reason you can't rebuild the container from scratch (e.g. it's already 2 years old and all the dependencies went missing) then yes, you can download the container, run it interactively, and update dependencies. Do this in a shell and then publish a new version of your container back into your registry and redeploy. Don't do this at startup time.

Resources