Check for unused/uncessary packages in Docker - docker

TL;DR: Is there an convenient way to scan docker images for unused/unnecessary packages.
Question: Given an enormous list of docker images & files, is it possible to scan them and check whether or not a package is activity being used? For the purpose of security, it would be best to remove all unnecessary packages and reduce any attack surface. In particularly large applications it's not uncommon for a developer to accidentally leave behind a previously useful package.
Potential dirty approach: Remove packages one by one, if the application fails to build then we put that package back and can consider it necessary. However, if the docker file builds successfully it could trigger a notification indicating that the package was potentially unused.

Concerning the unsued image, you can use the command docker image prune.
Here a link to the documentation that might help you.
nabil#LAPTOP:~$ docker image help
Usage: docker image COMMAND
Manage images
Commands:
build Build an image from a Dockerfile
history Show the history of an image
import Import the contents from a tarball to create a filesystem image
inspect Display detailed information on one or more images
load Load an image from a tar archive or STDIN
ls List images
prune Remove unused images
pull Pull an image or a repository from a registry
push Push an image or a repository to a registry
rm Remove one or more images
save Save one or more images to a tar archive (streamed to STDOUT by default)
tag Create a tag TARGET_IMAGE that refers to SOURCE_IMAGE
Run 'docker image COMMAND --help' for more information on a command.

Related

How to delete cached/intermediate docker images after the cache gets invalidated?

I have a CI-pipeline that builds a docker image for my app for every run of the pipeline (and the pipeline is triggered by a code-push to the git repository.)
The docker image consists of several intermediate layers which progressively become very large in size. Most of the intermediate images are identical for each run, hence the caching mechanism of docker is significantly utilized.
However, the problem is that the final couple layers are different for each run, as they result from a COPY statement in dockerfile, where the built application artifacts are copied into the image. Since the artifacts are modified for every run, the already cached bottommost images will ALWAYS be invalidated. These images have a size of 800mb each.
What docker command can I use to identify (and delete) these image that gets replaced by newer images, i.e. when they get invalidated?
I would like to have my CI-pipeline to remove them at the end of the run so they don't end up dangling on the CI-server and waste a lot of disk space.
If I understand correctly: With every code push, CI pipeline creates new image, where new version of application is deployed. As a result, previously created image becomes outdated, so you want to remove it. To do so, you have to:
Get rid of all outdated containers, which where created from outdated image
display all containers with command docker ps -a
if still running, stop outdated containers with command docker stop [containerID]
remove them with command docker rm [containerID]
Remove outdated images with command: docker rmi [imageID]
To sum up why this process is needed: you cannot remove any image, until it is used by any existing container (even stopped containers still require their images). For this reason, you should first stop and remove old containers, and then remove old images.
Detection part, and automation of deletion process should be based on image versions and container names, which CI pipeline generates while creating new images.
Edit 1
To list all images, which have no relationship to any tagged images, you can use command: docker images -f dangling=true. You can delete them with the command: docker images purge.
Just one thing to remember here: If you build an image without tagging it, the image will appear on the list of "dangling" images. You can avoid this situation by providing a tag when you build it.
Edit 2
The command for image purging has changed. Right now the proper command is:
docker image prune
Here is a link with a documentation

Docker change existing image

Docker novice here. Is Docker analogous to GitHub in that you can commit changes to an image without having to re-build the image from scratch? If yes, what commands are used to do that?
Right now every time I make a change to my code I delete the current Docker image using docker system prune -a and re-build the image using docker build -t appname.
There's no need to delete the existing image first, you can rebuild and create a tag to the same image name that already exists. Images themselves are resolved to an immutable image ID that does not change. To change the contents of an image, you must build a new image that has a new image ID. And then to use it, you need to start new containers referencing that new image.
A rebuild from scratch will reuse the cache, so only commands in your Dockerfile that changed, or are after a change, will result in a rebuild. The layers in the beginning of your Dockerfile that are same as previous builds will be reused between images. Those layers need to be built previous on this host (or there's a --cache-from option if you are building in ephemeral cloud environments). Order matters with the build cache, as does the exact hash of the files and their metadata that you copy into your image.
The docker image prune command is useful after you rebuild an image with the same image name. In that scenario, docker will delete old image ID's that no longer have a reference (image name) pointing to it, and do not currently have a container using it. Note that this also removes those old images from the build cache, so you may want to keep some old images around to speed up builds should a change get reverted from a commit.

The best way to create docker image "offline installer"

I use docker-compose file to get Elasticsearch Logstash Kibana stack. Everything works fine,
docker-compose build
command creates three images, about 600 MB each, downloads from docker repository needed layers.
Now, I need to do the same, but at the machine with no Internet access. Downloading from respositories there is impossible. I need to create "offline installer". The best way I found is
docker save image1 image2 image3 -o archivebackup.tar
but created file is almost 2GB. During
docker-compose build
command some data are downloaded from the Internet but it is definitely less than 2GB.
What is a better way to create my "offline installer", to avoid making it so big?
The save command is the way to go for running docker images online.
The size difference that you are noticing is because when you are pulling images from a registry, some layers might exist locally and are thus not pulled. So you are not pulling all the image layers, only the ones
that you don't have locally.
On the other hand, when you are saving the image to a tar, all the layers need to be stored.
The best way to create the Docker offline Installer is to
List item
Get the CI/CD pipeline to generate the TAR file as build process.
Later create a local folder with the required TAR files
Write a script to load these TAR files on the machine
The same script can fire the docker-compose up -d command to bring up the whole service ecosystem
Note : It is important to load the images before bringing up the services
Regarding the size issue the answer by Yamenk specifically points to the reason why the size increases. The reason is docker does not pull the shared layers.

How can I see Dockerfile for each docker image?

I have the following docker images.
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
hello-world latest 48b5124b2768 2 months ago 1.84 kB
docker/whalesay latest 6b362a9f73eb 22 months ago 247 MB
Is there a way I can see the Dockerfile of each docker image on my local system?
The answer at Where to see the Dockerfile for a docker image? does not help me because it does not exactly show the Dockerfile but the commands run to create the image. I want the Dockerfile itself.
As far as I know, no, you can't. Because a Dockerfile is used for building the image, it is not packed with the image itself. That means you should reverse engineer it. You can use docker inspect on an image or container, thus getting some insight and a feel of how it is configured. The layers an image are also visible, since you pull them when you pull a specific image, so that is also no secret.
However, you can usually see the Dockerfile in the repository of the image itself on Dockerhub. I can't say most repositories have Dockerfiles attached, but the most of the repositories I seen do have it.
Different repository maintainers may opt for different ways to document the Dockerfiles. You can see the Dockerfile tab on the repository page if automatic builds are set up. But when multiple parallel versions are available (like for Ubuntu), maintainers usually opt to put links the Dockerfiles for different versions in the description. If you take a look here: https://hub.docker.com/_/ubuntu/, under the "Supported tags" (again, for Ubuntu), you can see there are links to multiple Dockerfiles, for each respective Ubuntu version.
As the images are downloaded directly from the Dockerhub, only the image is pulled from the docker hub into your machine. If you want to see the dockerfile, then you can go to docker hub and type the image name and version name in the tag format (e.g ubuntu:14.04) this will open the image along with Docker file details. Also keep in mind, only if the owner of the image shared their Dockerfile, you can see it. Otherwise not. Most official images will not provide you with Dockerfile.
Hope it helps!
You can also regenerate the dockerfile from an image or use the docker history <image name> command to see what is inside.
check this: Link to answer
TL;DR
So if you have a docker image that was built by a dockerfile, you can recover this information (All except from the original FROM command, which is important, I’ll grant that. But you can often guess it, especially by entering the container and asking “What os are you?”). However, the maker of the image could have manual steps that you’d never know about anyways, plus they COULD just export an image, and re-import it and there would be no intermediate images at that point.
One approach could be to save the image in a image.tar file. Next extract the file and try to explore if you can find Dockerfile in any of the layer directories.
docker image save -o hello.tar hello-world
This will output a hello.tar file.
hello.tar is the compressed output image file and hello-world is the name of the image you are saving.
After that, extract the compressed file and explore the image layer directories. You may find Dockerfile in one of the directories.
However, there is one thing to note, if the image was built while ignoring the Dockerfile in the .dockerignore. Then you will not find the Dockerfile by this approach.

How to see tree view of docker images?

I know docker has deprecated --tree flag from docker images command. But I could not find any handy command to get same output like docker images --tree. I found dockviz. But it seems to be another container to run. Is there any built in cli command to see tree view of images without using dockviz
Update Nov. 2021: for online public image, you have the online service contains.dev.
Update Nov. 2018, docker 18.09.
You now have wagoodman/dive, A tool for exploring each layer in a docker image
To analyze a Docker image simply run dive with an image tag/id/digest:
dive <your-image-tag>
or if you want to build your image then jump straight into analyzing it:
dive build -t <some-tag> .
The current (Sept 2015, docker 1.8) workaround mentioned by issue 5001 remains dockviz indeed:
docker run --rm -v /var/run/docker.sock:/var/run/docker.sock nate/dockviz images -t
The -t allows to remain in CLI only (no graphics needed)
Update Sept. 2016 (post docker 1.10: docker 1.11 soon 1.12), one year later, as mentioned in the same issue 5001, by Michael Härtl:
Since 1.10 the way layer IDs worked has changed fundamentally. For a lengthy explanation of this topic see #20399. There's also #20451 but I'm not sure, if this could be used by the nate/dockviz image.
Personally I find the way the new layers work very very confusing and much less transparent than before. And it's not really well documented either.
AFAIK #tonistiigi's comments in the issue above are the only public explanation available.
Tõnis Tiigi:
Pre v1.10 there was no concept of layers or the other way to think about it is that every image only had one layer. You built a chain of images and you pushed and pulled a chain. All these images in the chain had their own config.
Now there is a concept of a layer that is a content addressable filesystem diff. Every image configuration has an array of layer references that make up the root filesystem of the image and no image requires anything from its parent to run. Push and pull only move a single image, the parent images are only generated for a local build to use for the cache.
If you build an image with the Dockerfile, every command adds a history item into the image configuration. This stores to command so you can see it in docker history. As this is part of image configuration it also moves with push/pull and is included in the checksum verification.
Here are some examples of content addressable configs:
https://gist.github.com/tonistiigi/6447977af6a5c38bbed8
Terms in v1.10: (the terms really have not changed in implementation but previously our docs probably simplified things).
Layer is a filesystem diff. Bunch of files that when stacked on top of each other make up a root filesystem. Layers are managed by graphdrivers, they don't know anything about images.
Image is something you can run and that shows up in docker images -a. Needs to have a configuration object. When container starts it needs some kind of way to generate a root filesystem from image info. On build every Dockerfile command creates a new image.
You can refer to the more recent project TomasTomecek/sen, which:
had to understand 1.10 new layer format (commit 82b224e)
include an image tree representation:

Resources