Remove older docker images but not layers that are still referenced - docker

Say I have a build machine which builds many docker images myimage:myversion. I consume about 1GB disk space per 100 images created and I certainly don't need all of them. I'd like to keep, say, the most recent 10 images (and delete everything older) but I want to make sure I have all of the cached layers from the 10 builds/image. If I have all of the layers cached, then I'm more likely to have a fast build on my next run.
The problem is all of the images (very old and brand new) share a lot of layers so I can't blindly delete the old ones as there is a ton of overlap with the new ones.
I don't want to use docker image prune (https://docs.docker.com/config/pruning/#prune-images) as that depends on which containers I have (regardless of state) and am deleting the containers so prune will end up deleting way too much stuff.
Is there a simple command I can run periodically to achieve the state I described above?

Simple, no, but some shell wizardry is possible. I think this shell script will do what you want:
#!/bin/sh
docker images --filter 'reference=myimage:*' \
--format '{{ .CreatedAt }}/{{ .ID }}/{{ .Repository }}:{{ .Tag }}' \
| sort -r \
| tail +11 \
| cut -d / -f 2 \
| xargs docker rmi
(You might try running this one step at a time to see what comes out.)
In smaller pieces:
List all of the myimage:* images in a format that starts with their date. (If you're using a private registry you must include the registry name as a separate part and you must explicitly include the tag; for instance to list all of your GCR images you need -f 'reference=gcr.io/*/*:*'.)
Sort them, by the date, newest first.
Skip the first 10 lines and start printing at the 11th.
Take only the second slash-separated field (which from the --format option is the hex image ID).
Convert that to command-line arguments to docker rmi.
The extended docker images documentation lists all of the valid --format options.

Related

How to prune old docker images, but only for a selected container?

We know that docker image prune has a --filter argument that can be used to select (and remove) images older than a given number of hours (e.g.--filter "until=7*24h").
We also know that docker images has a similar --filter argument that supports a before key (e.g. docker images --filter "before=ubuntu:22.04"), but that can only filter images created before a given image or reference (not a date).
But pruning as described above would apply to all containers, which is rather too broad. What if we wanted to prune the "old" images more selectively, restricting the pruning to the images of just a single container (e.g. to spare older base containers, etc.)?
I've come up with something looking ugly, but apparently rather effective.
The example below removes (forcefully) all images older than 2 weeks (the shortest period for which this implementation works - can be tweaked to any period though) of the an mirekphd/ml-cache container (caution: as a special case it can remove all images of this container):
$ MAX_WEEK_NUM=2 && REPO=mirekphd && CONTAINER=ml-cache && docker images --format "{{.ID}} {{.CreatedSince}}" --filter=reference="$REPO/$CONTAINER" | grep "[$MAX_WEEK_NUM-9999] weeks\|[1-999] months\|[1-99] years" | awk '{print $1}' | xargs docker rmi -f

Delete docker images with same name

Server storing multiple docker images with same name but with different tags, I am finding the good command which will search docker images with same name, but with different tags, and delete all, saving only the newest.
Example.
TestImage:v1
TestImage:v2
TestImage:v3
TestImage:v4
The command will delete all images which have name "TestImage" but save TestImage:v4 (which is newest by date).
The actual command .
Provided you have the docker cli installed, the following should work:
image_ids=$(docker image ls "$name" -q | tail -n +2 | xargs)
docker image rm "$image_ids"
The first command loads all local images with the given name. Using the -q flag means that only the image ids are being outputted.
the tail program then skips the first one of these ids (in other words, starts reading at line 2). Since images are sorted from new to old, this means the newest image is skipped.
The xargs then brings all the outputted ids into the same line, so that docker can interpret them easily.
The ids obtained in that step may look something like this:
ae3c4906b72c 650d66a00131 18cac929dc4f
docker image rm will then remove all images with these ids.

Docker - remove all but last N images

I am trying to build a small script that removes all docker images besides a small "cache" of N last images (for rolling back to one of the last working versions).
Is there an idiomatic way to do this?
You can use the tail command to accomplish this.
Let's say you only want to keep the most recent 5 images. You can tell tail to show you the list starting with the nth line. For 5 images, you would want tail to start on the 6th line:
tail -n +6
Pair this with docker to show a list of your image IDs, which are sorted by most recent, by default.
docker images -q | tail -n +6
You can pass all of that to the remove images command. This assumes you're using the bash shell; if you use a csh-derived shell, you may need different syntax.
docker rmi $(docker images -q | tail -n +6)

How to delete unused docker images in swarm?

We have a system where user may install some docker containers. We dont have a limit on what he can install. After some time, we need to clean up - delete all the images that are not in used in the swarm.
What would be the solution for that using docker remote API?
Our idea is to have background image-garbage-collector thread that:
lists all the images
try to delete some
if it fails, just ignore
Would this make sense? Would this affect swarm somehow?
Cleaner way to list and (try to) remove all images
The command docker rmi $(docker images -q) would do the same as the answer by #tpbowden but in a cleaner way. The -q|--quiet only list the images ID.
It may delete frequently used images (not running at the cleaning date)
If you do this, when the user will try to swarm run deleted-image it will:
Either pull the image (< insert network consumption warning here />)
Either just block as the pull action is not automatic in swarm if I remember it right (< insert frequent support request warning here about misunderstood Swarm behavior />).
"dangling=true" filter:
A useful option is the --filter "dangling=true". Executing swarm images -q --filter "dangling=true" will display not-currently-running images.
Though challenge
You issue reminds me the memory management in a computer. Your real issue is:
How to remove image that won't be used in the future?
Which is really hard and really depends on your policy. If your policy is old images are to be deleted the command that could help is: docker images --format='{{.CreatedSince}}:{{ .ID}}'. But then the hack starts... You may need to grep "months" and then cut -d ':' -f 2.
The whole command would result as:
docker rmi $(docker images --format='{{.CreatedSince}}:{{ .ID}}' G months | cut -d ':' -f 2)
Note that this command will need to be run on every Swarm agent as well as the Swarm manager, not only the Swarm manager.
Swarm and registry
Be aware than a swarm pull image:tag will not pull the image on Swarm agents! Each Swarm agent must pull the image itself. Thus deleting still used images will result in network load.
I hope this answer helps. At this time there is no mean to query "image not used since a month" AFAIK.
All you need is 'prune'
$ docker image prune --filter until=72h --force --all
docker images | tail -n+2 | awk '{print $3}' | xargs docker rmi
This will list all images, strip the top line with column headings, grab the 3rd column (image ID hash) and then attempt to remove them all. Docker will prevent you from removing any images that are currently used by running containers.
If you want to do this in a slightly less 'hacky' way, you could use Docker's API to get images which aren't being used and delete them that way.

How to use docker images filter

I can write
docker images --filter "dangling=true"
What other filters can I use?
I can use something like this?
docker images --filter "running=false"
Docker v1.13.0 supports the following conditions:
-f, --filter value Filter output based on conditions provided (default [])
- dangling=(true|false)
- label=<key> or label=<key>=<value>
- before=(<image-name>[:tag]|<image-id>|<image#digest>)
- since=(<image-name>[:tag]|<image-id>|<image#digest>)
- reference=(pattern of an image reference)
Or use grep to filter images by some value:
$ docker images | grep somevalue
References
docker images filtering
docker docs
You can also use the REPOSITORY argument to docker images to filter the images.
For example, suppose we have the images:
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
local-foo latest 17864104b328 2 months ago 100 MB
example.com/bar latest b94c37de2801 9 months ago 285 MB
example.com/baz latest a004e3ac682c 2 years ago 221 MB
We can explicitly filter for all images with a given name:
$ docker images example.com/bar
REPOSITORY TAG IMAGE ID CREATED SIZE
example.com/bar latest b94c37de2801 9 months ago 285 MB
Docker also supports globbing:
$ docker images "example.com/*"
REPOSITORY TAG IMAGE ID CREATED SIZE
example.com/bar latest b94c37de2801 9 months ago 285 MB
example.com/baz latest a004e3ac682c 2 years ago 221 MB
Official docs here.
In Docker v1.7:
The currently supported filters are:
dangling (boolean - true or false)
label (label=<key> or label=<key>=<value>)
For me,
docker images -q | while read IMAGE_ID; do
docker inspect --format='{{.Created}}' --type=image ${IMAGE_ID}
done
did the trick. The date command is able to produce output in the same format via
date -Ins --date='10 weeks ago'
which allows me to compare timestamps. I still use the filter for dangling images for convenience, though.
I'm wanted to find a match for both local images and images that were tagged with a remote repo (my-repo.example.com in example below).
For example,
docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
my-good-app latest 9a8742ad82d3 24 hours ago 126MB
my-repo.example.com/mine/my-good-app latest 9a8742ad82d3 24 hours ago 126MB
my-repo.example.com/mine/demo-docker latest c10baf5231a1 2 weeks ago 200MB
I got tired of trying to figure out how filtering worked, so I just fell back to what I knew.
docker images | grep my-good-app | awk '{print $3}' | uniq
This would match any image names that had the pattern my-good-app. I could not get other answers to include both (images without a repo and images with a reponame like my-repo.example.com in my example).
Then to delete the images matched above, I ran:
docker rmi -f $(docker images | grep my-good-app | awk '{print $3}' | uniq)
sudo docker images --filter "running=false"
For cleaning up old stopped containers you can use:
docker container prune
To remove untagged images you can use:
docker image prune
Docker CLI lacks in the proper filtering if you're looking for a solution to remove images based on both the creation date and repository / tag:
docker image prune does accept the timestamps via --since and --until flags (e.g.: --until=24h), but does not allow filtering by repo/tag.
docker images and docker image ls does accept the repo/tag filters (e.g.: --filter=reference='registry.gitlab.com/group/*/*) but does not accept timestamps - only other images ids in --since and --before (e.g. --since=586026f10754)
I wrote a (seven-lines) one-liner for what I believe is a common need: removing images based on both their repo and creation date. Here you go:
docker image rm $(docker images \
--filter=reference='registry.gitlab.com/group/*/*/*:*' \
--format "{{.ID}}-{{.CreatedAt}}'" | \
cut -d " " -f 1 | \
sed 's/-/ /'| \
awk -v date="$(date --date='3 days ago' +%Y-%m-%d)" '$NF < date' \
cut -d " " -f 1)
Customizations:
Name filtering: 2nd line, value of reference= - see docs
Date filtering: 6th line, value of date= - see docs
Requirements: bash, awk, cut, sed and obviously docker.
In Powershell use this example:
docker images --format "{{.Repository}}:{{.Tag}}" | findstr "some_name"
To delete images you can combine this with the delete command like so:
docker rmi $(docker images --format "{{.Repository}}:{{.Tag}}"|findstr "some_name")
To add to the original answer on how to use images filter, just to add a use case for a similar scenario.
My CI pipeline re-builds docker and tags them with last commit number every time, sends them to docker repository.
However, this results in residual & un-used/un-wanted images on the CI build machine.
As a post step, I need to clean them up all, even the ones build just now, but at the same time, want to leave my base downloaded images ( such as OpenJDK, PostGre ) un-deleted to avoid download every time
Add a/any label in Docker file ( unique and is not expected to be contained in my base images)
LABEL built=XYZ
Using images filter and just to get the image identifiersfor the images I created
docker images --quiet --filter label=built=XYZ
Delete them as a post build action
docker rmi -f $(docker images --quiet --filter label=built=XYZ)
There's another example, works with version 17.09++:
sudo docker rmi $(sudo docker images -f=reference="registry.gitlab.com/example-app" -f "dangling=true" -q)
Explanation:
reference - we are referencing images by repository name;
dangling=true - we are removing untagged images;
-q - means quiet, showing only numeric IDs of images, instead of a whole line.
This command removes all images that have a repository name "registry.gitlab.com/example-app" and untagged (have <none> in a tag column)
Reference link: https://docs.docker.com/engine/reference/commandline/images/#filtering
Docker builtin filtering feature simply doesn't cut it for certain use cases. Mainly, the glob pattern not matching the forward slash makes it impossible to match images not sharing the same count of forward slashes.
I find it's easier to rely on an external tool such as awk. For example, here I'm pulling/updating all the images from a certain repository and its subdirectories tagged with any SNAPSHOT version :
docker images --filter=since=65f20cac3aa5 | awk '$1~repo && $2~tag { print $1 ":" $2}' repo=my.repo.com/subdirectory tag=SNAPSHOT | xargs -r -L 1 docker pull
Notice that I combined it with the builtin filter "since".
FYI, without filter, but for delete all images when you use as testing or learning,
docker image rm -f $(docker image ls)
Greetings.

Resources