Managing Docker images over time

Managing Docker images over time - docker

Folks that are building Docker images on push - how do you manage the different versions over time?
Do they all get their own tag? Do you tag based on the git hash?
Do you delete old images after some time? Don't they take up a lot of space (1GB+) each?

how do you manage the different versions over time?
The first thing to note is tags can't be trusted over time. They are not guaranteed to refer to the same thing or continue to exist so use Dockerfile LABEL's which will remain consistent and always be stored with the image. label-schema.org is a good starting point.
Do they all get their own tag? Do you tag based on the git hash?
If you need something unique to refer to every build with, just use the image sha256 sum. If you want to attach extra build metadata to an image, use a LABEL as previously mentioned and include the git hash and whatever versioning system you want. If using the sha256 sum sounds hard, tags are still needed to refer to multiple image releases so you will need some system.
Git tags, datetimes, build numbers all work. Each have their pros and cons depending on your environment and what you are trying to tie together as a "release". It's worthwhile to note that a Docker image might come from a Dockerfile with a git hash, but building from that git hash will not produce a consistent image over time, if you source an image FROM elsewhere.
Do you delete old images after some time?
Retention entirely depends on your software/systems/companies requirements or policy. I've seen environments where audit requirements have been high which increases build/release retention time, down to the "I want to re run these tests on this build" level. Other environments have minimal audit which tends to drop retention requirements. Some places don't even try to impose any release management at all (this is bad). This can't really be answered by someone out here for your specific environment, there are minimums though that would be a good idea to stick to.
The base requirement is having an artefact for each production release stored. This is generally "forever" for historical purposes. Actively looking back more than a release or two is pretty rare (again this can depend on your app), so archival is a good idea and easy to do with a second registry on cheap storage/hosting that is pushed a copy of everything (i.e. not on your precious ssd's).
I've never seen a requirement to keep all development builds over time. Retention generally follows your development/release cycle. It's rare you need access to dev builds out of your current release + next release. Just remember to LABEL + tag dev builds appropriately so clean up is simple. -dev -snapshot -alpha.0 whatever.
Don't they take up a lot of space (1GB+) each?
It's normally less than you think but yes they can be large as on top of your application you have an OS image. That's why lots of people start with alpine as it's tiny compared to most distros, as long as you don't have anything incompatible with musl libc.

Related

Github actions docker caching

I think this will be useful for many others.
I am using https://github.com/phips28/gh-action-bump-version to automatically bump NPM versions in Github Actions.
Is there a way to cache the docker image of this action so it doesn't have to build each time? It takes a long time to run and it runs upfront before the rest of the steps. I am sure this is common for similar types of Github Actions that pull docker images.
The docker image looks pretty slim so I am not sure there will be any benefit of trying to optimise the image itself. More to do with how to configure Github Actions.
Any suggestions?

TLDR
Somewhat! You can change the GitHub workflow file to pull an image from a repository instead of building each run. While this doesn't cache the image, it is significantly faster. This can be achieved by editing your flow to look like the following:
- name: 'Automated Version Bump'
id: version-bump
uses: 'docker://phips28/gh-action-bump-version:master'
with:
tag-prefix: 'v'
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
Please note the docker:// prefix to the uses: statement, coupled with the change from #master to :master in order to convert the name to a valid image name.
I have opened up a PR on that repository with a suggested fix :^)
Original Response
A very good question, and one that little information can be found on in the official documents (Though GitHub acknowledge the delay in their docs).
GitHub Staff response
I had a search for you and managed to find an article from September 2019 on the GitHub community forum about this exact topic. It inevitably linked to this article from July 2019.
There is a wonderful explanation about how building each time will still utilise the docker build cache, reducing times, yet allow for flexibility in terms of using the latest version of the base image etc, etc.
There is a proposed solution if you aren't bothered about the flexibility of updates, and just want the shortest build time possible, though I am not certain if that syntax is still valid currently:
But let’s say that I don’t want my Action to even evaluate the Dockerfile every time the Action runs, because I want the absolute fastest runtime possible. I want a predefined Docker container to be spun up and get right to work. You can have that too! If you create a Docker image and upload it to Docker Hub or another public registry, you can instruct your Action to use that Docker image specifically using the docker:// form in the uses key. See the GitHub Actions documentation 72 for specifics. This requires a bit more up-front work and maintenance, but if you don’t need the flexibility that the above Dockerfile evaluation system provides, then it may be worth the tradeoff.
Unfortunately the link to the Github actions documentation is broken, but this does suggest that the author of the action could allow this behaviour if they modified their action
Alternate Ideas
If you require the ability to control the cache on the executor host (to truly cache an image), then you may think of considering hosting your own GitHub runner, as you would have total control over the images there. Though I imagine this is potentially a detterent given that GitHub actions is largely a free service (with limits, and this is perhaps one of them!)
You might want to consider adding a task that utilises the file cache action, and attempting to export the gh-action-bump-version to a file contents through docker commit or docker save, and then reinflate it on your next run. However, this introduces complexity and may not save you time in the long run.. Edit: This is a horrible idea now that we know actions can support pulling images from registries.
I hope this helps you, and anyone else searching for more information 👍

There is an excellent article on the docker blog explaining how to cache docker images with actions/cache and buildx (which allows you to specify a custom cache path).
It can be found here: https://www.docker.com/blog/docker-github-actions/.

What services should I use for autobuild of computationally intensive dockers?

I have a repo with a piece of software, and a docker for users who have installation problems. I need to re-build the docker every time I publish a new version, and also want to use automated testing after it. DockerHub has such functionality, but builds are too long and are killed by timeout. Also I can't use tests there, as some tests use ~8 Gb RAM.
Are there any other services for these tasks? I'm fine with paying for it, but don't want to spend time for long configuration and maintenance (e.g. for having my own build server).

TravisCI.
It's fairly easy to start, hosted CI service which is free as long as you're keeping the repo public.
It's well known, common and you will find thousands of helpful questions and answers under the [travisci] tag
I'm adding a link to their documentation with example on how to build Dockerfile.
https://docs.travis-ci.com/user/docker/#building-a-docker-image-from-a-dockerfile
Also, I've tried to find memory and time limitations but couldn't find any in quick search.

Is it a good practice to have an extra docker container for build tasks?

I have a simple web app with nginx as web server. I use grunt (node module) to prepare my assets for production (minifying etc.).
Now I wonder if I should run the build task in an own container or if one container is enough.
Which approach is the best and why?

Having separate images for the build and the finished app is a good practice - it means your final app image is clean and has a minimal feature set, only what you need to run the app. That makes for a smaller image with (more importantly) a smaller attack surface. Here's a good write up of that - it's called the Docker Builder pattern.
Alternatively, the benefit of having a single image which contains your app and the build tools is that you reduce your management overhead during development - you don't have to chain builds together or manage multiple versions of multiple images. But the cost in having a more bloated final app with more potential for exploits may not be worth it.

One container is enough. Containers running on a single machine share the same operating system kernel. So, maintenance maybe a problem if you want to run the same build at multiple containers. Preferably, you can spawn up another image at any point. Though it is advisable to keep snapshots after each successful build.

Creating a Docker UAT/Production image

Just a quick question about best practices on creating Docker images for critical environments. As we know in the real world, often times the team/company deploying to internal test is not the same as who is deploying to client test environments and production. There becomes a problem because all app configuration info may not be available when creating the Docker UAT/production image e.g. with Jenkins. And then there is the question about passwords that are stored in app configuration.
So my question is, how "fully configured" should the Docker image be? The way I see it, it is in practice not possible to fully configure the Docker image, but some app passwords etc. must be left out. But then again this slightly defies the purpose of a Docker image?

how "fully configured" should the Docker image be? The way I see it, it is in practice not possible to fully configure the Docker image, but some app passwords etc. must be left out. But then again this slightly defies the purpose of a Docker image?
There will always be tradeoffs between convenience, security, and flexibility.
An image that works with zero runtime configuration is very convenient to run but not very flexible and sensitive config like passwords will be exposed.
An image that takes all configuration at runtime is very flexible and doesn't expose sensitive info, but can be inconvenient to use if default values aren't provided. If a user doesn't know some values they may not be able to use the image at all.
Sensitive info like passwords usually land on the runtime side when deciding what configuration to bake into images and what to require at runtime. However, this isn't always the case. As an example, you may want to build test images with zero runtime configuration that only point to test environments. Everyone has access to test environment credentials anyways, zero configuration is more convenient for testers, and no one can accidentally run a build against the wrong database.
For configuration other than credentials (e.g. app properties, loglevel, logfile location) the organizational structure and team dynamics may dictate how much configuration you bake in. In a devops environment making changes and building a new image may be painless. In this case it makes sense to bake in as much configuration as you want to. If ops and development are separate it may take days to make minor changes to the image. In this case it makes sense to allow more runtime configuration.
Back to the original question, I'm personally in favor of choosing reasonable defaults for everything except credentials and allowing runtime overrides only as needed (convention with reluctant configuration). Runtime configuration is convenient for ops, but it can make tracking down issues difficult for the development team.

TFS 2010: Rolling CI Builds

I've been looking around online at ways of improving our build time (which is currently ~30-40 minutes, depending on which build agent gets the task), and one common theme I've seen is use CI builds.
I understand the logic behind this, and it makes sense that it would reduce the time each build takes. Our problem, however, is that building on every check-in is a pointless use of our resources, because in our development branch, we only keep the latest successful build. This means that if 2 people check-in in a short space of time, whoever checked-in last will be the one whose build is kept.
It's this reason (along with disk space limitations) that we changed to using Rolling Builds, so that we only built the development branch a maximum of once every 45 minutes (obviously we could manually trigger builds on otp of that).
What I want to know (and haven't been able to find anywhere) is whether there's a way of combining rolling builds AND continuous integration. So keep building only once every 45 minutes, but only get and build files that have changed.
I'm not even sure it's possible, and if not then I'll look into other ways, but this seems like something that should be possible.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart