Github actions docker caching

Github actions docker caching - docker

I think this will be useful for many others.
I am using https://github.com/phips28/gh-action-bump-version to automatically bump NPM versions in Github Actions.
Is there a way to cache the docker image of this action so it doesn't have to build each time? It takes a long time to run and it runs upfront before the rest of the steps. I am sure this is common for similar types of Github Actions that pull docker images.
The docker image looks pretty slim so I am not sure there will be any benefit of trying to optimise the image itself. More to do with how to configure Github Actions.
Any suggestions?

TLDR
Somewhat! You can change the GitHub workflow file to pull an image from a repository instead of building each run. While this doesn't cache the image, it is significantly faster. This can be achieved by editing your flow to look like the following:
- name: 'Automated Version Bump'
id: version-bump
uses: 'docker://phips28/gh-action-bump-version:master'
with:
tag-prefix: 'v'
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
Please note the docker:// prefix to the uses: statement, coupled with the change from #master to :master in order to convert the name to a valid image name.
I have opened up a PR on that repository with a suggested fix :^)
Original Response
A very good question, and one that little information can be found on in the official documents (Though GitHub acknowledge the delay in their docs).
GitHub Staff response
I had a search for you and managed to find an article from September 2019 on the GitHub community forum about this exact topic. It inevitably linked to this article from July 2019.
There is a wonderful explanation about how building each time will still utilise the docker build cache, reducing times, yet allow for flexibility in terms of using the latest version of the base image etc, etc.
There is a proposed solution if you aren't bothered about the flexibility of updates, and just want the shortest build time possible, though I am not certain if that syntax is still valid currently:
But let’s say that I don’t want my Action to even evaluate the Dockerfile every time the Action runs, because I want the absolute fastest runtime possible. I want a predefined Docker container to be spun up and get right to work. You can have that too! If you create a Docker image and upload it to Docker Hub or another public registry, you can instruct your Action to use that Docker image specifically using the docker:// form in the uses key. See the GitHub Actions documentation 72 for specifics. This requires a bit more up-front work and maintenance, but if you don’t need the flexibility that the above Dockerfile evaluation system provides, then it may be worth the tradeoff.
Unfortunately the link to the Github actions documentation is broken, but this does suggest that the author of the action could allow this behaviour if they modified their action
Alternate Ideas
If you require the ability to control the cache on the executor host (to truly cache an image), then you may think of considering hosting your own GitHub runner, as you would have total control over the images there. Though I imagine this is potentially a detterent given that GitHub actions is largely a free service (with limits, and this is perhaps one of them!)
You might want to consider adding a task that utilises the file cache action, and attempting to export the gh-action-bump-version to a file contents through docker commit or docker save, and then reinflate it on your next run. However, this introduces complexity and may not save you time in the long run.. Edit: This is a horrible idea now that we know actions can support pulling images from registries.
I hope this helps you, and anyone else searching for more information 👍

There is an excellent article on the docker blog explaining how to cache docker images with actions/cache and buildx (which allows you to specify a custom cache path).
It can be found here: https://www.docker.com/blog/docker-github-actions/.

Related

Is there a tool that notifies immediately about new docker image versions

I have a private docker registry that I'm using for my own images. I would like that the container that run this images (via docker-compose) get updated immediately, when I push a new version.
I know that there are Watchtower (https://containrrr.dev/watchtower/) and Diun (https://crazymax.dev/diun/), but these containers are only polling in a defined interval (I'm using watchtower now, but it is not as fast as I like even with a poll every minute).
I found that the docker registry is sending notifications when a container is updated (https://docs.docker.com/registry/notifications/) and was looking for a service that uses this. But I didn't found any tool, expect for a Jenkins Plugin (https://github.com/jenkinsci/dockerhub-notification-plugin). Am I looking at the wrong places or is there just no tool that works with the notifications from the registry?

Since webhooks aren't part of OCI's distribution-spec, any solution that implements this will be registry specific. That means you won't be able to easily change registries in the future without breaking this implementation. Part of the value of registries and the standards they use is allowing portability, both of the images, and of tooling that works with those registries. What works on distribution/distribution specific APIs may not work on Nexus, Artifactory, or any of the cloud/SaaS solutions like Docker Hub, Quay, ECR, ACR, GCR, etc. (I'm leaving Harbor out of this list because they are based on distribution/distribution, so that's the one registry where it will probably work.)
If you want the solution to be portable, then it either needs to be designed into a higher level workflow (e.g. the same CI pipeline that does the push of the image triggers the deploy), or you are left with polling which a lot of GitOps solutions implement. Also realize that the S3 backend many registries use is eventually consistent, so with a distributed/HA registry implementation, trying to pull the image before the data store has had a chance to replicate may trigger race conditions.

I think you have to look at the problem from a different angle. If you shift your focus from containers, you will notice that GitOps might be the perfect fit. You can achieve the same thing with your CI/CD pipeline that trigger a redeployment.
If you want to stick with containers only, I can recommend taking a look at Harbor that can call a Webhooks after a push. See docs (https://container-registry.com/docs/user-manual/projects/configuration/webhooks/)

In docker, what does it mean to "reduce the cache interval of the build-index component"

We are using Uber Kraken to increase container download speeds in a kubernetes cluster, it works great.
However, we commonly mutate tags (upload a new version of :latest). In the limitations section of the Uber Kraken Github page they state:
Mutating tags (e.g. updating a latest tag) is allowed, however, a few things will not work: tag lookups immediately afterwards will still return the old value due to Nginx caching, and replication probably won't trigger. We are working on supporting this functionality better. If you need tag mutation support right now, please reduce the cache interval of the build-index component. If you also need replication in a multi-cluster setup, please consider setting up another Docker registry as Kraken's backend.
What do they mean by "reduce the cache interval of the build-index component"? I don't quite understand what they are referring to in the docker universe.

This sentence comes from PR 61.
It offers a better alternative than the previous documentation, which stated:
Mutating tags is allowed, however the behavior is undefined.
A few things will go wrong:
replication probably won't trigger, and
most tag lookups will probably still return the old tag due to caching.
We are working supporting this functionality better.
If you need mutation (e.g. updating a latest tag) right now, please consider implementing your own index component on top of a consistent key-value store.
That last sentence was before:
please consider setting up another docker registry as Kraken's backend,
and reduce cache interval of build-index component
That would replace the current Storage Backend For Origin And Build-Index.

What services should I use for autobuild of computationally intensive dockers?

I have a repo with a piece of software, and a docker for users who have installation problems. I need to re-build the docker every time I publish a new version, and also want to use automated testing after it. DockerHub has such functionality, but builds are too long and are killed by timeout. Also I can't use tests there, as some tests use ~8 Gb RAM.
Are there any other services for these tasks? I'm fine with paying for it, but don't want to spend time for long configuration and maintenance (e.g. for having my own build server).

TravisCI.
It's fairly easy to start, hosted CI service which is free as long as you're keeping the repo public.
It's well known, common and you will find thousands of helpful questions and answers under the [travisci] tag
I'm adding a link to their documentation with example on how to build Dockerfile.
https://docs.travis-ci.com/user/docker/#building-a-docker-image-from-a-dockerfile
Also, I've tried to find memory and time limitations but couldn't find any in quick search.

Complex/Orchestrated CD with AWS CodePipeline or others

Building a AWS serverless solution (lambda, s3, cloudformation etc) I need an automated build solution. The application should be stored in a Git repository (pref. Bitbucket or Codecommit). I looked at BitBucket pipelines, AWS CodePipeline, CodeDeploy , hosted CI/CD solutions but it seems that all of these do something static as in receiving a dumb signal that something changed to rebuild the whole environment.... like it is 1 app, not a distributed application.
I want to define ordered steps of what to do depending on the filetype per change.
E.g.
1. every updated .js file containing lambda code should first be used to update the existing lambda
2. after that, every new or changed cloudformation file/stack shoud be used to update or create existing ones, there may be a needed order (importing values from each other)
3. after that, code for new lambda's in .js files should be used to update the created lambda's (prev step) code.
Non updated resources should NOT be updated or recreated!
It seems that my pipelines should be ordered AND have the ability to filter input (e.g. only .js files from a certain path) and receive as input also what the name of the changed resource(s) is(are).
I dont seem to find this functionality withing AWS or hosted git solutions like BitBucket or CI/CD pipelines like CircleCI or Codeship, aws CodePipeline, CodeDeploy etc.
How come? Doesn't anyone need this? Seems like a basic requirement in my eyes....

I looked again at available AWS tooling and got to the following conclusion:
When coupling CodePipeline to CodeCommit repositry, every commit puts a whole package of the repositry on S3 as input for CodeCommit. So not only the changes but everything.
In CodePipeline there is the orchestration functionality i was looking for. You can have actions for every component like create-change-set for SAM component and execute-chage-set etc and have control over the order of all.
But:
Since all code is given as input I assume all actions in CodeCommit will be triggered even for a small change in code which does not affect 99% of the resources. Underwater SAM or CF will determine themself what did or did not change. But it is not very efficient. See my post here.
I cannot see in the pipeline overview which one was run the last time and its status...
I cannot temporary disable a pipeline or trigger with custom input
In the end I think to make a main pipeline with custom lambda code determining what actually changed using CodeCommit API and splitting all actions in sub pipelines. From the main pipeline I will push their needed input to S3 and execute them.

(i'm not allow to comment, so i'll try and provide an answer instead - probably not the one you were hoping for :) )
There is definitely a need and at Codeship we're looking into how best to support FaaS/Serverless workflows. It's been a bit of a moving target over the last years, but more common practices etc. are starting to emerge/mature to a point where it makes more sense to start codifying them.
For now, it seems most people working in this space have resorted to scripting (either the Serverless framework, or directly against the FaaS providers) but everyone's struggling with the issue of just deploying what's changes vs. deploying everything as you point to. Adding further complexity with sequencing is obviously just making things harder.
Most services (Codeship included) will allow you some form of sequenced/stepped approach to deploying, but you'll have to do all the heavy lifting of working out what has changed etc.
As to your question of How come? i think it's purely down to how fast the tooling has been changing lately combined with how few are really doing it. There's a huge push for larger companies to move to K8s and i think they've basically just drowned out the FaaS adopters. Not that it should be like that, or that we at Codeship don't want to change that; it's just how i personally see things.

Managing Docker images over time

Folks that are building Docker images on push - how do you manage the different versions over time?
Do they all get their own tag? Do you tag based on the git hash?
Do you delete old images after some time? Don't they take up a lot of space (1GB+) each?

how do you manage the different versions over time?
The first thing to note is tags can't be trusted over time. They are not guaranteed to refer to the same thing or continue to exist so use Dockerfile LABEL's which will remain consistent and always be stored with the image. label-schema.org is a good starting point.
Do they all get their own tag? Do you tag based on the git hash?
If you need something unique to refer to every build with, just use the image sha256 sum. If you want to attach extra build metadata to an image, use a LABEL as previously mentioned and include the git hash and whatever versioning system you want. If using the sha256 sum sounds hard, tags are still needed to refer to multiple image releases so you will need some system.
Git tags, datetimes, build numbers all work. Each have their pros and cons depending on your environment and what you are trying to tie together as a "release". It's worthwhile to note that a Docker image might come from a Dockerfile with a git hash, but building from that git hash will not produce a consistent image over time, if you source an image FROM elsewhere.
Do you delete old images after some time?
Retention entirely depends on your software/systems/companies requirements or policy. I've seen environments where audit requirements have been high which increases build/release retention time, down to the "I want to re run these tests on this build" level. Other environments have minimal audit which tends to drop retention requirements. Some places don't even try to impose any release management at all (this is bad). This can't really be answered by someone out here for your specific environment, there are minimums though that would be a good idea to stick to.
The base requirement is having an artefact for each production release stored. This is generally "forever" for historical purposes. Actively looking back more than a release or two is pretty rare (again this can depend on your app), so archival is a good idea and easy to do with a second registry on cheap storage/hosting that is pushed a copy of everything (i.e. not on your precious ssd's).
I've never seen a requirement to keep all development builds over time. Retention generally follows your development/release cycle. It's rare you need access to dev builds out of your current release + next release. Just remember to LABEL + tag dev builds appropriately so clean up is simple. -dev -snapshot -alpha.0 whatever.
Don't they take up a lot of space (1GB+) each?
It's normally less than you think but yes they can be large as on top of your application you have an OS image. That's why lots of people start with alpine as it's tiny compared to most distros, as long as you don't have anything incompatible with musl libc.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart