docker hub automated build does not use cache, does this mean layers are regenerated every time and that requires docker clients to redownload? - docker

I have noticed that docker hub automated build does not use cache.
I guess that's because they do the builds on different machines and cache can not be shared between them.
So that would mean that every time I push to my git repo, docker hub regenerates all layers, even if they are identical with last time? and that would require docker clients to redownload those layers that were regenerated?
So in practice, should I prefer building locally(or in a CI machine like jenkins) and push to docker hub, rather than using docker hub automated builds?

Related

Use cache docker image for gitlab-ci

I was wondering is it possible to use cached docker images in gitlab registry for gitlab-ci?
for example, I want to use node:16.3.0-alpine docker image, can I cache it in my gitlab registry and pull it from that and speed up my gitlab ci instead of pulling it from docker hub?
Yes, GitLab's dependency proxy features allow you to configure GitLab as a "pull through cache". This is also beneficial for working around rate limits of upstream sources like dockerhub.
It should be faster in most cases to use the dependency proxy, but not necessarily so. It's possible that dockerhub can be more performant than a small self-hosted server, for example. GitLab runners are also remote with respect to the registry and not necessarily any "closer" to the GitLab registry than any other registry over the internet. So, keep that in mind.
As a side note, the absolute fastest way to retrieve cached images is to self-host your GitLab runners and hold images directly on the host. That way, when jobs start, if the image already exists on the host, the job will start immediately because it does not need to pull the image (depending on your pull configuration). (that is, assuming you're using images in the image: declaration for your job)
I'm using a corporate Gitlab instance where for some reason the Dependency Proxy feature has been disabled. The other option you have is to create a new Docker image on your local machine, then push it into the Container Registry of your personal Gitlab project.
# First create a one-line Dockerfile containing "FROM node:16.3.0-alpine"
docker pull node:16.3.0-alpine
docker build . -t registry.example.com/group/project/image
docker login registry.example.com -u <username> -p <token>
docker push registry.example.com/group/project/image
where the image tag should be constructed based on the example given on your project's private Container Registry page.
Now in your CI job, you just change image: node:16.3.0-alpine to image: registry.example.com/group/project/image. You may have to run the docker login command (using a deploy token for credentials, see Settings -> Repository) in the before_script section -- I think maybe newer versions of Gitlab will have the runner authenticate to the private Container Registry using system credentials, but that could vary depending on how it's configured.

What is "PARALLEL BUILD" in docker hub private registry?

While trying to signup with docker hub by selecting a suitable plan, I see pricing is based on Private Repositories Required and Parallel Builds desired.
What is a PARALLEL BUILD in this context?
PS:
After a bit of an internet search, I found that docker hub can pull up my source code from external repositories and build an image by itself and later publish the same into Hub. If this is true and I don't want to use docker hub build service, can I ignore the PARALLEL BUILD part entirely?
Dockerhub is a service provided by Docker for finding and sharing container images with your team. It provides the following major features:
Repositories: Push and pull container images.
Teams & Organizations: Manage access to private repositories of
container images.
Official Images: Pull and use high-quality container images provided
by Docker.
Publisher Images: Pull and use high-quality container images provided
by external vendors. Certified images also include support and
guarantee compatibility with Docker Enterprise.
Builds: Automatically build container images from GitHub and
Bitbucket and push them to Docker Hub
Webhooks: Trigger actions after a successful push to a repository to
integrate Docker Hub with other services.
More info here.
If you see the pricing page of dockerhub. There are two things you should know:
PARALLEL BUILD specifies the number of images that you can build
parallelly (con-currently). The parallelism is across all of the
repos owned by you.
Private Repository specify the number of repository that are private and not exposed publicly.
If you're new to docker and trying out it first time then its ok to go with dockerhub free plan where you can have max 1 private repository and 1 parallel build count.
If you want to store docker images of your project privately that is hosted somewhere on public cloud like AWS then I suggest to use docker registry provided by those cloud providers like AWS ECR, Azure ACR, Google container registry and so on.
Or else you can host docker image privately by running docker registry inside container. Check this.
Hope this helps.

Cache for docker build in gitlab-ci

I want to build docker images in CI task.
With the same configuration from
https://docs.gitlab.com/ee/ci/docker/using_docker_build.html.
Launches of CI tasks don't share docker build cache. So each launch of CI is very long.
How should I configure ci workers and volumes for use of docker build cache between CI tasks from different commits?
GitLab offers a cache-sharing mechanism, which you can use to share the docker build cache (usually /var/lib/docker) between unrelated pipeline runs.
While this sounds straight forward and easy, you may need to configure your runners depending on how exactly your runners are set up.

How to get transferable docker compose stack without dockerhub

I have few docker images composed together in the stack using docker-compose.yml.
Now I want to transfer whole docker compose stack to the other host machine without uploading to the dockerhub,
And deploy it on the docker swarm.
I saw there is a thing called docker compose bundle, would that help?
If you’re deploying on a multi-host swarm (or something similar like Kubernetes or Nomad) you all but need a Docker registry. It doesn’t specifically have to be Docker Hub — quay.io, Amazon’s ECR, Google’s GCR, and self-hosted registries all work fine — but you do need to have pushed the built images somewhere where the orchestrator can retrieve them by name.
I’ve never used docker-compose bundle myself, but its documentation also notes that its operation “requires interaction with a Docker registry”.
The only real alternative is using docker save and docker load to manually move images between machines, but as a manual process it will get tedious very quickly, and you need to make sure an identical set of images are on every machine for consistency. Using a registry will be vastly easier.
The easyest way to do it is to use a Docker registry. The problem with Docker Hub is that you can only have one private registry, the rest must be public or paid.
Thankfully, there are other (free) alternatives:
Deploy your own private registry. Here is a nice tutorial where you can try it in the browser.
Use a free private registry. I personnaly use Codefresh. It can automatically build your image from a private repo (like bitbucket who has free plan too), but you can also just use it like a "simple" docker registry and push and pull your Docker images there.

Is git pull, docker-compose build and docker-compose up -d a good way to deploy complete solution on an empty machine

Recently, we just finished web application solution using Docker.
https://github.com/yccheok/celery-hello-world/tree/nginx (The actual solution is hosted in private repository. This example just a quick glance on how our project structure looks like)
We plan to purchase 1 empty Linux machine on deploy on it. We might purchase more machines in the future but with current traffic right now, 1 machine will be sufficient.
My plan for deployment on the single empty machine is
git pull <from private code repository>
docker-compose build
docker-compose up -d
Since we are going to deploy to multiple machines in near future, I was wondering, is this a common practice to deploy docker application into a fresh empty machine?
Is there anything we can utilize from https://hub.docker.com/ , without requiring us to perform git pull during deployment stage?
You don't want to perform git pull in each machine - your intuition is correct.
Instead you want to use remote docker registry (as docker hub for example).
So the right flow, each time your source code (git repo) is changed:
git pull from all relevant repos.
docker-compose build to build all relevant images.
docker-compose push to push all images (diff) to remote registry.
docker-compose pull in your production machines, to get the latest updated images.
docker-compose up to start all containers.
First 3 steps should be done in your CI machine (for example, as a jenkins job). Steps 4-5 in your production machines.
EDIT: one thing to consider. I think build via docker-compose is bad. Consider building directly by docker build -f Dockerfile -t repo/image:tag . and in docker-compose just specify the image name.
My opinion is you should not BUILD images on production machines. Because the image might be different than you would expect and you should limit yourself what you do on production machines.. With that being said, i would recommend:
updating the code on your local computer (development)
when you push code to git, you should use some software to build
your images from your push. For example Gitlab-CI (Continuous
integration tool)
gitlab-ci will build the image, then it could run some tests on that
image, and then deploy it to production (this build image)
On you production machine just do docker-compose pull &&
docker-compose up -d and that is it.
I strongly recommend to build images on other machine than production machines, and use some CI tool to test your images before deploying. For example https://docs.gitlab.com/ce/ci/README.html
Deploying it on a fresh machine or the other way around would be fine.
The best way to go around is to make a private repo on https://hub.docker.com/ and push your images there.
Building and shipping the image
git pull
docker build
docker login
docker push repo/image
Pulling the shipped image and deploying
docker login on the server
docker pull repo/image
docker-compose up -d
Though i would recommend you to look at container scheduling using kubernetes and setting up your CI/CD stack with jenkins to automate this process, in case something bad happens it can be a life saver.

Resources