What is the purpose of pushing an image in a CI/CD pipeline? - docker

Context: Reading through this blog post.
Pushing images to a registry seems to be the "right thing to do" ... but I don't understand why.
What purpose does this serve? Is it because the server I ssh into needs to have a local copy of the image? And to do that, one approach is to pull an image from a registry?

What purpose does this serve? Is it because the server I ssh into needs to have a local copy of the image? And to do that, one approach is to pull an image from a registry?
From the CI/CD perspective, a docker registry is the equivalent of an artifact repository for images. You want a central source of these images to download from as you go from one docker host to another since your build server is most likely different than your dev and prod servers.
Couldn't I just upload an image from one machine (say a CI/CD server) via ssh? using dockerhub seems needlessly ceremonious to me. Like in this example (I know this api is deprecated but it illustrates my point).
It is possible to save/load images directly to a docker host, but there a few major downsides. First, you lose any benefit from docker's layered filesystem. When building an app in CI/CD, most of the time only the last few layers should need to be rebuilt with your application changes. There should be the same previous base image and various common layers to build your app that remain identical. With a registry, these common layers are seen, only the difference is pushed and pulled, making your deploys faster and saving you disk space. With a save/load command, all layers are sent every time since you do not know the state of the remote server when you run the save.
Second, this doesn't scale as you add hosts to run images. Every host would need the image copied on the chance you want to run it on that host, e.g. to handle failover or load balancing. It also won't work if you move to swarm mode or kubernetes since you could easily add new nodes to the cluster that won't have your image. Swarm mode defaults to looking up the sha256 of the image on the registry to guarantee the same image is always used even if the tag is modified on the registry after the initial deploy.
Keep in mind you can run your own registry server (there's a docker image and the api is open). Many artifact repositories (e.g. artifactory and nexus) include support for a docker registry. And many cloud providers include a registry with their container offerings. So you do not need to push to a remote docker hub to deploy locally.
One last point is that a registry server is useful to developers who can now pull the same image used in dev and prod to test against other microservices they are writing locally without the need to build everything locally or ssh to a CI/CD server or even prod to save and scp images back to their laptops.

Usually, you use a CI, CD pipeline when you want to streamline your build / test/ deploy process, and usually this happens if you have a production infrastructure to maintain that is actually critical to your business.
There is no need for a CI/CD pipeline if you're just playing around / prototyping IMO, in which case you can build you docker images on the machine directly, or ssh an image over. That's perfectly reasonable.
Look at the 'registry' as a repository of your binary image (i.e. a fixed version of your code that ideally is versioned and you know works)
Then deploying is as simple as telling your servers to pull the image and run it, from anywhere.
On a flexible architecture, you might have nodes coming up or going down at any time, and they need to be able to pull the latest code from somewhere to get back up and running automatically, at any time, without intervention.

Registry is single source of truth in this case. It means, that you can have multiple nodes (servers), cluster(s) and have the single place from where you can get your images. Also if of your nodes drop-down - you can fast start your image in the new one. Also you can automate your image's updating using registry's webhook, for example when you add new version of image registry gonna send webhook to any service that can upgrade your containers to the newest version.

Consider docker image as a new way of distribution of your software to your servers and docker-registry as a centralized storage of shared images(the like npm.org for js, maven.org for java).
For example,
if you develop java application, years before docker you may use .jar files to do it. The way docker image is better is that also include all OS level dependencies like JDK/JRE and system configurations. So this helps you to avoid "it works on my machine" effect.
To distribute docker image you can also use just docker file and build it all the time on every machine. Docker-Repository allows you to have centralized storage of pre-build images.
Pushing to docker-repository in your CI/CD allows to build your distributive once and further work with the same distributive both on integration and prod environments.
Using just Dockerfile will not guarantee you the same state on every build in every moment of time because you may install external dependencies in your Dockerfile script which may be updated or even removed between two sequential builds.

Related

Docker dealing with images instead of Dockerfiles

Can someone explain to me why the normal Docker process is to build an image from a Dockerfile and then upload it to a repository, instead of just moving the Dockerfile to and from the repository?
Let's say we have a development laptop and a test server with Docker.
If we build the image, that means uploading and downloading all of the packages inside the Dockerfile. Sometimes this can be very large (e.g. PyTorch > 500MB).
Instead of transporting the large imagefile to and from the server, doesn't it make sense to, perhaps compile the image locally to verify it works, but mostly transport the small Dockerfile and build the image on the server?
This started out as a comment, but it got too long. It is likely to not be a comprehensive answer, but may contain useful information regardless.
Often the Dockerfile will form part of a larger build process, with output files from previous stages being copied into the final image. If you want to host the Dockerfile instead of the final image, you’d also have to host either the (usually temporary) processed files or the entire source repo & build script.
The latter is often done for open source projects, but for convenience pre-built Docker images are also frequently available.
One tidy solution to this problem is to write the entire build process in the Dockerfile using multi-stage builds (introduced in Docker CE 17.05 & EE 17.06). But even with the complete build process described in a platform-independent manner in a single Dockerfile, the complete source repository must still be provided.
TL,DR: Think of a Docker image as a regular binary. It’s convenient to download and install without messing around with source files. You could download the source for a C application and build it using the provided Makefile, but why would you if a binary was made available for your system?
Instead of transporting the large imagefile to and from the server,
doesn't it make sense to, perhaps compile the image locally to verify
it works, but mostly transport the small Dockerfile and build the
image on the server?
Absolutely! You can, for example, set up an automated build on Docker Hub which will do just that every time you check in an updated version of your Dockerfile to your GitHub repo.
Or you can set up your own build server / CI pipeline accordingly.
IMHO, one of the reason for building the images concept and putting into repository is sharing with people too. For example we call Python's out of the box image for performing all python related stuff for a python program to run in Dockerfile. Similarly we could create a custom code(let's take example I did for apache installation with some custom steps(like ports changes and additionally doing some steps) I created its image and then finally put it to my company's repository.
I came to know after few days that may other teams are using it too and now when they are sharing it they need NOT to make any changes simply use my image and they should be done.

Web development workflow using Github and Docker

I learnt the basics of github and docker and both work well in my environment. On my server, I have project directories, each with a docker-compose.yml to run the necessary containers. These project directories also have the actual source files for that particular app which are mapped to virtual locations inside the containers upon startup.
My question is now- how to create a pro workflow to encapsulate all of this? Should the whole directory (including the docker-compose files) live on github? Thus each time changes are made I push the code to my remote, SSH to the server, pull the latest files and rebuild the container. This rebuilding of course means pulling the required images from dockerhub each time.
Should the whole directory (including the docker-compose files) live on github?
It is best practice to keep all source code including dockerfiles, configuration ... versioned. Thus you should put all the source code, dockerfile, and dockercompose in a git reporitory. This is very common for projects on github that have a docker image.
Thus each time changes are made I push the code to my remote, SSH to the server, pull the latest files and rebuild the container
Ideally this process should be encapsulated in a CI workflow using a tool like Jenkins. You basically push the code to the git repository,
which triggers a jenkins job that compiles the code, builds the image and pushes the image to a docker registry.
This rebuilding of course means pulling the required images from dockerhub each time.
Docker is smart enough to cache the base images that have been previously pulled. Thus it will only pull the base images once on the first build.

DevOps vs Docker

I am wondering how exactly does docker fit into CI /CD .
I understand that with help of containers, you may focus on code , rather than dependencies/environment. But once you check-in your code, you will expect tools like TeamCity, Jenkins or Bamboo to take care of integration build , integration test/unit tests and deployment to target servers ( after approvals) where you will expect same Docker container image to run the built code.
However, in all above, Docker is nowhere in the CI/CD cycle , though it comes into play when execution happens at server. So, why do I see articles listing it as one of the things for DevOps.
I could be wrong , as I am not a DevOps guru, please enlighten !
Docker is just another tool available to DevOps Engineers, DevOps practitioners, or whatever you want to call them. What Docker does is it encapsulates code and code dependencies in a single unit (a container) that can be run anywhere where the Docker engine is installed. Why is this useful? For multiple reasons; but in terms of CI/CD it can help Engineers separate Configuration from Code, decrease the amount of time spent doing dependency management etc., can use it to scale (with the help of some other tools of course). The list goes on.
For example: If I had a single code repository, in my build script I could pull in environment specific dependencies to create a Container that functionally behaves the same in each environment, as I'm building from the same source repository, but it can contain a set of environment specific certificates and configuration files etc.
Another example: If you have multiple build servers, you can create a bunch of utility Docker containers that can be used in your CI/CD Pipeline to do a certain operation by pulling down a Container to do something during a stage. The only dependency on your build server now becomes Docker Engine. And you can change, add, modify, these utility containers independent of any other operation performed by another utility container.
Having said all of that, there really is a great deal you can do to utilize Docker in your CI/CD Pipelines. I think an understanding of what Docker is, and what Docker can do is more important that a "how to use Docker in your CI/CD" guide. While there are some common patterns out there, it all comes down to the problem(s) you are trying to solve, and certain patterns may not apply to a certain use case.
Docker facilitates the notion of "configuration as code". I can write a Dockerfile that specifies a particular base image that has all the frameworks I need, along with the custom configuration files that are checked into my repository. I can then build that image using the Dockerfile, push it to my docker registry, then tell my target host to pull the latest image, and then run the image. I can do all of this automatically, using target hosts that have nothing but Linux installed on them.
This is a simple scenario that illustrates how Docker can contribute to CI/CD.
Docker is also usefull for building your applications. If you have multiple applications with different dependencies you can avoid having a lot of dependencies and conflicts on your CI machine by building everything in docker containers that have the necessary dependencies. If you need to scale in the future all you need is another machine running your CI tool (like jenkins slave), and an installation of docker.
When using microservices this is very important. One applicatio can depend on an old version of a framework while another needs the new version. With containers thats not problem.
Docker is a DevOps Enabler, Not DevOps Itself: Using Docker, developers can support new development, enhancement, and production support tasks easily. Docker containers define the exact versions of software in use, this means we can decouple a developer’s environment from the application that needs to be serviced or enhanced.
Without Pervasive Automation, Docker Won’t Do Much for You : You can’t achieve DevOps with bad code. You must first ensure that the code being delivered is of the highest quality by automating all developer code delivery tasks, such as Unit testing, Integration testing, Automated acceptance testing (AAT), Static code analysis, code review sign offs & pull request workflow, and security analysis.
Leapfrogging to Docker without Virtualization Know-How Won’t Work : Leapfrogging as an IT strategy rarely works. More often than not new technologies bring about abstractions over existing technologies. It is true that such abstractions increase productivity, but they are not an excuse to skip the part where we must understand how a piece of technology works.
Docker is a First-Class Citizen on All Computing Platforms : This is the right time to jump on to the Docker bandwagon. For the first time ever Docker is supported on all major computing platforms in the world. There are two kinds of servers: Linux servers and Windows servers. Native Docker support for Linux existed from Day 1, since then Linux support has been optimized to the point of having access to the pint-sized.
Agile is a Must to Achieve DevOps : DevOps is a must to achieve Agile. The point of Agile is adding and demonstrating value iteratively to all stakeholders without DevOps you likely won’t be able to demonstrate the value you’re adding to stakeholders in a timely manner. So why is Agile also a must to achieve DevOps? It takes a lot of discipline to create a stream of continuous improvement and an Agile framework like Scrum defines fundamental qualities that a team must possess to begin delivering iteratively.
Docker saves the wastage for your organization capital and resources by containerizing our application. Containers on a singe host are isolated from each other and thy uses same OS resources. This frees up RAM, CPU and storage etc. Docker makes it easy to package our application along with all the required dependencies in an image. For most of the application we have readily available base images. One can create customized base image as well. We build our own custom image by writing simple Dockerfile. We can have this image shipped to central registry from where we can PULL it to deploy into various environments like QA, STAGE and PROD. This All these activities can be automated by CI tools like Jenkins.
In a CI/CD pipeline you can expect the Docker coming into picture when the build is ready. Initially CI server (Jenkins) will checkout the code from SCM in a temporary workspace where the application is built. Once you have the build artifact ready, you can package it as an image with the dependencies. Jenkins does this by executing simple docker build commands.
Docker removes what we all know the matrix from hell problem, making the environments independent with its container technology. An open source project Docker changed the game by simplifying container workflows and this has resulted in a lot of excitement around using containers in all stages of the software delivery lifecycle, from development to production.
It is not just about containers, it involves building Docker images, managing your images and dependencies on any Docker registry, deploying to an orchestration platform, etc. and it all comes under CI/CD process.
DevOps is a culture or methodology or procedure to deliver our development is very fast. Docker is a one of the tool in our devops culture to deploy application as container technology (use less resources to deploy our application).
Docker just package devloper environment to run on other system so that developer need not to worry about whether there code work in there system and not work in production due to differences in environment and operating system.
It just make the code portable to other environments.

How do you put your source code into Kubernetes?

I am new to Kubernetes and so I'm wondering what are the best practices when it comes to putting your app's source code into container run in Kubernetes or similar environment?
My app is a PHP so I have PHP(fpm) and Nginx containers(running from Google Container Engine)
At first, I had git volume, but there was no way of changing app versions like this so I switched to emptyDir and having my source code in a zip archive in one of the images that would unzip it into this volume upon start and now I have the source code separate in both images via git with separate git directory so I have /app and /app-git.
This is good because I do not need to share or configure volumes(less resources and configuration), the app's layer is reused in both images so no impact on space and since it is git the "base" is built in so I can simply adjust my dockerfile command at the end and switch to different branch or tag easily.
I wanted to download an archive with the source code directly from repository by providing credentials as arguments during build process but that did not work because my repo, bitbucket, creates archives with last commit id appended to the directory so there was no way o knowing what unpacking the archive would result in, so I got stuck with git itself.
What are your ways of handling the source code?
Ideally, you would use continuous delivery patterns, which means use Travis CI, Bitbucket pipelines or Jenkins to build the image on code change.
that is, every time your code changes, your automated build will get triggered and build a new Docker image, which will contain your source code. Then you can trigger a Deployment rolling update to update the Pods with the new image.
If you have dynamic content, you likely put this a persistent storage, which will be re-mounted on Pod update.
What we've done traditionally with PHP is an overlay on runtime. Basically the container will have a volume mounted to it with deploy keys to your git repo. This will allow you to perform git pull operations.
The more buttoned up approach is to have custom, tagged images of your code extended from fpm or whatever image you're using. That way you would run version 1.3 of YourImage where YourImage would contain code version 1.3 of your application.
Try to leverage continuous integration and continuous deployment. You can use Jenkins as CI/CD server, and create some jobs for building image, pushing image and deploying image.
I recommend putting your source code into docker image, instead of git repo. You can also extract configuration files from docker image. In kubernetes v1.2, it provides new feature 'ConfigMap', so we can put configuration files in ConfigMap. When running a pod, configuration files will be mounted automatically. It's very convenience.

Can I put my docker repository/image on GitHub/Bitbucket?

I know that Docker hub is there but it allows only for 1 private repository. Can I put these images on Github/Bitbucket?
In general you don't want to use version control on large binary images (like videos or compiled files) as git and such was intended for 'source control', emphasis on the source. Technically, here's nothing preventing you from doing so and putting the docker image files into git (outside the limits of the service you're using).
One major issue you'll have is git/bitubucket have no integration with Docker as neither provide the Docker Registry api needed for a docker host to be able to pull down the images as needed. This means you'll need to manually pull down out of the version control system holding the image files if you want to use it.
If you're going to do that, why not just use S3 or something like that?
If you really want 'version control' on your images (which docker hub does not do...) you'd need to look at something like: https://about.gitlab.com/2015/02/17/gitlab-annex-solves-the-problem-of-versioning-large-binaries-with-git/
Finally, docker hub only allows one FREE private repo. You can pay for more.
So the way to go is:
Create a repository on Github or Bitbucket
Commit and push your Dockerfile (with config files if necessary)
Create an automated build on Docker Hub which uses the Github / Bitbucket repo as source.
In case you need it all private you can self-host a git service like Gitlab or GOGS and of course you can also selfhost a docker registry service for the images.
Yes, since Sept. 2020.
See "Introducing GitHub Container Registry" from Kayla Ngan:
Since releasing GitHub Packages last year (May 2019), hundreds of millions of packages have been downloaded from GitHub, with Docker as the second most popular ecosystem in Packages behind npm.
Available today as a public beta, GitHub Container Registry improves how we handle containers within GitHub Packages.
With the new capabilities introduced today, you can better enforce access policies, encourage usage of a standard base image, and promote innersourcing through easier sharing across the organization.
Our users have asked for anonymous access for public container images, similar to how we enable anonymous access to public repositories of source code today.
Anonymous access is available with GitHub Container Registry today, and we’ve gotten things started today by publishing a public image of our own super-linter.
GitHub Container Registry is free for public images.
With GitHub Actions, publishing to GitHub Container Registry is easy. Actions automatically suggests workflows based for you based on your work, and we’ve updated the “Publish Docker Container” workflow template to make publishing straightforward.
GitHub is in the process of releasing something similar to ECR or Docker Hub. At the time of writing this, it's in Alpha phase and you can request access.
From GitHub:
"GitHub Package Registry is a software package hosting service, similar to npmjs.org, rubygems.org, or hub.docker.com, that allows you to host your packages and code in one place. You can host software packages privately or publicly and use them as dependencies in your projects."
https://help.github.com/en/articles/about-github-package-registry
I guess you are saying about docker images. You can setup your own private registry which will contain the docker images. If you are not pushing only dockerfiles, but are interested in pushing the whole image, then pushing the images as a whole to github is a very bad idea. Consider a case you have 600 MB of docker image, pushing it to github is like putting 600 MB of data to a github repo, and if you keep on pushing more images there, it will get terribly bad.
Also, docker registry does the intelligent mapping of storing only a single copy of a layer (this layer can be referenced by multiple images). If you use github, you are not going to use this use-case. You will end up storing multiple copies of large files which is really really bad.
I would definitely suggest you to go with a private docker registry rather than going with github.
If there is a real need of putting docker image to github/bitbucket you can try to save it into archive (by using https://docs.docker.com/engine/reference/commandline/save/) and commit/push it to your repository.

Resources