Is "differentially" updating a local Docker image with the latest changes possible? - docker

This is my first time using Docker, so excuse my potentially improper terminology. Our team uses a Docker image that contains the entire development environment, minus the git repo that needs to be cloned manually within a persistent container based on this image. The Docker image hosted on our repository is frequently updated, e.g. to include new 3rd party dependencies and other files, making my local environment - which includes a directory with the git repo where I do my work - out-of-date.
Is there a way to update my local environment with the latest image changes without having to pull the entire image with docker pull image:latest and start from scratch? Assuming there are no conflicts, I would like to preserve my local changes (git repo clone, local filesystem modifications etc.); so I'm looking for something like git pull for Docker, if that makes sense.
I must have missed something, but a search on this issue didn't yield any viable solutions.

Related

What is the purpose of pushing an image in a CI/CD pipeline?

Context: Reading through this blog post.
Pushing images to a registry seems to be the "right thing to do" ... but I don't understand why.
What purpose does this serve? Is it because the server I ssh into needs to have a local copy of the image? And to do that, one approach is to pull an image from a registry?
What purpose does this serve? Is it because the server I ssh into needs to have a local copy of the image? And to do that, one approach is to pull an image from a registry?
From the CI/CD perspective, a docker registry is the equivalent of an artifact repository for images. You want a central source of these images to download from as you go from one docker host to another since your build server is most likely different than your dev and prod servers.
Couldn't I just upload an image from one machine (say a CI/CD server) via ssh? using dockerhub seems needlessly ceremonious to me. Like in this example (I know this api is deprecated but it illustrates my point).
It is possible to save/load images directly to a docker host, but there a few major downsides. First, you lose any benefit from docker's layered filesystem. When building an app in CI/CD, most of the time only the last few layers should need to be rebuilt with your application changes. There should be the same previous base image and various common layers to build your app that remain identical. With a registry, these common layers are seen, only the difference is pushed and pulled, making your deploys faster and saving you disk space. With a save/load command, all layers are sent every time since you do not know the state of the remote server when you run the save.
Second, this doesn't scale as you add hosts to run images. Every host would need the image copied on the chance you want to run it on that host, e.g. to handle failover or load balancing. It also won't work if you move to swarm mode or kubernetes since you could easily add new nodes to the cluster that won't have your image. Swarm mode defaults to looking up the sha256 of the image on the registry to guarantee the same image is always used even if the tag is modified on the registry after the initial deploy.
Keep in mind you can run your own registry server (there's a docker image and the api is open). Many artifact repositories (e.g. artifactory and nexus) include support for a docker registry. And many cloud providers include a registry with their container offerings. So you do not need to push to a remote docker hub to deploy locally.
One last point is that a registry server is useful to developers who can now pull the same image used in dev and prod to test against other microservices they are writing locally without the need to build everything locally or ssh to a CI/CD server or even prod to save and scp images back to their laptops.
Usually, you use a CI, CD pipeline when you want to streamline your build / test/ deploy process, and usually this happens if you have a production infrastructure to maintain that is actually critical to your business.
There is no need for a CI/CD pipeline if you're just playing around / prototyping IMO, in which case you can build you docker images on the machine directly, or ssh an image over. That's perfectly reasonable.
Look at the 'registry' as a repository of your binary image (i.e. a fixed version of your code that ideally is versioned and you know works)
Then deploying is as simple as telling your servers to pull the image and run it, from anywhere.
On a flexible architecture, you might have nodes coming up or going down at any time, and they need to be able to pull the latest code from somewhere to get back up and running automatically, at any time, without intervention.
Registry is single source of truth in this case. It means, that you can have multiple nodes (servers), cluster(s) and have the single place from where you can get your images. Also if of your nodes drop-down - you can fast start your image in the new one. Also you can automate your image's updating using registry's webhook, for example when you add new version of image registry gonna send webhook to any service that can upgrade your containers to the newest version.
Consider docker image as a new way of distribution of your software to your servers and docker-registry as a centralized storage of shared images(the like npm.org for js, maven.org for java).
For example,
if you develop java application, years before docker you may use .jar files to do it. The way docker image is better is that also include all OS level dependencies like JDK/JRE and system configurations. So this helps you to avoid "it works on my machine" effect.
To distribute docker image you can also use just docker file and build it all the time on every machine. Docker-Repository allows you to have centralized storage of pre-build images.
Pushing to docker-repository in your CI/CD allows to build your distributive once and further work with the same distributive both on integration and prod environments.
Using just Dockerfile will not guarantee you the same state on every build in every moment of time because you may install external dependencies in your Dockerfile script which may be updated or even removed between two sequential builds.

Web development workflow using Github and Docker

I learnt the basics of github and docker and both work well in my environment. On my server, I have project directories, each with a docker-compose.yml to run the necessary containers. These project directories also have the actual source files for that particular app which are mapped to virtual locations inside the containers upon startup.
My question is now- how to create a pro workflow to encapsulate all of this? Should the whole directory (including the docker-compose files) live on github? Thus each time changes are made I push the code to my remote, SSH to the server, pull the latest files and rebuild the container. This rebuilding of course means pulling the required images from dockerhub each time.
Should the whole directory (including the docker-compose files) live on github?
It is best practice to keep all source code including dockerfiles, configuration ... versioned. Thus you should put all the source code, dockerfile, and dockercompose in a git reporitory. This is very common for projects on github that have a docker image.
Thus each time changes are made I push the code to my remote, SSH to the server, pull the latest files and rebuild the container
Ideally this process should be encapsulated in a CI workflow using a tool like Jenkins. You basically push the code to the git repository,
which triggers a jenkins job that compiles the code, builds the image and pushes the image to a docker registry.
This rebuilding of course means pulling the required images from dockerhub each time.
Docker is smart enough to cache the base images that have been previously pulled. Thus it will only pull the base images once on the first build.

How do you put your source code into Kubernetes?

I am new to Kubernetes and so I'm wondering what are the best practices when it comes to putting your app's source code into container run in Kubernetes or similar environment?
My app is a PHP so I have PHP(fpm) and Nginx containers(running from Google Container Engine)
At first, I had git volume, but there was no way of changing app versions like this so I switched to emptyDir and having my source code in a zip archive in one of the images that would unzip it into this volume upon start and now I have the source code separate in both images via git with separate git directory so I have /app and /app-git.
This is good because I do not need to share or configure volumes(less resources and configuration), the app's layer is reused in both images so no impact on space and since it is git the "base" is built in so I can simply adjust my dockerfile command at the end and switch to different branch or tag easily.
I wanted to download an archive with the source code directly from repository by providing credentials as arguments during build process but that did not work because my repo, bitbucket, creates archives with last commit id appended to the directory so there was no way o knowing what unpacking the archive would result in, so I got stuck with git itself.
What are your ways of handling the source code?
Ideally, you would use continuous delivery patterns, which means use Travis CI, Bitbucket pipelines or Jenkins to build the image on code change.
that is, every time your code changes, your automated build will get triggered and build a new Docker image, which will contain your source code. Then you can trigger a Deployment rolling update to update the Pods with the new image.
If you have dynamic content, you likely put this a persistent storage, which will be re-mounted on Pod update.
What we've done traditionally with PHP is an overlay on runtime. Basically the container will have a volume mounted to it with deploy keys to your git repo. This will allow you to perform git pull operations.
The more buttoned up approach is to have custom, tagged images of your code extended from fpm or whatever image you're using. That way you would run version 1.3 of YourImage where YourImage would contain code version 1.3 of your application.
Try to leverage continuous integration and continuous deployment. You can use Jenkins as CI/CD server, and create some jobs for building image, pushing image and deploying image.
I recommend putting your source code into docker image, instead of git repo. You can also extract configuration files from docker image. In kubernetes v1.2, it provides new feature 'ConfigMap', so we can put configuration files in ConfigMap. When running a pod, configuration files will be mounted automatically. It's very convenience.

docker run git checkout doesn't fetch changes in the branch?

Why RUN git checkout -b mybranch switch to the branch but the content remain the one fetched from the master branch?
The whole point of Docker is that it only rebuilds the part of an image that has changed. It has no way of knowing that the content in the repo has changed, all it knows is that is already has a cached image "slice" for this step in the Dockerfile. So it uses the image it previously built.
As Mark notes, you can force a regeneration using --no-cache. Another option is to have a source code container that is always built using --no-cache which you add volumes to and then use that code via those volumes in a different container (look at 'volumes from' for docker-compose). Then you always get the changes in the repo, as it is built every time from scratch. You may want to look into 'docker-compose' for this sort of work.
When you run docker build look carefully at the output. When it has a cached version of a step it will say as much. When it has to build it, it will also note that.
Have you tried running your build with the "--no-cache" option?
I'd also recommend reading the following:
https://ryanfb.github.io/etc/2015/07/29/git_strategies_for_docker.html
Finally, do you really need to run git within your build? Personally I use Jenkins that runs my build within a workspace that is separately checked out from git.

How to install Dockerfile from GitLab to allow pull and commit

Is there a way to clone a Dockerfile from GitLab with the docker command?
I want to use the feature that allow pull and commit.
I am not sure if I have understand well but these pull and commit update the Dockerfile from the git repositories ? Or is it only locally in the next images ?
If not, is there a way to get all the change you made from the previous image made by the Dockerfile into another Dockerfile ?
I know you can clone with Git directly, but like for npm, you can also use Git url like git+https:// or git+ssh://
The pull/commit commands affect the related image and operate directly against your configured registry, which is the official Docker Hub Registry unless configured otherwise. Perhaps some confusion may arise from the registry's support for Automated Builds, where the registry is directly bound to a repository and rebuilds the image every time the targeted repository branch changes.
If you wish to reuse someone's Docker image, the best approach is to simply reference it via the FROM instruction in your Dockerfile and effectively fork the image. While it's certainly possible to clone the original source repository and continue editing the Dockerfile contained therein, you usually do not want to go down that path.
So if there exists such a foo/bar image you want to continue building upon, the best, most direct approach to do so is to create your own Dockerfile, inherit the image by setting it as a base for your succeeding instructions via FROM foo/bar and possibly pushing your baz/bar image back into the registry if you want it to be publicly available for others to re-base upon.

Resources