Our production/staging docker image build pulls the latest code from our repository and then it installs all required dependencies, this process takes a while.
This is why for development we are using a volume to map the application code to a local folder.
Is there a way to commit the local changes in the mapped volume to the actual image data storage? So that we don't have to rebuild it all the time?
Being able to put your code and label that docker image is what people use it for. You shouldn't be deploying code at run-time in production. Instead you should be building images and tagging them based on version.
You want to know what is running on your production and you also want the ability to rollback to previous version.
Now coming back to your rebuilding the code part. There are multiple ways to improve the build times
Create base images
You can create base images and just put code on top of that base image. This base image will have your necessary softwares (Node etc). And in your Dockerfile you will just copy the code.
Instead of Git Checkout Use Tag URL
You can download a specific branch/tag as a zip file instead of the whole repository and unzip it. I have seen sometimes Git repos are 100MB and the code is only 4-5 MB. This can save you time
Use multistage build
Multistage builds are not going to save you build time but they will save you image size. This is only needed if you deploy lots of containers
Related
Can someone explain to me why the normal Docker process is to build an image from a Dockerfile and then upload it to a repository, instead of just moving the Dockerfile to and from the repository?
Let's say we have a development laptop and a test server with Docker.
If we build the image, that means uploading and downloading all of the packages inside the Dockerfile. Sometimes this can be very large (e.g. PyTorch > 500MB).
Instead of transporting the large imagefile to and from the server, doesn't it make sense to, perhaps compile the image locally to verify it works, but mostly transport the small Dockerfile and build the image on the server?
This started out as a comment, but it got too long. It is likely to not be a comprehensive answer, but may contain useful information regardless.
Often the Dockerfile will form part of a larger build process, with output files from previous stages being copied into the final image. If you want to host the Dockerfile instead of the final image, you’d also have to host either the (usually temporary) processed files or the entire source repo & build script.
The latter is often done for open source projects, but for convenience pre-built Docker images are also frequently available.
One tidy solution to this problem is to write the entire build process in the Dockerfile using multi-stage builds (introduced in Docker CE 17.05 & EE 17.06). But even with the complete build process described in a platform-independent manner in a single Dockerfile, the complete source repository must still be provided.
TL,DR: Think of a Docker image as a regular binary. It’s convenient to download and install without messing around with source files. You could download the source for a C application and build it using the provided Makefile, but why would you if a binary was made available for your system?
Instead of transporting the large imagefile to and from the server,
doesn't it make sense to, perhaps compile the image locally to verify
it works, but mostly transport the small Dockerfile and build the
image on the server?
Absolutely! You can, for example, set up an automated build on Docker Hub which will do just that every time you check in an updated version of your Dockerfile to your GitHub repo.
Or you can set up your own build server / CI pipeline accordingly.
IMHO, one of the reason for building the images concept and putting into repository is sharing with people too. For example we call Python's out of the box image for performing all python related stuff for a python program to run in Dockerfile. Similarly we could create a custom code(let's take example I did for apache installation with some custom steps(like ports changes and additionally doing some steps) I created its image and then finally put it to my company's repository.
I came to know after few days that may other teams are using it too and now when they are sharing it they need NOT to make any changes simply use my image and they should be done.
Context: Reading through this blog post.
Pushing images to a registry seems to be the "right thing to do" ... but I don't understand why.
What purpose does this serve? Is it because the server I ssh into needs to have a local copy of the image? And to do that, one approach is to pull an image from a registry?
What purpose does this serve? Is it because the server I ssh into needs to have a local copy of the image? And to do that, one approach is to pull an image from a registry?
From the CI/CD perspective, a docker registry is the equivalent of an artifact repository for images. You want a central source of these images to download from as you go from one docker host to another since your build server is most likely different than your dev and prod servers.
Couldn't I just upload an image from one machine (say a CI/CD server) via ssh? using dockerhub seems needlessly ceremonious to me. Like in this example (I know this api is deprecated but it illustrates my point).
It is possible to save/load images directly to a docker host, but there a few major downsides. First, you lose any benefit from docker's layered filesystem. When building an app in CI/CD, most of the time only the last few layers should need to be rebuilt with your application changes. There should be the same previous base image and various common layers to build your app that remain identical. With a registry, these common layers are seen, only the difference is pushed and pulled, making your deploys faster and saving you disk space. With a save/load command, all layers are sent every time since you do not know the state of the remote server when you run the save.
Second, this doesn't scale as you add hosts to run images. Every host would need the image copied on the chance you want to run it on that host, e.g. to handle failover or load balancing. It also won't work if you move to swarm mode or kubernetes since you could easily add new nodes to the cluster that won't have your image. Swarm mode defaults to looking up the sha256 of the image on the registry to guarantee the same image is always used even if the tag is modified on the registry after the initial deploy.
Keep in mind you can run your own registry server (there's a docker image and the api is open). Many artifact repositories (e.g. artifactory and nexus) include support for a docker registry. And many cloud providers include a registry with their container offerings. So you do not need to push to a remote docker hub to deploy locally.
One last point is that a registry server is useful to developers who can now pull the same image used in dev and prod to test against other microservices they are writing locally without the need to build everything locally or ssh to a CI/CD server or even prod to save and scp images back to their laptops.
Usually, you use a CI, CD pipeline when you want to streamline your build / test/ deploy process, and usually this happens if you have a production infrastructure to maintain that is actually critical to your business.
There is no need for a CI/CD pipeline if you're just playing around / prototyping IMO, in which case you can build you docker images on the machine directly, or ssh an image over. That's perfectly reasonable.
Look at the 'registry' as a repository of your binary image (i.e. a fixed version of your code that ideally is versioned and you know works)
Then deploying is as simple as telling your servers to pull the image and run it, from anywhere.
On a flexible architecture, you might have nodes coming up or going down at any time, and they need to be able to pull the latest code from somewhere to get back up and running automatically, at any time, without intervention.
Registry is single source of truth in this case. It means, that you can have multiple nodes (servers), cluster(s) and have the single place from where you can get your images. Also if of your nodes drop-down - you can fast start your image in the new one. Also you can automate your image's updating using registry's webhook, for example when you add new version of image registry gonna send webhook to any service that can upgrade your containers to the newest version.
Consider docker image as a new way of distribution of your software to your servers and docker-registry as a centralized storage of shared images(the like npm.org for js, maven.org for java).
For example,
if you develop java application, years before docker you may use .jar files to do it. The way docker image is better is that also include all OS level dependencies like JDK/JRE and system configurations. So this helps you to avoid "it works on my machine" effect.
To distribute docker image you can also use just docker file and build it all the time on every machine. Docker-Repository allows you to have centralized storage of pre-build images.
Pushing to docker-repository in your CI/CD allows to build your distributive once and further work with the same distributive both on integration and prod environments.
Using just Dockerfile will not guarantee you the same state on every build in every moment of time because you may install external dependencies in your Dockerfile script which may be updated or even removed between two sequential builds.
I learnt the basics of github and docker and both work well in my environment. On my server, I have project directories, each with a docker-compose.yml to run the necessary containers. These project directories also have the actual source files for that particular app which are mapped to virtual locations inside the containers upon startup.
My question is now- how to create a pro workflow to encapsulate all of this? Should the whole directory (including the docker-compose files) live on github? Thus each time changes are made I push the code to my remote, SSH to the server, pull the latest files and rebuild the container. This rebuilding of course means pulling the required images from dockerhub each time.
Should the whole directory (including the docker-compose files) live on github?
It is best practice to keep all source code including dockerfiles, configuration ... versioned. Thus you should put all the source code, dockerfile, and dockercompose in a git reporitory. This is very common for projects on github that have a docker image.
Thus each time changes are made I push the code to my remote, SSH to the server, pull the latest files and rebuild the container
Ideally this process should be encapsulated in a CI workflow using a tool like Jenkins. You basically push the code to the git repository,
which triggers a jenkins job that compiles the code, builds the image and pushes the image to a docker registry.
This rebuilding of course means pulling the required images from dockerhub each time.
Docker is smart enough to cache the base images that have been previously pulled. Thus it will only pull the base images once on the first build.
I am new to Kubernetes and so I'm wondering what are the best practices when it comes to putting your app's source code into container run in Kubernetes or similar environment?
My app is a PHP so I have PHP(fpm) and Nginx containers(running from Google Container Engine)
At first, I had git volume, but there was no way of changing app versions like this so I switched to emptyDir and having my source code in a zip archive in one of the images that would unzip it into this volume upon start and now I have the source code separate in both images via git with separate git directory so I have /app and /app-git.
This is good because I do not need to share or configure volumes(less resources and configuration), the app's layer is reused in both images so no impact on space and since it is git the "base" is built in so I can simply adjust my dockerfile command at the end and switch to different branch or tag easily.
I wanted to download an archive with the source code directly from repository by providing credentials as arguments during build process but that did not work because my repo, bitbucket, creates archives with last commit id appended to the directory so there was no way o knowing what unpacking the archive would result in, so I got stuck with git itself.
What are your ways of handling the source code?
Ideally, you would use continuous delivery patterns, which means use Travis CI, Bitbucket pipelines or Jenkins to build the image on code change.
that is, every time your code changes, your automated build will get triggered and build a new Docker image, which will contain your source code. Then you can trigger a Deployment rolling update to update the Pods with the new image.
If you have dynamic content, you likely put this a persistent storage, which will be re-mounted on Pod update.
What we've done traditionally with PHP is an overlay on runtime. Basically the container will have a volume mounted to it with deploy keys to your git repo. This will allow you to perform git pull operations.
The more buttoned up approach is to have custom, tagged images of your code extended from fpm or whatever image you're using. That way you would run version 1.3 of YourImage where YourImage would contain code version 1.3 of your application.
Try to leverage continuous integration and continuous deployment. You can use Jenkins as CI/CD server, and create some jobs for building image, pushing image and deploying image.
I recommend putting your source code into docker image, instead of git repo. You can also extract configuration files from docker image. In kubernetes v1.2, it provides new feature 'ConfigMap', so we can put configuration files in ConfigMap. When running a pod, configuration files will be mounted automatically. It's very convenience.
I have on Docker an automatic build for an image based on ubuntu with some custom configurations to re-use then as base image on other specific Dockerfiles for particular projects. This works okay.
I made a change to it, committed to github which then started and did the automatic build on Docker.
From one of these other projects, I'm calling at the beginning of the Dockerfile FROM myuser/myimage but its not getting the last image with the changes, but rather it keeps caching the old one.
Shouldn't this be automatically?
You need to docker pull the latest version. Docker looks for the image from FROM locally. It doesn't notice if that tag has been updated in the registry where it came from. I have a script that runs docker pull before building images.