Since DockerHub will implement rate limiting on Nov 1, we will probably tell our engineers to create (free) DockerHub accounts in order to each individually enjoy 200 pulls per 6 hours.
We are however worried that an engineer might somehow accidentally push an image to DockerHub from their laptop.
Is there some way we can block our developers from accidentally pushing to DockerHub?
It's possible that Docker will provide the ability to create more limited tokens before the rate limits start, either pull only, or perhaps limited to specific repos. This functionality is needed for CI users that are running builds in the cloud and don't want to give out full access.
Otherwise, I'd recommend not tagging your local images with something that you can push to Docker Hub. The image reference includes the repository name, and can be prefixed with a registry. If they tag with a local registry name, or specify a repository they don't have access to push to, then they will not push anything to Hub.
Even better is if you create a local registry which removes the need to pull from Hub. Mirror the base images your developers need, and have them perform their builds against your internal registry. There are lots of implementations of the docker registry including Docker's image or for an enterprise environment, that's been extended in the Harbor Project which is part of the CNCF. Both are free and open source.
As an extreme measure, you can block the ability to run any POST /v2/*/blobs/upload (see the registry API spec) on an http proxy that all developers use to access the internet, but you may find this breaks many legitimate use cases.
Related
I'm writing a tool to sync container image from any container registry. In order to sync images, I need a way to check if local image:tag is different from remote image:tag, possibly through comparing image sha ID (as image sha digest is registry-based). Due to the nature of my tool, pulling image first then compare using docker inspect will not be suitable.
I was able to find some post like this or this. They either tell me to use docker v2 API to fetch remote metadata (which contains image ID) and then compare with local image ID or use container-diff (which seems like it was made for a more complicated problem, comparing packages in package management systems inside images). The docker v2 API method is not universal because each registry (docker.io, grc.io, ecr) requires different headers, authentications, etc. Therefore, container-diff seems to be the most suitable choice for me, but I have yet to figure out a way to simply output true/false if local and remote image are different. Also, it seems like this tool does pull images before diffing them.
Is there anyway to do this universally for all registries? I see that there are tools that already implemented this feature like fluxcd for kubernetes which can sync remote image to local pod image but is yet to know their technical details.
On a high level your approach is correct to compare the SHA values, however you need to dive deeper into the container spec, as there is more to it. (layer and blobs)
There are already tools out there that can copy images from one registry to another. The tools by default don't copy the data if the image already exist in the target. Skopeo is a good starting point.
If you plan to copy images from different registries, you need to cope with each registry individually. I would also recommend you to take a look at Harbor. The Harbor Container Registry has the capability to copy images from and to various registries built in. You can use Harbor as your solution or starting point for your endeavor.
Let's say I have the following layers of docker:
Os
jre
application server
application.
The customer is running an image containing all the above.
What is the best practice in case there is an urgent security or any other urgent update on one of the layers.
For example:
there is an urgent security update required on the os layer, and the customer can't wait until we finish the entire CI/CD and certify the change by another docker image.
My assumption was to provide the customer an option to update by his own, and update on his local docker repo the image with updated os, and where all the rest remains the same.
It seems to be very demanding request , is there any alternative or any best practices?
General advises would be to make sure the containers you provide are stateless (you can use volumes to store data). This makes the process of updating the images a question of stopping the old containers and starting the updated ones.
Your clients could use docker exec on the target containers to apply a quick fix until you provide them with the new image or you can host a link with the new image that they can pull and update themselves.
Outside of that, I don't think there is a standard for this process.
Suppose I have a private Docker repository at myrepo.myhost.com.
I now build an image off of a very large public Docker Registry image. Assume it's called bandwidthguy/five-gigabyte-image:latest.
I have a Dockerfile that does one simple thing, for example:
FROM bandwidthguy/five-gigabyte-image
COPY some-custom-file /etc/bigstuff
I build the image:
docker build -t myversionof-five-gigabyte-image .
and tag it.
docker tag myversionof-five-gigabyte-image:latest myrepo.myhost.com/myversions/five-gigabyte-image:latest
Now I push to my repo.
docker push myrepo.myhost.com/myversions/five-gigabyte-image
I noticed that when doing this, the entire large source image gets pushed to my repository.
What I'm wondering is if there is any way to somehow have Docker only push a difference image, and then pull the other layers from their respective sources when the image is pulled. Pushing the entire image to my private repo can have problems:
If the private repo is hosted on my home ISP, my limited upstream can cause major lag when pulling the image while out and about.
If the private repo is on a hosted service, it might have a disk quota and I am using 5GB of that quota needlessly.
It takes a long time to push the image, especially if I have slow upload speed at the time.
It may just be the case that you can't put the parts on different servers, but I figured it's worth an ask to see if it can be done. It would make sense that you could store all the layers on your own host for the purposes of running an air-gapped server, but it seems a bit of an oversight that you can't pull the source images from the Registry.
This question showcased my early misunderstanding of Docker. There is no current mechanism for storing different layers of an image on different repositories. While there's no theoretical reason this couldn't be implemented, I'm guessing it's just not worth the extra effort.
So, the answer to my question is no, you can't store only image differences in a private repo - you'll be storing all layers, including those that were pulled from the public repo, in your private repo. However, since layers are represented by their hashes, clients that have already pulled the image from the public repo won't need to re-download those layers again from the private repo. This leads to the possibility that perhaps the hashes of the very large layers could be kicked out of the private repo manually, and then users could be required to first pull the source image from public manually. (Pulling fresh from the private repo only would error out.) I haven't looked into this, but it might be a possible hacky solution.
Luckily, there aren't too many Docker images that actually need multiple gigabytes of space. Even so, layers are stored compressed and deduplicated in the registry.
I have created one private repository in docker hub.
My doubts are
How many separate images can I store in my single private repository?
Is there any image size restriction?
When do I need more than one private repository?
Best practice is to use one repository per application. You use different tags to differentiate between versions and flavors of your app. Theoretically you can mix totally different Docker images in one repo. But maybe you can just choose a different Docker repository provider instead that offers more private repositories for free like Codefresh, GitLab and some others.
Each repository can have one docker image only however the image can have many tags so you can have 100 tag. each tag can represent an image if you want to, which gives you a total of 100 in this case same image name but different tags e.g. myapplication:backend, myapplication:frontend, myapplication:xservice and so on. Quoted from the documentation:
A single Docker Hub repository can hold many Docker images (stored as tags).
Image size restriction is not announced as far as i know but you should keep your image small as you can as the larger it gets the more issues you may face in push and pull so don't make your image 10 GB for example unless there is a must.
You may need more than one private repository in case you need to keep each image in a repository.
I have a question related with the best practices for deploying applications to the production based on the docker swarm.
In order to simplify discussion related with this question/issue lets consider following scenario:
Our swarm contains:
6 servers (different hosts)
on each of these servers, we will have one service
each service will have only one task/replica docker running
Memcached1 and Memcached2 uses public images from docker hub
"Recycle data 1" and "Recycle data 2" uses custom image from private repository
"Client 1" and "Client 2" uses custom image from private repository
So at the end, for our example application, we have 6 dockers running across 6 different servers. 2 dockers are memcached, and 4 of them are clients which are communicating with memcached.
"Client 1" and "Client 2" are going to insert data in the memcached based on the some kind of rules. "Recycle data 1" and "Recycle data 2" are going to update or delete data from memcached based on some kind of rules. Simple as that.
Our applications which are communicating with memcached are custom ones, and they are written by us. The code for these application reside on github (or any other repository). What is the best way to deploy this application to the production:
Build images which will contain copied code within the image which you can use to deploy things to the swarm
Build image which will use volume where code reside outside of the image.
Having in mind that I am deploying swarm to the production for the first time, I can see a lot of issues with way number 1. Having a code incorporate to the images seems non logical to me, having in mind that in 99% of the time, the updates which are going to happen are going to be code based. This will require building image every time when you want to update the code which runs on specific docker (no matter how small that change is).
Way number 2. seems much more logical to me. But at this specific moment I am not sure is this possible? So there are a number of questions here:
What is the best approach in case where we are going to host multiple dockers which will run the same code in the background?
Is it possible on docker swarm, to have one central host,server (manager, anywhere) where we can clone our repositories and share those repositores as volumes across the docker swarm? (in our example, all 4 customer services will mount volume where we have our code hosted)
If this is possible, what is the docker-compose.yml implementation for it?
After digging more deeper and working with docker and docker swarm mode for last 3 months, these are the answers on questions above:
Answer 1: In general, you should consider your docker image as "compiled" version of your program. Your image should contain either code base, or compiled version of the program (depends which programming language you are using), and that specific image represents your version of the app. Every single time when you want to deploy your next version, you will generate the new image.
This is probably best approach for 99% of the apps which are going to be hosted with the docker (exceptions are development environments and apps where you really want to bash and control things directly from the docker container by itself).
Answer 2: It is possible but it is extremely bad approach. As mentioned in answer one, the best one is to copy the app code directly into the image and "consider" your image (running container) as "app by itself".
I was not able to wrap my head around this concept at the begging, because this concept will not allow you to simply go to the server (or where ever you are hosting your docker) and change the app and restart docker (obviously because container will be at the same beginning again after restart using the same image, same base of code you deployed with that image). Any kind of change SHOULD and NEEDS to be deployed as different image with different version. That is what docker is all about.
Additionally, initial idea for sharing same code base across multiple swarm services is possible, but it totally ruins purpose of the versioning across docker swarm.
Consider having 3 services which are used as redundant services (failover), and you want to use new version on one of them as beta test. This will not be possible with the shared code base.