Pulling docker images - docker

Is there a way where I can manually download a docker image?
I have pretty slow Internet connection and for me is better to get a link of the image and download it elsewhere with better Internet speed,
How can I get the direct URL of the image managed by docker pull?

It's possible to obtain that, but let me suggest two other ways!
If you can connect to a remote server with a fast connection, and that server can run Docker, you could docker pull on that server, then you can docker save to export an image (and all its layers and metadata) as tarball, and transfer that tarball any way you like.
If you want to transfer multiple images sharing a common base, the previous method won't be great, because you will end up transferring multiple tarballs sharing a lot of data. So another possibility is to run a private registry e.g. on a "movable" computer (laptop), connect it to the fast network, pull images, push images to the private registry; then move the laptop to the "slow" network, and pull images from it.
If none of those solutions is acceptable for you, don't hesitate to give more details, we'll be happy to help!

You could pull down the individual layers with this:
https://github.com/samalba/docker-registry-debug
Use the curlme option.
Reassembling the layers into an image is left as an exercise for the reader.

Related

Universal way to check if container image changed without pulling that works for all container registries

I'm writing a tool to sync container image from any container registry. In order to sync images, I need a way to check if local image:tag is different from remote image:tag, possibly through comparing image sha ID (as image sha digest is registry-based). Due to the nature of my tool, pulling image first then compare using docker inspect will not be suitable.
I was able to find some post like this or this. They either tell me to use docker v2 API to fetch remote metadata (which contains image ID) and then compare with local image ID or use container-diff (which seems like it was made for a more complicated problem, comparing packages in package management systems inside images). The docker v2 API method is not universal because each registry (docker.io, grc.io, ecr) requires different headers, authentications, etc. Therefore, container-diff seems to be the most suitable choice for me, but I have yet to figure out a way to simply output true/false if local and remote image are different. Also, it seems like this tool does pull images before diffing them.
Is there anyway to do this universally for all registries? I see that there are tools that already implemented this feature like fluxcd for kubernetes which can sync remote image to local pod image but is yet to know their technical details.
On a high level your approach is correct to compare the SHA values, however you need to dive deeper into the container spec, as there is more to it. (layer and blobs)
There are already tools out there that can copy images from one registry to another. The tools by default don't copy the data if the image already exist in the target. Skopeo is a good starting point.
If you plan to copy images from different registries, you need to cope with each registry individually. I would also recommend you to take a look at Harbor. The Harbor Container Registry has the capability to copy images from and to various registries built in. You can use Harbor as your solution or starting point for your endeavor.

How to get large amounts of data into a docker image?

I want to upload a significant amount of images for processing to use in a Docker instance.
As I have observed this is normally done by a download script (where the images are downloaded to the instance).
I have several terabytes of images so I do not want to download them each time. Is there a way to get my images to a specific location in the Docker instance?
What is the standard way of doing this?

Can you push a Docker image to a private repo and only include differences from a public image?

Suppose I have a private Docker repository at myrepo.myhost.com.
I now build an image off of a very large public Docker Registry image. Assume it's called bandwidthguy/five-gigabyte-image:latest.
I have a Dockerfile that does one simple thing, for example:
FROM bandwidthguy/five-gigabyte-image
COPY some-custom-file /etc/bigstuff
I build the image:
docker build -t myversionof-five-gigabyte-image .
and tag it.
docker tag myversionof-five-gigabyte-image:latest myrepo.myhost.com/myversions/five-gigabyte-image:latest
Now I push to my repo.
docker push myrepo.myhost.com/myversions/five-gigabyte-image
I noticed that when doing this, the entire large source image gets pushed to my repository.
What I'm wondering is if there is any way to somehow have Docker only push a difference image, and then pull the other layers from their respective sources when the image is pulled. Pushing the entire image to my private repo can have problems:
If the private repo is hosted on my home ISP, my limited upstream can cause major lag when pulling the image while out and about.
If the private repo is on a hosted service, it might have a disk quota and I am using 5GB of that quota needlessly.
It takes a long time to push the image, especially if I have slow upload speed at the time.
It may just be the case that you can't put the parts on different servers, but I figured it's worth an ask to see if it can be done. It would make sense that you could store all the layers on your own host for the purposes of running an air-gapped server, but it seems a bit of an oversight that you can't pull the source images from the Registry.
This question showcased my early misunderstanding of Docker. There is no current mechanism for storing different layers of an image on different repositories. While there's no theoretical reason this couldn't be implemented, I'm guessing it's just not worth the extra effort.
So, the answer to my question is no, you can't store only image differences in a private repo - you'll be storing all layers, including those that were pulled from the public repo, in your private repo. However, since layers are represented by their hashes, clients that have already pulled the image from the public repo won't need to re-download those layers again from the private repo. This leads to the possibility that perhaps the hashes of the very large layers could be kicked out of the private repo manually, and then users could be required to first pull the source image from public manually. (Pulling fresh from the private repo only would error out.) I haven't looked into this, but it might be a possible hacky solution.
Luckily, there aren't too many Docker images that actually need multiple gigabytes of space. Even so, layers are stored compressed and deduplicated in the registry.

How many docker images can I store in Docker Private Repository

I have created one private repository in docker hub.
My doubts are
How many separate images can I store in my single private repository?
Is there any image size restriction?
When do I need more than one private repository?
Best practice is to use one repository per application. You use different tags to differentiate between versions and flavors of your app. Theoretically you can mix totally different Docker images in one repo. But maybe you can just choose a different Docker repository provider instead that offers more private repositories for free like Codefresh, GitLab and some others.
Each repository can have one docker image only however the image can have many tags so you can have 100 tag. each tag can represent an image if you want to, which gives you a total of 100 in this case same image name but different tags e.g. myapplication:backend, myapplication:frontend, myapplication:xservice and so on. Quoted from the documentation:
A single Docker Hub repository can hold many Docker images (stored as tags).
Image size restriction is not announced as far as i know but you should keep your image small as you can as the larger it gets the more issues you may face in push and pull so don't make your image 10 GB for example unless there is a must.
You may need more than one private repository in case you need to keep each image in a repository.

Does a docker feature exist similar to save that only saves the new layer as a .tar?

I have a question about whether or not a feature exists in docker: I have a local network with multiple computers (all with docker) and a local registry. I have one base image (2.5GB) that is on every machine. Devs on the network run a container, make modifications, and can push to registry and pull and only that new layer is downloaded. They can save and load, and the load is almost instant because of base layer existing. Is there any way, (docker diff?) that I can use an alternative to docker save that only saves that single layer to send for a smaller file transfer. My base image is 2.5GB so each layer is trivial, but there will be real scenarios when a registry won't be available and we will have to use save/ load. Everything works, but if we could cut 2.5GB out of each transfer, that would be amazing.
This is a post from 2014 asking for it but I don't think anything official became of it.
Any ideas? Thanks.

Resources