How to share images between multiple docker hosts? - docker

I have two hosts and docker is installed in each.
As we know, each docker stores the images in local /var/lib/docker directory.
So If I want to use some image, such as ubuntu, I must execute the docker pull to download from internet in each host.
I think it's slow.
Can I store the images in a shared disk array? Then have some host pull the image once, allowing every host, with access to the shared disk, to use the image directly.
Is it possible or good practice? Why docker is not designed like this?
It may need to hack the docker's source code to implement this.

Have you looked at this article
Dockerizing an Apt-Cacher-ng Service
http://docs.docker.com/examples/apt-cacher-ng/
extract
This container makes the second download of any package almost instant.
At least one node will be very fast, and I think it should possible to tell the second node to use the cache of the first node.
Edit : you can run your own registry, with a command similar to
sudo docker run -p 5000:5000 registry
see
https://github.com/docker/docker-registry

What you are trying to do is not supposed to work as explained by cpuguy83 at this github/docker issue.
Indeed:
The underlying storage driver would need to synchronize access.
Sharing /var/lib/docker is far not enough and won't work!
According to the doc.docker.com/registry:
You should use the Registry if you want to:
tightly control where your images are being stored
fully own your images distribution pipeline
integrate image storage and distribution tightly into your in-house development workflow
So I guess that this is the (/your) best option to work this out (but I guess that you got that info -- I just add it here to update the details).
Good luck!

Update in 2016.1.25 docker mirror feature is deprecated
Therefore this answer is not applicable now, leave for reference
Old info
What you need is the mirror mode for docker registry, see https://docs.docker.com/v1.6/articles/registry_mirror/
It is supported directly from docker-registry
Surely you can use public mirror service locally.

Related

Does Docker collect data processed by the container (Docker Desktop)

I am going to use Azures Computer Vision (Read 3.2) and run it on-premise as a container, and therefore using Docker Desktop.
However, I have not been able to figure out if Docker collects any data that is being processed by containers running on Docker.
In https://www.docker.com/legal/security-and-privacy-guidelines under the header 'Data Privacy and Security' Docker writes:
"In general, Docker does not collect or store personal data and the use of Docker products does not result in personal data being collected or stored."
Now, to me, this sounds ambiguous. We are using Azure's on-premise container in order to stay compliant and that part works since Azure does not collect any data from the container. However, if Docker itself collects data then that is a show stopper. FYI, I am a beginner to Docker and I might be completely off.
EDIT: So my question is, does Docker collect any of the input or the output going in and out from the container?
Thankful for any answers or wisdom you might be able to share.
Regards
As you saw, the Docker privacy policy "applies to Docker websites, products and services offered by Docker". I do not think running a Docker image as a container would be considered under those terms, and so I do not think Docker collect any information produced by the container as 'output' - i.e. standard output/error streams or the like.
Docker Desktop may collect statistics, metrics and information on the images/containers run directly under Docker desktop where they have access to that information, but also many docker-built images will be run under non-docker environments (such as Kubernetes) where they could not have access to the such information.
As an aside, I think all the image themselves be built from scratch and you (or other interested parties) have access to the layers within the image so you can see what has been added and what the effect of the layer is. Thus you could also verify that Docker (or other parties) are not harvesting data from a running container.

Trying to understand how docker works in production for a scalable symfony application

I'm trying to understand how docker works in production for a scalable symfony application.
Let's say we start with a basic LAMP stack:
Apache container
php container
mysql container
According to my research once our containers are created, we push them to the registry (docker registry).
For production, the orchestrator will take care of creating the PODS (in the case of kubernetes) and will call the images that we have uploaded to the registry.
Did I understand right ? Do we push 3 separate images on the registry?
Let me clarify a couple of things:
push them to the registry (docker registry)
You can push them to the docker registry, however, depending on your company policy this may not be allowed since dockerhub (docker's registry) is hosted somewhere in the internet and not at your company.
For that reason many enterprises deploy their own registry, something like JFrog's artifactory for example.
For production, the orchestrator will take care of creating the PODS (in the case of kubernetes)
Well, you will have to tell the orchestrator what he needs to create, in case of Kubernetes you will need to write a YAML file that describes what you want, then it will take care of the creation.
And having an orchestrator is not only suitable for production. Ideally you would want to have a staging environment as similar to your production environment as possible.
will call the images that we have uploaded to the registry
That may need to be configured. As mentioned above, dockerhub is not the only registry out there but in the end you need to make sure that it is somehow possible to connect to the registry in order to pull the images you pushed.
Do we push 3 separate images on the registry?
You could do that, however, I would not advice you to do so.
As good as containers may have become, they also have downsides.
One of their biggest downsides still are stateful applications (read up on this article to understand the difference between stateful vs stateless).
Although it is possible to have stateful applications (such as MySQL) inside a container and orchestrate it by using something like Kubernetes, it is not advised to do so.
Containers should be ephemeral, something that does not work well with databases for example.
For that reason I would not recommend having your database within a container but instead use a virtual or even a physical machine for it.
Regarding PHP and Apache:
You may have 2 separate images for these two but it honestly is not worth the effort since there are images that already have both of them combined.
The official PHP image has versions that include Apache, better use that and save yourself some maintenance effort.
Lastly I want to say that you cannot simply take everything from a virtual/physical server, put it into a container and expect it to work just as it used to.

What's the best way to cache downloads done by docker while building containers?

While testing new Docker builds (modifying Dockerfile) it can take quite some time for the image to rebuild due to huge download size (either direct by wget, or indirect using apt, pip, etc)
One way around this that I personally use often is to just split commands I plan to modify into their own RUN variable. This avoids re-downloading some parts because previous layers are cached. This, however, doesn't cut it if the command that requires "tuning" is early on in the Dockerfile.
Another solution is to use an image that already contains most of the required packages so that it would just be pulled once and cached, but this can come with unnecessary "baggage".
So is there a straight forward way to cache all downloads done by Docker while building/running? I'm thinking of having Memcached on the host machine but it seems kind of an overkill. Any suggestions?
I'm also aware that I can test in an interactive shell first but sometimes you need to test the Dockerfile and make sure it's production-ready (including arguments and defaults) especially if the only way you will ever see what's going on after that point is ELK or cluster crash logs
This here:
https://superuser.com/questions/303621/cache-for-apt-packages-in-local-network
Is the same question but regarding a local network instead of the same machine. However, the answer can be used in this scenario, it's actually a simpler scenario than a network with multiple machines.
If you install Squid locally you can use it to cache all your downloads including your host-side downloads.
But more specifically, there's also a Squid Docker image!
Headsup: If you use a squid service in a docker-compose file, don't forget to use the squid service name instead of docker's subnet gateway 172.17.0.1:3128 becomes squid:3128
the way i did this was
used the new --mount=type=cache,target=/home_folder/.cache/curl
wrote a script which looks into the cache before calling curl (wrapper over curl with cache)
called the script in the Dockerfile during build
it is more a hack, works

Docker PGAdmin Container persient config

I am new to docker. So what I want to have is a pgadmin container which I can pull and have always my configs and connections up to date. I was not really sure how to do that, but can I have a Volume which is alsways shared for example on my Windows PC at home and on work? I couldt find an good tutorial for that and dont know if that makes sense. Lets say my computer would be stolen I just want to install docker and my images and fun.
What about a shared directory using DropBox ? as far as i know that the local dropbox directories always synced with the actual dropbox account which means you can have the config up to date for all of your devices.
Alternatively you can save the configuration - as long as it does not contain sensitive data - on a git repository which you can clone it then start using it. Both cases can be used as volumes in docker.
That's not something you can do with Docker itself. You can only push images to DockerHub, which do not contain information that you added to a container during an execution.
What you could do is using a backup routine to S3 for example, and sync your 'config and connections' between your docker container running on your home PC and work one.

Editing Docker content

I am looking at Docker to share and contain applications, after reading several articles on the subject I can't figure out what the steps would be to use a Docker container for actual development. Is that even acceptable?
My thought process goes like this
Create DockerFile
Share DockerFile
User A and B download DockerFile
User A and B install images
User A and B are able to make changes to their local containers
User A and B submit changes
The way I have been reading different articles Docker is only to share applications but not for continuous development the way I am thinking, the closest I can think of on what I am explaining above is to make changes outside the containers and commit to a repo outside the containers, then the containers will update the local repo and re-run the application internally but you would never develop on the container itself.
Using docker for development process is not only possible, but handy and convenient in my opinion.
What you might have missed during your study of the docker ecosystem is the concept of volumes.
With volumes you can bind mount a directory of your host (the developer computer) into the container.
You may want to use volumes to share application data folder, making it possible for the devs to work on their local copies normally, but have their application served by a docker container.
A link to get started: https://docs.docker.com/engine/admin/volumes/volumes/

Resources