Is there a limit to the number of parallel Docker push/pulls you can do?
E.g. if you thread Docker pull / push commands such that they are
pulling/pushing different images at the same time what would be the
upper limit to the number of parallel push/pulls
Or alternatively
On one terminal you do docker pull ubuntu on another you do docker
pull httpd etc - what would be the limit Docker would support?
The options are set in the configuration file (Linux-based OS it is located in the path: /etc/docker/daemon.json and C:\ProgramData\docker\config\daemon.json on Windows)
Open /etc/docker/daemon.json (If doesn't exist, create it)
Add the values(for push/pulls) and set parallel operations limit
{
"max-concurrent-uploads": 1,
"max-concurrent-downloads": 1
}
Restart daemon: sudo service docker restart
The docker daemon (dockerd) has two flags:
--max-concurrent-downloads int Set the max concurrent downloads for each pull
(default 3)
--max-concurrent-uploads int Set the max concurrent uploads for each push
(default 5)
The upper limit will likely depend on the number of open files you permit for the process (ulimit -n). There will be some overhead of other docker file handles, and I expect that each push and pull opens multiple handles, one for the remote connection, and another for the local file storage.
To compound the complication of this, each push and pull of an image will open multiple connections, one per layer, up to the concurrent limit. So if you run a dozen concurrent pulls, you may have 50-100 potential layers to pull.
While docker does allow these limits to be increased, there's a practical limit where you'll see diminishing returns if not a negative return to opening more concurrent connections. Assuming the bandwidth to the remote registry is limited, more connections will simply split that bandwidth, and docker itself will wait until the very first layer finishes before it starts unpacking that transmission. Also any aborted docker pull or push will lose any partial transmissions of a layer, so you increase the potential data you'd need to retransmit with more concurrent connections.
The default limits are well suited for a development environment, and if you find the need to adjust them, I'd recommend measuring the performance improvement before trying to find the max number of concurrent sessions.
For anyone using Docker for Windows and WSL2:
You can (and should) set the options on the Settings tab:
Docker for Windows Docker Engine settings
Related
The project in which I am working develops a Java service that uses MarkLogic 9 in the backend.
We are running a Jenkins build server that executes (amongst others) several tests in MarkLogic written in XQuery.
For those tests MarkLogic is running in a docker container on the Jenkins host (which is running Ubuntu Linux).
The Jenkins host has 12 GB of RAM and 8 GB of swap configured.
Recently I have noticed that the MarkLogic instance running in the container uses a huge amount of RAM (up to 10 GB).
As there are often other build jobs running in parallel, the Jenkins starts to swap, sometimes even eating up all swap
so that MarkLogic reports it cannot get more memory.
Obviously, this situation leads to failed builds quite often.
To analyse this further I made some tests on my PC running Docker for Windows and found out that the MarkLogic tests
can be run successfully with 5-6 GB RAM. The MarkLogic logs show that it sees all the host memory and wants to use everything.
But as we have other build processes running on that host this behaviour is not desirable.
My question: is there any possibility to tell the MarkLogic to not use so much memory?
We are preparing the docker image during the build, so we could modify some configuration, but it has to be scripted somehow.
The issue of the container not detecting memory limit correctly has been identified, and should be addressed in a forthcoming release.
In the meantime, you might be able to mitigate the issue by:
changing the group cache sizing from automatic to manual and setting cache sizes appropriate for the allocated resources. There area variety of ways to set these configs, whether deploying and settings configs from ml-gradle project, making your own Manage API REST calls, or programmatically:
admin:group-set-cache-sizing
admin:group-set-compressed-tree-cache-partitions
admin:group-set-compressed-tree-cache-size
admin:group-set-expanded-tree-cache-partitions
admin:group-set-expanded-tree-cache-size
admin:group-set-list-cache-partitions
admin:group-set-list-cache-size
reducing the in-memory-limit
in memory limit specifies the maximum number of fragments in an in-memory stand. An in-memory stand contains the latest version of any new or changed fragments. Periodically, in-memory stands are written to disk as a new stand in the forest. Also, if a stand accumulates a number of fragments beyond this limit, it is automatically saved to disk by a background thread.
Problem:
I have a couple of Docker images on a hosting server. I start multiple containers from a bunch of jenkins job. Due to limited capabilities of the host, I'd like to limit the maximum number of container per image. Setting the limit for the number of jenkins executors doesn't really solve the problem since some jobs can spin up 16 containers. It is possible though to split them into several threads of parallel executions, but this is still not ideal. I'd like to have one solution for all jobs
Question #1 (main):
Is it possible to set the maximum limit of containers Docker runs on a single machine to 10, and queue the rest of them?
Question #2:
If there is no such functionality or there are better options in this case, what is the workaroud for this?
One way is to use kubetnetes as mentioned above. But this is very time consuming route
A simpler way is to set a master job that spins up your containers. Your pipeline will be calling this job, eg 16 times spinning up 16 containers. Then set a maximum of executors on your jenkins host for example to 6. When you kick off your job it will be 1 executor plus 16 in queue, total 17. It will start first 6, and then wait until then will be stopped. Once any of running containers is done, it will allow the next container to run
My workaround is to clean unused containers and images once in a while with a job.
Here it is:
https://gist.github.com/fredericrous/26e51ed936d710364fe1d1ab6572766e
In a deployment bash script, I have two hosts:
localhost, which is a machine, that typically builds docker images.
$REMOTE_HOST, which is believed to be some production web server.
And I need to transfer locally built docker image to $REMOTE_HOST, in most efficient way (fast, reliable, private, storage-friendly). Up to day, I have following command in my streaming script:
docker save $IMAGE_NAME :latest | ssh -i $KEY_FILE -C $REMOTE_HOST docker load
This has following PROS:
Utilizes "compression-on-the-fly"
Does not stores intermediate files on both source and destination
Does direct transfer (images may be private), that also reduces upload time and stays "green", in another broader terms.
However, the CONS are also on checkerboard: When you are involved in transferring larger images, you dont know operation progress. So you have to wait unknown, but sensible time, that you cant estimate. I heard that progress can be tracked with kinda rsync --progress command
But rsync seems to transfer files, and is not working good with my ol'UNIX-style commands. Of couse you can docker load from some file, but how to avoid it?
How can I utilize piping, to preserve above advantages? (Or is there another special tool do copy build image to remote docker host, which shows progress?)
You could invoke pv as part of your pipeline:
docker save $1:latest | pv [options...] | ssh -i $3 -C $2 docker load
pv works like cat, in that it reads from its standard input and writes to its standard output. Except that, like the documentation says,
pv allows a user to see the progress of data through a pipeline, by giving information such as time elapsed, percentage completed (with progress bar), current throughput rate, total data transferred, and ETA.
pv has a number of options to control what kind of progress information it prints. You should read the documentation and choose the output that you want. In order to display a percentage complete or an ETA, you will probably need to supply an expected size for the data transfer using the -s option.
We are running our application on a DCOS cluster in Azure Container Service. Docker image of our marathon app is approx 7GB. I know this is against best practices but lets keep that debate aside for this question. We pull latest on worker nodes and it takes around 20 minutes, if no running container currently uses this image on a node, it gets deleted from that node by some cleanup routine job.
Is there a way to prevent this from happening?
Amount of time to wait before Docker containers are removed can be set using this flag (this is agent option)
--docker_remove_delay flag
--docker_remove_delay=VALUE The amount of time to wait before removing docker containers (e.g., 3days, 2weeks, etc). (default: 6hrs)
To prevent accidental deletion (or modification) of a resource. You can create a lock that would prevent users from deleting or modifying the resource while the lock is there (even if they have the permissions to delete\modify the resource).
For more details, refer "Lock resources to prevent unexpected changes".
I am trying to find a way how to push 100s of images to the docker hub in a single short. Is there a better and efficient way to do this?
alpine:1.0.0.0
alpine:2.0.0.0
.
..
...
alpine:100.0.0.0
there are 100 images. I am looking for the best way to push all the images to the docker hub.
moby issue 9132 seems to indicate that you can push in parallel:
this was included in the 1.10.3 release, but requires a registry 2.3.x.
You would need to:
thread your docker push
change the --max-concurrent-uploads int of the docker daemon to 100 at least, in order to maximize the number of parallel pushed (by default limited to 5)
make sure your network upload capacity can handle that kind of parallel queries!
Beside that, parallel push is requested since 2014 (see issue 7336)
There is PR 458, but...
This change doesn't not address the fundamental problems that are brought up when requesting this feature.
The idea behind multiple push/pull arguments is that they are parallelized, but this simply performs them in sequence. This design provides no advantage over for i in images; do docker push $i; done.
So you still need to script the threading of docker push.