Multiple GitLab Runner Docker instances at one host?

Multiple GitLab Runner Docker instances at one host? - docker

I need to configure GitLab runner to run multiple shared runners in Docker containers at one server (host).
So, I registered two runners with gitlab-runner register as shared runners with same tag.
But there is an issue now - only one of them is currently using and all other tasks are waiting in Pending status until the first runner is stopped. So, second runner instance is not using, until first instance will be stopped.
All tasks have same tag.
How to run multiple runners at same server host?

By default concurrent is 1, so unless you increase it your runner will just use one registration at a time: https://docs.gitlab.com/runner/configuration/advanced-configuration.html#the-global-section
limits how many jobs globally can be run concurrently. The most upper limit of jobs using all defined runners. 0 does not mean unlimited

To utilize all your CPU cores set concurrent in /etc/gitlab-runner/config.toml (when running as root) or ~/.gitlab-runner/config.toml (when running as non root) to the number of your CPUs.
You can find the number of CPUs like this: grep -c ^processor /proc/cpuinfo.
In my case the config.toml says concurrent = 8
Citations:
Gitlab-Runner advanced config: https://docs.gitlab.com/runner/configuration/advanced-configuration.html#the-global-section
Find CPU roes on linux: How to obtain the number of CPUs/cores in Linux from the command line?

Related

Is it possible to limit the maximum number of containers per docker image?

Problem:
I have a couple of Docker images on a hosting server. I start multiple containers from a bunch of jenkins job. Due to limited capabilities of the host, I'd like to limit the maximum number of container per image. Setting the limit for the number of jenkins executors doesn't really solve the problem since some jobs can spin up 16 containers. It is possible though to split them into several threads of parallel executions, but this is still not ideal. I'd like to have one solution for all jobs
Question #1 (main):
Is it possible to set the maximum limit of containers Docker runs on a single machine to 10, and queue the rest of them?
Question #2:
If there is no such functionality or there are better options in this case, what is the workaroud for this?

One way is to use kubetnetes as mentioned above. But this is very time consuming route
A simpler way is to set a master job that spins up your containers. Your pipeline will be calling this job, eg 16 times spinning up 16 containers. Then set a maximum of executors on your jenkins host for example to 6. When you kick off your job it will be 1 executor plus 16 in queue, total 17. It will start first 6, and then wait until then will be stopped. Once any of running containers is done, it will allow the next container to run

My workaround is to clean unused containers and images once in a while with a job.
Here it is:
https://gist.github.com/fredericrous/26e51ed936d710364fe1d1ab6572766e

Slow install / upgrade through Helm (for Kubernetes)

Our application consists of circa 20 modules. Each module contains a (Helm) chart with several deployments, services and jobs. Some of those jobs are defined as Helm pre-install and pre-upgrade hooks. Altogether there are probably about 120 yaml files, which eventualy result in about 50 running pods.
During development we are running Docker for Windows version 2.0.0.0-beta-1-win75 with Docker 18.09.0-ce-beta1 and Kubernetes 1.10.3. To simplify management of our Kubernetes yaml files we use Helm 2.11.0. Docker for Windows is configured to use 2 CPU cores (of 4) and 8GB RAM (of 24GB).
When creating the application environment for the first time, it takes more that 20 minutes to become available. This seems far to slow; we are probably making an important mistake somewhere. We have tried to improve the (re)start time, but to no avail. Any help or insights to improve the situation would be greatly appreciated.
A simplified version of our startup script:
#!/bin/bash
# Start some infrastructure
helm upgrade --force --install modules/infrastructure/chart
# Start ~20 modules in parallel
helm upgrade --force --install modules/module01/chart &
[...]
helm upgrade --force --install modules/module20/chart &
await_modules()
Executing the same startup script again later to 'restart' the application still takes about 5 minutes. As far as I know, unchanged objects are not modified at all by Kubernetes. Only the circa 40 hooks are run by Helm.
Running a single hook manually with docker run is fast (~3 seconds). Running that same hook through Helm and Kubernetes regularly takes 15 seconds or more.
Some things we have discovered and tried are listed below.
Linux staging environment
Our staging environment consists of Ubuntu with native Docker. Kubernetes is installed through minikube with --vm-driver none.
Contrary to our local development environment, the staging environment retrieves the application code through a (deprecated) gitRepo volume for almost every deployment and job. Understandibly, this only seems to worsen the problem. Starting the environment for the first time takes over 25 minutes, restarting it takes about 20 minutes.
We tried replacing the gitRepo volume with a sidecar container that retrieves the application code as a TAR. Although we have not modified the whole application, initial tests indicate this is not particularly faster than the gitRepo volume.
This situation can probably be improved with an alternative type of volume that enables sharing of code between deployements and jobs. We would rather not introduce more complexity, though, so we have not explored this avenue any further.
Docker run time
Executing a single empty alpine container through docker run alpine echo "test" takes roughly 2 seconds. This seems to be overhead of the setup on Windows. That same command takes less 0.5 seconds on our Linux staging environment.
Docker volume sharing
Most of the containers - including the hooks - share code with the host through a hostPath. The command docker run -v <host path>:<container path> alpine echo "test" takes 3 seconds to run. Using volumes seems to increase runtime with aproximately 1 second.
Parallel or sequential
Sequential execution of the commands in the startup script does not improve startup time. Neither does it drastically worsen.
IO bound?
Windows taskmanager indicates that IO is at 100% when executing the startup script. Our hooks and application code are not IO intensive at all. So the IO load seems to originate from Docker, Kubernetes or Helm. We have tried to find the bottleneck, but were unable to pinpoint the cause.
Reducing IO through ramdisk
To test the premise of being IO bound further, we exchanged /var/lib/docker with a ramdisk in our Linux staging environment. Starting the application with this configuration was not significantly faster.

To compare Kubernetes with Docker, you need to consider that Kubernetes will run more or less the same Docker command on a final step. Before that happens many things are happening.
The authentication and authorization processes, creating objects in etcd, locating correct nodes for pods scheduling them and provisioning storage and many more.
Helm itself also adds an overhead to the process depending on size of chart.
I recommend reading One year using Kubernetes in production: Lessons learned. Author goes into explaining what have they achieved by switching to Kubernetes as well differences in overhead:
Cost calculation
Looking at costs, there are two sides to the story. To run Kubernetes, an etcd cluster is required, as well as a master node. While these are not necessarily expensive components to run, this overhead can be relatively expensive when it comes to very small deployments. For these types of deployments, it’s probably best to use a hosted solution such as Google's Container Service.
For larger deployments, it’s easy to save a lot on server costs. The overhead of running etcd and a master node aren’t significant in these deployments. Kubernetes makes it very easy to run many containers on the same hosts, making maximum use of the available resources. This reduces the number of required servers, which directly saves you money. When running Kubernetes sounds great, but the ops side of running such a cluster seems less attractive, there are a number of hosted services to look at, including Cloud RTI, which is what my team is working on.

Best Practices for Cron on Docker

I've transitioned to using docker with cron for some time but I'm not sure my setup is optimal. I have one cron container that runs about 12 different scripts. I can edit the schedule of the scripts but in order to deploy a new version of the software running (some scripts which run for about 1/2 day) I have to create a new container to run some of the scripts while others finish.
I'm considering either running one container per script (the containers will share everything in the image but the crontab). But this will still make it hard to coordinate updates to multiple containers sharing some of the same code.
The other alternative I'm considering is running cron on the host machine and each command would be a docker run command. Doing this would let me update the next run image by using an environment variable in the crontab.
Does anybody have any experience with either of these two solutions? Are there any other solutions that could help?

If you are just running docker standalone (single host) and need to run a bunch of cron jobs without thinking too much about their impact on the host, then making it simple running them on the host works just fine.
It would make sense to run them in docker if you benefit from docker features like limiting memory and cpu usage (so they don't do anything disruptive). If you also use a log driver that writes container logs to some external logging service so you can easily monitor the jobs.. then that's another good reason to do it. The last (but obvious) advantage is that deploying new software using a docker image instead of messing around on the host is often a winner.
It's a lot cleaner to make one single image containing all the code you need. Then you trigger docker run commands from the host's cron daemon and override the command/entrypoint. The container will then die and delete itself after the job is done (you might need to capture the container output to logs on the host depending on what logging driver is configured). Try not to send in config values or parameters you change often so you keep your cron setup as static as possible. It can get messy if a new image also means you have to edit your cron data on the host.
When you use docker run like this you don't have to worry when updating images while jobs are running. Just make sure you tag them with for example latest so that the next job will use the new image.
Having 12 containers running in the background with their own cron daemon also wastes some memory, but the worst part is that cron doesn't use the environment variables from the parent process, so if you are injecting config with env vars you'll have to hack around that mess (write them do disk when the container starts and such).
If you worry about jobs running parallel there are tons of task scheduling services out there you can use, but that might be overkill for a single docker standalone host.

Docker run cron jobs in parallel

Using simple server
I was using a simple node (centos or ubuntu) to run my web application and also configured some cron jobs there to run schedule tasks. In that moment everything worked.
Using Docker Swarm Cluster
I migrated my application to Docker Swarm cluster. Now the crons are running in multiple containers at same time and that is critical for me. I know Docker is working on new feature called jobs but I need a solution for now. I will like to know if there is any way to just run one kind of cron job process.
Blocker
The crons are running tasks like:
create report about process.
send notification to another services.
updating data in the application.
The crons need to be run on the server because were configured to use interfaces and endpoint using php command.
My Problem
I created multiple instance of the same docker service to provide availability. All the instances are running in a cluster of 3 nodes and each of them are running its cron jobs at same time in parallel and I will like to run just one job per docker service.

Maybe it would be a solution to create periodically a docker service with restart condition none and replicas 1 or create a cron container with replicas 1 and restart condition any, it would be the scheduler and attach a volume with the required cron scripts.

There are multiple options.
Use a locking mechanism locking over NFS, or a database (MySQL, Redis, etc). You execute each job like this: /usr/local/bin/locker /path/to/script args. It may good to provide the locker options to wait for the lock or fail immediately if the lock is not available (blocking, non-blocking). Therefore, if the job is long-running, only the first one will acquire the lock, and others will fail. You may want to reuse some existing software simplifying the hard job of creating reliable locks.
Use a leader selection. When running in a swarm, there must be a mechanism to query the list of containers. List only cron containers, and sort them alphabetically. If the current container's id is the first one, then allow executing. first=$(get-containers cron | sort | head -n 1); if [[ "$current-id" == "$first" ]]; then ... fi
Run the cron outside of the cluster but use it to trigger jobs within the cluster over the load balancer. The load balancer would pick exactly one container to execute the job. For example curl -H 'security-key: xxx' HTTP://the.cluster/my-job.
I'm sure there are swarm-specific tools/methods available. A link.

Is there a limit of the number of tasks that can be launched by Mesos on a slave?

I am currently using Mesos + Marathon for the test.
When I launch a lot of tasks with command ping 8.8.8.8, at one point, slaves cannot launch a task any more. So I checked out stderr of sandbox, then it shows
Failed to initialize, pthread_create
I launched tasks with 0.00001 cpus and 0.00001 mems, so enough resources to launch a task remained in slaves.
Is there a limit of the number of tasks that can be launched by Mesos on a slave?

My first guees would be you are hitting a ulimits limit on your slaves.
Can you try the following:
#Check max number of threads:
$ ulimit -u
1024
Btw: If you just want to launch dummy tasks I would probably use sleep 3000 or something like that.
Hope this helps
Joerg

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart