Using simple server
I was using a simple node (centos or ubuntu) to run my web application and also configured some cron jobs there to run schedule tasks. In that moment everything worked.
Using Docker Swarm Cluster
I migrated my application to Docker Swarm cluster. Now the crons are running in multiple containers at same time and that is critical for me. I know Docker is working on new feature called jobs but I need a solution for now. I will like to know if there is any way to just run one kind of cron job process.
Blocker
The crons are running tasks like:
create report about process.
send notification to another services.
updating data in the application.
The crons need to be run on the server because were configured to use interfaces and endpoint using php command.
My Problem
I created multiple instance of the same docker service to provide availability. All the instances are running in a cluster of 3 nodes and each of them are running its cron jobs at same time in parallel and I will like to run just one job per docker service.
Maybe it would be a solution to create periodically a docker service with restart condition none and replicas 1 or create a cron container with replicas 1 and restart condition any, it would be the scheduler and attach a volume with the required cron scripts.
There are multiple options.
Use a locking mechanism locking over NFS, or a database (MySQL, Redis, etc). You execute each job like this: /usr/local/bin/locker /path/to/script args. It may good to provide the locker options to wait for the lock or fail immediately if the lock is not available (blocking, non-blocking). Therefore, if the job is long-running, only the first one will acquire the lock, and others will fail. You may want to reuse some existing software simplifying the hard job of creating reliable locks.
Use a leader selection. When running in a swarm, there must be a mechanism to query the list of containers. List only cron containers, and sort them alphabetically. If the current container's id is the first one, then allow executing. first=$(get-containers cron | sort | head -n 1); if [[ "$current-id" == "$first" ]]; then ... fi
Run the cron outside of the cluster but use it to trigger jobs within the cluster over the load balancer. The load balancer would pick exactly one container to execute the job. For example curl -H 'security-key: xxx' HTTP://the.cluster/my-job.
I'm sure there are swarm-specific tools/methods available. A link.
Related
Please pardon me IF I ASKED A VERY AMATEUR question but after reading multiple threads, posts, references and etc... I still do not understand the differences.
My current understanding:
1st method)
A traditional docker will compose of 3 dockers:
Scheduler that manages the schedule of the job
Queue that manages the queue of the multiple jobs
Worker that manages the work of each queue
I read from this source: https://laravel-news.com/laravel-scheduler-queue-docker
Docker + Apache Airflow will compose a single docker that does the same as above 3 dockers:
2nd method)
Worker(Airflow: since in airflow we can set the scheduler and also the queue)
I watched this tutorial: https://www.youtube.com/watch?v=vvr_WNzEXBE&t=575s
I first learnt from these two sources and other sources but I am confuse about:
Since I can use docker-compose to build all the services, all I need is just 1 docker(2nd method) and then set the scheduler which already in airflow to control the workflow, right? Then it means I do not need to create multiple dockers as the 1st method which separate all tasks into different dockers.
If both are different, then what are the differences? I try to find it for days but still could not figure out, I am sorry, I am new to this subject so I am currently still studying about it.
Thank you.
I think you are pointing at multi-node airflow vs single node airflow comparison. A multi-node airflow will provide you more computing power and higher availability for your Apache Airflow instance. You can run everything (webserver,scheduler and worker) on one machine/docker instance but if you project grows, you can make a cluster and scale your pipeline.
Actually, a Airflow instance can have number of Daemons workers that work together to provide the full functionality of Airflow.
With multiple gunicorn workers you can take and execute more task from the queue , in parallel/concurrently. On a single machine (depending on your usecase/cores of machine) you can define this in {AIRFLOW_HOME}/airflow.cfg (for example workers=6)
Now since Daemons don’t are independent of each other, people distribute them on multiple nodes/instances (Docker containers in your case). So, probably, this is what you have seen.
Update:
about the tutorial links you shared
As you asked in the comment section, the youtube tutorial that you have pointed to is also using one docker container where you run every thing, you arent using multi Node there.
For your first link (relating laravel scheduling), I am not sure but it seems like it is also using only one container
how you link multiple nodes of airflow in Multi-node setup
As an example, if you use same external Database instance (Mysql,postgres) and all your nodes interact with it, similarly, you have same queue from where they take task (may be external/same RabbitMQ cluster)
What is scheduler for, how to exucute
Scheduler is the thing that actually schedules the DAGs, for example running the weekly/daily/monthly etc as you have declared. In essense, one has only one
scheduler and there are workers that you need more. But it doesnt mean you cant have, you may have two webservers etc but you need to tackle port difference and share metadata between them.
To run the scheduler, you just need to do airflow scheduler and it will start picking Your DAGs and starts executing them once it is running successfully. For the first run it will see start_date and for subsequent, it will use schedule_interval that you have defined in the DAG to schedule them (thats why it is scheduler)
On my local server, I have a bash script that sets some environment variables, opens a Docker container and executes a command in the container that runs the main program of interest via the Docker image. The bash script actually loops over this main command to fire off one job per (different) input file over the desired number of CPUs on the local server. In other words, the overall run is embarrassingly parallel.
How can I set up a workflow with SQS, Lamba, Cloudwatch to accomplish something similar to spin up the needed EC2 instances (I would set the specific number)? I get that each bash command in the loop would act as a message that is sent to SQS and then in some way would trigger an EC2 instance, but I need some help constructing the overall workflow.
Thanks!
I have a Docker image that needs to be run in an environment where I have no admin privileges, using Slurm 17.11.8 in RHEL. I am using udocker to run the container.
In this container, there are two applications that needs to run:
[1] ROS simulation (there is a rosnode that is a TCP client talking to [2])
[2] An executable (TCP server)
So [1] and [2] needs to run together and they shared some common files as well. Usually, I run them in separate terminals. But I have no idea how to do this with SLURM.
Possible Solution:
(A) Use two containers of the same image, but their files will be stored locally. Could use volumes instead. But this requires me to change my code significantly and maybe break compatibility when I am not running it as containers (e.g in Eclipse).
(B) Use a bash script to launch two terminals and run [1] and [2]. Then srun this script.
I am looking at (B) but have no idea how to approach it. I looked into other approaches but they address sequential executions of multiple processes. I need these to be concurrent.
If it helps, I am using xfce-terminal though I can switch to other terminals such as Gnome, Konsole.
This is a shot in the dark since I don't work with udocker.
In your slurm submit script, to be submitted with sbatch, you could allocate enough resources for both jobs to run on the same node(so you just need to reference localhost for your client/server). Start your first process in the background with something like:
udocker container_name container_args &
The & should start the first container in the background.
You would then start the second container:
udocker 2nd_container_name more_args
This would run without & to keep the process in the foreground. Ideally, when the second container completes the script would complete and slurm cleanup would kill the first container. If both containers will come to an end cleanly you can put a wait at the end of the script.
Caveats:
Depending on how Slurm is configured, processes may not be properly cleaned up at the end. You may need to capture the PID of the first udocker as a variable and kill it before you exit.
The first container may still be processing when the second completes. You may need to add a sleep command at the end of your submission script to give it time to finish.
Any number of other gotchas may exist that you will need to find and hopefully work around.
I've transitioned to using docker with cron for some time but I'm not sure my setup is optimal. I have one cron container that runs about 12 different scripts. I can edit the schedule of the scripts but in order to deploy a new version of the software running (some scripts which run for about 1/2 day) I have to create a new container to run some of the scripts while others finish.
I'm considering either running one container per script (the containers will share everything in the image but the crontab). But this will still make it hard to coordinate updates to multiple containers sharing some of the same code.
The other alternative I'm considering is running cron on the host machine and each command would be a docker run command. Doing this would let me update the next run image by using an environment variable in the crontab.
Does anybody have any experience with either of these two solutions? Are there any other solutions that could help?
If you are just running docker standalone (single host) and need to run a bunch of cron jobs without thinking too much about their impact on the host, then making it simple running them on the host works just fine.
It would make sense to run them in docker if you benefit from docker features like limiting memory and cpu usage (so they don't do anything disruptive). If you also use a log driver that writes container logs to some external logging service so you can easily monitor the jobs.. then that's another good reason to do it. The last (but obvious) advantage is that deploying new software using a docker image instead of messing around on the host is often a winner.
It's a lot cleaner to make one single image containing all the code you need. Then you trigger docker run commands from the host's cron daemon and override the command/entrypoint. The container will then die and delete itself after the job is done (you might need to capture the container output to logs on the host depending on what logging driver is configured). Try not to send in config values or parameters you change often so you keep your cron setup as static as possible. It can get messy if a new image also means you have to edit your cron data on the host.
When you use docker run like this you don't have to worry when updating images while jobs are running. Just make sure you tag them with for example latest so that the next job will use the new image.
Having 12 containers running in the background with their own cron daemon also wastes some memory, but the worst part is that cron doesn't use the environment variables from the parent process, so if you are injecting config with env vars you'll have to hack around that mess (write them do disk when the container starts and such).
If you worry about jobs running parallel there are tons of task scheduling services out there you can use, but that might be overkill for a single docker standalone host.
My problem is I have a dedicate server, but the resources are still limited, i.e. IO, memory, CPU, etc.
I need to run a lot of jobs every day. Some jobs are io intensive some jobs are computation intensive. Is there a way to monitor the current status and decide when to start a new job from my job pool or not.
For example, when it knows the current running job are io intensive, it can lunch a job which do not relay on much of io. Or it can choose a running job which use a lot of disk io, stop it, re-schedule it later.
I come up with the solution with docker,since it can monitor the process, but I do not know such kind of scheduler build on top of docker.
Thanks
You can check the docker stats command in order to get basic metrics on what is running in the containers managed by a docker daemon.
You cannot exactly assign a job to a node depending on its dynamic behavior. That would mean to know in advance what type of resource a job will use. Which is not described in docker at all.
Docker provides a way to tag nodes which enable swarm filers, which would enable a cluster manager like swarm to select the right node based on criteria represented by a tag.
But Docker doesn't know about the "job" about to be launched.
Depending on the Docker version you're on, you have a number of options for prod. You can use the native Docker Swarm (just went GA in v1.9), you can give the more mature Kubernetes a try or HashiCorp's Nomad (early days) and there's of course Apache Mesos+Marathon. See also this comparison for more info on the topic.