Setting Up Simple Workflow with SQS and Lambda - amazon-sqs

On my local server, I have a bash script that sets some environment variables, opens a Docker container and executes a command in the container that runs the main program of interest via the Docker image. The bash script actually loops over this main command to fire off one job per (different) input file over the desired number of CPUs on the local server. In other words, the overall run is embarrassingly parallel.
How can I set up a workflow with SQS, Lamba, Cloudwatch to accomplish something similar to spin up the needed EC2 instances (I would set the specific number)? I get that each bash command in the loop would act as a message that is sent to SQS and then in some way would trigger an EC2 instance, but I need some help constructing the overall workflow.
Thanks!

Related

Using SLURM to run TCP client, server

I have a Docker image that needs to be run in an environment where I have no admin privileges, using Slurm 17.11.8 in RHEL. I am using udocker to run the container.
In this container, there are two applications that needs to run:
[1] ROS simulation (there is a rosnode that is a TCP client talking to [2])
[2] An executable (TCP server)
So [1] and [2] needs to run together and they shared some common files as well. Usually, I run them in separate terminals. But I have no idea how to do this with SLURM.
Possible Solution:
(A) Use two containers of the same image, but their files will be stored locally. Could use volumes instead. But this requires me to change my code significantly and maybe break compatibility when I am not running it as containers (e.g in Eclipse).
(B) Use a bash script to launch two terminals and run [1] and [2]. Then srun this script.
I am looking at (B) but have no idea how to approach it. I looked into other approaches but they address sequential executions of multiple processes. I need these to be concurrent.
If it helps, I am using xfce-terminal though I can switch to other terminals such as Gnome, Konsole.
This is a shot in the dark since I don't work with udocker.
In your slurm submit script, to be submitted with sbatch, you could allocate enough resources for both jobs to run on the same node(so you just need to reference localhost for your client/server). Start your first process in the background with something like:
udocker container_name container_args &
The & should start the first container in the background.
You would then start the second container:
udocker 2nd_container_name more_args
This would run without & to keep the process in the foreground. Ideally, when the second container completes the script would complete and slurm cleanup would kill the first container. If both containers will come to an end cleanly you can put a wait at the end of the script.
Caveats:
Depending on how Slurm is configured, processes may not be properly cleaned up at the end. You may need to capture the PID of the first udocker as a variable and kill it before you exit.
The first container may still be processing when the second completes. You may need to add a sleep command at the end of your submission script to give it time to finish.
Any number of other gotchas may exist that you will need to find and hopefully work around.

Port allocation when running build job in Jenkins

My project is structured in such a way that the build job in Jenkins is triggered from a push to Git. As part of my application logic, I spin up kafka and elastic search instances to be used in my test cases downstream.
The issue I have right now is, when a developer pushes his changes to Git, it triggers a build in Jenkins which in turn runs our code and spawns kafka broker in localhost:9092 and elastic search in localhost:9200.
When another developer working on some other change simultaneously, pushes his code, it triggers the build job again and tries to spin up another instance of kafka/elastic search but fails with the exception “Port already in use”.
I am looking at options on how to handle this scenario.
Will running these instances inside of docker container help to some extent? How do I handle the port issue in that case?
Yes dockerizing these instances can indeed help as you can spawn them multiple times.
You could create a docker container per component including your application and then let them talk to each other by linking them or using docker-compose
That way you would not have to expose the ports to the "outside" world but keep it internal within the docker environment.
That way you would not have the “Port already in use”. The only problem is memory in that case. e.g. if 100 pushes are done to the git repo, you might run out of memory...

Best Practices for Cron on Docker

I've transitioned to using docker with cron for some time but I'm not sure my setup is optimal. I have one cron container that runs about 12 different scripts. I can edit the schedule of the scripts but in order to deploy a new version of the software running (some scripts which run for about 1/2 day) I have to create a new container to run some of the scripts while others finish.
I'm considering either running one container per script (the containers will share everything in the image but the crontab). But this will still make it hard to coordinate updates to multiple containers sharing some of the same code.
The other alternative I'm considering is running cron on the host machine and each command would be a docker run command. Doing this would let me update the next run image by using an environment variable in the crontab.
Does anybody have any experience with either of these two solutions? Are there any other solutions that could help?
If you are just running docker standalone (single host) and need to run a bunch of cron jobs without thinking too much about their impact on the host, then making it simple running them on the host works just fine.
It would make sense to run them in docker if you benefit from docker features like limiting memory and cpu usage (so they don't do anything disruptive). If you also use a log driver that writes container logs to some external logging service so you can easily monitor the jobs.. then that's another good reason to do it. The last (but obvious) advantage is that deploying new software using a docker image instead of messing around on the host is often a winner.
It's a lot cleaner to make one single image containing all the code you need. Then you trigger docker run commands from the host's cron daemon and override the command/entrypoint. The container will then die and delete itself after the job is done (you might need to capture the container output to logs on the host depending on what logging driver is configured). Try not to send in config values or parameters you change often so you keep your cron setup as static as possible. It can get messy if a new image also means you have to edit your cron data on the host.
When you use docker run like this you don't have to worry when updating images while jobs are running. Just make sure you tag them with for example latest so that the next job will use the new image.
Having 12 containers running in the background with their own cron daemon also wastes some memory, but the worst part is that cron doesn't use the environment variables from the parent process, so if you are injecting config with env vars you'll have to hack around that mess (write them do disk when the container starts and such).
If you worry about jobs running parallel there are tons of task scheduling services out there you can use, but that might be overkill for a single docker standalone host.

Docker run cron jobs in parallel

Using simple server
I was using a simple node (centos or ubuntu) to run my web application and also configured some cron jobs there to run schedule tasks. In that moment everything worked.
Using Docker Swarm Cluster
I migrated my application to Docker Swarm cluster. Now the crons are running in multiple containers at same time and that is critical for me. I know Docker is working on new feature called jobs but I need a solution for now. I will like to know if there is any way to just run one kind of cron job process.
Blocker
The crons are running tasks like:
create report about process.
send notification to another services.
updating data in the application.
The crons need to be run on the server because were configured to use interfaces and endpoint using php command.
My Problem
I created multiple instance of the same docker service to provide availability. All the instances are running in a cluster of 3 nodes and each of them are running its cron jobs at same time in parallel and I will like to run just one job per docker service.
Maybe it would be a solution to create periodically a docker service with restart condition none and replicas 1 or create a cron container with replicas 1 and restart condition any, it would be the scheduler and attach a volume with the required cron scripts.
There are multiple options.
Use a locking mechanism locking over NFS, or a database (MySQL, Redis, etc). You execute each job like this: /usr/local/bin/locker /path/to/script args. It may good to provide the locker options to wait for the lock or fail immediately if the lock is not available (blocking, non-blocking). Therefore, if the job is long-running, only the first one will acquire the lock, and others will fail. You may want to reuse some existing software simplifying the hard job of creating reliable locks.
Use a leader selection. When running in a swarm, there must be a mechanism to query the list of containers. List only cron containers, and sort them alphabetically. If the current container's id is the first one, then allow executing. first=$(get-containers cron | sort | head -n 1); if [[ "$current-id" == "$first" ]]; then ... fi
Run the cron outside of the cluster but use it to trigger jobs within the cluster over the load balancer. The load balancer would pick exactly one container to execute the job. For example curl -H 'security-key: xxx' HTTP://the.cluster/my-job.
I'm sure there are swarm-specific tools/methods available. A link.

Docker for Jobs

I am trying to understand if this is a valid use case for Docker.
Suppose I am creating a python application, and the user submits a job of a specific type. I would like to then use docker run programmatically to start a docker image that is designed to process the job. The docker image would need to be able to receive the job, and then exit sending me a message that the job is successful.
Does this make sense? How could this be done using python? How can I observe docker containers and their status?
Or would it make more sense for the docker image to simply run on a loop, looking for jobs from a shared volume?
It's not totally crazy idea but, you most likely would be better off having only one container which contains a server, that listens for requests to run a "job", and then runs the relevant R code. No need to start a container for each request if the image you're running will be the same every time, instead you just send an HTTP request.
If you really want to manipulate Docker with Python you can use docker-py. Docker Compose is based on this library.

Resources