Single cron job across multiple AWS EC2 images - ruby-on-rails

We have Ruby on Rails application running on EC2 and enabled autoscaling feature. We have been using whenever to manage cron. So new instances with an image of the main instance created automatically on spikes and dropped when low traffic. But this also copies cron jobs as well to newly created instances.
We have a specific requirement where we want to limit cron to a single instance.
I found a gem which looks like handing this specific requirement but I am skeptical about it, reason being it is for elastic beanstalk and no longer maintained.

as a workaround, you can have a condition within the cron specifying that the cron job should execute based on a condition that would elect a single instance among your autoscaling group. e.g have only the oldest instance running the cron, or only the instance having the "lowest" instance ID, or whatever you like as a condition.
you can achieve such a thing by having your instances calling the AWS API.
As a more proper solution, you maybe could use a single cronified lambda accessing your instances? this is now possible as per this page

Best is to set scale in protection. It prevents your instance being terminated during scaling events.
You can find more information here on AWS https://aws.amazon.com/blogs/aws/new-instance-protection-for-auto-scaling/

Related

AWS ECS restrict to only one container instance

I want to only ever run one instance of my container to run at a given time. For instance say my ECS container is writing data to an EFS. The application I am running is built in such a way that multiple instances can not be writing to this data at a given time. So is there a way to make sure that ECS never starts more than one instance. I was worried that when one container was being torn down or stood up that two containers may end up running simultaneously.
I was also thinking about falling back to EC2 so that I could meet this requirement but did want to try this out in ECS.
I have tried setting the desired instances to 1 but I am worried that this will not work.
Just set min, desired and maximum number of tasks to 1.
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/service-configure-auto-scaling.html#:~:text=To%20configure%20basic%20Service%20Auto%20Scaling%20parameters

Prevent multiple cron running in nest.js on docker

In docker we have used deploy: replicas: 3 for our microservice. We have some Cronjob & the problem is the system in running all cronjob is getting called 3 times which is not what we want. We want to run it only one time. Sample of cron in nest.js :
#Cron(CronExpression.EVERY_5_MINUTES)
async runBiEventProcessor() {
const calculationDate = new Date()
Logger.log(`Bi Event Processor started at ${calculationDate}`)
How can I run this cron only once without changing the replicas to 1?
This is quite a generic problem when cron or background job is part of the application having multiple instances running concurrently.
There are multiple ways to deal with this kind of scenario. Following are some of the workaround if you don't have a concrete solution:
Create a separate service only for the background processing and ensure only one instance is running at a time.
Expose the cron job as an API and trigger the API to start background processing. In this scenario, the load balancer will hand over the request to only one instance. This approach will ensure that only one instance will handle the job. You will still need an external entity to hit the API, which can be in-house or third-party.
Use repeatable jobs feature from Bull Queue or any other tool or library that provides similar features.
Bull will hand over the job to any active processor. That way, it ensures the job is processed only once by only one active processor.
Nest.js has wrapper for the same. Read more about the Bull queue repeatable job here.
Implement a custom locking mechanism
It is not difficult as it sounds. Many other schedulers in other frameworks work on similar principles to handle concurrency.
If you are using RDBMS, make use of transactions and locking. Create cron records in the database. Acquire the lock as soon as the first cron enters and processes. Other concurrent jobs will either fail or timeout as they will not be able to acquire the lock. But you will need to handle a few cases in this approach to make it bug-free and flawless.
If you are using MongoDB or any similar database that supports TTL (Time-to-live) setting and unique index. Insert the document in the database where one of the fields from the document has unique constraints that ensure another job will not be able to insert one more document as it will fail due to database-level unique constraints. Also, ensure TTL(Time-to-live index) on the document; this way document will be deleted after a configured time.
These are workaround if you don't have any other concrete options.
There are quite some options here on how you could solve this, but I would suggest to create a NestJS microservice (or plain nodeJS) to run only the cronjob and store it in a shared db for example to store the result in Redis.
Your microservice that runs the cronjob does not expose anything, it only starts your cronjob:
const app = await NestFactory.create(
WorkerModule,
);
await app.init();
Your WorkerModule imports the scheduler and configures the scheduler there. The result of the cronjob you can write to a shared db like Redis.
Now you can still use 3 replica's but prevent registering cron jobs in all replica's.

ECS Fargate - Is it possible to create container instances dynamically?

I am working on a project where it is required to create multiple instance of container dynamically based on the count received from the AWS Lambda function. Each container will execute its own task. I have done a lot of research but still not sure how to achieve this. Also how to delete the container instance when the task execution is completed?
You're describing the use case AWS Batch has been built for. It essentially allows you to submit tasks that are being processed in Docker Containers and manages the lifecycle of those containers for you. Since pre:invent 2020 it also supports Fargate.
An alternative would be using a Step Function that processes the output of the Lambda function and dynamically creates ECS tasks for that. Tasks without a service, so they just terminate when they're done processing. Depending on the amount of jobs you have I'd prefer AWS Batch.

Is it possible to assign worker resources to dask distributed worker after creation?

As per title, if I am creating workers via helm or kubernetes, is it possible to assign "worker resources" (https://distributed.readthedocs.io/en/latest/resources.html#worker-resources) after workers have been created?
The use case is tasks that hit a database, I would like to limit the amount of processes able to hit the database in a given run, without limiting the total size of the cluster.
As of 2019-04-09 there is no standard way to do this. You've found the Worker.set_resources method, which is reasonable to use. Eventually I would also expect Worker plugins to handle this, but they aren't implemented.
For your application of controlling access to a database, it sounds like what you're really after is a semaphore. You might help build one (it's actually decently straightforward given the current Lock implementation), or you could use a Dask Queue to simulate one.

How to create new EC2 instance in AutoScaling group?

I want to create EC2 instance in autoscaling group. I want to do it manually using Ruby SDK. I haven't find any parameter to pass a name of AutoScaling Group name in amazon-ec2 Docs. Can anybody tell me that how to do it?
You can't manually create an instance other than be updating the autoscaling group and increasing the desired number of instances by 1. Specific instances can be terminated with TerminateInstanceInAutoScalingGroup.
You can also call ExecutePolicy to call a pre-existing autoscaling policy. I don't think you can trigger a cloud watch alarm from the API (other than by injecting sufficient amount of metric data to trigger the alarm (which you might not be able to do for non-custom metrics))

Resources