In docker we have used deploy: replicas: 3 for our microservice. We have some Cronjob & the problem is the system in running all cronjob is getting called 3 times which is not what we want. We want to run it only one time. Sample of cron in nest.js :
#Cron(CronExpression.EVERY_5_MINUTES)
async runBiEventProcessor() {
const calculationDate = new Date()
Logger.log(`Bi Event Processor started at ${calculationDate}`)
How can I run this cron only once without changing the replicas to 1?
This is quite a generic problem when cron or background job is part of the application having multiple instances running concurrently.
There are multiple ways to deal with this kind of scenario. Following are some of the workaround if you don't have a concrete solution:
Create a separate service only for the background processing and ensure only one instance is running at a time.
Expose the cron job as an API and trigger the API to start background processing. In this scenario, the load balancer will hand over the request to only one instance. This approach will ensure that only one instance will handle the job. You will still need an external entity to hit the API, which can be in-house or third-party.
Use repeatable jobs feature from Bull Queue or any other tool or library that provides similar features.
Bull will hand over the job to any active processor. That way, it ensures the job is processed only once by only one active processor.
Nest.js has wrapper for the same. Read more about the Bull queue repeatable job here.
Implement a custom locking mechanism
It is not difficult as it sounds. Many other schedulers in other frameworks work on similar principles to handle concurrency.
If you are using RDBMS, make use of transactions and locking. Create cron records in the database. Acquire the lock as soon as the first cron enters and processes. Other concurrent jobs will either fail or timeout as they will not be able to acquire the lock. But you will need to handle a few cases in this approach to make it bug-free and flawless.
If you are using MongoDB or any similar database that supports TTL (Time-to-live) setting and unique index. Insert the document in the database where one of the fields from the document has unique constraints that ensure another job will not be able to insert one more document as it will fail due to database-level unique constraints. Also, ensure TTL(Time-to-live index) on the document; this way document will be deleted after a configured time.
These are workaround if you don't have any other concrete options.
There are quite some options here on how you could solve this, but I would suggest to create a NestJS microservice (or plain nodeJS) to run only the cronjob and store it in a shared db for example to store the result in Redis.
Your microservice that runs the cronjob does not expose anything, it only starts your cronjob:
const app = await NestFactory.create(
WorkerModule,
);
await app.init();
Your WorkerModule imports the scheduler and configures the scheduler there. The result of the cronjob you can write to a shared db like Redis.
Now you can still use 3 replica's but prevent registering cron jobs in all replica's.
Related
I am working on a project where it is required to create multiple instance of container dynamically based on the count received from the AWS Lambda function. Each container will execute its own task. I have done a lot of research but still not sure how to achieve this. Also how to delete the container instance when the task execution is completed?
You're describing the use case AWS Batch has been built for. It essentially allows you to submit tasks that are being processed in Docker Containers and manages the lifecycle of those containers for you. Since pre:invent 2020 it also supports Fargate.
An alternative would be using a Step Function that processes the output of the Lambda function and dynamically creates ECS tasks for that. Tasks without a service, so they just terminate when they're done processing. Depending on the amount of jobs you have I'd prefer AWS Batch.
I have inherited a system that consists of a couple daemons that asynchronously process messages. I am trying to find a clean way to introduce integration testing into this system with minimal impact/risk on the existing programs. Here is a very simplified overview of their responsibilities:
Process 1 polls a queue for messages, and inserts a row into a DB for each one it dequeues.
Process 2 polls the DB for rows inserted by Process 1, does some calculations, and then deposits a file into a directory on the host and sends an email.
These processes are quite old and complex, and I am strongly inclined to avoid modifying them in any way. What I would like to do is put each of them in a container, and also stand up the dependencies (queue, DB, mail server) in other containers. This part is straightforward, but what I'm unsure about is the best way to orchestrate these tests. Since these processes consume and generate output asynchronously I will need to poll or wait for the expected outcome (mail sent, file created).
Normally I would just write a series of tests in a single test suite of my language of choice (Java, Go, etc), and make the setUp / tearDown hooks responsible for resetting the environment to the desired state. But because these processes have a lot of internal state I am afraid I cannot successfully "clean up" properly after each distinct test. This would be a problem if, for example, one test failed to generate the desired output in a specific period of time so I marked it as failed, but a subsequent test falsely got marked as passed because the original test case actually did output something (albeit much slower than anticipated) that was mistakenly attributed to the subsequent test. For these reasons I feel I need to recreate the world between each test.
In order to do this the only options I can see are:
Use a shell script to actually run my tests -- having it bring up the containers, execute a single test file, and then terminate my containers for each test.
Follow my usual pattern of setUp / tearDown in my existing test framework but call out to docker to terminate and start up the containers between each test.
Am I missing another option? Is there some kind of existing framework or pattern used for this sort of testing?
I am trying to create a background processor windows service using hangfire.
I would like to increase the recurring job polling interval to more than 1 minute(hard-coded by default). The reason for doing the same is that recurring polling can affect the performance of the database.
Is there a possibility to enable/disable the hangfire recurring Job feature. This is required in case there are multiple instances of the service installed.
When you create a recurring job in Hangfire, even if you have multiple Hangfire servers, the job will not be run on two servers at the same time.
You can use Cron expression to define the frequency at which to run your job, as described in Hangfire docs:
RecurringJob.AddOrUpdate(() => YourJob(), "0 12 * */2");
However, your need may be to avoid triggering a job when the previous instance is still running. For this situation, I would recommend setting a flag (in the DB for example) when your job starts and removing it when it ends. Then check if the flag is present before actually starting your process.
Update
As you stated you want to prevent the RecurringJobScheduler from running on some servers, I have looked into the code and it seems there is no option to do this.
You can check the file BackgroundJobServer.cs where the scheduler is added to the process list and the RecurringJobScheduler.cs where the DB is queried. The value of 1 minute is hardcoded, as specified in the comments.
I think your only option is the pull request you have already made :(
I want to send email batch at specific time like CRON.
I think whenever gem (https://github.com/javan/whenever) is not to fit in Cloud Foundry Environment. Because Cloud Foundry can't use crontab.
Please inform me what options are available to me.
There's a node.js app here that you could use to schedule a specific rake task.
I haven't worked with cloudfare so I'm not sure if it'll serve your needs, but you can also try some of the batch job processing tools rails has available: Delayed job and sidekiq. Those store data for recurring jobs either on your database (DJ) or in a separate redis database (Sidekiq) and both need keeping extra processes up and running, so review them deeply and the changes you'd need for your deployment process before using each one. There's also resque, and here's a tutorial to use it with rails for scheduling tasks.
There are multiple solutions here, but the short answer is that whatever you end up doing needs to implement its own scheduler. This is because there is no cron service available to your application when it runs on CF. This means there is nothing to trigger or schedule your actions. Any project or solution that depends on cron will not work when deploying to CF. Any project that implements it's own scheduler should work fine.
Some specific things I've seen people do successfully:
Use a web service that sends HTTP requests to your app on predefined intervals. The requests trigger your action. It's the services responsibility to let you define when to trigger and to send the HTTP requests. I'm intentionally avoiding mentioning any specific services, but you can find them by searching for "cron http service" or something like that.
Importing a library that has cron like functionality. I'm not familiar with Ruby, so I don't know the landscape there. #mlabarca has mentioned a couple that you might try out. Again, look to see that they implement the scheduling functionality and do not depend on cron. I'm more familiar with Java where you have Quartz and Spring, which has some scheduling functionality too.
Implement a "clock" process or scheduler. This would generally be a second app that you deploy on CF. It would be lightweight and probably not have a web interface. It could be as simple as do something, sleep, loop for ever repeating those two steps. It really depends on your needs. You could even get fancy and implement something like the first option above where you're sending some sort of request to your other apps to trigger the actual events.
There are probably other solutions as well, those are just some examples to get you started.
Probably also worth mentioning that the Cloud Controller v3 API will have first class features to run tasks. In this case, the "task" is some job that runs in a finite amount of time and exits (like a batch job). This is opposed to the standard "app" that when run on CF should continue executing forever (i.e. if it exits, it's cause of a crash). That said, I do not believe it will include a scheduler so you'd still need something to trigger the task.
We have Ruby on Rails application running on EC2 and enabled autoscaling feature. We have been using whenever to manage cron. So new instances with an image of the main instance created automatically on spikes and dropped when low traffic. But this also copies cron jobs as well to newly created instances.
We have a specific requirement where we want to limit cron to a single instance.
I found a gem which looks like handing this specific requirement but I am skeptical about it, reason being it is for elastic beanstalk and no longer maintained.
as a workaround, you can have a condition within the cron specifying that the cron job should execute based on a condition that would elect a single instance among your autoscaling group. e.g have only the oldest instance running the cron, or only the instance having the "lowest" instance ID, or whatever you like as a condition.
you can achieve such a thing by having your instances calling the AWS API.
As a more proper solution, you maybe could use a single cronified lambda accessing your instances? this is now possible as per this page
Best is to set scale in protection. It prevents your instance being terminated during scaling events.
You can find more information here on AWS https://aws.amazon.com/blogs/aws/new-instance-protection-for-auto-scaling/