Run process as Daemon on AWS Infrastructure - docker

I would like to run a process as daemon on AWS Infrastructure that is responsible to read the AWS SQS Queue and make some process.
My first approach is to use a docker container deployed on ECS Container service. So I will be on while true loop, sleeping for some seconds. Using this, I can control the sleep time between processing, so If my SQS queue is full, I could decrease the sleep time. So
I know that is possible to use AWS Lambda scheduled as a cron job, but I have no control over the cron time (decrease or increase in response of sqs size).
The AWS Lambda approach is simpler and there is no need of "any" infrastructure, but it less flexible.
Does anyone know another approach?

Looking at the way AWS Lambda handles cron scheduling under the hood (see https://docs.aws.amazon.com/lambda/latest/dg/with-scheduled-events.html and http://docs.aws.amazon.com/AmazonCloudWatch/latest/events/RunLambdaSchedule.html) you should be able to find the Cloudwatch event that is triggering your lambda and modify it from within the lambda itself. Documentation for the programmatic APIs for Lambda and Cloudwatch Events is fairly scarce, though, so you'll have to figure a lot out for yourself.
All in all, doesn't really sound like an easier approach than just running your own container, although if you have a lot of these sorts of things to do it might not hurt to package it all in a library.

Related

Mass Transit: Using AWS SQS and Job Consumers

Currently we're using Hangfire for scheduling and running long lived tasks. We need these tasks to be able to be retried in the event of an ungraceful shutdown, which Hangfire handles for us.
We're looking to try and move to a producer/consumer model and I've built a basic prototype with Masstransit and AWS SQS, but I have some concerns about how to handle the event of a task being processed during an ungraceful shutdown.
I understand that eventually the SQS visibility timeout will expire and the queued item will be picked up for processing again, but setting that timeout isn't trivial as the length of tasks can be quite varied and I'd prefer if the task could immediately resume/retry processing when the application starts up again.
I got reading about Job Consumers and they seemed to be better fitted to this type of scenario, but all the examples I've seen are using RabbitMQ. Wondering if it's possible/appropriate to do this using SQS, or if there's a better approach?
Thank you for taking the time to read this question :)
MassTransit will extend the visibility timeout as long as the consumer is still running.
I believe SQS has an upper-limit of something like 12 hours, but you should look it up and find out.
Job Consumers have significantly greater requirements (sagas, temporary queues, etc.) and SQS is really annoying about not having auto-delete/expiring queues, so I'd stick to a regular consumer if you can swing it.

ECS Fargate - Is it possible to create container instances dynamically?

I am working on a project where it is required to create multiple instance of container dynamically based on the count received from the AWS Lambda function. Each container will execute its own task. I have done a lot of research but still not sure how to achieve this. Also how to delete the container instance when the task execution is completed?
You're describing the use case AWS Batch has been built for. It essentially allows you to submit tasks that are being processed in Docker Containers and manages the lifecycle of those containers for you. Since pre:invent 2020 it also supports Fargate.
An alternative would be using a Step Function that processes the output of the Lambda function and dynamically creates ECS tasks for that. Tasks without a service, so they just terminate when they're done processing. Depending on the amount of jobs you have I'd prefer AWS Batch.

How to use delayed_job with Rails on Google App Engine?

This may be more of an App Engine question than a delayed_job question. But generally, how can I keep a long-lived process running to handling the scheduling of notifications and the sending of scheduled notifications on Google App Engine?
The maintainers of active_job https://github.com/collectiveidea/delayed_job include a script for production deploys, but this seems to stop after a few hours. Trying to figure out the best approach to ensure that the script stays running, and also that the script is able to access the logs for debugging purposes.
I believe that Google Pub/Sub is also a possibility, but I would ideally like to avoid setting up additional infrastructure for such a small project.
For running long processes that last for hours, App Engine will not be the ideal solution, since the requests are cap to 60 s (GAE Standard) and 60 m (GAE Flex).
The best would be to use a Compute Engine based solution, since the you would be able to keep the GCE VM up for long periods.
Once you have deployed on your GCE VM a RESTful application you can use Cloud Scheduler to create an scheduled job with this command:
gcloud scheduler jobs create http JOB --schedule=SCHEDULE --uri=APP_PATH
You can find more about this solution in this article
If App Engine is required take into consideration the mentioned maximum request times. And additionally you can give a look to Cloud Tasks, since those fit pretty much into your requirement.

AWS ECS scale up and down ways

We are using AWS ECS service where docker containers are running into it. These docker container having application code which continuously polling SQS and gets the single message, process it and kill their self and that's the life cycle of task.
Now we are scaling tasks and EC2 in cluster based on number of messages comes to SQS. We are able to scale up but it's difficult to scale down because we don't know whether any task is still processing any message because message processing time is huge due to some complex logic.
Could anybody suggest what's the based way to scale up and scale down in this case?
Have you considered using AWS Lambda for this use case rather than ECS (provided that your application logic runs in less than 5 mins). You can use SQS as a trigger for the Lambda. AWS Documentation : Using AWS Lambda with Amazon SQS provides a comprehensive guide on how to achieve this using Lambda.
The use case you have mentioned doesn't mean for ECS for EC2 instances. You should consider AWS ECS Fargate or AWS BATCH. On one side fargate will give you more capabilities in term of infrastructure like The task can be run for longer periods or scaling of tasks according to some parameters like CPU or MEM. On another side, you will be paying only the number of tasks running at a moment in your cluster.
Ref: https://aws.amazon.com/fargate/

Docker as "Function" (Create a Docker per request)

Is there a simple way to create an istance of a docker container for each request?
I have a Docker container that takes a very long time to compute a mathematical algorithm. When running, no other requests can be processed in parallel. Lambda Functions would be the best solution, but the container needs to download more than 1gb of data and needs at least 10 cores and 5GB ram to be executed, and therefore Lambda would be too expensive.
We have a big cluster (1000 cores, 0.5TB RAM) and I was considering to use a NGINX Load balancer or a Kubernetes bare metal.
Is it possible to configure in a way that creates an instance per request (similar to a Lambda Function)?
There are tools like Airflow or Argo that are designed for these things.
basically you can create a DAG will run very much like a function as a service but on what ever custom docker container you want.
You probably need to decouple the HTTP service from the backend processing. If the job takes minutes or longer to run, most browsers and other HTTP clients will time out before it will finish, so the HTTP end of it needs to start the job in some way and immediately return some sort of success message.
Once you’ve done that, you might find a job queue like RabbitMQ a useful piece of infrastructure technology. Again, this decouples the queue of jobs from the mechanism to actually run them. In a Docker/Kubernetes space you’d launch some number of persistent workers that all listened to the queue and did work as it appeared there. You wouldn’t necessarily launch one worker per job; or possibly you would have just one worker that launched other Docker containers or Kubernetes Jobs; but if the work backlog got too long you could launch additional workers.
In a pure-Docker space it’s theoretically possible to use the Docker API to launch additional containers. However, doing this gives your process unlimited root-level access to the host; if you are running this in the context of an HTTP server you need to be extremely careful about security considerations. Kubernetes also has an API and from a security point of view this is probably better: you can set up a service account that has permissions only to launch Jobs, and launch a Job per inbound job that arrives. (Security is still important but it’s much harder for a malicious input to root the host.)

Resources