Routing of an activity task to preferably a specific worker in the SWF fleet - amazon-swf

Considering the fileprocessing sample which executes download activity on any host from the pool, then converts the file and uploads the result on the same host as the first one, is there a way to route an activity task to preferably (not always) a specific worker in the SWF fleet based upon the worker's availability, reason is that the worker should not be overloaded with tasks which might affect the latency.

There are two use cases for not overloading host with tasks. The first is when a host just needs to limit a number of parallel activities running on it. It is done by setting the ActivityWorker.setTaskExecutorThreadPoolSize property.
The second is when a host needs to limit number of sequences of activities running on it. For example as in fileprocessing example limiting number of files to be processed at the same time. The solution is to use a special semaphore like activity that runs for the duration of the whole sequence. Here is the outline of the implementation:
Hosts listen for "semaphore" activity on a special common task list using a separate activity worker.
The number of parallel activities on this worker is limited through setTaskExecutorThreadPoolSize.
Once the "semaphore" activity starts executing on a host it sends a signal to its workflow with a host specific task list. Then it just keeps periodically heartbeating to SWF.
All other activities from the given workflow are dispatched to the host specific task list received through the signal sent by the "semaphore" activity.
After the last host specific activity is done the "semaphore" activity is cancelled.
The semaphore activity learns about cancellation on the next heartbeat. Then it rethrows cancellation exception which completes it with the worker and releases a place in the task executor thread pool.
A variation is to not use the cancellation but notify the semaphore activity in process by the last activity in a sequence.

Related

Monitor Amazon SQS delayed processing

I have a series of applications that consume messages from SQS Queues. If for some reason one of these consumers fails and stop consuming messages I'd like to be notified. What's the best way to do this?
Note that some of these queues could only have one message placed into the queue every 2 - 3 days, so waiting for the # of messages in the queue to trigger a notification is not a good option for me.
What I'm looking for is something that can monitor an SQS queue and say "This message has been here for an hour and nothing has processed it ... let someone know."
Possible solution off the top of my head (possibly not the most elegant one) which does not require using CloudWatch at all (according to the comment from OP the required tracking cannot be implemented through CloudWatch alarms). Assume you have the Queue to be processed at Service and the receiving side is implemented through long polling.
Run a Lambda function (say hourly) listening to the Queue and reading messages, however never deleting (Service deletes the messages once processed). On the Queue set the Maximum Receives to any value u want, let's say 3. If Lambda function ran 3 times and all three times message was present in the queue, the message will be pushed to Dead Letter Queue (automatically if the redrive policy is set). Whenever new message is pushed to dead letter queue, it is a good indicator that your service is either down or not handling the requests fast enough. All variables can be changed to suit your needs

Creating a FIFO queue in SWF to control access to critical code sections

At the moment we have an Amazon Simple Workflow application that has a few tasks that can occur in parallel at the beginning of the process, followed by one path through a critical region where we can only allow one process to proceed.
We have modeled the critical region as a child workflow and we only allow one process to run in the child workflow at a time (though there is a race condition in our code that hasn't caused us issues yet). This is doing the job, but it has some issues.
We have a method that keeps checking if the child workflow is running and if it isn't it proceeds (race condition mentioned above - the is running check and starting running are not an atomic operation), otherwise throws an exception and retries, this method has an exponential backoff, the problems are: 1. With multiple workflows entering, which workflow will proceed first is non-deterministic, it would be better if this were a FIFO queue. 2. We can end up waiting a long time for the next workflow to start so there is wasted time, would be nice if the workflows proceeded as soon as the last one had finished.
We can address point 2 by reducing the retry interval, but we would still have the non-FIFO problem.
I can imagine modeling this quite easily on a single machine with a queue and locks, but what are our options in SWF?
You can have "critical section" workflow that is always running. Then signal it to "queue" execute requests. Upon receiving signal the "critical section" workflow either starts activity if it is not running or queues the request in the decider. When activity execution completes the "response" signal is sent back to the requester workflow. As "critical section" workflow is always running it has periodically restart itself as new (passing list of outstanding requests as a parameter) the same way all cron workflows are doing.

Amazon Simple Workflow - Choosing Worker From Same Activity Type Nodes Based on System Usage

I have four servers which are classified under the same activity type. All four servers are consistently polling from SWF. I start one workflow and one of the nodes start a processing routine. This routine will take an hour long and 80% of the CPU resources of the server.
How do I make sure that the next workflow I start does not utilize this same server? And so on for the third and fourth workflows I start? Is there any logic I can put in my decider to do this?
I think it is better handled on the level of the activity worker. The basic idea is that after a poll returns an activity task the next poll is not issued until the task is completed. By monitoring the depth of the task list you can support autoscaling of worker nodes if necessary.

Erlang VM: scheduler runtime information

I was searching for a way to retrieve information about how the scheduling is done during a program's execution: which processes are in which scheduler, if they change, what process is active at each scheduler, if each scheduler runs in one core etc...
Any ideas or related documentation/articles/anything?
I would suggest you take a look on the following tracing/profiling options:
erlang:system_profile/2
It has options for monitoring scheduler and run queue (runnable_procs) activity.
The scheduler option will report
{profile, scheduler, Id, State, NoScheds, Ts}
where State will tell you if it is active or not. NoScheds reports the number of currently active schedulers (if I remember correctly).
The runnable_procs option will let you know if a process is put into or removed from a run queue of a particular scheduler.
If you have a system that supports DTrace, you can use the erlang dtrace probes being developed to see exactly when process scheduling events occur.
For example, I wrote a simple one-liner that shows you the number of nanoseconds that pass between sending a message to a process and having the recipient process be scheduled for execution (± a few nanoseconds for cross-core clock variance and processes and such).

Erlang: Job Scheduling Over a Dynamic Set of Nodes

I need some advice writing a Job scheduler in Erlang which is able to distribute jobs ( external os processes) over a set of worker nodes. A job can last from a few milliseconds to a few hours. The "scheduler" should be a global registry where jobs come in, get sorted and then get assigned and executed on connected "worker nodes". Worker nodes should be able to register on the scheduler by telling how many jobs they are able to process in parallel (slots). Worker nodes should be able to join and leave at any time.
An Example:
Scheduler has 10 jobs waiting
Worker Node A connects and is able to process 3 jobs in parallel
Worker Node B connects and is able to process 1 job in parallel
Some time later, another worker node joins which is able to process 2 jobs in parallel
Questions:
I seriously spent some time thinking about the problem but I am still not sure which way to go. My current solution is to have a globally registered gen_server for the scheduler which holds the jobs in its state. Every worker node spawns N worker processes and registers them on the scheduler. The worker processes then pull a job from the scheduler (which is an infinite blocking call with {noreply, ...} if no jobs are currently availale).
Here are some questions:
Is it a good idea to assign every new job to an existing worker, knowing that I will have to re-assign the job to another worker at the time new workers connect? (I think this is how the Erlang SMP scheduler does things, but reassigning jobs seems like a big headache to me)
Should I start a process for every worker processing slot and where should this process live: on the scheduler node or on the worker node? Should the scheduler make rpc calls to the worker node or would it be better for the worker nodes to pull new jobs and then execute them on their own?
And finally: Is this problem already solved and where to find the code for it? :-)
I already tried RabbitMQ for job scheduling but custom job sorting and deployment adds a lot of complexity.
Any advice is highly welcome!
Having read your answer in the comments I'd still recommend to use pool(3):
Spawning 100k processes is not a big deal for Erlang because spawning a process is much cheaper than in other systems.
One process per job is a very good pattern in Erlang, start a new process run the job in the process keeping all the state in the process and terminate the process after the job is done.
Don't bother with worker processes that process a job and wait for a new one. This is the way to go if you are using OS-processes or threads because spawning is expensive but in Erlang this only adds unnecessary complexity.
The pool facility is useful as a low level building block, the only thing it misses your your functionality is the ability to start additional nodes automatically. What I would do is start with pool and a fixed set of nodes to get the basic functionality.
Then add some extra logic that watches the load on the nodes e.g. also like pool does it with statistics(run_queue). If you find that all nodes are over a certain load threshold just slave:start/2,3 a new node on a extra machine and use pool:attach/1to add it to your pool.
This won't rebalance old running jobs but new jobs will automatically be moved to the newly started node since its still idle.
With this you can have a fast pool controlled distribution of incoming jobs and a slower totally separate way of adding and removing nodes.
If you got all this working and still find out -- after some real world benchmarking please -- you need rebalancing of jobs you can always build something into the jobs main loops, after a message rebalance it can respawn itself using the pool master passing its current state as a argument.
Most important just go ahead and build something simple and working and optimize it later.
My solution to the problem:
"distributor" - gen_server,
"worker" - gen_server.
"distributor" starts "workers" using slave:start_link, each "worker" is started with max_processes parameter,
"distributor" behavior:
handle_call(submit,...)
* put job to the queue,
* cast itself check_queue
handle_cast(check_queue,...)
* gen_call all workers for load (current_processes / max_processes),
* find the least busy,
* if chosen worker load is < 1 gen_call(submit,...) worker
with next job if any, remove job from the queue,
"worker" behavior (trap_exit = true):
handle_call(report_load, ...)
* return current_process / max_process,
handle_call(submit, ...)
* spawn_link job,
handle_call({'EXIT', Pid, Reason}, ...)
* gen_cast distributor with check_queue
In fact it is more complex than that as I need to track running jobs, kill them if I need to, but it is easy to implement in such architecture.
This is not a dynamic set of nodes though, but you can start new node from the distributor whenever you need.
P.S. Looks similar to pool, but in my case I am submitting port processes, so I need to limit them and have better control of what is going where.

Resources