what would be the possible approach to go : SQS or SNS? - ruby-on-rails

I am going to make the rails application which integrates the Amazon's cloud services.
I have explore amazon's SNS service which gives the facility of public subscription which i don't want to do. I want to notify only particular subscriber.
For example if I have 5 subscriber in one topic then the notification should be goes to particular subscriber.
I have also explored amazon's SQS in which i have to write a poller which monitor the queue for message. SQS has also a lock mechanism but the problem is that it is distributed so there would be a chance of getting same message from another copy of queue for process.
I want to know that what would be the possible approach to go.

SQS sounds like what you want.
You can run multiple "worker" processes that compete over messages in the queue. Each message is only consumed once. The logic behind the "lock" / timeout that you mention is as follows: if one of your workers were to die after downloading a message, but before processing it, then you want that message to eventually time out and be re-downloaded for processing on another node.
Yes, SQS is built on a polling model. For example, I have a number of use cases in which I use a minutely cron job to poll for new messages in the queue and take action on any messages found. This pattern is stupid simple to build and works wonders for a bunch of use cases -- a handy little "client" script that pushes a message into the queue, and the cron activated script that will process that message within a minute or so.
If your message pattern is extremely sparse -- eg, only a few messages a day -- it may seem wasteful to poll constantly while the queue is empty. It hardly matters.
My original calculation was that a minutely cron job would cost $0.04 (now $0.02) per month. Since then, SQS added a "Long-Polling" feature that lets you achieve sub-second latency on processing new messages by sending 1 "long-poll" message every 20 seconds to poll an idle queue. Plus, they dropped the price 50%. So per month, that's 131k messages (~$0.06), a little bit more expensive, but with near realtime request processing.
Keep in mind that a minutely cron job I described only costs ~$0.04 / month in request load (30d*24h*60m * 1c / 10k msgs). So at a minutely clip, cost shouldn't really be a concern here. Even polling every second, the price rises only to $2.59 / mo, not exactly a bank buster.
However, it is possible to avoid frequent polling using a webservice that takes an SNS HTTP message. Such an architecture would work as follows: client pushes message to SNS, which pushes message to SQS and routes an HTTP request to your webservice, triggering it to drain the queue. You'd still want to poll the queue hourly or daily, just in case an HTTP request was dropped. In the end though, I'm not sure I can think of any scenario which really justifies such complexity. I'd much rather pay $0.04 a month to have a dirt simple cron job polling my queue.

Related

Mass Transit: Using AWS SQS and Job Consumers

Currently we're using Hangfire for scheduling and running long lived tasks. We need these tasks to be able to be retried in the event of an ungraceful shutdown, which Hangfire handles for us.
We're looking to try and move to a producer/consumer model and I've built a basic prototype with Masstransit and AWS SQS, but I have some concerns about how to handle the event of a task being processed during an ungraceful shutdown.
I understand that eventually the SQS visibility timeout will expire and the queued item will be picked up for processing again, but setting that timeout isn't trivial as the length of tasks can be quite varied and I'd prefer if the task could immediately resume/retry processing when the application starts up again.
I got reading about Job Consumers and they seemed to be better fitted to this type of scenario, but all the examples I've seen are using RabbitMQ. Wondering if it's possible/appropriate to do this using SQS, or if there's a better approach?
Thank you for taking the time to read this question :)
MassTransit will extend the visibility timeout as long as the consumer is still running.
I believe SQS has an upper-limit of something like 12 hours, but you should look it up and find out.
Job Consumers have significantly greater requirements (sagas, temporary queues, etc.) and SQS is really annoying about not having auto-delete/expiring queues, so I'd stick to a regular consumer if you can swing it.

Monitor Amazon SQS delayed processing

I have a series of applications that consume messages from SQS Queues. If for some reason one of these consumers fails and stop consuming messages I'd like to be notified. What's the best way to do this?
Note that some of these queues could only have one message placed into the queue every 2 - 3 days, so waiting for the # of messages in the queue to trigger a notification is not a good option for me.
What I'm looking for is something that can monitor an SQS queue and say "This message has been here for an hour and nothing has processed it ... let someone know."
Possible solution off the top of my head (possibly not the most elegant one) which does not require using CloudWatch at all (according to the comment from OP the required tracking cannot be implemented through CloudWatch alarms). Assume you have the Queue to be processed at Service and the receiving side is implemented through long polling.
Run a Lambda function (say hourly) listening to the Queue and reading messages, however never deleting (Service deletes the messages once processed). On the Queue set the Maximum Receives to any value u want, let's say 3. If Lambda function ran 3 times and all three times message was present in the queue, the message will be pushed to Dead Letter Queue (automatically if the redrive policy is set). Whenever new message is pushed to dead letter queue, it is a good indicator that your service is either down or not handling the requests fast enough. All variables can be changed to suit your needs

Grails non time based queuing

I need to process files which get uploaded and it can take as little as 1 second or as much as 10 minutes. Currently my solution is to make a quartz job with a timer of 30 seconds and then process and arbitrary job whenever it hits. There are several problems with this.
One: if the job will take less than a few seconds it is wasteful to make things wait 30 seconds for the job queue.
Two: if there is only one long job in the queue it could feasibly try to do it twice.
What I want is a timeless queue. When things are added the are started immediately if there is a free worker. Is there a solution for this? I was looking at jesque, but I couldn't tell if it can do this.
What you are looking for is a basic message queue. There are lots of options out there, but my favorite for Grails is RabbitMQ. The Grails plugin for it is quite good and it performs well in my experience.
In general, message queues allow you to have N producers (things creating jobs") adding work messages to a queue and then M consumers pulling jobs off of the queue and processing them. When a worker completes it's job, it simply asks the queue for the next job to process and if there is none, it just waits for the queue to give it something to do. The queue also keeps track of success / failure of message processing (you can control this) so that you don't give the same message to more than one worker.
This has the advantage of not relying on polling (so you can start processing as soon as things come in) and it's also much more scaleable. You can scale both your producers and consumers up or down as needed, decoupling the inputs from the outputs so that you can take a traffic spike and then work your way through it as you have the resources (workers) available.
To solve problem one just make the job check for new uploaded files every 5 seconds (or 3 seconds, or 1 second). If the check for uploaded files is quick then there is no reason you can't run it often.
For problem two you just need to record when you start processing a file to ensure it doesn't get picked-up twice. You could create a table in the database, or store the information in memory somewhere.

Practical use of delayed background job when dealing with many users

When a background job starts, it's sent to the back of a queue where a worker handles it; a task clears and the other starts. I think I've got this one right except I don't understand the practical side of it in some cases. Sure, if you're a company sending out 15,000 newsletters once a week using a delayed job makes perfect sense. But when you have an application of even 100 users, in which some task is long enough to need background work (like sending/fetching emails that might take a minute) then each user will have to wait in line while another user gets cleared (in case there's a single worker).
This is the part I'm not sure I'm getting right. I'm talking about the same job, but individually for each user. Does that count as a job per user? If I have 100 users, do I need to keep 100 workers for each one's process to not get tied up?
I've tried using delayed_job to simulate that, and indeed when I sign in with a different account I have to wait until another user's email gets sent until mine is. While the plugin is swift and simple to work with, I think it's not the right approach here.
I've also tried using Ajax, but since it's an HTTP request it ties up the browser in loading mode until it gets a response from the server (even with async: true). Not sure if I ruled this one out too quickly, but I was sortof looking for a more elegant server solution.
Is there a way to achieve a background job like this? (I've heard of different, mostly commercial solutions promising little waiting time, but I'm interested in completely eliminating the queue between users). If not, is there a method to make an ajax request without waiting for a response? I realize my questions are both drastically different but both seem like an appropriate solution to this problem.
Resque is a background processing engine that can support multiple queues.
Ways you could use this:
Group your tasks into queues that make sense on their priority. If you need fast response times, use it in a 'foreground' queue. Slow? (like sending/receiving emails) can be in the 'background' queue
Have one queue per user (you will need to have many many workers for this)
This SO question also gives a way to use delayed_jobs with multiple queues/tables
The purpose of delayed_job and other message queues is to asynchronously process jobs outside of your core application. I always use a queue for sending email since I'm relying on an outside application (sometimes a third-party API like gmail) to send them and I can't guarantee available and operating efficiency.
So for your use case, even with very few users, I highly recommend offloading emails to delayed_job. This will speed up your front end (ajax) and will also give you retries upon failure. You could spin up multiple workers to process the queue, but it shouldn't be necessary with your numbers unless your calls to send mail are taking a really long time (more than a couple seconds?).
And yes in most situations I'd create separate jobs for each user even though the message might be identical. The only time I'd process them all together would be if the email application / API has bulk sending and you can reduce the number of calls significantly by sending a large payload in a few calls.

Executing large numbers of asynchronous IO-bound operations in Rails

I'm working on a Rails application that periodically needs to perform large numbers of IO-bound operations. These operations can be performed asynchronously. For example, once per day, for each user, the system needs to query Salesforce.com to fetch the user's current list of accounts (companies) that he's tracking. This results in huge numbers (potentially > 100k) of small queries.
Our current approach is to use ActiveMQ with ActiveMessaging. Each of our users is pushed onto a queue as a different message. Then, the consumer pulls the user off the queue, queries Salesforce.com, and processes the results. But this approach gives us horrible performance. Within a single poller process, we can only process a single user at a time. So, the Salesforce.com queries become serialized. Unless we run literally hundreds of poller processes, we can't come anywhere close to saturating the server running poller.
We're looking at EventMachine as an alternative. It has the advantage of allowing us to kickoff large numbers of Salesforce.com queries concurrently within a single EventMachine process. So, we get great parallelism and utilization of our server.
But there are two problems with EventMachine. 1) We lose the reliable message delivery we had with ActiveMQ/ActiveMessaging. 2) We can't easily restart our EventMachine's periodically to lessen the impact of memory growth. For example, with ActiveMessaging, we have a cron job that restarts the poller once per day, and this can be done without worrying about losing any messages. But with EventMachine, if we restart the process, we could literally lose hundreds of messages that were in progress. The only way I can see around this is to build a persistance/reliable delivery layer on top of EventMachine.
Does anyone have a better approach? What's the best way to reliably execute large numbers of asynchronous IO-bound operations?
I maintain ActiveMessaging, and have been thinking about the issues of a multi-threaded poller also, though not perhaps at the same scale you guys are. I'll give you my thoughts here, but am also happy to discuss further o the active messaging list, or via email if you like.
One trick is that the poller is not the only serialized part of this. STOMP subscriptions, if you do client -> ack in order to prevent losing messages on interrupt, will only get sent a new message on a given connection when the prior message has been ack'd. Basically, you can only have one message being worked on at a time per connection.
So to keep using a broker, the trick will be to have many broker connections/subscriptions open at once. The current poller is pretty heavy for this, as it loads up a whole rails env per poller, and one poller is one connection. But there is nothing magical about the current poller, I could imagine writing a poller as an event machine client that is implemented to create new connections to the broker and get many messages at once.
In my own experiments lately, I have been thinking about using Ruby Enterprise Edition and having a master thread that forks many poller worker threads so as to get the benefit of the reduced memory footprint (much like passenger does), but I think the EM trick could work as well.
I am also an admirer of the Resque project, though I do not know that it would be any better at scaling to many workers - I think the workers might be lighter weight.
http://github.com/defunkt/resque
I've used AMQP with RabbitMQ in a way that would work for you. Since ActiveMQ implements AMQP, I imagine you can use it in a similar way. I have not used ActiveMessaging, which although it seems like an awesome package, I suspect may not be appropriate for this use case.
Here's how you could do it, using AMQP:
Have Rails process send a message saying "get info for user i".
The consumer pulls this off the message queue, making sure to specify that the message requires an 'ack' to be permanently removed from the queue. This means that if the message is not acknowledged as processed, it is returned to the queue for another worker eventually.
The worker then spins off the message into the thousands of small requests to SalesForce.
When all of these requests have successfully returned, another callback should be fired to ack the original message and return a "summary message" that has all the info germane to the original request. The key is using a message queue that lets you acknowledge successful processing of a given message, and making sure to do so only when relevant processing is complete.
Another worker pulls that message off the queue and performs whatever synchronous work is appropriate. Since all the latency-inducing bits have already performed, I imagine this should be fine.
If you're using (C)Ruby, try to never combine synchronous and asynchronous stuff in a single process. A process should either do everything via Eventmachine, with no code blocking, or only talk to an Eventmachine process via a message queue.
Also, writing asynchronous code is incredibly useful, but also difficult to write, difficult to test, and bug-prone. Be careful. Investigate using another language or tool if appropriate.
also checkout "cramp" and "beanstalk"
Someone sent me the following link: http://github.com/mperham/evented/tree/master/qanat/. This is a system that's somewhat similar to ActiveMessaging except that it is built on top of EventMachine. It's almost exactly what we need. The only problem is that it seems to only work with Amazon's queue, not ActiveMQ.

Resources