Looking for suggestions on a background gem [closed] - ruby-on-rails

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
I plan on running our web app on Heroku. I am looking for a gem to handle a few background jobs. i.e. sending emails, to call a few methods which submit files to an encoding service via an API, etc.
A few that have, so far, come to mind are resque and delayed_job. I hear good things about resque and it also seems to be the more popular gem in its category. Ryan Bates has done an excellent screen cast on delayed_job. However, I hear delayed_job has had a few problems. i.e. not very stable in some areas. So I hear.
Heroku offers Redis-to-Go. They have a free plan which offers 5mb. If I go with resque, is this 5mb plan enough to handle background jobs? I don't want to end up spending more just for background jobs.
Just concerned that if I went with resque, I would need another db just to run background jobs. If I was using Redis for something else, then perhaps it would be worth it. Is it worth having another db just to handle background jobs?
Should I consider alternative gems? If so which ones?

Both delayed_job and resque work fairly well. resque should scale better as the volume of background requests increases.
resque's use of redis should be limited to the task request. Large data objects that are needed by the background tasks should be stored somewhere other than the background worker queue. For example, the files being sent to a background worker to be encoded should be stored in AWS S3 or some other persistent store, not the redis queue used by resque.
When using delayed_job or resque, you will need to run background workers which cost money. You might want to look at an autoscaling solution for dynamically starting and stopping background workers as needed.
See http://s831.us/h3pKE6 as an example.

We've used delayed_job very intensively, sending hundreds of concurrent emails, and it's worked very well. flawlessly. Yes, it'll cost $36/mo for the worker. But a single worker gets a lot of jobs done... several fairly complex emails (lot of dbase lookups) sent per second.

Related

Ruby Concurrency in cron job needed [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I am developing a system in which the API should handle simultaneous, continuous by rails 4.0
In system, each user has 3 scripts to be run in background. The scripts grab the user's information from DB to call API repeatedly and process transaction. Currently I am using cronjob (whenever gem) to run scripts in the background for each individual user
So my problem is when the system has 1,000 people, I need to run 3000 cronjobs.
I think this system will have problems. Can anyone help me solve this problem?
At this point you have a system that performs some tasks periodically, and the amount of work your system has to handle (let's say, per hour) is less than the amount of work it could handle.
However, the amount of work increases with the number of users in your system so, as you have already guessed, there will be a moment when the situation will be critical. Your system will not be able to handle all the tasks it has to do.
One way to solve this problem is adding more machines to your system, that is, if you are currently using a single machine to run all your tasks, consider adding another one and split the job. You can split the job between the machines in a number of ways, but I would use a consumer-producer approach.
You will need to use a queue manager where your producer periodically sends a batch of tasks to be done (you can still use whenever gem for that) and a number of consumers (1 makes no sense, 2 would be OK by now but you could increase this number) get the tasks done one by one until there is none left.
The manager I like the most is Sidekiq but you can find some others that might match your needs better.

Importing data that may take 10-15 minutes to process, what are my options in Rails? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I have a Rails application that displays thousands of products.
The products are loaded from product feeds, so the source may be a large XML file or web service API calls.
I want to be able to re-use my models in my existing rails application in my import process.
What are my options in importing data into my Rails application?
I could use sidekiq to fire off rake tasks, but not sure if sidekiq is suitable for tasks that take 10+ minutes to run? Most use cases that I have seen is for sending of emails and other similiar light tasks
I could create maybe a stand-alone ruby script, but not sure how I could re-use my Rails models if I go this route.
Update
My total product could is around 30-50K items.
Sidekiq would be a great option for this as others have mentioned. 10+ minutes isn't unreasonable as long as you understand that if you restart your sidekiq process mid run that job will be stopped as well.
The concern I have is if you are importing 50K items and you have a failure near the beginning you'll never get to the last ones. I would suggest looking at your import routine and seeing if you can break it up into smaller components. Something like this:
Start sidekiq import job.
First thing job does is reschedule itself N hours later.
Fetch data from API/XML.
For each record in that result schedule a "import this specific data" job with the data as an argument.
Done.
The key is the second to last step. By doing it this way your primary job has a much better chance of succeeding as all it is doing is reading API/XML and scheduling 50K more jobs. Each of those can run individually and if a single one fails it won't affect the others.
The other thing to remember is that unless you configure it not to Sidekiq will rerun failed jobs. So make sure that "import specific data" job can be run multiple times and still do the right thing.
I have a very similar setup that has worked well for me for two years.

System for monitoring cron jobs and automated tasks? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
I have several cron-jobs and background tasks on a variety of servers. These tasks can fail for any number of reasons:
lack of disk space
processing strange, unreadable file types
logical errors/bugs in the programs
invalid cron entry
invalid json received
network connectivity failure
db locks
system library update breaks program
Why they failed to run is important, but the most important thing is knowing they failed to run.
Is there a uniform way to monitor multiple jobs, and be alerted if they fail to run at their scheduled time, for any reason? I'm using Ubuntu, the scripts are primarily in Ruby.
Note:
I'm specifically looking for a framework or system that works across multiple servers, and that has alerting via email or text built in, and one that can survive limited disk-space. So the solution presented in
How can I setup a system to tell me if a cron job is NOT running fine? doesn't seem applicable.
It's still under active development but I would encourage you to take a look at https://github.com/jamesrwhite/minicron, I believe it meets all the requirements you specified and more!
Disclaimer: I'm the developer working on it.
Cronitor (https://cronitor.io) was a tool I built exactly for this purpose. It basically boils down to being a tracking beacon that uses http requests as the pings (similar to pushmon).
However, one of the needs that I had (and that pushmon and similar tools couldn't offer) was getting alerts if cron jobs started taking too long to run (or conversely if they started finishing too quickly). Cronitor solves this by allowing you to optionally trigger a begin event and an end event in order to keep track of duration.
Duration tracking was a must have for me because I had a cronjob that was scheduled every hour, but over time started taking over an hour to run. That was a disaster ;)
Will http://www.pushmon.com fill your needs? It's built primarily to let you know if a cron job or scheduled task has failed to run. You can put it on any of your servers and has email and text alerts. The idea is you "ping" PushMon when your job has run successfully, and PushMon will alert you if it didn't receive the ping.
Although it may not satisfy all your needs:
https://github.com/javan/whenever

How do I handle long requests for a Rails App so other users are not delayed too much?

I have a Rails app on a free tier on Heroku and it recently started getting some users. One of the events in my app involves querying another API and can take up to 10 seconds to finish. How do I make sure other users who visit a simple page at the same time (as another user's API event) don't need to wait 10 seconds for their page to load?
Do I need to pay for more Dynos? Is this something that can be solved with the delayed_job gem? Would another host (like AppFog or OpenShift) be able to handle simultaneous requests faster?
Update:
This question suggest manually handling threads instead of using delayed_job.
That sounds like a Delayed Job situation. If the first request is just waiting, the most efficient thing to do is assign a process to wait for it to complete and cut the Rails process loose to handle another request.
Yes you need more dynos, speccialy worker dynos those are the ones that work on the background you can check this railscast on delayed jobs that can help also:
http://railscasts.com/episodes/366-sidekiq
also here is a quick tutorial on adding unicorn and multiple threads to your free heroku instance:
https://devcenter.heroku.com/articles/rails-unicorn
you divide your dyno into two or more instances then each one can handle a different request
What kind of app server are you using? If you are using passenger or unicorn, you can have multiple worker processes that can handle simultaneous requests
http://www.modrails.com/documentation/Users%20guide%20Apache.html#_passengermaxinstancesperapp_lt_integer_gt

Is Amazon SQS the right choice here? Rails performance issue [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I'm close to releasing a rails app with the common networking features (messaging, wall, etc.). I want to use some kind of background processing (most likely Bj) for off-loading tasks from the request/response cycle.
This would happen when users invite friends via email to join and for email notifications.
I'm not sure if I should just drop these invites and notifications in my Database, using a model and then just process it with a worker process every x minutes or if I should go for Amazon SQS, storing the messages and invites there and let my worker retrieve it from Amazon SQS for processing (sending the invites / notifications).
The Amazon approach would get load off my Database but I guess it is slower to retrieve messages from there.
What do you think?
Your title states that you have a Rails performance issue, but do you know this for certain? From the rest of your question it sounds like you're trying to anticipate a possible future performance issue. The only way to deal sensibly with performance issues is to get your application into the wild and profile it. Doing so will give you empirical data as to what the real performance issues are.
Given that Amazon SQS isn't free and the fact that using it will almost certainly add complexity to your application, I would migrate to it if and when database load becomes a problem. Don't try to second guess problems before they arise, because you'll find that you'll likely face different problems when your app goes live, some of which you probably haven't considered.
The main point is that you've already decided to use background processing, which is the correct decision, given that any sort of processing that isn't instantaneous doesn't belong within the Rails' request/response cycle, as it blocks that Rails process. You can always scale with Amazon later if you need to.
Is your app hosted on Amazon EC2 already? I probably wouldn't move an existing app over to AWS just so I can use SQS, but if you're already using Amazon's infrastructure, SQS ia a great choice. You could certainly set up your own messaging system (such as RabbitMQ), but by going with SQS that's one less thing you have to worry about.
There are a lot of options to add background processing to Rails apps, such as delayed_job or background_job, but my personal favorite is Workling. It gives you a nice abstraction layer that allows you to plug in different background runners without having to change the actual implementation of your jobs.
I maintain a Workling fork that adds an SQS client. There are some shortcomings (read the comments or my blog post for more details), but overall it worked well for us at my last startup.
I've also used SQS for a separate Ruby (non-Rails) project and generally found it reliable and fast enough. Like James pointed out above, you can read up to 10 messages at once, so you'll definitely want to do that (my Workling SQS client does this and buffers the messages locally).
I agree with John Topley that you don't want to over-complicate your application if you don't need to. That being said there are times when it is good to make this kind of decision early, do you anticipate high load from the beginning? Are you rolling this out to an existing user base or is it a public site that may or may not take off?
If you know you will need to handle a large amount of traffic from the beginning then this might be a good step. If you don't want to spend the money to use SQS take a look at some of the free queue solutions out there like RabbitMQ.
I currently push a couple million messages a month through SQS and it works pretty well. Make sure you plan for it being down or slow from time to time, so you would need to work in some retry facilities and exponential backoff. One of the nice things is that you can get 10 messages at a time which speeds up being able to work through the queue, you can use one request to get the 10 messages and process them 1 by 1.
Amazon SQS is a fine service, except where the following things become important:
Performance
Legal
Acknowledgments and Transactions
Messaging Idioms Message Properties
Security, Authenticity and Queue
Permissions
If any of these things are important you need to look at a real enterprise MQ service such as StormMQ, RabbitMQ, or even onlinemq.com.
I found this blog series interesting as it compares Amazon SQS to StormMQ without holding any punches back:
http://blog.stormmq.com/2011/01/06/apples-and-oranges-performance/
If you have issues with moving to EC2, You can use other services like onlinemq.com.

Resources