Ruby Concurrency in cron job needed [closed] - ruby-on-rails

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I am developing a system in which the API should handle simultaneous, continuous by rails 4.0
In system, each user has 3 scripts to be run in background. The scripts grab the user's information from DB to call API repeatedly and process transaction. Currently I am using cronjob (whenever gem) to run scripts in the background for each individual user
So my problem is when the system has 1,000 people, I need to run 3000 cronjobs.
I think this system will have problems. Can anyone help me solve this problem?

At this point you have a system that performs some tasks periodically, and the amount of work your system has to handle (let's say, per hour) is less than the amount of work it could handle.
However, the amount of work increases with the number of users in your system so, as you have already guessed, there will be a moment when the situation will be critical. Your system will not be able to handle all the tasks it has to do.
One way to solve this problem is adding more machines to your system, that is, if you are currently using a single machine to run all your tasks, consider adding another one and split the job. You can split the job between the machines in a number of ways, but I would use a consumer-producer approach.
You will need to use a queue manager where your producer periodically sends a batch of tasks to be done (you can still use whenever gem for that) and a number of consumers (1 makes no sense, 2 would be OK by now but you could increase this number) get the tasks done one by one until there is none left.
The manager I like the most is Sidekiq but you can find some others that might match your needs better.

Related

Implementing a online compiler [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I am doing a small project which is to host a site similar to ideone.com ie to host an online compiler which will compile and run the code which is given as an input.I am using ROR for the backend part.
What I have done is that the code which is inputted in the textbox is stored into the string and I am using system calls in ruby to create a file and I am storing the string in that file.Similary I am also storing the input for the code in another file.Again I am using system calls to compile and run the file and storing the output into the string and sending it to the front end part.
I have got two problems for the above implemented method
1) It will only work for a single user at a time.Any idea how to implement for multiple users and if yes what will be the limit of the number of users?
2) Anyone can put a malicious code and harm the system.I have to sandbox the environment such that it will run at an isolated environment. How can I do it?
Program running infinity loop is not a problem as I have put the limit on the execution time.I am using backtick to execute the shell script.I am implementing it for C, if I succeed to solve all the problems I would extend it to other languages.
For the sake of not letting people wipe out your hard drive, install spambots etc, you will need to run all code inside a virtual machine, to protect the host. This also solves the user problem since you can spin up a virtual machine for each user and spin it down after running the code. However, this might use a lot of resources on your server.
I'd be interested to find out what ideone.com does. I suspect that everything runs in the client's browser, which is obviously much safer since you can just use your server to save their code, but not actually run it. If it runs in their browser it is sandboxed anyway. Are you sure that you don't want to do this instead? I've never heard of anyone letting people upload code and then run it on the system server. Seems kind of insanely risky.

Importing data that may take 10-15 minutes to process, what are my options in Rails? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I have a Rails application that displays thousands of products.
The products are loaded from product feeds, so the source may be a large XML file or web service API calls.
I want to be able to re-use my models in my existing rails application in my import process.
What are my options in importing data into my Rails application?
I could use sidekiq to fire off rake tasks, but not sure if sidekiq is suitable for tasks that take 10+ minutes to run? Most use cases that I have seen is for sending of emails and other similiar light tasks
I could create maybe a stand-alone ruby script, but not sure how I could re-use my Rails models if I go this route.
Update
My total product could is around 30-50K items.
Sidekiq would be a great option for this as others have mentioned. 10+ minutes isn't unreasonable as long as you understand that if you restart your sidekiq process mid run that job will be stopped as well.
The concern I have is if you are importing 50K items and you have a failure near the beginning you'll never get to the last ones. I would suggest looking at your import routine and seeing if you can break it up into smaller components. Something like this:
Start sidekiq import job.
First thing job does is reschedule itself N hours later.
Fetch data from API/XML.
For each record in that result schedule a "import this specific data" job with the data as an argument.
Done.
The key is the second to last step. By doing it this way your primary job has a much better chance of succeeding as all it is doing is reading API/XML and scheduling 50K more jobs. Each of those can run individually and if a single one fails it won't affect the others.
The other thing to remember is that unless you configure it not to Sidekiq will rerun failed jobs. So make sure that "import specific data" job can be run multiple times and still do the right thing.
I have a very similar setup that has worked well for me for two years.

System for monitoring cron jobs and automated tasks? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
I have several cron-jobs and background tasks on a variety of servers. These tasks can fail for any number of reasons:
lack of disk space
processing strange, unreadable file types
logical errors/bugs in the programs
invalid cron entry
invalid json received
network connectivity failure
db locks
system library update breaks program
Why they failed to run is important, but the most important thing is knowing they failed to run.
Is there a uniform way to monitor multiple jobs, and be alerted if they fail to run at their scheduled time, for any reason? I'm using Ubuntu, the scripts are primarily in Ruby.
Note:
I'm specifically looking for a framework or system that works across multiple servers, and that has alerting via email or text built in, and one that can survive limited disk-space. So the solution presented in
How can I setup a system to tell me if a cron job is NOT running fine? doesn't seem applicable.
It's still under active development but I would encourage you to take a look at https://github.com/jamesrwhite/minicron, I believe it meets all the requirements you specified and more!
Disclaimer: I'm the developer working on it.
Cronitor (https://cronitor.io) was a tool I built exactly for this purpose. It basically boils down to being a tracking beacon that uses http requests as the pings (similar to pushmon).
However, one of the needs that I had (and that pushmon and similar tools couldn't offer) was getting alerts if cron jobs started taking too long to run (or conversely if they started finishing too quickly). Cronitor solves this by allowing you to optionally trigger a begin event and an end event in order to keep track of duration.
Duration tracking was a must have for me because I had a cronjob that was scheduled every hour, but over time started taking over an hour to run. That was a disaster ;)
Will http://www.pushmon.com fill your needs? It's built primarily to let you know if a cron job or scheduled task has failed to run. You can put it on any of your servers and has email and text alerts. The idea is you "ping" PushMon when your job has run successfully, and PushMon will alert you if it didn't receive the ping.
Although it may not satisfy all your needs:
https://github.com/javan/whenever

Looking for suggestions on a background gem [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
I plan on running our web app on Heroku. I am looking for a gem to handle a few background jobs. i.e. sending emails, to call a few methods which submit files to an encoding service via an API, etc.
A few that have, so far, come to mind are resque and delayed_job. I hear good things about resque and it also seems to be the more popular gem in its category. Ryan Bates has done an excellent screen cast on delayed_job. However, I hear delayed_job has had a few problems. i.e. not very stable in some areas. So I hear.
Heroku offers Redis-to-Go. They have a free plan which offers 5mb. If I go with resque, is this 5mb plan enough to handle background jobs? I don't want to end up spending more just for background jobs.
Just concerned that if I went with resque, I would need another db just to run background jobs. If I was using Redis for something else, then perhaps it would be worth it. Is it worth having another db just to handle background jobs?
Should I consider alternative gems? If so which ones?
Both delayed_job and resque work fairly well. resque should scale better as the volume of background requests increases.
resque's use of redis should be limited to the task request. Large data objects that are needed by the background tasks should be stored somewhere other than the background worker queue. For example, the files being sent to a background worker to be encoded should be stored in AWS S3 or some other persistent store, not the redis queue used by resque.
When using delayed_job or resque, you will need to run background workers which cost money. You might want to look at an autoscaling solution for dynamically starting and stopping background workers as needed.
See http://s831.us/h3pKE6 as an example.
We've used delayed_job very intensively, sending hundreds of concurrent emails, and it's worked very well. flawlessly. Yes, it'll cost $36/mo for the worker. But a single worker gets a lot of jobs done... several fairly complex emails (lot of dbase lookups) sent per second.

Is Amazon SQS the right choice here? Rails performance issue [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I'm close to releasing a rails app with the common networking features (messaging, wall, etc.). I want to use some kind of background processing (most likely Bj) for off-loading tasks from the request/response cycle.
This would happen when users invite friends via email to join and for email notifications.
I'm not sure if I should just drop these invites and notifications in my Database, using a model and then just process it with a worker process every x minutes or if I should go for Amazon SQS, storing the messages and invites there and let my worker retrieve it from Amazon SQS for processing (sending the invites / notifications).
The Amazon approach would get load off my Database but I guess it is slower to retrieve messages from there.
What do you think?
Your title states that you have a Rails performance issue, but do you know this for certain? From the rest of your question it sounds like you're trying to anticipate a possible future performance issue. The only way to deal sensibly with performance issues is to get your application into the wild and profile it. Doing so will give you empirical data as to what the real performance issues are.
Given that Amazon SQS isn't free and the fact that using it will almost certainly add complexity to your application, I would migrate to it if and when database load becomes a problem. Don't try to second guess problems before they arise, because you'll find that you'll likely face different problems when your app goes live, some of which you probably haven't considered.
The main point is that you've already decided to use background processing, which is the correct decision, given that any sort of processing that isn't instantaneous doesn't belong within the Rails' request/response cycle, as it blocks that Rails process. You can always scale with Amazon later if you need to.
Is your app hosted on Amazon EC2 already? I probably wouldn't move an existing app over to AWS just so I can use SQS, but if you're already using Amazon's infrastructure, SQS ia a great choice. You could certainly set up your own messaging system (such as RabbitMQ), but by going with SQS that's one less thing you have to worry about.
There are a lot of options to add background processing to Rails apps, such as delayed_job or background_job, but my personal favorite is Workling. It gives you a nice abstraction layer that allows you to plug in different background runners without having to change the actual implementation of your jobs.
I maintain a Workling fork that adds an SQS client. There are some shortcomings (read the comments or my blog post for more details), but overall it worked well for us at my last startup.
I've also used SQS for a separate Ruby (non-Rails) project and generally found it reliable and fast enough. Like James pointed out above, you can read up to 10 messages at once, so you'll definitely want to do that (my Workling SQS client does this and buffers the messages locally).
I agree with John Topley that you don't want to over-complicate your application if you don't need to. That being said there are times when it is good to make this kind of decision early, do you anticipate high load from the beginning? Are you rolling this out to an existing user base or is it a public site that may or may not take off?
If you know you will need to handle a large amount of traffic from the beginning then this might be a good step. If you don't want to spend the money to use SQS take a look at some of the free queue solutions out there like RabbitMQ.
I currently push a couple million messages a month through SQS and it works pretty well. Make sure you plan for it being down or slow from time to time, so you would need to work in some retry facilities and exponential backoff. One of the nice things is that you can get 10 messages at a time which speeds up being able to work through the queue, you can use one request to get the 10 messages and process them 1 by 1.
Amazon SQS is a fine service, except where the following things become important:
Performance
Legal
Acknowledgments and Transactions
Messaging Idioms Message Properties
Security, Authenticity and Queue
Permissions
If any of these things are important you need to look at a real enterprise MQ service such as StormMQ, RabbitMQ, or even onlinemq.com.
I found this blog series interesting as it compares Amazon SQS to StormMQ without holding any punches back:
http://blog.stormmq.com/2011/01/06/apples-and-oranges-performance/
If you have issues with moving to EC2, You can use other services like onlinemq.com.

Resources