Long running tasks in Rails - ruby-on-rails

I have a controller that generates HTML, XML, and CSV reports. The queries used for these reports take over a minute to return their result.
What is the best approach to run these tasks in the background and then return the result to the user? I have looked into Backgroundrb. Is there anything more basic for my needs?

You could look at using DelayedJob to perform those queries for you, and have an additional table called "NotificationQueue". When a job is finished (with its resultset), store the resultset and the User ID of the person who made that query in the NotificationQueue table. then on every page load (and, if you like, every 15-20 seconds), poll that database and see if there are any completed queries.
DelayedJob is really great because you write your code as if it wasn't going to be a delayed job, and just change the code to do the following:
#Your method
Query.do_something(params)
#Change to
Query.send_later(:do_something, params)
We use it all the time at work, and it works great.

Related

Automatically delete all records after 30 days

I have a concept for a rails app that I want to make. I want a model that a user can create a record of with a boolean attribute. After 30 days/Month unless the record has true boolean attribute, the record will automatically delete itself.
In rails 5 you have access to "active_job"(http://guides.rubyonrails.org/active_job_basics.html)
There are two simple ways.
After creating the record, you could set this job to be executed after 30 days. This job checks if the record matches the specifications.
The other alternative is to create an alternative job, that runs everyday which queries the database for every record (of this specific model) that where created 30 days ago and destroy them if they do not match the specifications. (If thats on the database it should be easy as: MyModel.where(created_at: 30.days.ago, destroyable: true).destroy_all)
There are a couple of options for achieving this:
Whenever and run a script or rake task
Clockwork and run a script or rake task
Background jobs which in my opinion is the "rails way".
For the 1,2 you need to check everyday if a record is 30 days old and then delete it if there isn't a true boolean (which means, checking all the records or optimize the query everyday and check only the 30 days old records etc...). For the 3rd option, you can schedule on the record creation, a job to run after 30 days and do the check for each record independently. It depends on how you are processing the jobs, for example, if you use sidekiq you can use scheduled jobs or if you use resque check resque-scheduler.
Performing the deletion is straightforward: create a class method (e.g Record.prune) on the record class in question that performs the deletion based on a query e.g. Record.destroy_all(retain: false) where retain is the boolean attribute you mention. I'd recommend then defining a rake task in lib/tasks that invokes this method (e.g.)
namespace :record
task :prune => :environment
Record.prune
end
end
The scheduling is more difficult; a crontab entry is sufficient to provide the correct timing, but ensuring that an appropriate environment (e.g. one that's loaded rbenv/rvm and any appropriate environment variables) is more difficult. Ensuring that your deployment process produces binstubs is probably helpful here. From there the bin/rake record:prune ought to be enough. It's hard to provide a more in-depth answer without more knowledge of all the environments in which you hope to accomplish this task.
I want to mention a non Rails approach. It depends on your database. When you use mongodb you can utilize mongodb "Expire Data from Collections" feature. When you use mysql you can utilize mysql event scheduler. You can find a good example here What is the best way to delete old rows from MySQL on a rolling basis?

How to create unique delayed jobs

I have a method like this one
def abc
// some stuff here
end
handle_asynchronously :abc, queue: :xyz
I want to create a delayed job for this only if there isn't one already in the queue.
I really feel like this should have an easy solution
Thanks!
I know this post is old but it hasn't been replied.
Delayed jobs does not provide a way to identify jobs. https://github.com/collectiveidea/delayed_job/issues/192
My suggestion is that your job could check if it still has to run when it is executing, for example, comparing to a database value, etc. Inserting jobs in the table should be quick and you might lose that if you start checking for a certain job in the queue.
If you still want to look for duplicates when enqueuing, this might help you.
https://gist.github.com/landovsky/8c505ecab41eb38fa1c2cd23058a6ae3

Optimising export of DB using Rails

I have a RoR application which contains an API to manage applications, each of which contain recipes (and groups, ingredients, measurements).
Once the user has finished managing the recipes, they download a JSON file of the entire application. Because each application could have hundreds of recipes, the files can be large. It also means there is a lot of DB calls to get all the required data to export.
Now because of this, the request to download the application can take upwards of 30 seconds, sometimes more.
My current code looks something like this:
application.categories.each do |c|
c.recipes.each do |r|
r.groups.each do |r|
r.ingredients.each do |r|
Within each loop I'm storing the data in a HASH and then giving it to the user.
My question is: where do I go from here?
Is there a way to grab all the data I require from the DB in one query? From looking at the log, I can see it is running hundreds of queries.
If the above solution is still slow, is this something I should put into a background process, and then email the user a link (or similar)?
There are of course ways to grab more data at once. This is done with Rails includes or joins, depending on your needs. See this article for some detailed information.
The basic idea is that you can join between your tables so that each time new queries aren't generated. When you do application.categories, that's one query. For each of those categories, you'll do another query: c.recipes - this creates N+1 queries, where N is the number of categories you have. Rather, you can include them off the get go to create 1 or 2 queries (depending on what Rails does).
The basic syntax is easy:
Application.includes(:categories => :recipes).each do |application| ...
This generates 1 (or 2 - again, see article) query that grabs all applications, their categories, and each categories recipies all at once. You can tack on the groups and ingredients too.
As for putting the work in the background, my suggestion would be to just have a loading image, or get fancy by using a progress bar.
First of all I have to assume that the required has_many and belongs_to associations exist.
Generally you can do something like
c.recipes.includes(:groups)
or even
c.recipes.includes(:groups => :ingredients)
which will fetch recipes and groups (and ingredients) at once.
But since you have a quite big data set IMO it would be better if you limited that technique to the deepest levels.
The most usefull approach would be to use find_each and includes together.
(find_each fetches the items in batches in order to keep the memory usage low)
perhaps something like
application.categories.each do |c|
c.recipes.find_each do |r|
r.groups.includes(:ingredients).each do |r|
r.ingredients.each do |r|
...
end
end
end
end
Now even that can take quite a long time (for an http request) so you can consider using some async processing where the client will generate a request that is going to be processed by the server as a background job, and when that is ready, you can provide a download link (or send an email) to the client.
Resque is one possible solution for handling the async part.

Why would large job objects miss the queue?

In a rails project with delayed_job, and I am running into something strange.
I have an article Model, this model can be quite large, with many paragraphs of text in several fields.
If i do:
an_article.delay.do_something
the Delayed::Job that's created does not make it to the queue, it's never marked as failed or successful, and my logs don't acknowledge it's existance. However if i do
def self.proxy_method(article_id)
an_article = Article.find(article_id)
an_article.do_something
end
Article.proxy_method(an_article.id)
it works as intended.
Is there some unwritten rule about the size of job objects? Why would A not work, but B work?
One theory I have is that because I'm sort of close to my data cap for mongolab (430 / 496 mb) that the job never makes it to the db, but I have no log or error to really prove this.
NOTE: delayed_job using mongoid on heroku, rails 3.1
From what I have experienced, never push objects to the queue but rather use ids of objects. This is problematic and very hard to debug.
I have written a detailed post on this topic at:
https://stackoverflow.com/a/15672001/226255

How do I run delayed job inserts in the backgroud without affecting page load - Rails

I have an RoR application like posting answers to a question. If a user answers to a question, notification messages are sent to all the users, who watch-listed the question, who tracks the question and to the owner of the question. I am using delayed jobs for creating the notification messages. so, While creating answer, there are many inserts into delayed job table going on,which is slowing down the page load. It takes more time to redirect to the question show page after the answer is created.
Currently I am inserting into answers table using AJAX request. Is there any way to insert into delayed jobs table in background after the AJAX request completes?
As we have been trying to say in comments:
It sounds like you have something like:
User.all.each do |user|
user.delay.some_long_operation
end
This ends up inserting a lot of rows into delayed_jobs. What we are suggesting is to refactor that code into the delayed job itself, roughly:
def delayed_operation
User.all.each do |user|
user.some_long_operation
end
end
self.delay.delayed_operation
Obviously, you'll have to adapt that, and probably put the delayed_operation into a model library somewhere, maybe as a class method... but the point is to put the delay call outside the big query and loop.
I really advice doing this like that in a separate process. Why has the user to wait for those meta-actions? Stick to delivering a result page and only notifying your server something has to be done.
Create a separate model PostponedAction to build a list of 'to-do' actions. If you post an answer, add one PostponedAction to this database, with a parameter of the answer id. Then give the results back to the user.
Use a separate process (cron job), to read the PostponedAction items, and handle those. Mark them as 'handled' or delete on succesfull handling. This way, the user is not bugged by slow server processes.
Beside the email jobs you currently have, invent another type of job handling the creation of these jobs.
def email_all
User.all.each do |user|
user.delay.email_one()
end
end
def email_one
# do the emailing
end
self.delay.email_all()
This way the user action only triggers one insert before they see the response. You can also track individual jobs.

Resources