ActiveRecord find records in intervals - ruby-on-rails

I want to execute some code to every user in my database. For this I created a script and run it with rails runner myscript.rb
User.all.each do |user|
# code on user
end
If I have 500 user in my database this works but if I have for example 4000 users I get some errors given some of the code inside the .each given the amount of users. (This numbers are fictional)
For this reason I want to execute this script in intervals of 500 users.
How can I execute every 500 users until there are no more users?

Use find_each:
Looping through a collection of records from the database (using the Scoping::Named::ClassMethods.all method, for example) is very inefficient since it will try to instantiate all the objects at once.
In that case, batch processing methods allow you to work with the records in batches, thereby greatly reducing memory consumption.
You want to say:
User.find_each do |user|
# code on user
end
That will load the users 1000 at a time and hand them one by one to your "code on user" block. You can specify different chunk sizes if you'd like.

You can do it under while do loop and the user object can be set to User.all.sample. By doing this, until there are no users left, your script will work then quit. But you should add a flag to your table.
Let's imagine that you are trying to send e-mail to all users, and you should have a field to use as flag similar to email_sent.
And now your user object is User.all.where(email_sent: true).sample
Another aproach may be using querying jobs by user. Create a loop thorugh your users and have the index of the loop. Set indexamount_of_time seconds to start the query. So in case you have 1000 users and you set index2 seconds of buffer in between users. It will take 2000 secs. to handle all jobs queried.
There are another solutions. If you can better explain what you are looking for and what practise you need; I can suggest you a better one.

Related

Need advise : how to handle huge data to summarise a report in php

I am looking for advice to handle following situation.
I have report which shows list of products; each product has a number of times it has been viewed and also the number of times the order has been requested.
Looking in to DB I feel its not good. There are three tables participating :
product
product_view
order_item
The following SELECT query is executed
select product_title,
(select count(views) from product_view pv where p.pid=pv.pid) as product_view ,
(select count(placed) from order_item o where p.pid=o.pid) as product_request_count
From product p
order by product_title
Limit 0,10
This query returns 10 records successfully; However, it is very time consuming to load. Also when the user uses the export functionality approximately 2,000,000 records would be returned however I get a memory exhaust error.
I am not able to find the most suitable solution for this in ZF2[PHP+MySql]
Can someone suggest some good strategy to deal?
How about using background processes? It doesn't have to be purely ZF2.
And once the background process is done, the system will notify to user via email that the export is done. :)
You can:
call set_time_limit(0) to inter the execution time limitation.
loop through the whole result set in lumps of, say, 1000 records, and output to the user the result sequentially.

How do I optimise getting and updating the id for 500000 records?

I have a CSV file that contains data like the
id of user, unit and size.
I want to update member_id for 500,000 products:
500000.times do |i|
user = User.find(id: tmp[i])
hash = {
unit: tmp[UNIT],
size: tmp[SIZE]
}
hash.merge!(user_id: user.id) if user.present?
Product.create(hash)
end
How do I optimize that procedure to not find each User object but maybe get an array of related hashes?
There's two things here that are massively holding back performance. First you're doing N User.find calls which is totally out of control. Secondly you're creating individual records instead of doing a mass-insert each of which runs inside its own tiny transaction block.
Generally these sorts of bulk operations are better done purely in the SQL domain. You can insert a very large number of rows at the same time, often only limited by the size of the query you can submit, and that parameter is usually adjustable.
While a gigantic query may lock or block your database for a period of time, it will be the fastest way to do your updates. If you need to keep your system running during mass inserts, you'll need to break it up into a series of smaller commits.
Remember that Product.connection is a more low-level access layer allowing you to manipulate the data directly with queries.

Mass updating many records in Rails with Resque?

If I had to update 50,000 users, how would I go about it in a way that is best with a background processing library and not a N+1 issue?
I have users, membership, and points.
Memberships are related to total point values. If the membership is modified with point values I have to run through all of the users to update their proper membership. This is what I need to queue so the server isn't hanging for 30+ minutes.
Right now I have in a controller action
def update_memberberships
User.find_each do |user|
user.update_membership_level! # looks for a Membership defined by x points and assigns it to user. Then Saves the user.
end
end
This is a very expensive operation. How would I optimize for processing and in background so the post is near instantaneous from the form?
You seem to be after how to get this done with Resque or delayed_job. I'll give an example with delayed_job.
To create the job, add a method to app/models/user.rb:
def self.process_x_update
User.where("z = 1").find_each(:batch_size => 5000) do |user|
user.x = user.y + 3
user.save
end
end
handle_asynchronously :process_x_update
This will update all User records where z = 1, setting user.x = user.y + 3. This will complete this in batches of 5,000, so that performance is a bit more linear.
This will cause User.process_x_update to complete very quickly. To actually process the job, you should be running rake jobs:work in the background or start a cluster of daemons with ./script/delayed_job start
One other thing: can you move this logic to one SQL statement? That way you could have one statement that's fast and atomic. You'd still want to do this in the background as it could take some time to process. You could do something like:
def process_x_update
User.where("z = 1").update_all("x = y + 3")
end
handle_asynchronously :process_x_update
You're looking for update_all. From the docs:
Updates all records with details given if they match a set of conditions supplied, limits and order can also be supplied.
It'll probably still take awhile on the SQL side, but you can at least do it with one statement. Check out the documentation to see usage examples.

Display a record sequentially with every refresh

I have a Rails 3 application that currently shows a single "random" record with every refresh, however, it repeats records too often, or will never show a particular record. I was wondering what a good way would be to loop through each record and display them such that all get shown before any are repeated. I was thinking somehow using cookies or session_ids to sequentially loop through the record id's, but I'm not sure if that would work right, or exactly how to go about that.
The database consists of a single table with a single column, and currently only about 25 entries, but more will be added. ID's are generated automatically and are sequential.
Some suggestions would be appreciated.
Thanks.
The funny thing about 'random' is that it doesn't usually feel random when you get the same answer twice in short succession.
The usual answer to this problem is to generate a queue of responses, and make sure when you add entries to the queue that they aren't already on the queue. This can either be a queue of entries that you will return to the user, or a queue of entries that you have already returned to the user. I like your idea of using the record ids, but with only 25 entries, that repeating loop will also be annoying. :)
You could keep track of the queue of previous entries in memcached if you've already got one deployed or you could stuff the queue into the session (it'll probably just be five or six integers, not too excessive data transfer) or the database.
I think I'd avoid the database, because it sure doesn't need to be persistent, it doesn't need to take database bandwidth or compute time, and using the database just to keep track of five or six integers seems silly. :)
UPDATE:
In one of your controllers (maybe ApplicationController), add something like this to a method that you run in a before_filter:
class ApplicationController < ActionController::Base
before_filter :find_quip
def find_quip:
last_quip_id = session[:quip_id] || Quips.find(:first).id
new_quip_id = Quips.find(last_quip.id + 1).id || Quips.find(:first)
session[:quip_id] = new_quip
end
end
I'm not so happy with the code to wrap around when you run out of quips; it'll completely screw up if there is ever a hole in the sequence. Which is probably going to happen someday. And it will start on number 2. But I'm getting too tired to sort it out. :)
If there are only going to be not too many like you say, you could store the entire array of IDs as a session variable, with another variable for the current index, and loop through them sequentially, incrementing the index.

Rails Random Active Record with Pagination

I need to find all records for a particular resource and display them in a random order, but with consistent pagination (you won't see the same record twice if you start paging). The display order should be randomized each time a user visits a page. I am using will_paginate. Any advice?
Store a random number in the user session cookies, then use that as seed for your database random function. This will be the same until the user closes their browser, and thus they will all see random, consistent records:
Get a large, random number:
cookies[:seed] = SecureRandom.random_number.to_s[2..20].to_i
Use this seed with e.g. MySQL:
SomeModel.all.order("RAND ?", cookies[:seed])
This is not standard to my knowledge. I can see a use for this for instance for online tests.
I would suggest using a list per session/user. So when a user first goes to the page, you determine a list of ID's, in a random order, and all consecutive views you will use this list to show the correct order for that user/session.
I hope that the amount of rows is limited, and then this would make sense, for instance for tests. Also, when a user would leave a test before finishing it completely, she could continue where he left off. But maybe that is not relevant for you.
Hope this helps.
If you're using a database such as MySQL that has a randomize function such as RAND(), you can just add that to your pagination query like so:
Resource.paginate( ... :order => "RAND()" ... )
Check out some of the comments here regarding performance concerns: https://rails.lighthouseapp.com/projects/8994/tickets/1274-patch-add-support-for-order-random-in-queries
Not sure if you still need help with this. One solution I've done in the past is to do the query with RAND but without pagination at first. Then store those record ID's and use that stored list to lookup and paginate from there. The initial RAND query could be set to only run when the page is 1 or nil. Just a thought.
I ended-up with this solution that worked for me on Postgres:
session[:seed] ||= rand() # should be between [-1..1]
seed = session[:seed]
Product.select("setseed(#{seed})").first
Product.order('random()').limit(10).offset(params[:offset])

Resources