Ruby where with find_each - ruby-on-rails

I am looking at the official Rails documentation which shows how to use the "find_each" method. Here is an example they gave
Person.where("age > 21").find_each do |person|
person.party_all_night!
end
This processes 1000 records at a time. However, I am still confused. How does this translate to SQL? What happens behind the scenes that allows Ruby to only process 1000 records at a time?
The reason I am sort of confused is because it seems Person.where("age > 21") would execute first, which would return ALL results.
For instance:
Person.where("age > 21").limit(10)
would return all persons in memory first, then give you the first 10, right?

Person.where("age > 21") returns an ActiveRecord relation only. It doesn't return all the results.
Person.where("age > 21").limit(10) does NOT load all the models in memory, that would be awful and unusable. It just loads 10.
find_each doesn't really process 1000 records at a times. It loads 1000 records, and then process each one of them.

I'd suggest running this from the console and looking at the SQL or reading the source code.
For example:
User.find_each(:batch_size => 40) do |user| end
User Load (1.0ms) SELECT "users".* FROM "users" WHERE ("users"."id" >= 0) ORDER BY "users"."id" ASC LIMIT 40
User Load (0.8ms) SELECT "users".* FROM "users" WHERE ("users"."id" > 96) ORDER BY "users"."id" ASC LIMIT 40
User Load (0.8ms) SELECT "users".* FROM "users" WHERE ("users"."id" > 156) ORDER BY "users"."id" ASC LIMIT 40
User Load (0.8ms) SELECT "users".* FROM "users" WHERE ("users"."id" > 219) ORDER BY "users"."id" ASC LIMIT 40
User Load (0.8ms) SELECT "users".* FROM "users" WHERE ("users"."id" > 272) ORDER BY "users"."id" ASC LIMIT 40
User Load (0.8ms) SELECT "users".* FROM "users" WHERE ("users"."id" > 314) ORDER BY "users"."id" ASC LIMIT 40
User Load (0.8ms) SELECT "users".* FROM "users" WHERE ("users"."id" > 355) ORDER BY "users"."id" ASC LIMIT 40
Or
bundle show activerecord
point your favorite code editor at that location and find the source

There is a cute lovely feature of Ruby, called codeblocks. What makes it really great, that every method is assuming to 〈silently〉 receive a codeblock as the last parameter. There is a possibility to dynamically check if the codeblock was given with if block_given?.
I guess you wonder why Ruby returns data with where alone and just prepares it with where.whatever chain? Well, ActiveRecord implicitly checks, whether the codeblock was given and either executes the underlying SQL statement and iterates through result or returns an iterator with prepared but not yet executed SQL statement. The latter will be lazy executed and cached on demand. The same practice is used in, say, Array.each. Behind the scene something like that is being performed:
sql_prepare
if block_given?
#cache = sql_execute_and_cache
#cache.each { yield #cache }
end
Hope it helps.

Related

Alternative 'all' method when retrieving all records

I am doing a rake task to sync data from rails app to solr
I use:
Job.all to get all the records and process them afterwards.But what if in Job has millions of records. As far as I know the all method will save all records to RAM, which definitely affects performance. Is there any other way to process all records without using all method?
Look at the find_in_batches, this will not load all the records at once.
By default it will load the 1000 rows and also helps to reduce memory consumption.
2.5.1 :013 > User.find_in_batches do |group, batch|
2.5.1 :014 > puts "Processing group ##{batch}"
2.5.1 :015?> end
User Load (7.5ms) SELECT "users".* FROM "users" ORDER BY "users"."id" ASC LIMIT $1 [["LIMIT", 1000]]
Processing group ##<User:0x00005596f8ef44c8>
User Load (5.9ms) SELECT "users".* FROM "users" WHERE "users"."id" > $1 ORDER BY "users"."id" ASC LIMIT $2 [["id", 1002], ["LIMIT", 1000]]
Processing group ##<User:0x00005596f908e900>
User Load (5.0ms) SELECT "users".* FROM "users" WHERE "users"."id" > $1 ORDER BY "users"."id" ASC LIMIT $2 [["id", 2002], ["LIMIT", 1000]]
Processing group ##<User:0x00007f36a4f1a428>
User Load (6.7ms) SELECT "users".* FROM "users" WHERE "users"."id" > $1 ORDER BY "users"."id" ASC LIMIT $2 [["id", 3004], ["LIMIT", 1000]]
Processing group ##<User:0x00007f36a4b82590>

Handling a massive query in Rails

What's the best way to handle a large result set with Rails and Postgres? I didn't have a problem until today, but now I'm trying to return a 124,000 record object of #network_hosts, which has effectively DoS'd my development server.
My activerecord orm isn't the prettiest, but I'm pretty sure cleaning it up isn't going to help in relation to performance.
#network_hosts = []
#host_count = 0
#company.locations.each do |l|
if l.grace_enabled == nil || l.grace_enabled == false
l.network_hosts.each do |h|
#host_count += 1
#network_hosts.push(h)
#network_hosts.sort! { |x,y| x.ip_address <=> y.ip_address }
#network_hosts = #network_hosts.first(5)
end
end
end
In the end, I need to be able to return #network_hosts to the controller for processing into the view.
Is this something that Sidekiq would be able to help with, or is it going to be just as long? If Sidekiq is the path to take, how do I handle not having the #network_hosts object upon page load since the job is running asyncronously?
I believe you want to (1) get rid of all that looping (you've got a lot of queries going on) and (2) do your sorting with your AR query instead of in the array.
Perhaps something like:
NetworkHost.
where(location: Location.where.not(grace_enabed: true).where(company: #company)).
order(ip_address: :asc).
tap do |network_hosts|
#network_hosts = network_hosts.limit(5)
#host_count = network_hosts.count
end
Something like that ought to do it in a single DB query.
I had to make some assumptions about how your associations are set up and that you're looking for locations where grace_enabled isn't true (nil or false).
I haven't tested this, so it may well be buggy. But, I think the direction is correct.
Something to remember, Rails won't execute any SQL queries until the result of the query is actually needed. (I'll be using User instead of NetworkHost so I can show you the console output as I go)
#users = User.where(first_name: 'Random');nil # No query run
=> nil
#users # query is now run because the results are needed (they are being output to the IRB window)
# User Load (0.4ms) SELECT "users".* FROM "users" WHERE "users"."first_name" = $1 LIMIT $2 [["first_name", "Random"], ["LIMIT", 11]]
# => #<ActiveRecord::Relation [...]>
#users = User.where(first_name: 'Random') # query will be run because the results are needed for the output into the IRB window
# User Load (0.4ms) SELECT "users".* FROM "users" WHERE "users"."first_name" = $1 LIMIT $2 [["first_name", "Random"], ["LIMIT", 11]]
# => #<ActiveRecord::Relation [...]>
Why is this important? It allows you to store the query you want to run in the instance variable and not execute it until you get to a view where you can use some of the nice methods of ActiveRecord::Batches. In particular, if you have some view (or export function, etc.) where you are iterating the #network_hosts, you can use find_each.
# Controller
#users = User.where(first_name: 'Random') # No query run
# view
#users.find_each(batch_size: 1) do |user|
puts "User's ID is #{user.id}"
end
# User Load (0.5ms) SELECT "users".* FROM "users" WHERE "users"."first_name" = $1 ORDER BY "users"."id" ASC LIMIT $2 [["first_name", "Random"], ["LIMIT", 1]]
# User's ID is 1
# User Load (0.4ms) SELECT "users".* FROM "users" WHERE "users"."first_name" = $1 AND ("users"."id" > 1) ORDER BY "users"."id" ASC LIMIT $2 [["first_name", "Random"], ["LIMIT", 1]]
# User's ID is 2
# User Load (0.3ms) SELECT "users".* FROM "users" WHERE "users"."first_name" = $1 AND ("users"."id" > 2) ORDER BY "users"."id" ASC LIMIT $2 [["first_name", "Random"], ["LIMIT", 1]]
# => nil
Your query is not executed until the view, where it will now load only 1,000 records (configurable) into memory at a time. Once it reaches the end of those 1,000 records, it will automatically run another query to fetch the next 1,000 records. So your memory is much more sane, at the cost of extra database queries (which are usually pretty quick)

Updating a Heroku record with rails console

I realize this may not be "best practice" but this is sort of a temporary fix/experiment. I'm trying to update a record on Heroku using rails console but whenever I save it just rolls back.
UserAdmin = User.find_by(email: "User#example.com")
UserAdmin.admin = true
UserAdmin.save
The result is:
(0.6ms) BEGIN
(0.6ms) BEGIN
User Exists (0.6ms) SELECT 1 AS one FROM "users" WHERE (LOWER("users"."email") = LOWER('User#example.com') AND "users"."id" != 3) LIMIT 1
User Exists (0.6ms) SELECT 1 AS one FROM "users" WHERE (LOWER("users"."email") = LOWER('User#example.com') AND "users"."id" != 3) LIMIT 1
User Exists (0.5ms) SELECT 1 AS one FROM "users" WHERE (LOWER("users"."user_name") = LOWER('example') AND "users"."id" != 3) LIMIT 1
User Exists (0.5ms) SELECT 1 AS one FROM "users" WHERE (LOWER("users"."user_name") = LOWER('example') AND "users"."id" != 3) LIMIT 1
(0.4ms) ROLLBACK
(0.4ms) ROLLBACK
Am I going about this wrong? Is there any particular reason the record is not saving on Heroku?
What do you get if you do:
user_admin = User.find_by(email: "User#example.com")
user_admin.admin = true
user_admin.valid?
user_admin.errors.full_messages
(I realize this is not an answer yet, but posted it this way because it would be too messy as a comment.)

Rails - view turning complete white after refreshed or visited several times

I have a view page that will randomly turn into a blank white page after I have visited it a few times. If I change something in the view, it turns back to normal temporarily, but then after a few more page visits, the page turns white again. Also, it only happens in Safari. Here is the controller action for the page:
class ProjectsController < ApplicationController
def show_current_projects_to_freelancer
if current_user.type == 'Student'
#projects = current_user.projects
#schedules = current_user.projects.collect {|project| project.schedule}
else
redirect_to applicants_path, notice: 'Only Freelancers have access to this page.'
end
end
end
There are two models: Schedule and Project. Schedule belongs_To Project and Project has_one schedule. The routes for schedule and Project are nested like this:
get 'projects/current', to: 'projects#show_current', as: :current_freelancer_projects
resources :projects do
resources :schedules
end
I've changed my view several times. This happens regardless of whether there is content in the view or no content. Here is what the view looks like now:
<div style="color: black;">
<h3>Current freelancer Projects</h3>
<table>
<tr>
<td>Project Name</td>
<td>Employer Name</td>
<td>Date of Bid</td>
<td>rating</td>
<td>Bid</td>
<td>Tags</td>
<td>Make Schedule</td>
</tr>
<% #projects.each do |project| %>
<tr>
<td><%= project.title %></td>
<td><%= project.employer.email %></td>
<td>date</td>
<td>rating</td>
<td>bid</td>
<td>tags</td>
<td><%= link_to 'Create Schedule', new_project_schedule_path(project.id, Schedule.new) %></td>
</tr>
<% end %>
</table>
</div>
I can't imagine what is causing this. I know it has to be independent from the view because no matter how i change the view it still happens. Does anyone have any ideas?
Here are the logs when the page does not show up. When the page does show up, its too long.
Started GET "/current" for 127.0.0.1 at 2013-11-22 17:08:18 -0500
Started GET "/current" for 127.0.0.1 at 2013-11-22 17:08:18 -0500
ActiveRecord::SchemaMigration Load (0.4ms) SELECT "schema_migrations".* FROM "schema_migrations"
ActiveRecord::SchemaMigration Load (0.4ms) SELECT "schema_migrations".* FROM "schema_migrations"
Processing by ProjectsController#show_current_projects_to_freelancer as HTML
Processing by ProjectsController#show_current_projects_to_freelancer as HTML
User Load (1.0ms) SELECT "users".* FROM "users" WHERE "users"."id" = 226 ORDER BY "users"."id" ASC LIMIT 1
User Load (1.0ms) SELECT "users".* FROM "users" WHERE "users"."id" = 226 ORDER BY "users"."id" ASC LIMIT 1
Project Load (3.3ms) SELECT "projects".* FROM "projects" WHERE "projects"."student_id" = $1 [["student_id", 226]]
Project Load (3.3ms) SELECT "projects".* FROM "projects" WHERE "projects"."student_id" = $1 [["student_id", 226]]
Employer Load (1.0ms) SELECT "users".* FROM "users" WHERE "users"."type" IN ('Employer') AND "users"."id" = $1 ORDER BY "users"."id" ASC LIMIT 1 [["id", 202]]
Employer Load (1.0ms) SELECT "users".* FROM "users" WHERE "users"."type" IN ('Employer') AND "users"."id" = $1 ORDER BY "users"."id" ASC LIMIT 1 [["id", 202]]
Employer Load (0.5ms) SELECT "users".* FROM "users" WHERE "users"."type" IN ('Employer') AND "users"."id" = $1 ORDER BY "users"."id" ASC LIMIT 1 [["id", 201]]
Employer Load (0.5ms) SELECT "users".* FROM "users" WHERE "users"."type" IN ('Employer') AND "users"."id" = $1 ORDER BY "users"."id" ASC LIMIT 1 [["id", 201]]
CACHE (0.0ms) SELECT "users".* FROM "users" WHERE "users"."type" IN ('Employer') AND "users"."id" = $1 ORDER BY "users"."id" ASC LIMIT 1 [["id", 201]]
CACHE (0.0ms) SELECT "users".* FROM "users" WHERE "users"."type" IN ('Employer') AND "users"."id" = $1 ORDER BY "users"."id" ASC LIMIT 1 [["id", 201]]
CACHE (0.0ms) SELECT "users".* FROM "users" WHERE "users"."type" IN ('Employer') AND "users"."id" = $1 ORDER BY "users"."id" ASC LIMIT 1 [["id", 201]]
CACHE (0.0ms) SELECT "users".* FROM "users" WHERE "users"."type" IN ('Employer') AND "users"."id" = $1 ORDER BY "users"."id" ASC LIMIT 1 [["id", 201]]
Rendered projects/show_current_projects_to_freelancer.html.erb within layouts/application (97.3ms)
Rendered projects/show_current_projects_to_freelancer.html.erb within layouts/application (97.3ms)
(0.9ms) SELECT COUNT(*) FROM "relationships" WHERE "relationships"."student_id" = $1 AND "relationships"."state" = 'active' [["student_id", 226]]
(0.9ms) SELECT COUNT(*) FROM "relationships" WHERE "relationships"."student_id" = $1 AND "relationships"."state" = 'active' [["student_id", 226]]
Profile Load (1.0ms) SELECT "profiles".* FROM "profiles" WHERE "profiles"."user_id" = $1 ORDER BY "profiles"."id" ASC LIMIT 1 [["user_id", 226]]
Profile Load (1.0ms) SELECT "profiles".* FROM "profiles" WHERE "profiles"."user_id" = $1 ORDER BY "profiles"."id" ASC LIMIT 1 [["user_id", 226]]
Rendered layouts/_ssi_header_inner.html.erb (69.1ms)
Rendered layouts/_ssi_header_inner.html.erb (69.1ms)
Rendered layouts/_ssi_footer.html.erb (0.3ms)
Rendered layouts/_ssi_footer.html.erb (0.3ms)
Completed 200 OK in 547ms (Views: 384.9ms | ActiveRecord: 17.2ms)
Completed 200 OK in 547ms (Views: 384.9ms | ActiveRecord: 17.2ms)
The problem was the cache. By disabling the cache, I was able to fix the problem.
Looks like a WebKit bug; it's happening with a lot of people even in iOS. https://bugs.webkit.org/show_bug.cgi?id=32829
Actually the linked Webkit bugreport is wrong, because it refers to a Server sending a 304 for an unconditional request, so this is actually bad behavior of the Server and no Webkit bug.
The Bug that shows up in Safari is that it sends a conditional request, the server responds correctly with a 304 and then Safari shows a white page, probably due to invalid Cache.
And I doubt this is a Webkit bug after all, since no other Browser is affected, at least what I managed to research so far I could only reproduce this in Safari.
I filed a radar on Apples Bugtracker (rdar://19074069), but if anyone can reproduce this with the Webkit browser, then this is more likely a Webkit bug, but I weren't able to do so.

Rails 3 Eager Loading with conditions - how to access eager loaded data?

I have a lot of cases in my app where a user has no more than one object (say, a "Description") within its association to another object (a "Group").
For example:
class User < ActiveRecord::Base
has_many :descriptions
has_many :groups
class Group < ActiveRecord::Base
has_many :users
has_many :descriptions
class Description < ActiveRecord::Base
belongs_to :user
belongs_to :group
If I wanted to render all the users in certain group and include their relevant descriptions, I could do something this:
#users model
def description_for(group_id)
descriptions.find_by_group_id(group_id)
end
#view
#group.users.each do |user|
user.name
user.description_for(#group.id).content
But this generates a huge number of Description queries. I've tried using joins:
#controller
#group = Group.find(params[:id], :joins => [{:users => :descriptions}], :conditions => ["descriptions.group_id = ?", params[:id]])
But since I'm still calling user.description_for(#group.id) it doesn't help with the page loading.
UPDATE: Sample generated SQL
Rendered users/_title.html.haml (1.6ms)
CACHE (0.0ms) SELECT "users".* FROM "users" WHERE "users"."id" = 37 LIMIT 1
User Load (0.2ms) SELECT "users".* FROM "users" WHERE "users"."id" = 7 LIMIT 1
CACHE (0.0ms) SELECT "groups".* FROM "groups" WHERE "groups"."id" = 28 LIMIT 1
Description Load (0.1ms) SELECT "descriptions".* FROM "descriptions" WHERE "descriptions"."target_id" = 7 AND "descriptions"."group_id" = 28 LIMIT 1
CACHE (0.0ms) SELECT "descriptions".* FROM "descriptions" WHERE "descriptions"."target_id" = 7 AND "descriptions"."group_id" = 28 LIMIT 1
Rendered users/_title.html.haml (1.7ms)
CACHE (0.0ms) SELECT "users".* FROM "users" WHERE "users"."id" = 37 LIMIT 1
User Load (0.3ms) SELECT "users".* FROM "users" WHERE "users"."id" = 51 LIMIT 1
CACHE (0.0ms) SELECT "groups".* FROM "groups" WHERE "groups"."id" = 28 LIMIT 1
Description Load (0.1ms) SELECT "descriptions".* FROM "descriptions" WHERE "descriptions"."target_id" = 51 AND "descriptions"."group_id" = 28 LIMIT 1
CACHE (0.0ms) SELECT "descriptions".* FROM "descriptions" WHERE "descriptions"."target_id" = 51 AND "descriptions"."group_id" = 28 LIMIT 1
Rendered users/_title.html.haml (1.8ms)
CACHE (0.0ms) SELECT "users".* FROM "users" WHERE "users"."id" = 37 LIMIT 1
User Load (0.2ms) SELECT "users".* FROM "users" WHERE "users"."id" = 5 LIMIT 1
CACHE (0.0ms) SELECT "groups".* FROM "groups" WHERE "groups"."id" = 28 LIMIT 1
Description Load (0.1ms) SELECT "descriptions".* FROM "descriptions" WHERE "descriptions"."target_id" = 5 AND "descriptions"."group_id" = 28 LIMIT 1
CACHE (0.0ms) SELECT "descriptions".* FROM "descriptions" WHERE "descriptions"."target_id" = 5 AND "descriptions"."group_id" = 28 LIMIT 1
Rendered users/_title.html.haml (1.7ms)
CACHE (0.0ms) SELECT "users".* FROM "users" WHERE "users"."id" = 37 LIMIT 1
User Load (0.2ms) SELECT "users".* FROM "users" WHERE "users"."id" = 52 LIMIT 1
CACHE (0.0ms) SELECT "groups".* FROM "groups" WHERE "groups"."id" = 28 LIMIT 1
Description Load (0.2ms) SELECT "descriptions".* FROM "descriptions" WHERE "descriptions"."target_id" = 52 AND "descriptions"."group_id" = 28 LIMIT 1
CACHE (0.0ms) SELECT "descriptions".* FROM "descriptions" WHERE "descriptions"."target_id" = 52 AND "descriptions"."group_id" = 28 LIMIT 1
Rendered users/_title.html.haml (1.7ms)
Right, I think that actually you don't need the joins clause in rails 3. If you use include and where, Arel will do the hard work for you.
I've tested this (albeit using a different set of models (and attributes) than yours) using models with the same underlying arrangement of associations, and I think this should work:
in models/user.rb:
scope :with_group_and descriptions, lambda { |group_id| includes(:groups, :descriptions).where(:groups => { :id => group_id }, :descriptions => { :group_id => group_id }) }
Then in your controller you call:
#users = User.with_group_and_descriptions(params[:id])
Finally in the view you can then do:
#users.each do |user|
user.name
user.descriptions.each do |desc|
desc.content
# or
#users.each do |user|
user.name
user.descriptions[0].content
If I've gotten my thinking right then this should only make 2 db calls. One to get a list of user_ids and the second to get the user, group and description data, and even though you're calling a user object's descriptions method, which would ordinarily have all the descriptions in (not just the ones for a particular group), because you've already populated the association rails won't go off an grab all the associations again when you call user.descriptions, instead it'll just list the ones you've pulled from the DB using the descriptions.group_id where clause. Calling user.descriptions(true) however will force a reload of the descriptions leading to it returning an array of all the description associations for a user.
Take a look at include--it specifies an association that should be eager-loaded.
Railscasts #181: Include vs Joins (or the ASCIIcast version)
Ruby on Rails Guides - see section 4.1.2.7

Resources