I am doing a rake task to sync data from rails app to solr
I use:
Job.all to get all the records and process them afterwards.But what if in Job has millions of records. As far as I know the all method will save all records to RAM, which definitely affects performance. Is there any other way to process all records without using all method?
Look at the find_in_batches, this will not load all the records at once.
By default it will load the 1000 rows and also helps to reduce memory consumption.
2.5.1 :013 > User.find_in_batches do |group, batch|
2.5.1 :014 > puts "Processing group ##{batch}"
2.5.1 :015?> end
User Load (7.5ms) SELECT "users".* FROM "users" ORDER BY "users"."id" ASC LIMIT $1 [["LIMIT", 1000]]
Processing group ##<User:0x00005596f8ef44c8>
User Load (5.9ms) SELECT "users".* FROM "users" WHERE "users"."id" > $1 ORDER BY "users"."id" ASC LIMIT $2 [["id", 1002], ["LIMIT", 1000]]
Processing group ##<User:0x00005596f908e900>
User Load (5.0ms) SELECT "users".* FROM "users" WHERE "users"."id" > $1 ORDER BY "users"."id" ASC LIMIT $2 [["id", 2002], ["LIMIT", 1000]]
Processing group ##<User:0x00007f36a4f1a428>
User Load (6.7ms) SELECT "users".* FROM "users" WHERE "users"."id" > $1 ORDER BY "users"."id" ASC LIMIT $2 [["id", 3004], ["LIMIT", 1000]]
Processing group ##<User:0x00007f36a4b82590>
Related
In my app I have overwritten current_user devise method a bit. The idea is that if certain cookie is present method check the organization by the id inside that cookie and returns owner of this organization instead of regular user:
def current_user
user = warden.authenticate(scope: :user)
return nil if user.nil?
if user.admin? && cookies.key?('mock_admin_login')
organization = Organization.includes(:creator).find(cookies.encrypted[:mock_admin_login])
return organization.creator
end
user
end
Everything works correct but when I take a look at my console I noticed that Organization query is performed multiple times:
CACHE Organization Load (0.5ms) SELECT "organizations".* FROM
"organizations" WHERE "organizations"."id" = $1 LIMIT $2 [["id", 9],
["LIMIT", 1]] ↳
app/controllers/concerns/current_methods_overwritten.rb:11:in
current_user' CACHE User Load (0.3ms) SELECT "users".* FROM "users" WHERE "users"."id" = $1 [["id", 10]] ↳ app/controllers/concerns/current_methods_overwritten.rb:11:in current_user' CACHE Organization Load (0.9ms) SELECT
"organizations".* FROM "organizations" WHERE "organizations"."id" = $1
LIMIT $2 [["id", 9], ["LIMIT", 1]] ↳
app/controllers/concerns/current_methods_overwritten.rb:11:in
current_user' CACHE User Load (0.4ms) SELECT "users".* FROM "users" WHERE "users"."id" = $1 [["id", 10]] ↳ app/controllers/concerns/current_methods_overwritten.rb:11:in current_user' CACHE Organization Load (0.7ms) SELECT
"organizations".* FROM "organizations" WHERE "organizations"."id" = $1
LIMIT $2 [["id", 9], ["LIMIT", 1]] ↳
app/controllers/concerns/current_methods_overwritten.rb:11:in
current_user' CACHE User Load (0.3ms) SELECT "users".* FROM "users" WHERE "users"."id" = $1 [["id", 10]] ↳ app/controllers/concerns/current_methods_overwritten.rb:11:in current_user' CACHE Organization Load (0.3ms) SELECT
"organizations".* FROM "organizations" WHERE "organizations"."id" = $1
LIMIT $2 [["id", 9], ["LIMIT", 1]] ↳
app/controllers/concerns/current_methods_overwritten.rb:11:in
current_user' CACHE User Load (0.8ms) SELECT "users".* FROM "users" WHERE "users"."id" = $1 [["id", 10]] ↳ app/controllers/concerns/current_methods_overwritten.rb:11:in current_user' CACHE Organization Load (0.4ms) SELECT
"organizations".* FROM "organizations" WHERE "organizations"."id" = $1
LIMIT $2 [["id", 9], ["LIMIT", 1]] ↳
app/controllers/concerns/current_methods_overwritten.rb:11:in
current_user' CACHE User Load (0.5ms) SELECT "users".* FROM "users" WHERE "users"."id" = $1 [["id", 10]] ↳ app/controllers/concerns/current_methods_overwritten.rb:11:in current_user' CACHE Organization Load (2.0ms) SELECT
"organizations".* FROM "organizations" WHERE "organizations"."id" = $1
LIMIT $2 [["id", 9], ["LIMIT", 1]] ↳
app/controllers/concerns/current_methods_overwritten.rb:11:in
current_user' CACHE User Load (4.2ms) SELECT "users".* FROM "users" WHERE "users"."id" = $1 [["id", 10]] ↳ app/controllers/concerns/current_methods_overwritten.rb:11:in current_user' CACHE Organization Load (0.4ms) SELECT
"organizations".* FROM "organizations" WHERE "organizations"."id" = $1
LIMIT $2 [["id", 9], ["LIMIT", 1]] ↳
app/controllers/concerns/current_methods_overwritten.rb:11:in
current_user' CACHE User Load (0.3ms) SELECT "users".* FROM "users" WHERE "users"."id" = $1 [["id", 10]] ↳ app/controllers/concerns/current_methods_overwritten.rb:11:in current_user' CACHE Organization Load (42.8ms) SELECT
"organizations".* FROM "organizations" WHERE "organizations"."id" = $1
LIMIT $2 [["id", 9], ["LIMIT", 1]] ↳
app/controllers/concerns/current_methods_overwritten.rb:11:in
current_user' CACHE User Load (0.9ms) SELECT "users".* FROM "users" WHERE "users"."id" = $1 [["id", 10]] ↳ app/controllers/concerns/current_methods_overwritten.rb:11:in current_user' CACHE Organization Load (4.5ms) SELECT
"organizations".* FROM "organizations" WHERE "organizations"."id" = $1
LIMIT $2 [["id", 9], ["LIMIT", 1]] ↳
app/controllers/concerns/current_methods_overwritten.rb:11:in
current_user' CACHE User Load (1.0ms) SELECT "users".* FROM "users" WHERE "users"."id" = $1 [["id", 10]] ↳ app/controllers/concerns/current_methods_overwritten.rb:11:in current_user'
Although It might seem like a not big deal but server spends additional 30-40ms to perform this action every time when current_user method is called. Why this query is called so many times instead of one and how can I fix it?
You need to memoize the result so that its not reevaluated every time you call current_user.
If you look at the helper that devise generates you can see that it does just that:
def current_#{mapping}
#current_#{mapping} ||= warden.authenticate(scope: :#{mapping})
end
If you want to fix your existing method you want to make sure to memoize the DB calls:
def current_user
#current_user ||= warden.authenticate(scope: :#{mapping})
if #current_user&.admin? && cookies.key?('mock_admin_login')
#current_org || = Organization.includes(:creator)
.find(cookies.encrypted[:mock_admin_login])
#current_user = #current_org.creator
end
#current_user
end
But you really should implement this as a custom Warden strategy instead.
In my Rails app I have the N+1 problem where I'm making extra call(s) to the database to get associated data over and over again, specially when logging impressions per model.
For example:
Started GET "/" for 127.0.0.1 at 2018-12-02 16:21:05 -0500
Processing by JobsController#index as HTML
Job Load (4.4ms) SELECT "jobs".* FROM "jobs" WHERE "jobs"."published_at" IS NOT NULL ORDER BY "jobs"."published_at" DESC LIMIT $1 OFFSET $2 [["LIMIT", 30], ["OFFSET", 0]]
↳ app/controllers/jobs_controller.rb:22
User Load (1.1ms) SELECT "users".* FROM "users" WHERE "users"."id" = $1 ORDER BY "users"."id" ASC LIMIT $2 [["id", 13], ["LIMIT", 1]]
↳ app/controllers/jobs_controller.rb:24
Impression Exists (1.3ms) SELECT 1 AS one FROM "impressions" WHERE "impressions"."impressionable_id" = $1 AND "impressions"."impressionable_type" = $2 AND "impressions"."session_hash" = $3 LIMIT $4 [["impressionable_id", 705], ["impressionable_type", "Job"], ["session_hash", "d80d52dd401011a626d600167140e49f"], ["LIMIT", 1]]
↳ app/controllers/jobs_controller.rb:24
Impression Exists (0.6ms) SELECT 1 AS one FROM "impressions" WHERE "impressions"."impressionable_id" = $1 AND "impressions"."impressionable_type" = $2 AND "impressions"."session_hash" = $3 LIMIT $4 [["impressionable_id", 704], ["impressionable_type", "Job"], ["session_hash", "d80d52dd401011a626d600167140e49f"], ["LIMIT", 1]]
↳ app/controllers/jobs_controller.rb:24
Impression Exists (0.4ms) SELECT 1 AS one FROM "impressions" WHERE "impressions"."impressionable_id" = $1 AND "impressions"."impressionable_type" = $2 AND "impressions"."session_hash" = $3 LIMIT $4 [["impressionable_id", 703], ["impressionable_type", "Job"], ["session_hash", "d80d52dd401011a626d600167140e49f"], ["LIMIT", 1]]
↳ app/controllers/jobs_controller.rb:24
I'm using the impressionist gem and using the impressionist method directly, taking all of the Jobs on the page and logging an impression. The problem is because I'm only recording unique impressions, for records that already have an impression, these additional calls are redundant. I tried to use #jobs.includes(:impressions).each to preload the associated data hoping Rails was smart enough to figure out which records existed, but Rails still outputs numerous Impression Exists queries.
#jobs.each{|job| impressionist(job,'', :unique => [:session_hash])}
I'm using the public_activity gem and in the output, I'm checking if the trackable owner is the same as the current user:
= a.owner == current_user ? 'You' : a.owner.name
did this activity
I get a bunch of cache calls in the log:
User Load (1.8ms) SELECT "users".* FROM "users" WHERE "users"."id" = $1 LIMIT $2 [["id", 1], ["LIMIT", 1]]
Rendered public_activity/post/_create.html.haml (1.4ms)
Rendered public_activity/_snippet.html.haml (11.4ms)
CACHE (0.0ms) SELECT "users".* FROM "users" WHERE "users"."id" = $1 LIMIT $2 [["id", 1], ["LIMIT", 1]]
Rendered public_activity/post/_create.html.haml (13.9ms)
Rendered public_activity/_snippet.html.haml (18.9ms)
CACHE (0.0ms) SELECT "users".* FROM "users" WHERE "users"."id" = $1 LIMIT $2 [["id", 1], ["LIMIT", 1]]
Rendered public_activity/comment/_comment.html.haml (0.9ms)
Rendered public_activity/_snippet.html.haml (12.1ms)
CACHE (0.0ms) SELECT "users".* FROM "users" WHERE "users"."id" = $1 LIMIT $2 [["id", 1], ["LIMIT", 1]]
Rendered public_activity/comment/_comment.html.haml (2.7ms)
Rendered public_activity/_snippet.html.haml (56.3ms)
CACHE (0.0ms) SELECT "users".* FROM "users" WHERE "users"."id" = $1 LIMIT $2 [["id", 1], ["LIMIT", 1]]
Rendered public_activity/comment/_comment.html.haml (0.6ms)
Rendered public_activity/_snippet.html.haml (4.5ms)
CACHE (0.0ms) SELECT "users".* FROM "users" WHERE "users"."id" = $1 LIMIT $2 [["id", 1], ["LIMIT", 1]]
Rendered public_activity/content/_comment.html.haml (2.1ms)
Rendered public_activity/_snippet.html.haml (9.5ms)
Is there any way to eager load the conditional?
#jverban is correct that you can compare the record IDs to avoid needless record loading. To answer your question about eager loading though, yes you can eager load using the includes method in the ActiveRecord query chain. For example:
Activity.includes(:owner).latest
That will tell Rails you intend to reference the owner relation and so they should be loaded as well.
I highly recommend adding the bullet gem to your project (only in development and test environments) to detect N+1 queries and warn you when you've got an N+1 query situation like this happening.
You shouldn't need to load the user record, just compare id attributes
= a.owner_id == current_user.id ? 'You' : a.owner.name
The cache calls will likely still happen if multiple activity owners are not the current user (to get the owner name).
Does Rails actually cache the query result? The documentation says that same query will be never executed twice on the same request:
1.7 SQL Caching
The second time the same query is run against the database, it's not actually going to hit the database. The first time the result is returned from the query it is stored in the query cache (in memory) and the second time it's pulled from memory.
I did an experiment to proof that Rails actually cache the query:
def test
data = ""
User.find(1).update(first_name: 'Suwir Suwirr')
data << User.find(1).first_name
data << "\n"
User.find(1).update(first_name: 'Pengguna')
data << User.find(1).first_name
data << "\n"
render plain: data
end
If the result is cached, i would get the same result for each User.find(1). However, the result was Rails does not actually cache the query; i was expecting the update does not reflected on the result since it was "cached":
Suwir Suwirr
Pengguna
But the console says that it was cached: (Please highlight the CACHE word)
Started GET "/diag/test" for 10.0.2.2 at 2017-02-21 10:30:16 +0700
Processing by DiagController#test as HTML
User Load (0.7ms) SELECT "users".* FROM "users" WHERE "users"."id" = $1 ORDER BY "users"."id" ASC LIMIT $2 [["id", 4], ["LIMIT", 1]]
User Load (0.2ms) SELECT "users".* FROM "users" WHERE "users"."id" = $1 LIMIT $2 [["id", 1], ["LIMIT", 1]]
(0.1ms) BEGIN
SQL (0.4ms) UPDATE "users" SET "first_name" = $1, "updated_at" = $2 WHERE "users"."id" = $3 [["first_name", "Suwir Suwirr"], ["updated_at", 2017-02-21 03:30:16 UTC], ["id", 1]]
(16.5ms) COMMIT
User Load (0.3ms) SELECT "users".* FROM "users" WHERE "users"."id" = $1 LIMIT $2 [["id", 1], ["LIMIT", 1]]
CACHE (0.0ms) SELECT "users".* FROM "users" WHERE "users"."id" = $1 LIMIT $2 [["id", 1], ["LIMIT", 1]]
(0.1ms) BEGIN
SQL (0.3ms) UPDATE "users" SET "first_name" = $1, "updated_at" = $2 WHERE "users"."id" = $3 [["first_name", "Pengguna"], ["updated_at", 2017-02-21 03:30:16 UTC], ["id", 1]]
(0.9ms) COMMIT
User Load (0.5ms) SELECT "users".* FROM "users" WHERE "users"."id" = $1 LIMIT $2 [["id", 1], ["LIMIT", 1]]
Rendering text template
Rendered text template (0.0ms)
Completed 200 OK in 380ms (Views: 3.5ms | ActiveRecord: 21.9ms)
So my question, does Rails actually cache the query result? Or, only several query result on some request?
Update: Using Batch #update_all
I made another experiment to "fool" the query logic. Now Rails does not "cache" the query. Why this behaviour can happen?
# Controller
def test
data = ""
User.where(id: 1).update_all(first_name: 'Suwir Suwirr')
data << User.find(1).first_name
data << "\n"
User.where(id: 1).update_all(first_name: 'Pengguna')
data << User.find(1).first_name
data << "\n"
logger.info 'hi'
render plain: data
end
# Console
Started GET "/diag/test" for 10.0.2.2 at 2017-02-21 10:45:43 +0700
Processing by DiagController#test as HTML
User Load (0.3ms) SELECT "users".* FROM "users" WHERE "users"."id" = $1 ORDER BY "users"."id" ASC LIMIT $2 [["id", 4], ["LIMIT", 1]]
SQL (13.8ms) UPDATE "users" SET "first_name" = 'Suwir Suwirr' WHERE "users"."id" = $1 [["id", 1]]
User Load (0.4ms) SELECT "users".* FROM "users" WHERE "users"."id" = $1 LIMIT $2 [["id", 1], ["LIMIT", 1]]
SQL (2.9ms) UPDATE "users" SET "first_name" = 'Pengguna' WHERE "users"."id" = $1 [["id", 1]]
User Load (0.3ms) SELECT "users".* FROM "users" WHERE "users"."id" = $1 LIMIT $2 [["id", 1], ["LIMIT", 1]]
hi
Rendering text template
Rendered text template (0.0ms)
Completed 200 OK in 28ms (Views: 0.8ms | ActiveRecord: 17.8ms)
# Browser result
Suwir Suwirr
Pengguna
I was stupid.
Yes, Rails does actually cache the query, but update and destroy will invalidate its query cache. update_all is basically iterating each record with update.
I tried the experiment by really "fooling" the ActiveRecord query mechanism. And yes, it works.
# Controller
def test
data = ""
ActiveRecord::Base.connection.execute('UPDATE "users" SET "first_name" = \'Suwir Suwirr\' WHERE "users"."id" = 1')
data << User.find(1).first_name
data << "\n"
ActiveRecord::Base.connection.execute('UPDATE "users" SET "first_name" = \'Pengguna\' WHERE "users"."id" = 1')
data << User.find(1).first_name
data << "\n"
render plain: data
end
# Browser
Suwir Suwirr
Suwir Suwirr
I have a view page that will randomly turn into a blank white page after I have visited it a few times. If I change something in the view, it turns back to normal temporarily, but then after a few more page visits, the page turns white again. Also, it only happens in Safari. Here is the controller action for the page:
class ProjectsController < ApplicationController
def show_current_projects_to_freelancer
if current_user.type == 'Student'
#projects = current_user.projects
#schedules = current_user.projects.collect {|project| project.schedule}
else
redirect_to applicants_path, notice: 'Only Freelancers have access to this page.'
end
end
end
There are two models: Schedule and Project. Schedule belongs_To Project and Project has_one schedule. The routes for schedule and Project are nested like this:
get 'projects/current', to: 'projects#show_current', as: :current_freelancer_projects
resources :projects do
resources :schedules
end
I've changed my view several times. This happens regardless of whether there is content in the view or no content. Here is what the view looks like now:
<div style="color: black;">
<h3>Current freelancer Projects</h3>
<table>
<tr>
<td>Project Name</td>
<td>Employer Name</td>
<td>Date of Bid</td>
<td>rating</td>
<td>Bid</td>
<td>Tags</td>
<td>Make Schedule</td>
</tr>
<% #projects.each do |project| %>
<tr>
<td><%= project.title %></td>
<td><%= project.employer.email %></td>
<td>date</td>
<td>rating</td>
<td>bid</td>
<td>tags</td>
<td><%= link_to 'Create Schedule', new_project_schedule_path(project.id, Schedule.new) %></td>
</tr>
<% end %>
</table>
</div>
I can't imagine what is causing this. I know it has to be independent from the view because no matter how i change the view it still happens. Does anyone have any ideas?
Here are the logs when the page does not show up. When the page does show up, its too long.
Started GET "/current" for 127.0.0.1 at 2013-11-22 17:08:18 -0500
Started GET "/current" for 127.0.0.1 at 2013-11-22 17:08:18 -0500
ActiveRecord::SchemaMigration Load (0.4ms) SELECT "schema_migrations".* FROM "schema_migrations"
ActiveRecord::SchemaMigration Load (0.4ms) SELECT "schema_migrations".* FROM "schema_migrations"
Processing by ProjectsController#show_current_projects_to_freelancer as HTML
Processing by ProjectsController#show_current_projects_to_freelancer as HTML
User Load (1.0ms) SELECT "users".* FROM "users" WHERE "users"."id" = 226 ORDER BY "users"."id" ASC LIMIT 1
User Load (1.0ms) SELECT "users".* FROM "users" WHERE "users"."id" = 226 ORDER BY "users"."id" ASC LIMIT 1
Project Load (3.3ms) SELECT "projects".* FROM "projects" WHERE "projects"."student_id" = $1 [["student_id", 226]]
Project Load (3.3ms) SELECT "projects".* FROM "projects" WHERE "projects"."student_id" = $1 [["student_id", 226]]
Employer Load (1.0ms) SELECT "users".* FROM "users" WHERE "users"."type" IN ('Employer') AND "users"."id" = $1 ORDER BY "users"."id" ASC LIMIT 1 [["id", 202]]
Employer Load (1.0ms) SELECT "users".* FROM "users" WHERE "users"."type" IN ('Employer') AND "users"."id" = $1 ORDER BY "users"."id" ASC LIMIT 1 [["id", 202]]
Employer Load (0.5ms) SELECT "users".* FROM "users" WHERE "users"."type" IN ('Employer') AND "users"."id" = $1 ORDER BY "users"."id" ASC LIMIT 1 [["id", 201]]
Employer Load (0.5ms) SELECT "users".* FROM "users" WHERE "users"."type" IN ('Employer') AND "users"."id" = $1 ORDER BY "users"."id" ASC LIMIT 1 [["id", 201]]
CACHE (0.0ms) SELECT "users".* FROM "users" WHERE "users"."type" IN ('Employer') AND "users"."id" = $1 ORDER BY "users"."id" ASC LIMIT 1 [["id", 201]]
CACHE (0.0ms) SELECT "users".* FROM "users" WHERE "users"."type" IN ('Employer') AND "users"."id" = $1 ORDER BY "users"."id" ASC LIMIT 1 [["id", 201]]
CACHE (0.0ms) SELECT "users".* FROM "users" WHERE "users"."type" IN ('Employer') AND "users"."id" = $1 ORDER BY "users"."id" ASC LIMIT 1 [["id", 201]]
CACHE (0.0ms) SELECT "users".* FROM "users" WHERE "users"."type" IN ('Employer') AND "users"."id" = $1 ORDER BY "users"."id" ASC LIMIT 1 [["id", 201]]
Rendered projects/show_current_projects_to_freelancer.html.erb within layouts/application (97.3ms)
Rendered projects/show_current_projects_to_freelancer.html.erb within layouts/application (97.3ms)
(0.9ms) SELECT COUNT(*) FROM "relationships" WHERE "relationships"."student_id" = $1 AND "relationships"."state" = 'active' [["student_id", 226]]
(0.9ms) SELECT COUNT(*) FROM "relationships" WHERE "relationships"."student_id" = $1 AND "relationships"."state" = 'active' [["student_id", 226]]
Profile Load (1.0ms) SELECT "profiles".* FROM "profiles" WHERE "profiles"."user_id" = $1 ORDER BY "profiles"."id" ASC LIMIT 1 [["user_id", 226]]
Profile Load (1.0ms) SELECT "profiles".* FROM "profiles" WHERE "profiles"."user_id" = $1 ORDER BY "profiles"."id" ASC LIMIT 1 [["user_id", 226]]
Rendered layouts/_ssi_header_inner.html.erb (69.1ms)
Rendered layouts/_ssi_header_inner.html.erb (69.1ms)
Rendered layouts/_ssi_footer.html.erb (0.3ms)
Rendered layouts/_ssi_footer.html.erb (0.3ms)
Completed 200 OK in 547ms (Views: 384.9ms | ActiveRecord: 17.2ms)
Completed 200 OK in 547ms (Views: 384.9ms | ActiveRecord: 17.2ms)
The problem was the cache. By disabling the cache, I was able to fix the problem.
Looks like a WebKit bug; it's happening with a lot of people even in iOS. https://bugs.webkit.org/show_bug.cgi?id=32829
Actually the linked Webkit bugreport is wrong, because it refers to a Server sending a 304 for an unconditional request, so this is actually bad behavior of the Server and no Webkit bug.
The Bug that shows up in Safari is that it sends a conditional request, the server responds correctly with a 304 and then Safari shows a white page, probably due to invalid Cache.
And I doubt this is a Webkit bug after all, since no other Browser is affected, at least what I managed to research so far I could only reproduce this in Safari.
I filed a radar on Apples Bugtracker (rdar://19074069), but if anyone can reproduce this with the Webkit browser, then this is more likely a Webkit bug, but I weren't able to do so.