Performance: minimize database hitting - ruby-on-rails

I am using Ruby on Rails 3.0.7 and I am trying to minimize database hitting. In order to do that I retrieve from the database all Article objects related to a User and then perform a search on those retrieved objects.
What I do is:
stored_objects = Article.where(:user_id => <id>) # => ActiveRecord::Relation
<some_iterative_function_1>.each { |...|
stored_object = stored_objects.where(:status => 'published').limit(1)
...
# perform operation on the current 'stored_object' considered
}
<some_iterative_function_2>.each { |...|
stored_object = stored_objects.where(:visibility => 'public').limit(1)
...
# perform operation on the current 'stored_object' considered
}
<some_iterative_function_n>.each { |...|
...
}
The stored_object = stored_objects.where(:status => 'published') code will really avoid to hitting the database (I ask this because in my log file it seams still run a database query for each iteration)? If no, how can I minimize database hitting?
P.S.: in few words, what I would like to do is to work on the ActiveRecord::Relation (an array of ) but the where method called on it seams to hit the database.

Rails has functionality to grab chunks of the database at one time, then iterate over the rows without having to hit the database again.
See "Retrieving Multiple Objects in Batches" for more information about find_each and find_in_batches.

Once you start iterating over stored_objects (if that's what you're doing), they'll be loaded from the database. If you want to load only the users's published articles, you could do this:
stored_objects = Article.where(:user_id => id, :status => 'published')
If you instead want to load published and unpublished articles and do something different with the published ones, you could do this:
stored_objects = Article.where(:user_id => id)
stored_objects.find_all { |a| a.status == 'published' }. each do |a|
# ... do something with a published article
end
Or perhaps:
Article.where(:user_id => id).each do |article|
case article.status
when 'published'
# ... do something with a published article
else
# ... do something with an article that's not published
end
end
Each of these examples performs only one database query. Choosing which one depends on which data you really want to work with.

Related

Updating Lots of Records at Once in Rails

I've got a background job that I run about 5,000 of them every 10 minutes. Each job makes a request to an external API and then either adds new or updates existing records in my database. Each API request returns around 100 items, so every 10 minutes I am making 50,000 CREATE or UPDATE sql queries.
The way I handle this now is, each API item returned has a unique ID. I search my database for a post that has this id, and if it exists, it updates the model. If it doesn't exist, it creates a new one.
Imagine the api response looks like this:
[
{
external_id: '123',
text: 'blah blah',
count: 450
},
{
external_id: 'abc',
text: 'something else',
count: 393
}
]
which is set to the variable collection
Then I run this code in my parent model:
class ParentModel < ApplicationRecord
def update
collection.each do |attrs|
child = ChildModel.find_or_initialize_by(external_id: attrs[:external_id], parent_model_id: self.id)
child.assign_attributes attrs
child.save if child.changed?
end
end
end
Each of these individual calls is extremely quick, but when I am doing 50,000 in a short period of time it really adds up and can slow things down.
I'm wondering if there's a more efficient way I can handle this, I was thinking of doing something instead like:
class ParentModel < ApplicationRecord
def update
eager_loaded_children = ChildModel.where(parent_model_id: self.id).limit(100)
collection.each do |attrs|
cached_child = eager_loaded_children.select {|child| child.external_id == attrs[:external_id] }.first
if cached_child
cached_child.update_attributes attrs
else
ChildModel.create attrs
end
end
end
end
Essentially I would be saving the lookups and instead doing a bigger query up front (this is also quite fast) but making a tradeoff in memory. But this doesn't seem like it would be that much time, maybe slightly speeding up the lookup part, but I'd still have to do 100 updates and creates.
Is there some kind of way I can do batch updates that I'm not thinking of? Anything else obvious that could make this go faster, or reduce the amount of queries I am doing?
You can do something like this:
collection2 = collection.map { |c| [c[:external_id], c.except(:external_id)]}.to_h
def update
ChildModel.where(external_id: collection2.keys).each |cm| do
ext_id = cm.external_id
cm.assign_attributes collection2[ext_id]
cm.save if cm.changed?
collection2.delete(ext_id)
end
if collection2.present?
new_ids = collection2.keys
new = collection.select { |c| new_ids.include? c[:external_id] }
ChildModel.create(new)
end
end
Better because
fetches all required records all at once
creates all new records at once
You can use update_columns if you don't need callbacks/validations
Only drawback, more ruby code manipulation which I think is a good tradeoff for db queries..

Rails Eager loading has_many associations for an existing object

I am fairly new to rails & I am having this performance issue that I would appreciate any help with.
I have a User model & each user has_many UserScores associated. I am preparing a dashboard showing different user stats including counts of user_scores based on certain conditions. Here is a snippet of the code:
def dashboard
#users = Array.new
users = User.order('created_at ASC')
users.each do |u|
user = {}
user[:id] = u.id
user[:name] = u.nickname
user[:email] = u.email
user[:matches] = u.user_scores.count
user[:jokers_used] = u.user_scores.where(:joker => true).length
user[:jokers] = u.joker
user[:bonus] = u.user_scores.where(:bonus => 1).length
user[:joined] = u.created_at.strftime("%y/%m/%d")
if user[:matches] > 0
user[:last_activity] = u.user_scores.order('updated_at DESC').first.updated_at.strftime("%y/%m/%d")
else
user[:last_activity] = u.updated_at.strftime("%y/%m/%d")
end
#users << user
end
#user_count = #users.count
end
The issue I am seeing is repeated UserScore db queries for each user to get the different counts.
Is there a way to avoid those multiple queries??
N.B. I'm not sure if my approach for preparing data for the view is the optimal way, so any advice or tips regarding that will be greatly appreciated as well.
Thanks
You need to eager load users_scores to reduce multiple queries. #Slava.K provided good explanation on how to eliminate that.
Add includes(:user_scores) for querying users, and use ruby's methods to work with collections once data is fetched from DB through query.
See code below to understand that:
users = User.includes(:user_scores).order('created_at ASC')
users.each do |u|
....
user[:matches] = u.user_scores.length
user[:jokers_used] = u.user_scopes.select{ |score| score.joker == true }.length
user[:jokers] = u.joker
user[:bonus] = u.user_scores.select{ |score| score.bonus == 1 }.length
....
end
Also, The way you are preparing response is not clean and flexible. Instead you should override as_json method to prepare json which can consumed by views properly. as_json method is defined for models by default. You can read more about it from official documentation http://api.rubyonrails.org/classes/ActiveModel/Serializers/JSON.html or visit article on preparing clean json response where I explained about overriding as_json properly in-depth.
Use includes method for eager loading your has many associations. You can understand this concept here: https://www.youtube.com/watch?v=s2EPVMqOsTQ
Firstly, reference user_scores association in your query:
users = User.includes(:user_scores).order('created_at ASC')
Follow rails documentation associations eager loading: http://guides.rubyonrails.org/active_record_querying.html#eager-loading-associations
Also note that where makes new query to the database even if your association is already preloaded. Therefore, instead of
u.user_scores.where(:joker => true).length
u.user_scores.where(:bonus => 1).length
try:
u.user_scores.count { |us| us.joker }
u.user_scores.count { |us| us.bonus == 1 }
You will probably have to rewrite .user_scores.order('updated_at DESC').first.updated_at.strftime("%y/%m/%d") somehow as well

How to make an expensive function faster in Rails?

In my Rails 4 application a user can click through the edit views of his invoices using a skip button.
My problem is that each invoice contains the same expensive select box...
def self.options
names = []
names << [ "Please select...", nil ]
order(:last_name).includes(:company).with_projects.map do |person|
names << [ person.name, person.id, :'data-address' => person.invoice_address, :'data-email' => person.email ]
end
names
end
...and it will be calculated from scratch for each invoice despite being practically the same for all invoices.
Is there a way to store these options somewhere so they can get called faster when skipping through the various invoices?
Thanks for any help.
You can use a rails cache to do this
def self.options
Rails.cache.fetch('some_key', expires_in: 10.minutes) do
['Please select...', nil ] +
order(:last_name).includes(:company).with_projects.map do |person|
[ person.name, person.id, :'data-address' => person.invoice_address, :'data-email' => person.email ]
end
end
end
You can change the 'some_key' to something unique, and the timeout to something else, when the cache expires rails will regenerate it on the next request, or if you like you can create a task to generate this cache periodically.
Basically this block looks for a cache with the name 'some_key' and return it, but if the cache key is not found or it's expired it recalcuates the code inside it, caches it and then returns it.
This method is explained in this guide and if you're interested, I've written a post about the different options over here
If all of the data is simply retrieved from the database then consider avoiding the overhead of instantiating objects and just use pluck.
def self.options
names = []
names << [ "Please select...", nil ]
order(:last_name).includes(:company).with_projects.pluck(:name, :id, :invoice_address, :email).each do |person|
names << [ person[0], person[1], :'data-address' => person[2], :'data-email' => [3] ]
end
names
end
You might also post the queries generated by active record, as you'd hope to be generating a single SELECT with an INNER JOIN to COMPANIES.

removing objects from an array during a loop

I am trying to filter the results of an user search in my app to only show users who are NOT friends. My friends table has 3 columns; f1 (userid of person who sent request), f2 (userid of friend who received request), and confirmed (boolean of true or false). As you can see, #usersfiltered is the result of the search. Then the definition of the current user's friend is established. Then I am trying to remove the friends from the search results. This does not seem to be working but should be pretty straight forward. I've tried delete (not good) and destroy.
def index
#THIS IS THE SEARCH RESULT
#usersfiltered = User.where("first_name LIKE?", "%#{params[:first_name]}%" )
#THIS IS DEFINING ROWS ON THE FRIEND TABLE THAT BELONG TO CURRENT USER
#confirmedfriends = Friend.where(:confirmed => true)
friendsapproved = #confirmedfriends.where(:f2 => current_user.id)
friendsrequestedapproved = #confirmedfriends.where(:f1 => current_user.id)
#GOING THROUGH SEARCH RESULTS
#usersfiltered.each do |usersfiltered|
if friendsapproved.present?
friendsapproved.each do |fa|
if usersfiltered.id == fa.f1
#NEED TO REMOVE THIS FROM RESULTS HERE SOMEHOW
usersfiltered.remove
end
end
end
#SAME LOGIC
if friendsrequestedapproved.present?
friendsrequestedapproved.each do |fra|
if usersfiltered.id == fra.f2
usersfiltered.remove
end
end
end
end
end
I would flip it around the other way. Take the logic that is loop-invariant out of the loop, which gives a good first-order simplification:
approved_ids = []
approved_ids = friendsapproved.map { |fa| fa.f1 } if friendsapproved.present?
approved_ids += friendsrequestedapproved.map { |fra| fra.f2 } if friendsrequestedapproved.present?
approved_ids.uniq! # (May not be needed)
#usersfiltered.delete_if { |user| approved_ids.include? user.id }
This could probably be simplified further if friendsapproved and friendsrequestedapproved have been created separately strictly for the purpose of the deletions. You could generate a single friendsapproval list consisting of both and avoid unioning id sets above.
While I agree that there may be better ways to implement what you're doing, I think the specific problem you're facing is that in Rails 4, the where method returns an ActiveRecord::Relation not an Array. While you can use each on a Relation, you cannot in general perform array operations.
However, you can convert a Relation to an Array with the to_a method as in:
#usersfiltered = User.where("first_name LIKE?", "%#{params[:first_name]}%" ).to_a
This would then allow you to do the following within your loop:
usersfiltered.delete(fa)

Unit Testing Tire (Elastic Search) - Filtering Results with Method from to_indexed_json

I am testing my Tire / ElasticSearch queries and am having a problem with a custom method I'm including in to_indexed_json. For some reason, it doesn't look like it's getting indexed properly - or at least I cannot filter with it.
In my development environment, my filters and facets work fine and I am get the expected results. However in my tests, I continuously see zero results.. I cannot figure out where I'm going wrong.
I have the following:
def to_indexed_json
to_json methods: [:user_tags, :location_users]
end
For which my user_tags method looks as follows:
def user_tags
tags.map(&:content) if tags.present?
end
Tags is a polymorphic relationship with my user model:
has_many :tags, :as => :tagable
My search block looks like this:
def self.online_sales(params)
s = Tire.search('users') { query { string '*' }}
filter = []
filter << { :range => { :created_at => { :from => params[:start], :to => params[:end] } } }
filter << { :terms => { :user_tags => ['online'] }}
s.facet('online_sales') do
date :created_at, interval: 'day'
facet_filter :and, filter
end
end
end
I have checked the user_tags are included using User.last.to_indexed_json:
{"id":2,"username":"testusername", ... "user_tags":["online"] }
In my development environment, if I run the following query, I get a per day list of online sales for my users:
#sales = User.online_sales(start_date: Date.today - 100.days).results.facets["online_sales"]
"_type"=>"date_histogram", "entries"=>[{"time"=>1350950400000, "count"=>1, "min"=>6.0, "max"=>6.0, "total"=>6.0, "total_count"=>1, "mean"=>6.0}, {"time"=>1361836800000, "count"=>7, "min"=>3.0, "max"=>9.0, "total"=>39.0, "total_count"=>7, "mean"=>#<BigDecimal:7fabc07348f8,'0.5571428571 428571E1',27(27)>}....
In my unit tests, I get zero results unless I remove the facet filter..
{"online_sales"=>{"_type"=>"date_histogram", "entries"=>[]}}
My test looks like this:
it "should test the online sales facets", focus: true do
User.index.delete
User.create_elasticsearch_index
user = User.create(username: 'testusername', value: 'pass', location_id: #location.id)
user.tags.create content: 'online'
user.tags.first.content.should eq 'online'
user.index.refresh
ws = User.online_sales(start: (Date.today - 10.days), :end => Date.today)
puts ws.results.facets["online_sales"]
end
Is there something I'm missing, doing wrong or have just misunderstood to get this to pass? Thanks in advance.
-- EDIT --
It appears to be something to do with the tags relationship. I have another method, ** location_users ** which is a has_many through relationship. This is updated on index using:
def location_users
location.users.map(&:id)
end
I can see an array of location_users in the results when searching. Doesn't make sense to me why the other polymorphic relationship wouldn't work..
-- EDIT 2 --
I have fixed this by putting this in my test:
User.index.import User.all
sleep 1
Which is silly. And, I don't really understand why this works. Why?!
Elastic search by default updates it's indexes once per second.
This is a performance thing because committing your changes to Lucene (which ES uses under the hood) can be quite an expensive operation.
If you need it to update immediately include refresh=true in the URL when inserting documents. You normally don't want this since committing every time when inserting lots of documents is expensive, but unit testing is one of those cases where you do want to use it.
From the documentation:
refresh
To refresh the index immediately after the operation occurs, so that the document appears in search results immediately, the refresh parameter can be set to true. Setting this option to true should ONLY be done after careful thought and verification that it does not lead to poor performance, both from an indexing and a search standpoint. Note, getting a document using the get API is completely realtime.

Resources