rails + ActiveRecord: caching all registers of a model - ruby-on-rails

I've got a tiny model (let's call it "Node") that represents a tree-like structure. Each node contains only a name and a reference to its father:
class Node < ActiveRecord::Base
validates_presence_of :name, :parent_id
end
The table isn't very big - less than 100 elements. It's updated rarely - in the last 4 months 20 new elements were added, in one occasion, by the site admin.
Yet it is used quite a lot on my application. Given its tree-like structure, on some occasions a request triggers more than 30 database hits (including ajax calls, which I use quite a lot).
I'd like to use some sort of caching in order to lower the database access - since the table is so small, I thought about caching all registers in memory.
Is this possible rails 2.3? Is there a better way to deal with this?

Why don't you just load them all every time to avoid getting hit with multiple loads?
Here's a simple example:
before_filter :load_all_nodes
def load_all_nodes
#nodes = Node.all.inject({ }) { |h, n| h[n.id] = n; n }
end
This will give you a hash indexed by Node#id so you can use this cache in place of a find call:
# Previously
#node = Node.find(params[:id])
# Now
#node = #nodes[params[:id].to_i]
For small, simple records, loading them in quickly in one fetch is a fairly inexpensive operation.

Have you looked at any of the plugins that give tree like behaviour.
Ryan Bates has a railscast on acts_as_tree however acts_as_nested_set or one of the other projects inspired by it such as awesome_nested_set or acts_as_better_nested_set may be better fits for your needs.
These projects allow you to get a node and all of its children with one sql query. The acts_as_better_nested_set site has a good description of how this method works.

After looking in several places, I think tadman's solution is the simplest one.
For a more flexible solution, I've found this gist:
http://gist.github.com/72250/
Regards!

Related

More efficient, rails way to check for any of three fields being unique?

So, I need check three fields for uniqueness of an object before creating it (from a form), but I will create the object so long as any of the three fields are unique.
My first thought was to just pass the params from the controller to the model, and then run a query to check if a query with those three fields returns > 0 documents. However, I've since learned that this is a dangerous approach, and should not be used.
So I checked the docs, and based off of this snippet
Or even multiple scope parameters. For example, making sure that a teacher can only be on the schedule once per semester for a particular class.
class TeacherSchedule < ActiveRecord::Base
validates_uniqueness_of :teacher_id, scope: [:semester_id, :class_id]
end
I thought I had found my answer, and implemented:
validates_uniqueness_of :link_to_event, :scope => [:name_of_event, :date_of_event]
which works! But, this dataset is going to get very large (not from this form alone, lol), and I'm under the impression that with this implementation, Rails is going to query for all fields with a link_to_event, and then all fields with a name_of_event, and then all fields with a date_of_event. So, my question(s) is:
A) Am I wrong about how rails will implement this? Is it going to be more efficient out of the box?
B) If this will not be efficient for a table with a couple million entries, is there a better (and still railsy) way to do this?
You can define a method that queries the records with all the fields that you want to be unique as a group:
validate :uniqueness_of_teacher_semester_and_class
def uniqueness_of_teacher_semester_and_class
users = self.class.where(teacher_id: teacher_id, semester_id: semester_id, class_id: class_id)
errors.add :base, 'Record not unique.' if users.exists?
end
To answer your questions:
A) Am I wrong about how rails will implement this? Is it going to be more efficient out of the box?
I think Rails will query for a match on all 3 fields, and you should check the Mongo (or Rails) log to see for sure.
B) If this will not be efficient for a table with a couple million entries, is there a better (and still railsy) way to do this?
This is the Rails way. There are 2 things you can do to make it efficient:
You would need indexes on all 3 fields, or a compound index of the 3 fields. The compound index *might* be faster, but you can benchmark to find out.
You can add a new field with the 3 fields concatenated, and an index on it. But this will take up extra space and may not be faster than the compound index.
These days a couple million documents is not that much, but depends on document size and hardware.

Rails subquery reduce amount of raw SQL

I have two ActiveRecord models: Post and Vote. I want a make a simple query:
SELECT *,
(SELECT COUNT(*)
FROM votes
WHERE votes.id = posts.id) AS vote_count
FROM posts
I am wondering what's the best way to do it in activerecord DSL. My goal is to minimize the amount of SQL I have to write.
I can do Post.select("COUNT(*) from votes where votes.id = posts.id as vote_count")
Two problems with this:
Raw SQL. Anyway to write this in DSL?
This returns only attribute vote_count and not "*" + vote_count. I can append .select("*") but I will be repeating this every time. Is there an much better/DRY way to do this?
Thanks
Well, if you want to reduce amount of SQL, you can split that query into smaller two end execute them separately. For instance, the votes counting part could be extracted to query:
SELECT votes.id, COUNT(*) FROM votes GROUP BY votes.id;
which you may write with ActiveRecord methods as:
Vote.group(:id).count
You can store the result for later use and access it directly from Post model, for example you may define #votes_count as a method:
class Post
def votes_count
##votes_count_cache ||= Vote.group(:id).count
##votes_count_cache[id] || 0
end
end
(Of course every use of cache raises a question about invalidating or updating it, but this is out of the scope of this topic.)
But I strongly encourage you to consider yet another approach.
I believe writing complicated queries like yours with ActiveRecord methods — even if would be possible — or splitting queries into two as I proposed earlier are both bad ideas. They result in extremely cluttered code, far less readable than raw SQL. Instead, I suggest introducing query objects. IMO there is nothing wrong in using raw, complicated SQL when it's hidden behind nice interface. See: M. Fowler's P of EAA and Brynary's post on Code Climate Blog.
How about doing this with no additional SQL at all - consider using the Rails counter_cache feature.
If you add an integer votes_count column to the posts table, you can get Rails to automatically increment and decrement that counter by changing the belongs_to declaration in Vote to:
belongs_to :post, counter_cache: true
Rails will then keep each Post updated with the number of votes it has. That way the count is already calculated and no sub-query is needed.
Maybe you can create mysql view and just map it to new AR model. It works similar way to table, you just need to specify with set_table_name "your_view_name"....maybe on DB level it will work faster and will be automatically re-calculating.
Just stumbled upon postgres_ext gem which adds support for Common Table Expressions in Arel and ActiveRecord which is exactly what you asked. Gem is not for SQLite, but perhaps some portions could be extracted or serve as examples.

Rails app. Collect items from several models and show this paginated list of items on main page

In Rails app I have three models (Posting, Shop::Items and Directory::Items). Every model has relations with other models (User, Category, City ...). All this info I have to show on main page. It should look like paginated list of items. I'm using postgres for DB.
We can write such code in our controller:
#postings = Posting.includes(:category, :subcategory, :user, :city).all
#directory_items = Directory::Item.includes(:category, :user, :city).all
#shop_items = Shop::Item.includes(:category, :user, :city, :phone).all
and sort this results. But it is not the best way.
Second solution is to write custom query ommitting ActiveRecord. But this query in really large and it will be hard to support it (we have a lot of joins there)
What is the best solution for that with the smallest amount of DB queries? (This is main page, so it should work really fast).
There are some solutions which are not fast in here.
Alternatively you can use sphinx with thinking sphinx gem. This way it will be fast and clean.
ThinkingSphinx.search classes: [Posting, Shop::Items, Directory::Items], page: params[:page] || 1, per_page: 10
As the queries will be cached, only the sorting will take time. If you don't display too many items, developer readability far exceeds the cost to optimize this. You can always do it better if performance is really a bottleneck, but then you still have alternatives to making this code overly complex.

Rails optimization Question

In Rails while using activeRecord why are join queries considered bad.
For example
Here i'm trying to find the number of companies that belong to a certain category.
class Company ActiveRecord::Base
has_one :company_profile
end
Finding the number of Company for a particular category_id
number_of_companies = Company.find(:all, :joins=>:company_profile, :conditions=>["(company_profiles.category_id = #{c_id}) AND is_published = true"])
How could this be better or is it just poor design?
company_profiles = CompanyProfile.find_all_by_category_id(c_id)
companies = []
company_profiles.each{|c_profile| companies.push(c_profile.company) }
Isn't it better that the first request creates a single query while i'd be running several queries for the second case.
Could someone explain why joins are considered to be bad practice in Rails
Thanks in advance
To my knowledge, there is no such rule. The rule is to hit the database as least as possible, and rails gives you the right tools for that, using the joins.
The example Sam gives above is exemplary. Simple code, but behind the scenes rails has to do two queries, instead of only one using a join.
If there is one rule that comes to mind, that i think is related, is to avoid SQL where possible and use the rails way as much as possible. This keeps your code database agnostic (as rails handles the differences for you). But sometimes even that is unavoidable.
It comes down to good database design, creating the correct indexes (which you need to define manually in migrations), and sometimes big nested structures/joins are needed.
Join queries are not bad, in fact, they are good, and ActiveRecord has them at its very heart. You don't need to break into find_by_sql to use them, options like :include will handle it for you. You can stay within the ORM, which gives the readability and ease of use, whilst still, for the most part, creating very efficient SQL (providing you have your indexes right!)
Bottom line - you need to do the bare minimum of database operations. Joins are a good way of letting the database do the heavy lifting for you, and lowering the number of queries that you execute.
By the by, DataMapper and Arel (the query engine in Rails 3) feature a lot of lazy loading - this means that code such as:
#category = Category.find(params[:id])
#category.companies.size
Would most likely result in a join query that only did a COUNT operation, as the first line wouldn't result in a query being sent to the db.
If you just want to find the number of companies on a category all you need to do is find the category and then call the association name and size because it will return an array.
#category = Category.find(params[:id])
#category.companies.size

Rails Caching DB Queries and Best Practices

The DB load on my site is getting really high so it is time for me to cache common queries that are being called 1000s of times an hour where the results are not changing.
So for instance on my city model I do the following:
def self.fetch(id)
Rails.cache.fetch("city_#{id}") { City.find(id) }
end
def after_save
Rails.cache.delete("city_#{self.id}")
end
def after_destroy
Rails.cache.delete("city_#{self.id}")
end
So now when I can City.find(1) the first time I hit the DB but the next 1000 times I get the result from memory. Great. But most of the calls to city are not City.find(1) but #user.city.name where Rails does not use the fetch but queries the DB again... which makes sense but not exactly what I want it to do.
I can do City.find(#user.city_id) but that is ugly.
So my question to you guys. What are the smart people doing? What is
the right way to do this?
With respect to the caching, a couple of minor points:
It's worth using slash for separation of object type and id, which is rails convention. Even better, ActiveRecord models provide the cacke_key instance method which will provide a unique identifier of table name and id, "cities/13" etc.
One minor correction to your after_save filter. Since you have the data on hand, you might as well write it back to the cache as opposed to delete it. That's saving you a single trip to the database ;)
def after_save
Rails.cache.write(cache_key,self)
end
As to the root of the question, if you're continuously pulling #user.city.name, there are two real choices:
Denormalize the user's city name to the user row. #user.city_name (keep the city_id foreign key). This value should be written to at save time.
-or-
Implement your User.fetch method to eager load the city. Only do this if the contents of the city row never change (i.e. name etc.), otherwise you can potentially open up a can of worms with respect to cache invalidation.
Personal opinion:
Implement basic id based fetch methods (or use a plugin) to integrate with memcached, and denormalize the city name to the user's row.
I'm personally not a huge fan of cached model style plugins, I've never seen one that's saved a significant amount of development time that I haven't grown out of in a hurry.
If you're getting way too many database queries it's definitely worth checking out eager loading (through :include) if you haven't already. That should be the first step for reducing the quantity of database queries.
If you need to speed up sql queries on data that doesnt change much over time then you can use materialized views.
A matview stores the results of a query into a table-like structure of
its own, from which the data can be queried. It is not possible to add
or delete rows, but the rest of the time it behaves just like an
actual table. Queries are faster, and the matview itself can be
indexed.
At the time of this writing, matviews are natively available in Oracle
DB, PostgreSQL, Sybase, IBM DB2, and Microsoft SQL Server. MySQL
doesn’t provide native support for matviews, unfortunately, but there
are open source alternatives to it.
Here is some good articles on how to use matviews in Rails
sitepoint.com/speed-up-with-materialized-views-on-postgresql-and-rails
hashrocket.com/materialized-view-strategies-using-postgresql
I would go ahead and take a look at Memoization, which is now in Rails 2.2.
"Memoization is a pattern of
initializing a method once and then
stashing its value away for repeat
use."
There was a great Railscast episode on it recently that should get you up and running nicely.
Quick code sample from the Railscast:
class Product < ActiveRecord::Base
extend ActiveSupport::Memoizable
belongs_to :category
def filesize(num = 1)
# some expensive operation
sleep 2
12345789 * num
end
memoize :filesize
end
More on Memoization
Check out cached_model

Resources