Rails Caching DB Queries and Best Practices - ruby-on-rails

The DB load on my site is getting really high so it is time for me to cache common queries that are being called 1000s of times an hour where the results are not changing.
So for instance on my city model I do the following:
def self.fetch(id)
Rails.cache.fetch("city_#{id}") { City.find(id) }
end
def after_save
Rails.cache.delete("city_#{self.id}")
end
def after_destroy
Rails.cache.delete("city_#{self.id}")
end
So now when I can City.find(1) the first time I hit the DB but the next 1000 times I get the result from memory. Great. But most of the calls to city are not City.find(1) but #user.city.name where Rails does not use the fetch but queries the DB again... which makes sense but not exactly what I want it to do.
I can do City.find(#user.city_id) but that is ugly.
So my question to you guys. What are the smart people doing? What is
the right way to do this?

With respect to the caching, a couple of minor points:
It's worth using slash for separation of object type and id, which is rails convention. Even better, ActiveRecord models provide the cacke_key instance method which will provide a unique identifier of table name and id, "cities/13" etc.
One minor correction to your after_save filter. Since you have the data on hand, you might as well write it back to the cache as opposed to delete it. That's saving you a single trip to the database ;)
def after_save
Rails.cache.write(cache_key,self)
end
As to the root of the question, if you're continuously pulling #user.city.name, there are two real choices:
Denormalize the user's city name to the user row. #user.city_name (keep the city_id foreign key). This value should be written to at save time.
-or-
Implement your User.fetch method to eager load the city. Only do this if the contents of the city row never change (i.e. name etc.), otherwise you can potentially open up a can of worms with respect to cache invalidation.
Personal opinion:
Implement basic id based fetch methods (or use a plugin) to integrate with memcached, and denormalize the city name to the user's row.
I'm personally not a huge fan of cached model style plugins, I've never seen one that's saved a significant amount of development time that I haven't grown out of in a hurry.
If you're getting way too many database queries it's definitely worth checking out eager loading (through :include) if you haven't already. That should be the first step for reducing the quantity of database queries.

If you need to speed up sql queries on data that doesnt change much over time then you can use materialized views.
A matview stores the results of a query into a table-like structure of
its own, from which the data can be queried. It is not possible to add
or delete rows, but the rest of the time it behaves just like an
actual table. Queries are faster, and the matview itself can be
indexed.
At the time of this writing, matviews are natively available in Oracle
DB, PostgreSQL, Sybase, IBM DB2, and Microsoft SQL Server. MySQL
doesn’t provide native support for matviews, unfortunately, but there
are open source alternatives to it.
Here is some good articles on how to use matviews in Rails
sitepoint.com/speed-up-with-materialized-views-on-postgresql-and-rails
hashrocket.com/materialized-view-strategies-using-postgresql

I would go ahead and take a look at Memoization, which is now in Rails 2.2.
"Memoization is a pattern of
initializing a method once and then
stashing its value away for repeat
use."
There was a great Railscast episode on it recently that should get you up and running nicely.
Quick code sample from the Railscast:
class Product < ActiveRecord::Base
extend ActiveSupport::Memoizable
belongs_to :category
def filesize(num = 1)
# some expensive operation
sleep 2
12345789 * num
end
memoize :filesize
end
More on Memoization

Check out cached_model

Related

Rails model caching

I have two models, called Division and Tier, which are queried a lot, but inserts or updates are hardly ever done. The tables only have a handful or records in, so the whole tables can be stored in memory.
class Division
# position: integer
belongs_to :tier
end
class Tier
# name: string
has_many :tiers
end
These tables are queried in almost every page, so it seems like a waste of database calls
Model caching solutions like identity cache, which cache records in memcached, only allow you to retrieve records from the cache by id.
A lot of queries are just selecting by id, but I also perform a lot of queries like
SELECT * FROM divisions WHERE divisions.position BETWEEN ? AND ?
I have a few questions
Will the database cache these tables in memory by itself, making another caching solution unnecessary?
If not, is there any way to cache these models in memory, and query against the cache using conditions other than just the id?
Am I essentially asking for something with the speed of redis/memcached, but the features of a relational database, which isn't possible?
Rails will cache models and avoid duplicate DB lookups but only for identical SQL queries, and only if that SQL query has already been made in the same client request. So another caching solution may be warranted.
For your case (non-volatile, reference data), caching sounds like a good idea. However, there are a few caveats:
As you pointed out, you aren't always querying by id. Which means that you either need to cache the results of given ranges (which seems like a bad idea), or you'll need a cache that can perform lookups by multiple indexes and find results across a range (elasticsearch would be a good example of this).
Caches can become stale when you modify data (INSERT/UPDATE/DELETE). Range caches are particularly vulnerable to this (another reason it was a bad idea), but even if you cache by indexes you'll still need to ensure that you update your caches accordingly.

Is it possible to alter a record in rails, without first reading it

I want to alter a record, but I don't want to read it out of the database first, because frankly I don't see the point. It's huge and across a network that is slow and expensive.
Can it be done (easily)?
Lets say I have a record with 100 fields (for arguments sake) and I want to alter one field in the table. I have a really bad connection to the database (this is true) because it's housed on a different box and there's nothing I can do to change this.
Right now I pull down the record and rails validates its contents (because I have serialized bits) I then alter one field (one of the hundred depending on X condition) and save the record again. Which I suppose writes the whole record to the database again, with no knowledge of the fact that I only changed one small bit. (this last bit is assumption)
Now to change one record it's sending a huge amount of data across the network, and it could be that I'm only changing one small small thing..
Also it's doing two queries on the database. First the select * then the update..
So my question.. are there smarter base classes that do this right, to write without read?
Top of my head I would think a setter method for each field with a bool flag for changed.
When saving, walk the flags and where true... Does this happen now, if so how do I make use of it?
Rails models have the update method, which does a SQL select then an SQL update.
Its use is simple (suppose you have a model named Person)
Person.update(person_id, :user_name => 'New Name')
The drawback is that you have to know beforehand the person id.
But then, you will always have to do a query to find out that. You could write your own SQL to change the column value, with a WHERE clause that searches for the param you have.
Rails models have the update_all method, which does only a SQL update without loading the data.
But I wouldn't recommend using it because it doesn't trigger any callbacks or validations that your model may have.
You could use the update_all like this:
Book.update_all "author = 'David'", "title LIKE '%Rails%'"
ps: examples were all taken from the Rails official documentation. If you search for update, or update_all, you'll find both methods in no time.
Take a look there, it's a good place to find out what rails can do.
Rails API link
It's not obvious from the documentation, but update_all is the answer to this question.
Book.where(id: 123).update_all(title: "War and Peace")
results in exactly one query:
UPDATE `books` SET `books`.`title` = "War and Peace" WHERE `books`.`id` = 123
It's been a while since this question was asked, but this is so for Rails 4/5/6.

Duplicating logic in methods and scopes (and sql)

Named scopes really made this problem easier but it is far from being solved. The common situation is to have logic redefined in both named scopes and model methods.
I'll try to demonstrate the edge case of this by using somewhat complex example. Lets say that we have Message model that has many Recipients. Each recipient is being able to mark the message as being read for himself.
If you want to get the list of unread messages for given user, you would say something like this:
Message.unread_for(user)
That would use the named scope unread_for that would generate the sql which will return the unread messages for given user. This sql is probably going to join two tables together and filter messages by those recipients that haven't already read them.
On the other hand, when we are using the Message model in our code, we are using the following:
message.unread_by?(user)
This method is defined in message class and even it is doing basically the same thing, it now has different implementation.
For simpler projects, this is really not a big thing. Implementing the same simple logic in both sql and ruby in this case is not a problem.
But when application starts to get really complex, it starts to be a problem. If we have permission system implemented that checks who is able to access what message based on dozens of criteria defined in dozens of tables, this starts to get very complex. Soon it comes to the point where you need to join 5 tables and write really complex sql by hand in order to define the scope.
The only "clean" solution to the problem is to make the scopes use the actual ruby code. They would fetch ALL messages, and then filter them with ruby. However, this causes two major problems:
Performance
Pagination
Performance: we are creating a lot more queries to the database. I am not sure about internals of DMBS, but how harder is it for database to execute 5 queries each on single table, or 1 query that is going to join 5 tables at once?
Pagination: we want to keep fetching records until specified number of records is being retrieved. We fetch them one by one and check whether it is accepted by ruby logic. Once 10 of them are accepted, process will stop.
Curious to hear your thoughts on this. I have no experience with nosql dbms, can they tackle the issue in different way?
UPDATE:
I was only speaking hypotetical, but here is one real life example. Lets say that we want to display all transactions on the one page (both payments and expenses).
I have created SQL UNION QUERY to get them both, then go through each record, check whether it could be :read by current user and finally paginated it as an array.
def form_transaction_log
sql1 = #project.payments
.select("'Payment' AS record_type, id, created_at")
.where('expense_id IS NULL')
.to_sql
sql2 = #project.expenses
.select("'Expense' AS record_type, id, created_at")
.to_sql
result = ActiveRecord::Base.connection.execute %{
(#{sql1} UNION #{sql2})
ORDER BY created_at DESC
}
result = result.map do |record|
klass = Object.const_get record["record_type"]
klass.find record["id"]
end.select do |record|
can? :read, record
end
#transactions = Kaminari.paginate_array(result).page(params[:page]).per(7)
end
Both payments and expenses need to be displayed within same table, ordered by creation date and paginated.
Both payments and expenses have completely different :read permissions (defined in ability class, CanCan gem). These permission are quite complex and they require querieng several other tables.
The "ideal" thing would be to write one HUGE sql query that would do return what I need. It would made pagination and everything else a lot easier. But that is going to duplicate my logic defined in ability.rb class.
I'm aware that CanCan provides a way of defining the sql query for the ability, but the abilities are so complex, that they couldn't be defined in that way.
What I did is working, but I'm loading ALL transactions, and then checking which ones I could read. I consider it a big performance issue. Pagination here seems pointless because I'm already loading all records (it only saves bandwidth). An alternative is to write really complex SQL that is going to be hard to maintain.
Sounds like you should remove some duplication and perhaps use DB logic more. There's no reason that you can't share code between named scopes between other methods.
Can you post some problematic code for review?

Dynamically creating new Active Record models and database tables

I am not sure exactly what I should name this question. I just started server-side programming and I need some help.
All the tutorials I have read so far on RoR deal with creating a pre-defined table and with pre-defined fields (id, name, email, etc etc). They use ActiveRecord as base class and saving to db is handled automatically by superclass.
What I am trying to program is something that allows user-defined table with fields. So think of this way. The web UI will have an empty table, the user will name the table, and add columns (field), and after that, add rows, and then later save it. How would I implement this? I am not asking for details, just an overview of it. As I said, all the tutorials I have read so far deal with pre-defined tables with fields where the ActiveRecord subclass is predefined.
So in a nutshell, I am asking, how to create tables in db on runtime, and add fields to the tables.
Hope I was clear, if not, please let me know and i will try to elaborate a bit more.
Thanks.
Unless you're building a DB administration tool (and even maybe then), allowing the user direct access to the database layer in the way you're suggesting is probably a bad idea. Apart from issues of stability and security, it'll get really slow if your users are creating lots of tables.
For instance, if you wanted to search for a certain value across 100 of your users' tables, you'd have to run 100 separate queries. The site would get exponentially slower the more user tables that were created.
A saner way to do it might be to have a Table model like this
class Table < ActiveRecord::Base
has_many :fields
has_many :rows
end
Every table would have fields attached to it, and rows to store the corresponding data (which would be encoded somehow).
However, as #Aditya rightly points out, this is not really beginner stuff!
I agree with previous answers generally speaking. It's not clear from your question why you want to create a table at runtime. It's not really obvious what the advantage of doing this would be. If you are just trying to store data that seems to fit into a table with rows and columns, why not just store it as an array in a field of your user table. If your user is allowed to create many tables, then you could have something like
class User < ActiveRecord::Base
has_many :tables
end
and then each table might have a field to store a serialized array. Or you could go with Alex's suggestion - the best choice really depends on what you are going to do with the data, how often it changes, whether you need to search it and so on ...
You can create a database as shown in tutorials which stores name of tables and their columns name those your user want. Then you can have worker (which can be build using Redis and Resque, here is simple Tut on Resque and Redis) and have those worker run migration (write migration with variables and use params to replace them) for you for new table in DB as soon as new entry is made in database. Tell me if you have questions on this.

Rails Minimizing Database Load

I am relatively new to rails. I understand that rails lets you play with your database values with much ease but I am a little bit in the blind about what kind of approach is more energy efficient on the database and which not.
Here is a case in point. I have a model appointment which belongs_to user. In my syntax I can sometimes say process_user #appointment.user. When I write that, does that run a separate SELECT query on the database to retrieve that user? Is it more efficient to write process_user #appointment.user_id where user_id is an attribute in the appointment and then try use the user_id value to perform my evaluation related tasks as long as I don't need the whole user object #appointment.user.
Frankly, from a peace of mind point of view, I just love to be able to use process_user #appointment.user because it reads better, looks nicer and works better when preparing logic. Is it a performance efficient way?
You are perfectly fine with using code like process_user #appointment.user, as ActiveRecord tries its best to minimize the number of database queries. Of course it does not handle all situations perfectly, but your example is a very basic one. There would probably no immediate database query happen and the object would only be loaded when its attributes are accessed.
If you notice performance problems in a running large-scaled application and you can track the problems down to ActiveRecord using profiling, it is probably time to optimize. Trying to pre-optimize from the very beginning would be against Rails' philosophy and will only result in ugly (and possible even slower) code. Remember that the real performance bottlenecks are often at places where you would never expect them.
EDIT: As Winfield pointed out, optimizing the number of queries does usually not mean to manage foreign keys or similar internals by yourself. There are quite a number of flags and options for DB access methods that allow you to control how your database is queries.
You can eagerly load your associated users with your Appointment models:
Appointment.all(:include => :user)
...which will join in the users or do a separate lookup for all the associated users in a single query.
This will then load the user association in advance (eagerly) so the user attribute is already populated with the object when you reference it, instead of having to stop and execute a separate query to look it up one by one (N+1 queries).

Resources