where query causing a timeout in rails - ruby-on-rails

I have this query in my rails application where I get the name of the Books.
Book.where(:name => #name_list).pluck(:name)
Basically it find the Book whose name is present in the #name_list and then return an array of their names if present.
But since there are a huge number of books present in the database, the request is getting timed out when I call this particular endpoint.
Please let me know if there is any way we can make this faster so that endpoint will work.
Also will the query speed increase if we add this name column as an index into the Books table ?
add_index :books, :name

You asked it an index on books.name will increase the performance of query filtering books by a given list of names.
The answer is yes. This is exactly the textbook use-case for database indexes. The bigger the table is, the bigger the performance benefit will be. For huge tables, it is not unlikely that queries using the index will be a magnitude faster.
I highly suggest adding such an index with the method you already named in your question and try again:
add_index :books, :name

I think this is something you're looking for, it's best if you process the data in batches to prevent memory bloat, or process this large amount of data at once.
In Rails 3.2, how to "pluck_in_batches" for a very large table

Related

Does normal indexing also work by creating unique index?

Hi Guys, i would like to know, if i create unique index of two columns on postgreSQL, does normal indexing for the both columns also work by same unique index or i have to create one unique index and two more index for both columns as shown in the code? I want to create unique index of talent_id, job_id, also both columns should separately indexed. I read many resources but does not get appropriate answer.
add_index :talent_actions, [:talent_id, :job_id], unique: true
Does above code also handles below indexing also or i have to add below indexing separately?
add_index :talent_actions, :talent_id
add_index :talent_actions, :job_id
Thank you.
An index is an object in the database, which can be used to look up data faster, if the query planner decides it will be appropriate. So the trivial answer to your question is "no", creating one index will not result in the same structures in the database as creating three different indexes.
I think what you actually want to know is this:
Do I need all three indexes, or will the unique index already optimise all queries?
This, as with any database optimisation, depends on the queries you run, and the data you have.
Here are some considerations:
The order of columns in a multi-column index matters. If you have an index of people sorted by surname then first name, then you can use it to search for everybody with the same surname; but you probably can't use it to search for somebody when you only know their first name.
Data distribution matters. If everyone in your list has the surnames "Smith" and "Jones", then you can use a surname-first index to search for a first name fairly easily (just look up under Jones, then under Smith).
Index size matters. The fewer columns an index has, the more of it fits in memory at once, so the faster it will be to use.
Often, there are multiple indexes the query planner could use, and its job is to estimate the cost of the above factors for the query you've written.
Usually, it doesn't hurt to create multiple indexes which you think might help, but it does use up disk space, and occasionally can cause the query planner to pick a worse plan. So the best approach is always to populate a database with some real data, and look at the query plans for some real queries.

What's add_index used for in this code and more generally?

I am trying to get my head around this code... it's from the Rails Tutorial Book and is part of the process of making a twitter like application.
class CreateRelationships < ActiveRecord::Migration
def change
create_table :relationships do |t|
t.integer :follower_id
t.integer :followed_id
t.timestamps
end
add_index :relationships, :follower_id
add_index :relationships, :followed_id
add_index :relationships, [:follower_id, :followed_id], unique: true
end
end
Since there are only 2 columns (follower_id and followed_id), why
would their be a need for an index?
Does the index sort them in some way? It just seems a bit strange to
me to add an index to a table with 2 columns.
What does the index do to the rows?
Is indexing optional? If so why/why not use it? Is it a good idea to use it in the code above?
Please answer all the questions if you can. I'm just trying to get my head around this concept and after reading about it I have these questions.
Since there are only 2 columns (follower_id and followed_id) why would
there be a need for an index?
Need for indexing doesn't depend on the number of columns. It's used for speeding up the lookups. Even if your table has only one column, verifying whether a particular value is present in that column will need you to scan the whole table in the worst case. With an index it can be answered immediately.
Does the index sort them in some way? It just seems a bit strange to
me to add an index to a table with 2 columns?
No, in general indexes don't sort the data in the table in any way. I say "in general" because clustered indices do sort the data. See this question for more details.
What does the index do to the rows?
Again, nothing in general. Different DBMSes use different mechanisms to associate a row in the table to the index. Indexing is one of the most important tasks in a DBA's work. It'd be great if you have basic ideas about it. Read the wikipedia article to get the basics.
Is indexing optional? If so why/why not use it?
Yes, indexing is optional. You should use indexes when you see your query performance go down. Again, different DBMSes provide different mechanisms for you to monitor your query performance and you should have monitoring in place to alert you when performance degrades beyond a threshold. With experience, you'll reach a point where most of the indexing needs of an application will be clear to you from the beginning.
Is it a good idea to use it in the code above?
Can't comment on that. Indexing needs of each application are different. You should be aware of downsides of over-indexing as well. If you have a lot of indexes, your updates, inserts and deletes will become slower with time since they will also need to update your indexes.
An index on a column or set of columns speeds lookups on that column or set of columns. It's usefulness has nothing to do with the number of columns in the table since it's purpose is to locate the row(s) associated with the column values.
No, the index doesn't sort the table.
The index doesn't "do" anything to the rows, although if it's a "unique" index it would prevent the creation/update of rows which duplicate the column(s) in question.
Indexing is optional. It speeds up lookups, but takes additional time for write operations. Whether or not it is a good idea depends entirely on the application.

Improve performance of Rails application by adding indexes?

In my Rails application I have an index view that lists all of my projects.
This list can be sorted by clicking on any of the table column headers, e.g. Date, Name, updated_at etc. This happens by appending a &sort= GET parameter to the URL.
My question is: From a performance point-of-view, would it be advisable to add indexes to these columns in my database?
This is what a migration might look like:
class AddMoreIndexes < ActiveRecord::Migration
def change
add_index :projects, :date
add_index :projects, :name
add_index :projects, :update_at
end
end
Will I get any performance gains from this?
Indexes can be used to speed an order-by, but if you were identifying a subset of rows to display then an index that is helpful for that is likely to be chosen in preference. You'd need composite indexes in such a situation.
There're a couple of other problems.
Firstly, ordering on an indexed string value may require a linguistically sorted index, not the regular ASCII/Binary sort, so multilingual applications may not be helped at all.
Secondly, it can discourage normalisation of the database because you really need the display values to be in the table you're selecting.
You might like to look at using another method for the sort. I've been very happy with using Google visualisation tables, which come with JQuery sorting built in.
Depending on how you query your database, then yes, it will give you performance gains. For example, whenever I add a foreign key to a table, I immediately index by it. Why? I know queries will be running through it in my application. If not, I wouldn't have put a foreign key. In this way, especially when you accumulate a large amount of data in your database, it will definitely give performance gains (sometimes, by an incredible amount). If you plan to query your database by date, name, or updated at, then yes, it could potentially be a performance gain depending on your query. Otherwise, there really is no point.
Note, you wouldn't want to add an index for every column. Having necessary indices will help you, but if you have an index for every column, then you run the risk of confusing the SQL Query Optimizer and actually hindering your performance.
My suggestion: Add an index for every foreign key you have in your table, but if you're also running some heavy queries with other columns, then add an index there too.

Best SQL indexes for join table

With performance improvements in mind, I was wondering if and which indexes are helpful on a join table (specifically used in a Rails 3 has_and_belongs_to_many context).
Model and Table Setup
My models are Foo and Bar and per rails convention, I have a join table called bars_foos. There is no primary key or timestamps making the old fields in this table bar_id:integer and foo_id:integer. I'm interested in knowing which of the following indexes is best and is without duplication:
A compound index: add_index :bars_foos, [:bar_id, :foo_id]
Two indexes
A. add_index :bars_foos, :bar_id
B. add_index :bars_foos, :foo_id
A combination of both 1 and 2-B
Basically, I'm not sure if the compound index is enough assuming it is helpful to begin with. I believe that a compound index can be used as a single index for the first item which is why I am pretty sure that using all three lines would certainly result in unnecessary duplication.
Likely Usage
The most common usage will be given an instance of model Foo, I will be asking for its associated bars using the RoR syntax of foo.bars and vice versa with bar.foos for an instance of the model Bar.
These will generate queries of the type SELECT * FROM bars_foos WHERE foo_id = ? and SELECT * FROM bars_foos WHERE bar_id = ? respectively and then using those resultant IDs to SELECT * FROM bars WHERE ID in (?) and SELECT * FROM foos WHERE ID in (?).
Please correct me in the comments if I am incorrect, but I do not believe that, in the context of the Rails application, it is ever going to try to do a query where it specifies both IDs like SELECT * FROM bars_foos where bar_id = ? AND foo_id = ?.
Databases
In the event there are database specific optimization techniques, I will most likely be using PostgreSQL. However, others using this code may want to use it in MySQL or SQLite depending on their Rails configuration so all answers are appreciated.
The Answer
The oft repeated answer, which tends to always be the case more often than not is, "it depends." More specifically, it depends on what your data is and how it will be used.
tl;dr Explanation
The short tl;dr answer for my specific case (and to cover all future bases) is choice #2 which is what I suspected. However, choice #3 would work just fine as, depending on my usage of the data, the extra time and space used creating the compound index could reduce future query lookups.
The Full Explanation
The reason for this is that databases try to be smart and try to do things as fast as possible regardless of programmer input. The most basic item to consider when adding an index is will this object be looked up by this key. If yes, an index can potentially help speed that up. However, whether this index is even used all comes down to selectivity and the cardinality of the field.
Since foreign keys are typically the IDs of another AR class, cardinality usually will be high. But again, this depends on your data. In my example if there are many Foos but few Bars, many of the entries in my join table will have simliar bar_ids. With bar_ids having a low cardinality, an index on bar_id may never be used and may be getting in the way by having the database devote time and resources* to adding to this index every time a new bars_foos entry is created. The same goes with many Bars and few Foos and few of both.
The general lesson is that when considering an index on a table, decide if the entries will be both looked up by this field and if this field has a high cardinality. That is, does this field have many distinct values? In the case of most join tables "it depends" and we must think more carefully about what the data represents and the relationships themselves. In my case, I will have both many Foos and Bars and will be looking up Foos by their associated bars and vice versa.
Another good answer I got at the office was, "why are you worrying about your indexes? Build your app!"
Footnotes
* In a similar question on indexes on STI it was pointed out that the cost of an index is very low so when in doubt, just add it.
Depends on how you are going to query the data.
Assuming you want to search for all of these...
WHERE bar_id = ?
WHERE foo_id = ?
WHERE bar_id = ? AND foo_id = ?
...then you should probably go with an index on {bar_id, foo_id} and an index on {foo_id}.
While you could also create a third index on {bar_id}, the price of maintaining additional index would probably outweigh the benefit of better clustering in the smaller index.
Also, how do you plan to cover your queries with indexes? Some of the alternatives, such as...
{foo_id, bar_id} and {bar_id}
{foo_id, bar_id} and {bar_id, foo_id}
...might cover certain kinds of queries better.
Covering is a balancing act - sometimes adding a field to an index just for covering purposes is justified, sometimes it's not. You won't know until you measure on realistic amounts of data.
(Disclaimer: I'm not familiar with Ruby. This answer is purely from the database perspective.)

Rails Caching DB Queries and Best Practices

The DB load on my site is getting really high so it is time for me to cache common queries that are being called 1000s of times an hour where the results are not changing.
So for instance on my city model I do the following:
def self.fetch(id)
Rails.cache.fetch("city_#{id}") { City.find(id) }
end
def after_save
Rails.cache.delete("city_#{self.id}")
end
def after_destroy
Rails.cache.delete("city_#{self.id}")
end
So now when I can City.find(1) the first time I hit the DB but the next 1000 times I get the result from memory. Great. But most of the calls to city are not City.find(1) but #user.city.name where Rails does not use the fetch but queries the DB again... which makes sense but not exactly what I want it to do.
I can do City.find(#user.city_id) but that is ugly.
So my question to you guys. What are the smart people doing? What is
the right way to do this?
With respect to the caching, a couple of minor points:
It's worth using slash for separation of object type and id, which is rails convention. Even better, ActiveRecord models provide the cacke_key instance method which will provide a unique identifier of table name and id, "cities/13" etc.
One minor correction to your after_save filter. Since you have the data on hand, you might as well write it back to the cache as opposed to delete it. That's saving you a single trip to the database ;)
def after_save
Rails.cache.write(cache_key,self)
end
As to the root of the question, if you're continuously pulling #user.city.name, there are two real choices:
Denormalize the user's city name to the user row. #user.city_name (keep the city_id foreign key). This value should be written to at save time.
-or-
Implement your User.fetch method to eager load the city. Only do this if the contents of the city row never change (i.e. name etc.), otherwise you can potentially open up a can of worms with respect to cache invalidation.
Personal opinion:
Implement basic id based fetch methods (or use a plugin) to integrate with memcached, and denormalize the city name to the user's row.
I'm personally not a huge fan of cached model style plugins, I've never seen one that's saved a significant amount of development time that I haven't grown out of in a hurry.
If you're getting way too many database queries it's definitely worth checking out eager loading (through :include) if you haven't already. That should be the first step for reducing the quantity of database queries.
If you need to speed up sql queries on data that doesnt change much over time then you can use materialized views.
A matview stores the results of a query into a table-like structure of
its own, from which the data can be queried. It is not possible to add
or delete rows, but the rest of the time it behaves just like an
actual table. Queries are faster, and the matview itself can be
indexed.
At the time of this writing, matviews are natively available in Oracle
DB, PostgreSQL, Sybase, IBM DB2, and Microsoft SQL Server. MySQL
doesn’t provide native support for matviews, unfortunately, but there
are open source alternatives to it.
Here is some good articles on how to use matviews in Rails
sitepoint.com/speed-up-with-materialized-views-on-postgresql-and-rails
hashrocket.com/materialized-view-strategies-using-postgresql
I would go ahead and take a look at Memoization, which is now in Rails 2.2.
"Memoization is a pattern of
initializing a method once and then
stashing its value away for repeat
use."
There was a great Railscast episode on it recently that should get you up and running nicely.
Quick code sample from the Railscast:
class Product < ActiveRecord::Base
extend ActiveSupport::Memoizable
belongs_to :category
def filesize(num = 1)
# some expensive operation
sleep 2
12345789 * num
end
memoize :filesize
end
More on Memoization
Check out cached_model

Resources