In my Rails application I have an index view that lists all of my projects.
This list can be sorted by clicking on any of the table column headers, e.g. Date, Name, updated_at etc. This happens by appending a &sort= GET parameter to the URL.
My question is: From a performance point-of-view, would it be advisable to add indexes to these columns in my database?
This is what a migration might look like:
class AddMoreIndexes < ActiveRecord::Migration
def change
add_index :projects, :date
add_index :projects, :name
add_index :projects, :update_at
end
end
Will I get any performance gains from this?
Indexes can be used to speed an order-by, but if you were identifying a subset of rows to display then an index that is helpful for that is likely to be chosen in preference. You'd need composite indexes in such a situation.
There're a couple of other problems.
Firstly, ordering on an indexed string value may require a linguistically sorted index, not the regular ASCII/Binary sort, so multilingual applications may not be helped at all.
Secondly, it can discourage normalisation of the database because you really need the display values to be in the table you're selecting.
You might like to look at using another method for the sort. I've been very happy with using Google visualisation tables, which come with JQuery sorting built in.
Depending on how you query your database, then yes, it will give you performance gains. For example, whenever I add a foreign key to a table, I immediately index by it. Why? I know queries will be running through it in my application. If not, I wouldn't have put a foreign key. In this way, especially when you accumulate a large amount of data in your database, it will definitely give performance gains (sometimes, by an incredible amount). If you plan to query your database by date, name, or updated at, then yes, it could potentially be a performance gain depending on your query. Otherwise, there really is no point.
Note, you wouldn't want to add an index for every column. Having necessary indices will help you, but if you have an index for every column, then you run the risk of confusing the SQL Query Optimizer and actually hindering your performance.
My suggestion: Add an index for every foreign key you have in your table, but if you're also running some heavy queries with other columns, then add an index there too.
Related
I have this query in my rails application where I get the name of the Books.
Book.where(:name => #name_list).pluck(:name)
Basically it find the Book whose name is present in the #name_list and then return an array of their names if present.
But since there are a huge number of books present in the database, the request is getting timed out when I call this particular endpoint.
Please let me know if there is any way we can make this faster so that endpoint will work.
Also will the query speed increase if we add this name column as an index into the Books table ?
add_index :books, :name
You asked it an index on books.name will increase the performance of query filtering books by a given list of names.
The answer is yes. This is exactly the textbook use-case for database indexes. The bigger the table is, the bigger the performance benefit will be. For huge tables, it is not unlikely that queries using the index will be a magnitude faster.
I highly suggest adding such an index with the method you already named in your question and try again:
add_index :books, :name
I think this is something you're looking for, it's best if you process the data in batches to prevent memory bloat, or process this large amount of data at once.
In Rails 3.2, how to "pluck_in_batches" for a very large table
Hi Guys, i would like to know, if i create unique index of two columns on postgreSQL, does normal indexing for the both columns also work by same unique index or i have to create one unique index and two more index for both columns as shown in the code? I want to create unique index of talent_id, job_id, also both columns should separately indexed. I read many resources but does not get appropriate answer.
add_index :talent_actions, [:talent_id, :job_id], unique: true
Does above code also handles below indexing also or i have to add below indexing separately?
add_index :talent_actions, :talent_id
add_index :talent_actions, :job_id
Thank you.
An index is an object in the database, which can be used to look up data faster, if the query planner decides it will be appropriate. So the trivial answer to your question is "no", creating one index will not result in the same structures in the database as creating three different indexes.
I think what you actually want to know is this:
Do I need all three indexes, or will the unique index already optimise all queries?
This, as with any database optimisation, depends on the queries you run, and the data you have.
Here are some considerations:
The order of columns in a multi-column index matters. If you have an index of people sorted by surname then first name, then you can use it to search for everybody with the same surname; but you probably can't use it to search for somebody when you only know their first name.
Data distribution matters. If everyone in your list has the surnames "Smith" and "Jones", then you can use a surname-first index to search for a first name fairly easily (just look up under Jones, then under Smith).
Index size matters. The fewer columns an index has, the more of it fits in memory at once, so the faster it will be to use.
Often, there are multiple indexes the query planner could use, and its job is to estimate the cost of the above factors for the query you've written.
Usually, it doesn't hurt to create multiple indexes which you think might help, but it does use up disk space, and occasionally can cause the query planner to pick a worse plan. So the best approach is always to populate a database with some real data, and look at the query plans for some real queries.
Three questions:
When you create a model, is a foreign_key automatically created as well?
I'm thinking I should add_index when a column is unique to the table or unique in general, or when the column will relate to other databases, Amirite?
What will an index look like? Will it just use the contents of the cell?
1) Do you mean when using a generator? Generally speaking, you should generate migrations, rather than use a generator for the whole model/scaffolding. And then, no, a foreign key is not automatically created, only if you specify it.
2) add_index is going to come in handy for columns on big tables that need to be accessed quickly by your database. Let's say you've got a users table with an email column that must be unique, but isn't indexed. And your service grows, now you have millions of users, and you need to go User.find_by_email "someone#example.com". Without an index, that's going to take you a while. With an index, it'll be quick. That's when an index comes in handy.
3) Really depends on your database engine afaik. Not something that will affect your day-to-day imho (though if you have a specific database engine in mind, you can certainly find out). Here's the info on MySQL, straight from the source: https://dev.mysql.com/doc/refman/5.5/en/column-indexes.html
I am trying to get my head around this code... it's from the Rails Tutorial Book and is part of the process of making a twitter like application.
class CreateRelationships < ActiveRecord::Migration
def change
create_table :relationships do |t|
t.integer :follower_id
t.integer :followed_id
t.timestamps
end
add_index :relationships, :follower_id
add_index :relationships, :followed_id
add_index :relationships, [:follower_id, :followed_id], unique: true
end
end
Since there are only 2 columns (follower_id and followed_id), why
would their be a need for an index?
Does the index sort them in some way? It just seems a bit strange to
me to add an index to a table with 2 columns.
What does the index do to the rows?
Is indexing optional? If so why/why not use it? Is it a good idea to use it in the code above?
Please answer all the questions if you can. I'm just trying to get my head around this concept and after reading about it I have these questions.
Since there are only 2 columns (follower_id and followed_id) why would
there be a need for an index?
Need for indexing doesn't depend on the number of columns. It's used for speeding up the lookups. Even if your table has only one column, verifying whether a particular value is present in that column will need you to scan the whole table in the worst case. With an index it can be answered immediately.
Does the index sort them in some way? It just seems a bit strange to
me to add an index to a table with 2 columns?
No, in general indexes don't sort the data in the table in any way. I say "in general" because clustered indices do sort the data. See this question for more details.
What does the index do to the rows?
Again, nothing in general. Different DBMSes use different mechanisms to associate a row in the table to the index. Indexing is one of the most important tasks in a DBA's work. It'd be great if you have basic ideas about it. Read the wikipedia article to get the basics.
Is indexing optional? If so why/why not use it?
Yes, indexing is optional. You should use indexes when you see your query performance go down. Again, different DBMSes provide different mechanisms for you to monitor your query performance and you should have monitoring in place to alert you when performance degrades beyond a threshold. With experience, you'll reach a point where most of the indexing needs of an application will be clear to you from the beginning.
Is it a good idea to use it in the code above?
Can't comment on that. Indexing needs of each application are different. You should be aware of downsides of over-indexing as well. If you have a lot of indexes, your updates, inserts and deletes will become slower with time since they will also need to update your indexes.
An index on a column or set of columns speeds lookups on that column or set of columns. It's usefulness has nothing to do with the number of columns in the table since it's purpose is to locate the row(s) associated with the column values.
No, the index doesn't sort the table.
The index doesn't "do" anything to the rows, although if it's a "unique" index it would prevent the creation/update of rows which duplicate the column(s) in question.
Indexing is optional. It speeds up lookups, but takes additional time for write operations. Whether or not it is a good idea depends entirely on the application.
With performance improvements in mind, I was wondering if and which indexes are helpful on a join table (specifically used in a Rails 3 has_and_belongs_to_many context).
Model and Table Setup
My models are Foo and Bar and per rails convention, I have a join table called bars_foos. There is no primary key or timestamps making the old fields in this table bar_id:integer and foo_id:integer. I'm interested in knowing which of the following indexes is best and is without duplication:
A compound index: add_index :bars_foos, [:bar_id, :foo_id]
Two indexes
A. add_index :bars_foos, :bar_id
B. add_index :bars_foos, :foo_id
A combination of both 1 and 2-B
Basically, I'm not sure if the compound index is enough assuming it is helpful to begin with. I believe that a compound index can be used as a single index for the first item which is why I am pretty sure that using all three lines would certainly result in unnecessary duplication.
Likely Usage
The most common usage will be given an instance of model Foo, I will be asking for its associated bars using the RoR syntax of foo.bars and vice versa with bar.foos for an instance of the model Bar.
These will generate queries of the type SELECT * FROM bars_foos WHERE foo_id = ? and SELECT * FROM bars_foos WHERE bar_id = ? respectively and then using those resultant IDs to SELECT * FROM bars WHERE ID in (?) and SELECT * FROM foos WHERE ID in (?).
Please correct me in the comments if I am incorrect, but I do not believe that, in the context of the Rails application, it is ever going to try to do a query where it specifies both IDs like SELECT * FROM bars_foos where bar_id = ? AND foo_id = ?.
Databases
In the event there are database specific optimization techniques, I will most likely be using PostgreSQL. However, others using this code may want to use it in MySQL or SQLite depending on their Rails configuration so all answers are appreciated.
The Answer
The oft repeated answer, which tends to always be the case more often than not is, "it depends." More specifically, it depends on what your data is and how it will be used.
tl;dr Explanation
The short tl;dr answer for my specific case (and to cover all future bases) is choice #2 which is what I suspected. However, choice #3 would work just fine as, depending on my usage of the data, the extra time and space used creating the compound index could reduce future query lookups.
The Full Explanation
The reason for this is that databases try to be smart and try to do things as fast as possible regardless of programmer input. The most basic item to consider when adding an index is will this object be looked up by this key. If yes, an index can potentially help speed that up. However, whether this index is even used all comes down to selectivity and the cardinality of the field.
Since foreign keys are typically the IDs of another AR class, cardinality usually will be high. But again, this depends on your data. In my example if there are many Foos but few Bars, many of the entries in my join table will have simliar bar_ids. With bar_ids having a low cardinality, an index on bar_id may never be used and may be getting in the way by having the database devote time and resources* to adding to this index every time a new bars_foos entry is created. The same goes with many Bars and few Foos and few of both.
The general lesson is that when considering an index on a table, decide if the entries will be both looked up by this field and if this field has a high cardinality. That is, does this field have many distinct values? In the case of most join tables "it depends" and we must think more carefully about what the data represents and the relationships themselves. In my case, I will have both many Foos and Bars and will be looking up Foos by their associated bars and vice versa.
Another good answer I got at the office was, "why are you worrying about your indexes? Build your app!"
Footnotes
* In a similar question on indexes on STI it was pointed out that the cost of an index is very low so when in doubt, just add it.
Depends on how you are going to query the data.
Assuming you want to search for all of these...
WHERE bar_id = ?
WHERE foo_id = ?
WHERE bar_id = ? AND foo_id = ?
...then you should probably go with an index on {bar_id, foo_id} and an index on {foo_id}.
While you could also create a third index on {bar_id}, the price of maintaining additional index would probably outweigh the benefit of better clustering in the smaller index.
Also, how do you plan to cover your queries with indexes? Some of the alternatives, such as...
{foo_id, bar_id} and {bar_id}
{foo_id, bar_id} and {bar_id, foo_id}
...might cover certain kinds of queries better.
Covering is a balancing act - sometimes adding a field to an index just for covering purposes is justified, sometimes it's not. You won't know until you measure on realistic amounts of data.
(Disclaimer: I'm not familiar with Ruby. This answer is purely from the database perspective.)