I am having some performance problems with this two queries:
any_impression = Impression.exists?(user_id: user_id, created_at: range)
any_visit = Visit.exists?(user_id: user_id, created_at: range)
They have about 500k of records for each user and are taking more than 15s to run.
Based on this I would like to create two indexes, one for each search.
My question is, the indexes I should create are:
add_index :visits, [:user_id, :created_at]
add_index :impressions, [:user_id, :created_at]
Or need more some specific information to queries above use the indexes created ?
Thanks much.
Those indexes should be fine. In Postgres an index doesn't always know how to use a given operator---it depends on the index type. This page from the manual explains the details.
Your proposed indexes would be btree indexes. In my experiments, telling ActiveRecord to query a timestamp column based on a range produces BETWEEN ... AND ... SQL:
User.where(created_at: (Date.parse('2015-01-01') ..
Date.parse('2016-01-01'))).to_sql
gives:
SELECT "users".*
FROM "users"
WHERE ("users"."created_at" BETWEEN '2015-01-01' AND '2016-01-01')
Is that what you're seeing also? Then Postgres should use your index, because BETWEEN is just <= and >=.
You could also run the query by hand with EXPLAIN or EXPLAIN ANALYZE to see if the index is used as you expect.
Related
I have to run a query where a single column is called twice .
Transaction.where("datetime >= ? && datetime <= ?", params[:date_one], :params[:date_two])
Now while indexing this is the basic we do
add_index :transactions, :datetime
Now my question is can I do something like.....
add_index :transactions, [:datetime, :datetime]
Will it really speedup the search or benefit performance wise. Thanks in advance
You don't have to do that. adding index to column speeds up queries to that column. It does not matter how many times you use this column in your query, only the presence or absence of index matters. Also, you can rewrite your query like this:
Transaction.where("datetime BETWEEN ? AND ?", params[:date_one], :params[:date_two])
I'm using the validates_overlap gem (https://github.com/robinbortlik/validates_overlap) in a Rails app. Here is the Model code:
validates :start_time, :end_time, overlap: { scope: "device_id", exclude_edges: ["start_time", "end_time"] }
And here is the SQL it triggers:
SELECT 1 AS one FROM "bookings" WHERE
((bookings.end_time IS NULL OR bookings.end_time > '2014-04-11 13:00:00.000000') AND
(bookings.start_time IS NULL OR bookings.start_time < '2014-04-11 16:00:00.000000') AND
bookings.device_id = 20) LIMIT 1
I just want to know if I should be adding an index in my postgres database that covers start_time, end_time and device_id, or something similar? e.g. something like this:
add_index :bookings, [:device_id, :start_time, :end_time], unique: true
Adding the above index to ensure database consistency would make no sense. After all you are validating the Range AND excluding the actual edges (the unique index would check exactly the edges!).
Adding a non unique index to speed up the validation is a good idea. If so you should analyze your data and app queries.
The easiest approach is to simply add a single index for each column. Postgres can still use these for the multicolumn query (see heroku devcenter ).
Only if it really matters (or you do not query the columns in other combinations) a multicolumn index is necessary. If so the device_id should be first in index Rule of thumb: index for equality first—then for ranges.
My User table has about a million records.
My Region table has maybe 200 records.
Should I add_index :users, :region_id ?
And if it was the other way round where a region has a user_id, should I index the user_id in the region table?
Yes you should add. If you think to you are going to have lot of query that got condition for those fields.
In your case I think you might have lot queries like people living in one region, so I would create a index on users table for (id(user_id) and region_id fields together in one single index)
add_index :users, [:id, :user_id]
"Premature optimization is the root of all evil".
Add an index only if profiling of your application indicates that you need one.
My app uses a PostgreSQL database. I've got a migration that looks like this:
class CreateTagAssignments < ActiveRecord::Migration
def change
create_table :tag_assignments do |t|
t.integer :tag_id
t.integer :quote_id
t.integer :user_id
t.timestamps
end
add_index :tag_assignments, :tag_id
add_index :tag_assignments, :quote_id
end
end
Records will be quite frequently searched by these two columns so I want to have a separate index for each. But now I'd like to enforce uniqueness of the pair (tag_id, quote_id) on the database level. I tried add_index :tag_assignments, [:tag_id, :quote_id], unique: true but I got the error:
PG::Error: ERROR: could not create unique index "index_tag_assignments_on_tag_id_and_quote_id"
DETAIL: Key (tag_id, quote_id)=(10, 1) is duplicated.
: CREATE UNIQUE INDEX "index_tag_assignments_on_tag_id_and_quote_id" ON "tag_assignments" ("tag_id", "quote_id")
So multiple indexes apparently do the job of a multi-column index? If so, then I could add the constraint with ALTER TABLE ... ADD CONSTRAINT, but how can I do it in ActiveRecord?
edit: manually performing ALTER TABLE ... ADD CONSTRAINT produces the same error.
As Erwin points out, the "Key (tag_id, quote_id)=(10, 1) is duplicated" constraint violation error message tells you that your unique constraint is already violated by your existing data. I infer from what's visible of your model that different users can each introduce a common association between a tag and a quote, so you see duplicates when you try to constrain uniqueness for just the quote_id,tag_id pair. Compound indexes are still useful for index access on leading keys (though slightly less efficiently than a single column index since the compound index will have lower key-density). You could probably get the speed you require along with the appropriate unique constraint with two indexes, a single column index on one of the ids and a compound index on all three ids with the other id as its leading field. If mapping from tag to quote was a more frequent access path than mapping from quote to tag, I would try this:
add_index :tag_assignments, :tag_id
add_index :tag_assignments, [:quote_id,:tag_id,:user_id], unique: true
If you're using Pg >= 9.2, you can take advantage of 9.2's index visibility maps to enable index-only scans of covering indexes. In this case there may be benefit to making the first index above contain all three ids, with tag_id and quote_id leading:
add_index :tag_assignments, [:tag_id,:quote_id,user_id]
It's unclear how user_id constrains your queries, so you may find that you want indexes with its position promoted earlier as well.
So multiple indexes apparently do the job of a multi-column index?
This conclusion is untrue as well as unfounded after what you describe. The error message indicates the opposite. A multicolumn index or a UNIQUE constraint on multiple columns (implementing a multi-column index, too) provide functionality that you cannot get out of multiple single-column indexes.
I would like to set a index based on 4 columns in my db in order to ensure fast lookups and also to ensure that no two rows have the identical entries in ALL the 4 columns. Thus ensuring uniqueness based on the 4 columns. I know this can be done in other languages and frameworks, but can it be done in rails?
I have seen the following command to set an index in rails:
add_index "users", ["email"], :name => "index_users_on_email", :unique => true
However, can something similar be done for more than 1 column?
Also if this cannot be done for more than 1 column, how do people handle uniqueness based on multiple columns in rails then?
Yes, you can create an index on multiple columns in Rails.
add_index users, [email, col1, col2, col3], :name => "my_name", :unique => true
should work. As long as you specify the name you should be good.
PostgreSQL (not sure about MySQL) has a character limit on constraint names, so when using add_index for multiple columns, make sure you're either giving a custom name, or that your column names are short enough to fit under the limit, because otherwise the auto-generated index_users_on_col1_and_col2_and_col3 could screw things up for you.