> player.records
Record Load (0.5ms) SELECT * FROM `records` WHERE (`records`.player_id = 1)
> player.records.first(:conditions => {:metric_id => "IS NOT NULL"})
Record Load (0.5ms) SELECT * FROM `records` WHERE (`records`.player_id = 1 AND (`records`.`metric_id` = 'IS NOT NULL')) LIMIT 1
Is there a way to make the second query not hit the database, but use the cache instead? It seems a bit excessive for it to be hitting the database again when they data is already in memory.
I need both results. I'm aware that Ruby can iterate through the values, but I'd prefer to do this through ActiveRecord if possible. I'm coming from a Django background where filter() did this just fine.
I'm using Rails 2.3.
No, simply because the condition is different.
But try to explain the context. Why do you need to use both queries? Can't you use only the second one?
If you need both, why can't you filter the Array with Ruby code instead of making another query?
Related
For the analytics of my site, I'm required to extract the 4 states of my users.
#members = list.members.where(enterprise_registration_id: registration.id)
# This pulls roughly 10,0000 records.. Which is evidently a huge data pull for Rails
# Member Load (155.5ms)
#invited = #members.where("user_id is null")
# Member Load (21.6ms)
#not_started = #members.where("enterprise_members.id not in (select enterprise_member_id from quizzes where quizzes.section_id IN (?)) AND enterprise_members.user_id in (select id from users)", #sections.map(&:id) )
# Member Load (82.9ms)
#in_progress = #members.joins(:quizzes).where('quizzes.section_id IN (?) and (quizzes.completed is null or quizzes.completed = ?)', #sections.map(&:id), false).group("enterprise_members.id HAVING count(quizzes.id) > 0")
# Member Load (28.5ms)
#completes = Quiz.where(enterprise_member_id: registration.members, section_id: #sections.map(&:id)).completed
# Quiz Load (138.9ms)
The operation returns a 503 meaning my app gives up on the request. Any ideas how I can refactor this code to run faster? Maybe by better joins syntax? I'm curious how sites with larger datasets accomplish what seems like such trivial DB calls.
The answer is your indexes. Check your rails logs (or check the console in development mode) and copy the queries to your db tool. Slap an "Explain" in front of the query and it will give you a breakdown. From here you can see what indexes you need to optimize the query.
For a quick pass, you should at least have these in your schema,
enterprise_members: needs an index on enterprise_member_id
members: user_id
quizes: section_id
As someone else posted definitely look into adding indexes if needed. Some of how to refactor depends on what exactly you are trying to do with all these records. For the #members query, what are you using the #members records for? Do you really need to retrieve all attributes for every member record? If you are not using every attribute, I suggest only getting the attributes that you actually use for something, .pluck usage could be warranted. 3rd and 4th queries, look fishy. I assume you've run the queries in a console? Again not sure what the queries are being used for but I'll toss in that it is often useful to write raw sql first and query on the db first. Then, you can apply your findings to rewriting activerecord queries.
What is the .completed tagged on the end? Is it supposed to be there? only thing I found close in the rails api is .completed? If it is a custom method definitely look into it. You potentially also have an use case for scopes.
THIRD QUERY:
I unfortunately don't know ruby on rails, but from a postgresql perspective, changing your "not in" to a left outer join should make it a little faster:
Your code:
enterprise_members.id not in (select enterprise_member_id from quizzes where quizzes.section_id IN (?)) AND enterprise_members.user_id in (select id from users)", #sections.map(&:id) )
Better version (in SQL):
select blah
from enterprise_members em
left outer join quizzes q on q.enterprise_member_id = em.id
join users u on u.id = q.enterprise_member_id
where quizzes.section_id in (?)
and q.enterprise_member_id is null
Based on my understanding this will allow postgres to sort both the enterprise_members table and the quizzes and do a hash join. This is better than when it will do now. Right now it finds everything in the quizzes subquery, brings it into memory, and then tries to match it to enterprise_members.
FIRST QUERY:
You could also create a partial index on user_id for your first query. This will be especially good if there are a relatively small number of user_ids that are null in a large table. Partial index creation:
CREATE INDEX user_id_null_ix ON enterprise_members (user_id)
WHERE (user_id is null);
Anytime you query enterprise_members with something that matches the index's where clause, the partial index can be used and quickly limit the rows returned. See http://www.postgresql.org/docs/9.4/static/indexes-partial.html for more info.
Thanks everyone for your ideas. I basically did what everyone said. I added indexes, resorted how I called everything, but the major difference was using the pluck method.. Here's my new stats :
#alt_members = list.members.pluck :id # 23ms
if list.course.sections.tests.present? && #sections = list.course.sections.tests
#quiz_member_ids = Quiz.where(section_id: #sections.map(&:id)).pluck(:enterprise_member_id) # 8.5ms
#invited = list.members.count('user_id is null') # 12.5ms
#not_started = ( #alt_members - ( #alt_members & #quiz_member_ids ).count #0ms
#in_progress = ( #alt_members & #quiz_member_ids ).count # 0ms
#completes = ( #alt_members & Quiz.where(section_id: #sections.map(&:id), completed: true).pluck(:enterprise_member_id) ).count # 9.7ms
#question_count = Quiz.where(section_id: #sections.map(&:id), completed: true).limit(5).map{|quiz|quiz.answers.count}.max # 3.5ms
After running two similar queries like
#articles = #magazine.articles.limit(2).offset(0)
#articles = #articles.limit(2).offset(2)
I was expecting to see two SQL statements in my console being executed by the server. However, the first query is missing and only the second one is being run. Similarly, after executing the following two queries:
#articles = #magazine.articles.limit(2).offset(0)
#articles = #articles.limit(2).offset(#articles.size - 2)
the first query is completely ignored as well. These two queries generate the SQL:
SELECT COUNT(count_column) FROM (SELECT 1 AS count_column FROM "articles"
WHERE "articles"."magazine_id" = $1 LIMIT 2 OFFSET 0)
subquery_for_count [["magazine_id", 1]]
SELECT "articles".* FROM "articles"
WHERE "articles"."magazine_id" = $1
LIMIT 2 OFFSET 2 [["magazine_id", 1]]
Interestingly enough, if I change #articles.size to #articles.length both queries are run as expected. I would think since length requires the collection in memory, the first statement is forced to run. Can anyone describe what's happening here and if it's too broad a topic, point me to a good resource.
It's not so much optimising as deferring execution of the query until it really needs to execute it.
In both cases you're storing the result of building up a query in #articles. Active Record, or more accurately arel, defers execution of the query until you call a method that needs the results. I suspect that you're actually seeing the query being executed against the database when you call something like #artircles.each or #articles.count or somesuch.
You could build the query up in a series of steps and it won't actually get executed:
a = #magazine.articles
a = a.limit(2)
a = a.offset(0)
It also means you can leave some query clause that drastically reduces the result size to the end of the process:
a = a.where('created_at > ?', Time.now.at_beginning_of_day)
Still no query has been sent to the database.
The thing to watch out for is testing this logic in the rails console. If you run these steps in the console itself it tries to display the last return value (by calling .inspect I think) and by inspecting the return value it causes the query to be executed. So if you put a = Magazine.find(1).articles into the console you'll see a query immediately exeecuted which wouldn't have been if the code was run in the context of a controller action for example. If you then call a.limit(2) you'll see another query and so on.
I must convert ~ 1.300.000 records on my database.
Do you know a method faster than this?
Article.find_each(&:save)
If you're looking to update a single field in a table, you can use update_all on your ActiveRecord model.
Post.update_all(:published=>true)
# UPDATE "posts" SET "published" = 't'
This works with an ActiveRecord scopes as well.
Post.where(:published=>true).update_all(:published=>false)
# SQL (3.3ms) UPDATE "posts" SET "published" = 'f' WHERE "posts"."published" = 't'
By using this, you can use conditional statements (such as where) to pick out common rows in your table and perform update_all on them. This is assuming you want to do some form of attribute updating before saving the record.
You can increase the number of records in batch (the default is 1000), this number depends on how much memory you have in your server:
Article.find_each(:batch_size => 5000) { |r| r.save }
If you are creating, you need to bulk insert with a gem like activerecord-import. If you are updating, just use update_all.
I have a Custom Query that look like this
self.account.websites.find(:all,:joins => [:group_websites => {:group => :users}],:conditions=>["users.id =?",self])
where self is a User Object
I manage to generate the equivalent SQL for same
Here how it look
sql = "select * from websites INNER JOIN group_websites on group_websites.website_id = websites.id INNER JOIN groups on groups.id = group_websites.group_id INNER JOIN group_users ON (groups.id = group_users.group_id) INNER JOIN users on (users.id = group_users.user_id) where (websites.account_id = #{account_id} AND (users.id = #{user_id}))"
With the decent understanding of SQL and ActiveRecord I assumed that(which most would agree on) the result obtained from above query might take a longer time as compare to result obtained from find_by_sql(sql) one.
But Surprisingly
When I ran the above two
I found the ActiveRecord custom Query leading the way from ActiveRecord "find_by_sql" in term of load time
here are the test result
ActiveRecord Custom Query load time
Website Load (0.9ms)
Website Columns(1.0ms)
find_by_sql load time
Website Load (1.3ms)
Website Columns(1.0ms)
I repeated the test again an again and the result still the came out the same(with Custom Query winning the battle)
I know the difference aren't that big but still I just cant figure out why a normal find_by_sql query is slower than Custom Query
Can Anyone Share a light on this.
Thanks Anyway
Regards
Viren Negi
With the find case, the query is parameterized; this means the database can cache the query plan and will not need to parse and compile the query again.
With the find_by_sql case the entire query is passed to the database as a string. This means there is no caching that the database can do on the structure of the query, and it needs to be parsed and compiled on each occasion.
I think you can test this: try find_by_sql in this way (parameterized):
User.find_by_sql(["select * from websites INNER JOIN group_websites on group_websites.website_id = websites.id INNER JOIN groups on groups.id = group_websites.group_id INNER JOIN group_users ON (groups.id = group_users.group_id) INNER JOIN users on (users.id = group_users.user_id) where (websites.account_id = ? AND (users.id = ?))", account_id, users.id])
Well, the reason is probably quite simple - with custom SQL, the SQL query is sent immediately to db server for execution.
Remember that Ruby is an interpreted language, therefore Rails generates a new SQL query based on the ORM meta language you have used before it can be sent to the actual db server for execution. I would say additional 0.1 ms is the time taken by framework to generate the query.
In rails I have 2 tables:
bans(ban_id, admin_id)
ban_reasons(ban_reason_id, ban_id, reason_id)
I want to find all the bans for a certain admin where there is no record in the ban_reasons table. How can I do this in Rails without looping through all the ban records and filtering out all the ones with ban.ban_reasons.nil? I want to do this (hopefully) using a single SQL statement.
I just need to do: (But I want to do it the "rails" way)
SELECT bans.* FROM bans WHERE admin_id=1234 AND
ban_id NOT IN (SELECT ban_id FROM ban_reasons)
Your solution works great (only one request) but it's almost plain SQL:
bans = Ban.where("bans.id NOT IN (SELECT ban_id from ban_reason)")
You may also try the following, and let rails do part of the job:
bans = Ban.where("bans.id NOT IN (?)", BanReason.select(:ban_id).map(&:ban_id).uniq)
ActiveRecord only gets you to a point, everything after should be done by raw SQL. The good thing about AR is that it makes it pretty easy to do that kind of stuff.
However, since Rails 3, you can do almost everything with the AREL API, although raw SQL may or may not look more readable.
I'd go with raw SQL and here is another query you could try if yours doesn't perform well:
SELECT b.*
FROM bans b
LEFT JOIN ban_reason br on b.ban_id = br.ban_id
WHERE br.ban_reason_id IS NULL
Using Where Exists gem (which I'm author of):
Ban.where(admin_id: 123).where_not_exists(:ban_reasons)