I have a Custom Query that look like this
self.account.websites.find(:all,:joins => [:group_websites => {:group => :users}],:conditions=>["users.id =?",self])
where self is a User Object
I manage to generate the equivalent SQL for same
Here how it look
sql = "select * from websites INNER JOIN group_websites on group_websites.website_id = websites.id INNER JOIN groups on groups.id = group_websites.group_id INNER JOIN group_users ON (groups.id = group_users.group_id) INNER JOIN users on (users.id = group_users.user_id) where (websites.account_id = #{account_id} AND (users.id = #{user_id}))"
With the decent understanding of SQL and ActiveRecord I assumed that(which most would agree on) the result obtained from above query might take a longer time as compare to result obtained from find_by_sql(sql) one.
But Surprisingly
When I ran the above two
I found the ActiveRecord custom Query leading the way from ActiveRecord "find_by_sql" in term of load time
here are the test result
ActiveRecord Custom Query load time
Website Load (0.9ms)
Website Columns(1.0ms)
find_by_sql load time
Website Load (1.3ms)
Website Columns(1.0ms)
I repeated the test again an again and the result still the came out the same(with Custom Query winning the battle)
I know the difference aren't that big but still I just cant figure out why a normal find_by_sql query is slower than Custom Query
Can Anyone Share a light on this.
Thanks Anyway
Regards
Viren Negi
With the find case, the query is parameterized; this means the database can cache the query plan and will not need to parse and compile the query again.
With the find_by_sql case the entire query is passed to the database as a string. This means there is no caching that the database can do on the structure of the query, and it needs to be parsed and compiled on each occasion.
I think you can test this: try find_by_sql in this way (parameterized):
User.find_by_sql(["select * from websites INNER JOIN group_websites on group_websites.website_id = websites.id INNER JOIN groups on groups.id = group_websites.group_id INNER JOIN group_users ON (groups.id = group_users.group_id) INNER JOIN users on (users.id = group_users.user_id) where (websites.account_id = ? AND (users.id = ?))", account_id, users.id])
Well, the reason is probably quite simple - with custom SQL, the SQL query is sent immediately to db server for execution.
Remember that Ruby is an interpreted language, therefore Rails generates a new SQL query based on the ORM meta language you have used before it can be sent to the actual db server for execution. I would say additional 0.1 ms is the time taken by framework to generate the query.
Related
I have tried converting this plain sql query to rails active record but I am unable to do so.
select vote_shares.election_year as vs_election_name,
vote_shares.party as vs_party,
(sum(vote_shares.party_seats)/totals.total)*100 AS vs
from pcdemographics INNER JOIN vote_shares on vote_shares.pc_id = pcdemographics.pc_id,
(
SELECT vote_shares.election_name, sum(vote_shares.party_seats) as total
FROM `pcdemographics`
INNER JOIN vote_shares on vote_shares.pc_id = pcdemographics.pc_id
GROUP BY `election_name`
) AS totals
where vote_shares.election_name=totals.election_name
group by vote_shares.party,vote_shares.election_name;
This is what I have tried
#vssubquery = Pcdemographic.select('vote_shares.election_name, sum(vote_shares.party_seats) as total').joins('INNER JOIN vote_shares on vote_shares.pc_id = pcdemographics.pc_id')
Pcdemographic.select("vote_shares.election_year as vs_election_year,
vote_shares.party as vs_party,
(sum(vote_shares.party_seats)/'#{totals.total}')*100 AS vs").from(#vssubquery,:totals)
.joins("INNER JOIN vote_shares on vote_shares.pc_id = pcdemographics.pc_id and vote_shares.election_name='#{totals.election_name}'")
My answer might not be what you hoped for but I recommend not using AR, use Sequel (http://sequel.jeremyevans.net/) instead. It uses the concept of Datasets which I don't think has any equivalent in AR.
Disclaimer: Nobody asked me to advertise for it. I used both AR and Sequel and I found that Sequel is much better to perform complex queries and avoid the N+1 problem.
Did you try find_by_sql method?
I have some RAW sql and I'm not sure if it would be better as an Activerecord call or should I use RAW sql. Would this be easy to convert to AR?
select *
from logs t1
where
log_status_id = 2 and log_type_id = 1
and not exists
(
select *
from logs t2
where t2.log_version_id = t1.log_version_id
and t2.log_status_id in (1,3,4)
and log_type_id = 1
)
ORDER BY created_at ASC
So something like this?:
Log.where(:log_status_id=>2, log_type_id => 1).where.not(Log.where.....)
You could do this using AREL. See Rails 3: Arel for NOT EXISTS? for an example.
Personally I often find raw SQL to be more readable/maintainable than AREL queries, though. And I guess most developers are more familiar with it in general, too.
But in any case, your approach to separate the narrowing by log_states_id and log_type_id from the subquery is a good idea. Even if your .where.not construct won't work as written.
This should do the trick however:
Log.where(log_status_id: 2, log_type_id: 1)
.where("NOT EXISTS (
select *
from logs t2
where t2.log_version_id = logs.log_version_id
and t2.log_status_id in (1,3,4)
and t2.log_type_id = logs.log_type_id)")
.order(:created_at)
The only constellation where this might become problematic is when you try to join this query to other queries because the outer table will likely receive a different alias than logs.
For the analytics of my site, I'm required to extract the 4 states of my users.
#members = list.members.where(enterprise_registration_id: registration.id)
# This pulls roughly 10,0000 records.. Which is evidently a huge data pull for Rails
# Member Load (155.5ms)
#invited = #members.where("user_id is null")
# Member Load (21.6ms)
#not_started = #members.where("enterprise_members.id not in (select enterprise_member_id from quizzes where quizzes.section_id IN (?)) AND enterprise_members.user_id in (select id from users)", #sections.map(&:id) )
# Member Load (82.9ms)
#in_progress = #members.joins(:quizzes).where('quizzes.section_id IN (?) and (quizzes.completed is null or quizzes.completed = ?)', #sections.map(&:id), false).group("enterprise_members.id HAVING count(quizzes.id) > 0")
# Member Load (28.5ms)
#completes = Quiz.where(enterprise_member_id: registration.members, section_id: #sections.map(&:id)).completed
# Quiz Load (138.9ms)
The operation returns a 503 meaning my app gives up on the request. Any ideas how I can refactor this code to run faster? Maybe by better joins syntax? I'm curious how sites with larger datasets accomplish what seems like such trivial DB calls.
The answer is your indexes. Check your rails logs (or check the console in development mode) and copy the queries to your db tool. Slap an "Explain" in front of the query and it will give you a breakdown. From here you can see what indexes you need to optimize the query.
For a quick pass, you should at least have these in your schema,
enterprise_members: needs an index on enterprise_member_id
members: user_id
quizes: section_id
As someone else posted definitely look into adding indexes if needed. Some of how to refactor depends on what exactly you are trying to do with all these records. For the #members query, what are you using the #members records for? Do you really need to retrieve all attributes for every member record? If you are not using every attribute, I suggest only getting the attributes that you actually use for something, .pluck usage could be warranted. 3rd and 4th queries, look fishy. I assume you've run the queries in a console? Again not sure what the queries are being used for but I'll toss in that it is often useful to write raw sql first and query on the db first. Then, you can apply your findings to rewriting activerecord queries.
What is the .completed tagged on the end? Is it supposed to be there? only thing I found close in the rails api is .completed? If it is a custom method definitely look into it. You potentially also have an use case for scopes.
THIRD QUERY:
I unfortunately don't know ruby on rails, but from a postgresql perspective, changing your "not in" to a left outer join should make it a little faster:
Your code:
enterprise_members.id not in (select enterprise_member_id from quizzes where quizzes.section_id IN (?)) AND enterprise_members.user_id in (select id from users)", #sections.map(&:id) )
Better version (in SQL):
select blah
from enterprise_members em
left outer join quizzes q on q.enterprise_member_id = em.id
join users u on u.id = q.enterprise_member_id
where quizzes.section_id in (?)
and q.enterprise_member_id is null
Based on my understanding this will allow postgres to sort both the enterprise_members table and the quizzes and do a hash join. This is better than when it will do now. Right now it finds everything in the quizzes subquery, brings it into memory, and then tries to match it to enterprise_members.
FIRST QUERY:
You could also create a partial index on user_id for your first query. This will be especially good if there are a relatively small number of user_ids that are null in a large table. Partial index creation:
CREATE INDEX user_id_null_ix ON enterprise_members (user_id)
WHERE (user_id is null);
Anytime you query enterprise_members with something that matches the index's where clause, the partial index can be used and quickly limit the rows returned. See http://www.postgresql.org/docs/9.4/static/indexes-partial.html for more info.
Thanks everyone for your ideas. I basically did what everyone said. I added indexes, resorted how I called everything, but the major difference was using the pluck method.. Here's my new stats :
#alt_members = list.members.pluck :id # 23ms
if list.course.sections.tests.present? && #sections = list.course.sections.tests
#quiz_member_ids = Quiz.where(section_id: #sections.map(&:id)).pluck(:enterprise_member_id) # 8.5ms
#invited = list.members.count('user_id is null') # 12.5ms
#not_started = ( #alt_members - ( #alt_members & #quiz_member_ids ).count #0ms
#in_progress = ( #alt_members & #quiz_member_ids ).count # 0ms
#completes = ( #alt_members & Quiz.where(section_id: #sections.map(&:id), completed: true).pluck(:enterprise_member_id) ).count # 9.7ms
#question_count = Quiz.where(section_id: #sections.map(&:id), completed: true).limit(5).map{|quiz|quiz.answers.count}.max # 3.5ms
I'm just beginning with ruby on rails and have a question regarding a bit more complex query. So far I've done simple queries while looking at rails guide and it worked really well.
Right now I'm trying to get some Ids from database and I would use those Ids to get the real objects and do something with them. Getting those is a bit more complex than simple Object.find method.
Here is how my query looks like :
select * from quotas q, requests r
where q.id=r.quota_id
and q.status=3
and r.text is not null
and q.id in
(
select A.id from (
select max(id) as id, name
from quotas
group by name) A
)
order by q.created_at desc
limit 1000;
This would give me 1000 ids when executing this query from sql manager. And I was thinking to obtain the list of ids first and then find objects by id.
Is there a way to get these objects directly by using this query? Avoiding ids lookup? I googled that you can execute query like this :
ActiveRecord::Base.connection.execute(query);
Assuming Quota has_many :requests,
Quota.includes(:requests).
where(status:3).
where('requests.text is not null').
where("quotas.id in (#{subquery_string_here})").
order('quotas.created_at desc').limit(1000)
I'm by no means an expert but most basic SQL functionality is baked into ActiveRecord. You might also want to look at the #group and #pluck methods for ways to eliminate the ugly string subquery.
Calling #to_sql on a relationship object will show you the SQL command it is equivalent to, and may help with your debugging.
I would use find_by_sql for this. I wouldn't swear that this is exactly right, but as I recall you can pretty much plonk an SQL statement into a find_by_sql and the resulting columns will be returned as attributes of an array of objects of the class you call it on:
status = 3
Quota.find_by_sql('
select *
from quotas q, requests r
where q.id=r.quota_id
and q.status= ?
and r.text is not null
and q.id in
(
select A.id from (
select max(id) as id, name
from quotas
group by name) A
)
order by q.created_at desc
limit 1000;', status)
If you come to Rails as someone used to writing raw SQL, you're probably better off using this syntax than stringing together a bunch of ActiveRecord methods - the result is the same, so it's just a matter of what you find more readable.
Btw, you shouldn't use string interpolation (i.e. #{variable} syntax) inside an SQL query. Use the '?' syntax instead (see my example) to avoid SQL injection potential.
In rails I have 2 tables:
bans(ban_id, admin_id)
ban_reasons(ban_reason_id, ban_id, reason_id)
I want to find all the bans for a certain admin where there is no record in the ban_reasons table. How can I do this in Rails without looping through all the ban records and filtering out all the ones with ban.ban_reasons.nil? I want to do this (hopefully) using a single SQL statement.
I just need to do: (But I want to do it the "rails" way)
SELECT bans.* FROM bans WHERE admin_id=1234 AND
ban_id NOT IN (SELECT ban_id FROM ban_reasons)
Your solution works great (only one request) but it's almost plain SQL:
bans = Ban.where("bans.id NOT IN (SELECT ban_id from ban_reason)")
You may also try the following, and let rails do part of the job:
bans = Ban.where("bans.id NOT IN (?)", BanReason.select(:ban_id).map(&:ban_id).uniq)
ActiveRecord only gets you to a point, everything after should be done by raw SQL. The good thing about AR is that it makes it pretty easy to do that kind of stuff.
However, since Rails 3, you can do almost everything with the AREL API, although raw SQL may or may not look more readable.
I'd go with raw SQL and here is another query you could try if yours doesn't perform well:
SELECT b.*
FROM bans b
LEFT JOIN ban_reason br on b.ban_id = br.ban_id
WHERE br.ban_reason_id IS NULL
Using Where Exists gem (which I'm author of):
Ban.where(admin_id: 123).where_not_exists(:ban_reasons)