I have query as
select i.shipping_charges, i.shipping_carrier, i.shipping_method,
i.tracking_number, i.origin_zip_code, i.origin_city,
i.origin_country, i.weight_value, i.weight_unit,
i.delivery_date, i.shipping_date, i.shipping_description,
i.delivery_zip_code, i.delivery_street_add, i.item_id,
i.start_at, i.end_at, i.id
from (items it
left join item_shipping_details i on it.id = i.item_id)
left join users u on u.id = it.alert_user_id
where it.user_id=4 AND i.id in (35,602,1175,1176,1177,604,1178,1174,
1165,1179,930,1160,917,914,925,909,920,1147,910)
AND (it.alert_user_id is null OR u.user_type in (2,3))
AND (it.outbound != true OR it.outbound is null)
It takes 8ms in postgresql to run.
Please tell any other alternative solution to this?
A couple of things were odd or straight out nonsensical about your query.
Primary table in the FROM list was items, but you don't have a single column of it in the SELECT list main table. The way you had it, it would at best add a bunch of rows with only NULL values, while confusing the query planner. You want neither of that.
I reversed the order and made item_shipping_details the primary table. This will be much faster.
The LEFT JOIN between items and item_shipping_details was contradictory, because additional clauses require a row from both tables anyway. Simplified to plain JOIN.
Also makes the order of appearance of the first two tables irrelevant again.
Removed the brackets around the first JOIN, as that served no purpose.
Simplified (it.outbound != true OR it.outbound is null) to it.outbound IS NOT TRUE
SELECT i.shipping_charges, i.shipping_carrier, i.shipping_method,
i.tracking_number, i.origin_zip_code, i.origin_city,
i.origin_country, i.weight_value, i.weight_unit,
i.delivery_date, i.shipping_date, i.shipping_description,
i.delivery_zip_code, i.delivery_street_add, i.item_id,
i.start_at, i.end_at, i.id
FROM item_shipping_details i
JOIN items it ON it.id = i.item_id
LEFT JOIN users u on u.id = it.alert_user_id
WHERE i.id IN (35,602,1175,1176,1177,604,1178,1174,1165,1179,
930,1160,917,914,925,909,920,1147,910)
AND it.user_id = 4
AND it.outbound IS NOT TRUE
AND (it.alert_user_id IS NULL OR u.user_type IN (2,3))
Should considerably increase execution time ... performance. :)
Assuming you meant decrease query execution time, turning on indexing for the appropriate columns can greatly speed up result retrieval.
THIS PART OF YOUR QUERY IS TAKING TIME
IN (35,602,1175,1176,1177,604,1178,1174,1165,1179,
930,1160,917,914,925,909,920,1147,910)
YES "Subqueries" there are many ways:
Use JOIN() and less INCLUDE() in association.
Try to avoid subqueries and find([array])
Use cache
There may be more better options you may find below in rails doc
http://guides.rubyonrails.org/active_record_querying.html
Related
There are three models that matter here: Objective, Student, and Seminar. All are associated with has_and_belongs_to_many.
There is an ObjectiveStudent join model that includes columns "ready" and "points_all_time". There is an ObjectiveSeminar join model that includes column "priority".
I need to collect all of the objectives that are associated with a given student and also with a given seminar.
They need to also be marked with a "priority" above zero in the seminar. So I think I need this line:
obj_sems = ObjectiveSeminar.where(:seminar => given_seminar).where("priority > ?", 0)
Finally, they need to also be objectives where the student is ready, but has not scored above 7. So I think I need this line:
obj_studs = ObjectiveStudent.where(:user => given_student, :ready => true).where("points_all_time <= ?", 7)
Is there a way to gather all the objectives whose join table records appear in both of the above queries? Note that neither of the lists return objectives; they return objective_seminars, and objective_students, respectively. My end goal is to collect the objectives that meet all of the above criteria.
Or am I approaching this all wrong?
Bonus question: I would also love to sort the objectives by their priority in the given seminar. But I'm afraid that would add too much to the database load. What are your thoughts on this?
Thank you in advance for any insight.
In order to get Objectives you'll need to start your query from that.
In order to query with an AND condition the associated tables, you'll need inner joins with these tables.
Finally you'll need a distinct operator to only fetch each objective once.
The extended version of what (I think) you need is:
Objective.joins(objective_seminars: :seminar, objective_student: :student).
where(seminars: seminar_search_params, strudents: student_search_params).
where('objective_seminars.priority > 0').
where('objective_students.ready = 1 AND points_all_time <= 7').
order('objective_seminars.priority ASC').
distinct
Now for the database load it all depends on your indexes and the size of your tables.
The above query will translate to the following SQL (or something similar).
SELECT DISTINCT objectives.* FROM objectives
INNER JOIN objective_students ON objective_students.objective_id = objectives.id
INNER JOIN students ON students.id = objective_students.student_id
INNER JOIN objective_seminars ON objective_seminars.objective_id = objectives.id
INNER JOIN seminars ON seminars.id = objective_seminars.seminar_id
WHERE seminars_query AND
students_query AND
objective_seminars.priority > 0 AND
objective_students.ready = 1 AND points_all_time <= 7 AND
objective_seminars.priority ASC
So you'll need to add or extend your indexes so that all 5 tables queries can have an index helping out. The actual index implementation is up to you and depends on your application's specific (read - write load, tables size, cardinality etc)
Say you creating an imdb type site for TV Shows. You have a Show with many attached episodes and a bunch of people
Right now I link people to episodes though a contribution table - but if I want to make a list of all the shows they are on, I have to go through episodes.
Since this query takes a long time I was thinking about adding show_id to the contributions table. Is this common practice to increase performance or is there another way I haven't thought of?
Since this query takes a long time
Have you run a SQL explain plan to show why this is the case? What is the actual SQL query that is being run, and are you doing things like ordering or running subqueries within it?
If I understand your structure it is something like this:
|people| n---1 |contribution| 1---n |episodes| n---1 |shows|
A sql select of the sort:
select distinct s.name
from shows s,
episodes e,
contribution c
where c.people_id = <id>
and c.episode_id = e.id
and e.show_id = s.id
should really not have performance issues unless there are no indexes on the tables or the tables are massive.
Here's a way using where id in ( ... ) to select all shows a specific person appeared in
Shows.where(id: Contribution.select("show_id")
.join(:episodes)
.where(person_id: personId)
.group("episodes.show_id"))
You may also want to try exists
Shows.where("EXISTS(SELECT 1 from contributions c
join episodes e on e.id = c.episode_id
where c.person_id = ? and e.show_id = shows.id)")
For the analytics of my site, I'm required to extract the 4 states of my users.
#members = list.members.where(enterprise_registration_id: registration.id)
# This pulls roughly 10,0000 records.. Which is evidently a huge data pull for Rails
# Member Load (155.5ms)
#invited = #members.where("user_id is null")
# Member Load (21.6ms)
#not_started = #members.where("enterprise_members.id not in (select enterprise_member_id from quizzes where quizzes.section_id IN (?)) AND enterprise_members.user_id in (select id from users)", #sections.map(&:id) )
# Member Load (82.9ms)
#in_progress = #members.joins(:quizzes).where('quizzes.section_id IN (?) and (quizzes.completed is null or quizzes.completed = ?)', #sections.map(&:id), false).group("enterprise_members.id HAVING count(quizzes.id) > 0")
# Member Load (28.5ms)
#completes = Quiz.where(enterprise_member_id: registration.members, section_id: #sections.map(&:id)).completed
# Quiz Load (138.9ms)
The operation returns a 503 meaning my app gives up on the request. Any ideas how I can refactor this code to run faster? Maybe by better joins syntax? I'm curious how sites with larger datasets accomplish what seems like such trivial DB calls.
The answer is your indexes. Check your rails logs (or check the console in development mode) and copy the queries to your db tool. Slap an "Explain" in front of the query and it will give you a breakdown. From here you can see what indexes you need to optimize the query.
For a quick pass, you should at least have these in your schema,
enterprise_members: needs an index on enterprise_member_id
members: user_id
quizes: section_id
As someone else posted definitely look into adding indexes if needed. Some of how to refactor depends on what exactly you are trying to do with all these records. For the #members query, what are you using the #members records for? Do you really need to retrieve all attributes for every member record? If you are not using every attribute, I suggest only getting the attributes that you actually use for something, .pluck usage could be warranted. 3rd and 4th queries, look fishy. I assume you've run the queries in a console? Again not sure what the queries are being used for but I'll toss in that it is often useful to write raw sql first and query on the db first. Then, you can apply your findings to rewriting activerecord queries.
What is the .completed tagged on the end? Is it supposed to be there? only thing I found close in the rails api is .completed? If it is a custom method definitely look into it. You potentially also have an use case for scopes.
THIRD QUERY:
I unfortunately don't know ruby on rails, but from a postgresql perspective, changing your "not in" to a left outer join should make it a little faster:
Your code:
enterprise_members.id not in (select enterprise_member_id from quizzes where quizzes.section_id IN (?)) AND enterprise_members.user_id in (select id from users)", #sections.map(&:id) )
Better version (in SQL):
select blah
from enterprise_members em
left outer join quizzes q on q.enterprise_member_id = em.id
join users u on u.id = q.enterprise_member_id
where quizzes.section_id in (?)
and q.enterprise_member_id is null
Based on my understanding this will allow postgres to sort both the enterprise_members table and the quizzes and do a hash join. This is better than when it will do now. Right now it finds everything in the quizzes subquery, brings it into memory, and then tries to match it to enterprise_members.
FIRST QUERY:
You could also create a partial index on user_id for your first query. This will be especially good if there are a relatively small number of user_ids that are null in a large table. Partial index creation:
CREATE INDEX user_id_null_ix ON enterprise_members (user_id)
WHERE (user_id is null);
Anytime you query enterprise_members with something that matches the index's where clause, the partial index can be used and quickly limit the rows returned. See http://www.postgresql.org/docs/9.4/static/indexes-partial.html for more info.
Thanks everyone for your ideas. I basically did what everyone said. I added indexes, resorted how I called everything, but the major difference was using the pluck method.. Here's my new stats :
#alt_members = list.members.pluck :id # 23ms
if list.course.sections.tests.present? && #sections = list.course.sections.tests
#quiz_member_ids = Quiz.where(section_id: #sections.map(&:id)).pluck(:enterprise_member_id) # 8.5ms
#invited = list.members.count('user_id is null') # 12.5ms
#not_started = ( #alt_members - ( #alt_members & #quiz_member_ids ).count #0ms
#in_progress = ( #alt_members & #quiz_member_ids ).count # 0ms
#completes = ( #alt_members & Quiz.where(section_id: #sections.map(&:id), completed: true).pluck(:enterprise_member_id) ).count # 9.7ms
#question_count = Quiz.where(section_id: #sections.map(&:id), completed: true).limit(5).map{|quiz|quiz.answers.count}.max # 3.5ms
By default, when you perform query to sphinx table, Sphinx engine returns rows which are already sorted by query weight and does it really fast.
So, when I do this:
select
article.name
from article
left join article_ft on article._id=article_ft.id
where article_ft.query='some text;mode=any;';
Where:
article is InnoDB like table.
article_ft is Sphinx table.
Both of them (article.name and article_ft) contain these data (1 line = 1 row):
This is text.
This is also some text.
This is another text.
Sphinx engine will return rows like:
This is also some text.
This is text.
This is another text.
But, If I do something like this:
select
article.name
from article
left join article_ft on article._id=article_ft.id
left join article_category on article.category=article_category._id
where article_ft.query='some text;mode=any;';
It seems, MariaDB sorts it by its own way here.
Even If I provide Sphinx's 'sort' option like this:
select
article.name
from article
left join article_ft on article._id=article_ft.id
left join article_category on article.category=article_category._id
where article_ft.query='some text;mode=any;sort=extended:#weight desc;';
Still it doesn't work.
Changing order of joins doesn't work as well.
If I use order by article_ft.weight DESC MariaDB returns error message like:
Error: ER_ILLEGAL_HA: Storage engine SPHINX of the table `article_ft` doesn't have this option
in case if article has no rows that could match condition like article.category=50.
article_ft was created using this:
CREATE TABLE article_ft
(
id BIGINT NOT NULL,
weight INTEGER NOT NULL,
query VARCHAR(3072) NOT NULL,
INDEX(query)
) ENGINE=SPHINX CONNECTION="sphinx://192.168.1.98:9402/article_ft";
How to use this "magical" sort by weight feature if query contains more joins with no errors in return?
Thanks forward, for any reply!
P.S. Can't provide you a fiddle for this because I do not know any SQL fiddle online service which supports Sphinx Tables. Also if you found more relevant topic question I'll appreciate that.
Put the article_ft table first in the query. ie ... article_ft inner join article ...
Or maybe use FORCE INDEX, to force the use of the query index. Then it might honour the sort order.
Failing that use a subquery?
(select name,weight from article_ft ... ) order by weight desc;
I'm using Rails 3.2.
I have a product model, and a variant model. A product can have many variants. A variant can belong to many products.
I want to make a lookup on the Products model, to find only products that have a specific variant count, like such (pseudocode):
Product.where("Product.variants.count == 0")
How do you do this with activerecord?
You can use a LEFT OUTER JOIN to return the records that you need. Rails issues a LEFT OUTER JOIN when you use includes.
For example:
Product.includes(:variants).where('variants.id' => nil)
That will return all products where there are no variants. You can also use an explicit joins.
Product.joins('LEFT OUTER JOIN variants ON variants.product_id = products.id').where('variants.id' => nil)
The LEFT OUTER JOIN will return records on the left side of the join, even if the right side is not present. It will place null values into the associated columns, which you can then use to check negative presence, as I did above. You can read more about left joins here: http://www.w3schools.com/sql/sql_join_left.asp.
The good thing about this solution is that you're not doing subqueries as a conditional, which will most likely be more performant.
products= Product.find(:all,:select => 'variant').select{|product| product.varients.count > 10}
This is rails 2.3 , but only the activeRecord part, you need to see the select part
I don't know of any ActiveRecord way to do this but the following should help with your problem. The good thing about this solution is that everything's done on the db side.
Product.where('(SELECT COUNT(*) FROM variants WHERE variants.product_id = products.id) > 0')
If you want to pull products which have a specific non-0 number of variants, you could do that with something like this (admittedly untested):
Product.select('product.id, product.attr1_of_interest, ... product.attrN_of_interest, variant.id, COUNT(*)')
.joins('variants ON product.id = variants.product_id')
.group('product.id, product.attr1_of_interest, ... product.attrN_of_interest, variant.id')
.having('COUNT(*) = 5') #(or whatever number manipulation you want to do here)
If you want to allow for 0 products, you would have to use Sean's solution above.