I run simple query:
History.where(channel_id: 1).order('histories.id DESC').first
Result:
History Load (808.8ms) SELECT "histories".* FROM "histories" WHERE "histories"."channel_id" = 1 ORDER BY histories.id DESC LIMIT 1 [["channel_id", 1]]
808.8ms for 1 of 7 records with channel_id = 1. Total histories count is 2,110,443.
If I select all histories for channel_id = 1:
History.where(channel_id: 1)
History Load (0.5ms) SELECT "histories".* FROM "histories" WHERE "histories"."channel_id" = 1 [["channel_id", 1]]
It took only 0.5ms
And if we try to take one record with help of ruby Array:
History.where(channel_id: 1).order('histories.id DESC').to_a.first
History Load (0.5ms) SELECT "histories".* FROM "histories" WHERE "histories"."channel_id" = 1 ORDER BY id DESC [["channel_id", 1]]
Where I should find the problem?
PS: I already have an index on channel_id field.
UPD:
History.where(channel_id: 1).order('histories.id DESC').limit(1).explain
History Load (848.9ms) SELECT "histories".* FROM "histories" WHERE "histories"."channel_id" = 1 ORDER BY histories.id DESC LIMIT 1 [["channel_id", 1]]
=> EXPLAIN for: SELECT "histories".* FROM "histories" WHERE "histories"."channel_id" = 1 ORDER BY histories.id DESC LIMIT 1 [["channel_id", 1]]
QUERY PLAN
-------------------------------------------------------------------------------------------------------
Limit (cost=0.43..13.52 rows=1 width=42)
-> Index Scan Backward using histories_pkey on histories (cost=0.43..76590.07 rows=5849 width=42)
Filter: (channel_id = 1)
(3 rows)
There are two ways PostgreSQL can handle your query (with the ORDER BY and LIMIT clauses):
It can scan the table and order the found tuples, then limit your result. PostgreSQL will choose this plan if he thinks your table has a really low number of tuples, or if he thinks the index will be of no use;
It can use the index.
It seems that PostgreSQL did choose the first option, which can occur for only two reasons in my humble opinion:
Your table statistics are not accurate. I recommend you VACUUM your table and try again this query
The channel_id values are unevenly distributed (for example almost all of the tuples have channel_id=2), which is why PostgreSQL thinks that the index is of no use. Here I recommend the use of a partial index.
Related
I have a jsonb column called lms_data with a hash data-structure inside. I am trying to find elements that match an array of ids. This query works and returns the correct result :
CoursesProgram
.joins(:course)
.where(program_id: 12)
.where(
"courses.lms_data->>'_id' IN ('604d26cadb238f542f2fa', '604541eb0ff9d7b28828c')")
SQL LOG :
CoursesProgram Load (0.5ms) SELECT "courses_programs".* FROM "courses_programs" INNER JOIN "courses" ON "courses"."id" = "courses_programs"."course_id" WHERE "courses_programs"."program_id" = $1 AND (courses.lms_data->>'_id' IN ('604d26cadb61e238f542f2fa', '604541eb0ff9d8387b28828c')) [["program_id", 12]
However when I try to pass a variable as the array of ids :
CoursesProgram
.joins(:course)
.where(program_id: 12)
.where(
"courses.lms_data->'_id' IN (?)",
["604d26cadb61e238f542f2fa", "604541eb0ff9d8387b28828c"])
I dont get any results and I get two queries performed in the logs...
CoursesProgram Load (16.6ms) SELECT "courses_programs".* FROM "courses_programs" INNER JOIN "courses" ON "courses"."id" = "courses_programs"."course_id" WHERE "courses_programs"."program_id" = $1 AND (courses.lms_data->'_id' IN ('604d26cadb61e238f542f2fa','604541eb0ff9d8387b28828c')) [["program_id", 12]]
CoursesProgram Load (0.8ms) SELECT "courses_programs".* FROM "courses_programs" INNER JOIN "courses" ON "courses"."id" = "courses_programs"."course_id" WHERE "courses_programs"."program_id" = $1 AND (courses.lms_data->'_id' IN ('604d26cadb61e238f542f2fa','604541eb0ff9d8387b28828c')) LIMIT $2 [["program_id", 12], ["LIMIT", 11]]
I cannot wrapp my head around this one.
The queries perform in both cases seem to be the same. Why is one working and the other one not ? and why in the second case is the query performed twice ?
The question mark is its own operator in postgres's json query function set (meaning, does this exist). ActiveRecord is attempting to do what it thinks you want, but there are limitations with expectation.
Solution.
Don't use it. Since the ? can cause problems with postgres's json query, I use named substitution instead.
from the postgres documentation:
?| text[] Do any of these array strings exist as top-level keys? '{"a":1, "b":2, "c":3}'::jsonb ?| array['b', 'c']
So first we use the ?| postgres json operator to look for an ANY in the values of lms_data.
And secondly we tell postgres we'll be using an an array with the postgres array function array[:named_substitution]
ANd lastly after the , at the end of the postgres query, add your named sub variable (in this case I used :ids) and your array.
CoursesProgram
.joins(:course)
.where(program_id: 12)
.where(
"courses.lms_data->>'_id' ?| array[:ids]",
ids: ['604d26cadb238f542f2fa', '604541eb0ff9d7b28828c'])
I find my query is taking too long to load so I'm wondering if the position of the includes matters.
Example A:
people = Person.where(name: 'guillaume').includes(:jobs)
Example B:
people = Person.includes(:jobs).where(name: 'guillaume')
Is example A faster because I should have fewer people's jobs to load?
Short answer: no.
ActiveRecord builds your query and as long as you don't need the records, it won't send the final SQL query to the database to fetch them. The 2 queries you pasted are identical.
Whenever in doubt, you can always open up rails console, write your queries there and observe the queries printed out. In your example it would be something like:
SELECT "people".* FROM "people" WHERE "people"."name" = $1 LIMIT $2 [["name", "guillaume"], ["LIMIT", 11]]
SELECT "jobs".* FROM "jobs" WHERE "jobs"."person_id" = 1
in both of the cases.
Active record query using kaminari is taking more time to give results than the raw sql. The table contains 80 million records.
Ticket.order("tickets.id desc").page(1).without_count.per(10)
The above code generates below sql:
Ticket Load (89598.2ms)
SELECT
*
FROM
(
SELECT
raw_sql_.*,
rownum raw_rnum_
FROM
(
SELECT
"TICKETS".*
FROM
"TICKETS"
ORDER BY
TICKETS.ID DESC
)
raw_sql_
WHERE
rownum <= (:a1 + :a2)
)
WHERE
raw_rnum_ > :a1
[["OFFSET", 0], ["LIMIT", 11]]
The same sql when executed in rails console or database, takes 28.3ms
Not able to find why active record takes more time. The difference between the two timings are very high.
My problem is relatively simple. I am trying to aggregate and count all records of the job model that have values BETWEEN 20000 AND 30000. The problem is my scope is not producing the correct results. I'm receiving the error below
Error
Job Load (1.0ms) SELECT "jobs".* FROM "jobs" GROUP BY count(hourly_wage_salary) BETWEEN 20000 AND 30000 LIMIT $1 [["LIMIT", 11]]
ActiveRecord::StatementInvalid: PG::GroupingError: ERROR: aggregate functions are not allowed in GROUP BY
LINE 1: SELECT "jobs".* FROM "jobs" GROUP BY count(hourly_wage_sala...
^
: SELECT "jobs".* FROM "jobs" GROUP BY count(hourly_wage_salary) BETWEEN 20000 AND 30000 LIMIT $1
The code for my scope is:
scope :count_jobs_with_salaries_between_20k_30k, -> {group("count(hourly_wage_salary) BETWEEN 20000 AND 30000")}
SELECT
SUM(CASE WHEN SALARY BETWEEN 20000 AND 30000 THEN 1 ELSE 0 END) AS COUNT
FROM JOBS
Besides this also works
SELECT COUNT(*) FROM JOBS WHERE SALARY BETWEEN 20000 AND 30000;
you can check working example from SQL Fiddle
EDIT: Rails Scope
jobs.where(salary: 10000..20000).count
I'm creating a large xml output using rails and there are a lot of urls generated by rails. There are so called items and enclosures. Every item may have one enclosure. So I'm using has_one and belongs_to relation in my model.
I'm using
enclosure_url(item.enclosure, format: :json)
for generating the url.
What I expect: Rails should generate the url based on the id which is stored in the items table.
What now happens is, that rails is fetching each single enclosure from the database which is slowing down my system.
Enclosure Load (2.6ms) SELECT "enclosures".* FROM "enclosures" WHERE "enclosures"."id" = ? LIMIT 1 [["id", 11107]]
Enclosure Load (3.1ms) SELECT "enclosures".* FROM "enclosures" WHERE "enclosures"."id" = ? LIMIT 1 [["id", 11108]]
Enclosure Load (0.7ms) SELECT "enclosures".* FROM "enclosures" WHERE "enclosures"."id" = ? LIMIT 1 [["id", 11109]]
Enclosure Load (1.5ms) SELECT "enclosures".* FROM "enclosures" WHERE "enclosures"."id" = ? LIMIT 1 [["id", 11110]]
Enclosure Load (6.8ms) SELECT "enclosures".* FROM "enclosures" WHERE "enclosures"."id" = ? LIMIT 1 [["id", 11111]]
Is there any trick stopping rails doing this or do I have to generate my url myself?
IMO you have two options:
1) Use includes to load all enclosures in one query:
#items = Item.where(...).includes(:enclosure)
2) Pass the id of the enclosure to the url builder instead of the object:
enclosure_url(id: item.enclosure_id)
I would prefer the first option, because it ensures that the object you are linking to actually exists.