So I want to translate this SQL query into Rails (and in this EXACT order):
Suppose I have
WITH sub_table as (
SELECT * FROM main_table LIMIT 10 OFFSET 100 ORDER BY id
)
SELECT * FROM sub_table INNER JOIN other_table
ON sub_table.id = other_table.other_id
The importance here is that the order of execution must be:
LIMIT and OFFSET in that sub_table query MUST be executed first
The second statement should happen after.
So if the relations I have are called OtherTable and MainTable does something like this work:
subTableRelation = MainTable.order(id: :asc).limit(10).offset(100)
subTableRelation.join(OtherTable, ....)
The main question here is how Rails Relation execution order impacts things.
While ActiveRecord does not provide CTEs in its high level API, Arel will allow you to build this exact query.
Since you did not provide models and obfuscated the table names I will build this completely in Arel for the time being.
sub_table = Arel::Table.new('sub_table')
main_table = Arel::Table.new('main_table')
other_table = Arel::Table.new('other_table')
sub_table_query = main_table.project(Arel.star).take(10).skip(100).order(main_table[:id])
sub_table_alias = Arel::Nodes::As.new(Arel.sql(sub_table.name),sub_table_query)
query = sub_table.project(Arel.star)
.join(other_table).on(sub_table[:id].eq(other_table[:other_id]))
.with(sub_table_alias)
query.to_sql
Output :
WITH sub_table AS (
SELECT
*
FROM main_table
ORDER BY main_table.id
-- Output here will differ by database
LIMIT 10 OFFSET 100
)
SELECT
*
FROM sub_table
INNER JOIN other_table ON sub_table.id = other_table.other_id
If you are able to provide better context I can provided a better solution, most likely resulting in an ActiveRecord::Relation object which is likely to be preferable for chaining and model access purposes.
Related
I have some RAW sql and I'm not sure if it would be better as an Activerecord call or should I use RAW sql. Would this be easy to convert to AR?
select *
from logs t1
where
log_status_id = 2 and log_type_id = 1
and not exists
(
select *
from logs t2
where t2.log_version_id = t1.log_version_id
and t2.log_status_id in (1,3,4)
and log_type_id = 1
)
ORDER BY created_at ASC
So something like this?:
Log.where(:log_status_id=>2, log_type_id => 1).where.not(Log.where.....)
You could do this using AREL. See Rails 3: Arel for NOT EXISTS? for an example.
Personally I often find raw SQL to be more readable/maintainable than AREL queries, though. And I guess most developers are more familiar with it in general, too.
But in any case, your approach to separate the narrowing by log_states_id and log_type_id from the subquery is a good idea. Even if your .where.not construct won't work as written.
This should do the trick however:
Log.where(log_status_id: 2, log_type_id: 1)
.where("NOT EXISTS (
select *
from logs t2
where t2.log_version_id = logs.log_version_id
and t2.log_status_id in (1,3,4)
and t2.log_type_id = logs.log_type_id)")
.order(:created_at)
The only constellation where this might become problematic is when you try to join this query to other queries because the outer table will likely receive a different alias than logs.
I've rewritten this question as my previous explanation was causing confusion.
In the SQL world, you have an initial record set that you apply a query to. The output of this query is the result set. Generally, the initial record set is an entire table of records and the result set is the records from the initial record set that match the query ruleset.
I have a use case where I need my application to occasionally operate on only a subset of records in a table. If a table has 10,000 records in it, I'd like my application to behave like only the first 1,000 records exist. These should be the same 1,000 records each time. In other words, I want the initial record set to be the first 1,000 devices in a table (when ordered by primary key), and the result set the resulting records from these first 1,000 devices.
Some solutions have been proposed, and it's revealed that my initial description was not very clear. To be more explicit, I am not trying to implement pagination. I'm also not trying to limit the number of results I receive (which .limit(1,000) would indeed achieve).
Thanks!
This is the line in your question that I don't understand:
This causes issues though with both of the calls, as limit limits the results of the query, not the database rows that the query is performed on.
This is not a Rails thing, this is a SQL thing.
Device.limit(n) runs SELECT * FROM device LIMIT n
Limit always returns a subset of the queried result set.
Would first(n) accomplish what you want? It will both order the result set ascending by the PK and limit the number of results returned.
SQL Statements can be chained together. So if you have your subset, you can then perform additional queries with it.
my_subset = Device.where(family: "Phone")
# SQL: SELECT * FROM Device WHERE `family` = "Phone"
my_results = my_subset.where(style: "Touchscreen")
# SQL: SELECT * FROM Device WHERE `family` = "Phone" AND `style` = "Touchscreen"
Which can also be written as:
my_results = Device.where(family: "Phone").where(style: "Touchscreen")
my_results = Device.where(family: "Phone", style: "Touchscreen")
# SQL: SELECT * FROM Device WHERE `family` = "Phone" AND `style` = "Touchscreen"
From your question, if you'd like to select the first 1,000 rows (ordered by primary key, pkey) and then query against that, you'll need to do:
my_results = Device.find_by_sql("SELECT *
FROM (SELECT * FROM devices ORDER BY pkey ASC LIMIT 1000)
WHERE `more_searching` = 'happens here'")
You could specifically ask for a set of IDs:
Device.where(id: (1..4).to_a)
That will construct a WHERE clause like:
WHERE id IN (1,2,3,4)
So, the general question is, what's faster, taking an aggregate of a field or having extra expressions in the GROUP BY clause. Here are the two queries.
Query 1 (extra expressions in GROUP BY):
SELECT sum(subquery.what_i_want)
FROM (
SELECT table_1.some_id,
(
CASE WHEN some_date_field IS NOT NULL
THEN
FLOOR(((some_date_field - current_date)::numeric / 7) + 1) * MAX(some_other_integer)
ELSE
some_integer * MAX(some_other_integer)
END
) what_i_want
FROM table_1
JOIN table_2 on table_1.some_id = table_2.id
WHERE ((some_date_field IS NOT NULL AND some_date_field > current_date) OR some_integer > 0) -- per the data and what i want, one of these will always be true
GROUP BY some_id_1, some_date_field, some_integer
) subquery
Query 2 (using an (arbitrary, because each record for the table 2 fields in question here have the same value (in this dataset)) aggregate function):
SELECT sum(subquery.what_i_want)
FROM (
SELECT table_1.some_id,
(
CASE WHEN MAX(some_date_field) IS NOT NULL
THEN
FLOOR(((MAX(some_date_field) - current_date)::numeric / 7) + 1) * MAX(some_other_integer)
ELSE
MAX(some_integer) * MAX(some_other_integer)
END
) what_i_want
FROM table_1
JOIN table_2 on table_1.some_id = table_2.id
WHERE ((some_date_field IS NOT NULL AND some_date_field > current_date) OR some_integer > 0) -- per the data and what i want, one of these will always be true
GROUP BY some_id_1
) subquery
As far as I can tell, psql doesn't provide good benchmarking tools. \timing on only times for one query, so running a benchmark with enough trials for meaningful results is... tedious at best.
For the record, I did do this at about n=50 and saw the aggregate method (Query 2) run faster on average, but a p value of ~.13, so not quite conclusive.
'sup with that?
The general answer - should be +- same. There's a chance to hit/miss function based index when using/not using functions on a field, but not aggregation function and in where clause more then in column list. But this is speculation only.
What you should use for analyzing execution is EXPLAIN ANALYZE. In plan you not only see scan types, but also number of iterations, cost and individual operations time. And of course you can use it with psql
How can I select the top N percent of the rows of a table, according to some order clause? Hopefully with only one query to the database
According to this discussion, the following is a way to select the top 10% rows from a table from PostgreSQL:
SELECT * FROM mytbl ORDER BY num_sales DESC LIMIT
(SELECT (count(*) / 10) AS selnum FROM mytbl)
According to this answer, nesting a query inside a where clause in ActiveRecord will generated a nested SELECT, instead of firing two queries:
Item.where(product_id: Product.where(price: 50))
How can something like this be done in ActiveRecord without too much SQL?
It would not be very inefficient to do this:
total_rows = MyClass.count
limit_rows = (0.1 * total_rows).to_i
MyClass.order("num_sales desc").limit(limit_rows)
Or of course:
MyClass.order("num_sales desc").limit((0.1 * MyClass.count).to_i)
I have a Custom Query that look like this
self.account.websites.find(:all,:joins => [:group_websites => {:group => :users}],:conditions=>["users.id =?",self])
where self is a User Object
I manage to generate the equivalent SQL for same
Here how it look
sql = "select * from websites INNER JOIN group_websites on group_websites.website_id = websites.id INNER JOIN groups on groups.id = group_websites.group_id INNER JOIN group_users ON (groups.id = group_users.group_id) INNER JOIN users on (users.id = group_users.user_id) where (websites.account_id = #{account_id} AND (users.id = #{user_id}))"
With the decent understanding of SQL and ActiveRecord I assumed that(which most would agree on) the result obtained from above query might take a longer time as compare to result obtained from find_by_sql(sql) one.
But Surprisingly
When I ran the above two
I found the ActiveRecord custom Query leading the way from ActiveRecord "find_by_sql" in term of load time
here are the test result
ActiveRecord Custom Query load time
Website Load (0.9ms)
Website Columns(1.0ms)
find_by_sql load time
Website Load (1.3ms)
Website Columns(1.0ms)
I repeated the test again an again and the result still the came out the same(with Custom Query winning the battle)
I know the difference aren't that big but still I just cant figure out why a normal find_by_sql query is slower than Custom Query
Can Anyone Share a light on this.
Thanks Anyway
Regards
Viren Negi
With the find case, the query is parameterized; this means the database can cache the query plan and will not need to parse and compile the query again.
With the find_by_sql case the entire query is passed to the database as a string. This means there is no caching that the database can do on the structure of the query, and it needs to be parsed and compiled on each occasion.
I think you can test this: try find_by_sql in this way (parameterized):
User.find_by_sql(["select * from websites INNER JOIN group_websites on group_websites.website_id = websites.id INNER JOIN groups on groups.id = group_websites.group_id INNER JOIN group_users ON (groups.id = group_users.group_id) INNER JOIN users on (users.id = group_users.user_id) where (websites.account_id = ? AND (users.id = ?))", account_id, users.id])
Well, the reason is probably quite simple - with custom SQL, the SQL query is sent immediately to db server for execution.
Remember that Ruby is an interpreted language, therefore Rails generates a new SQL query based on the ORM meta language you have used before it can be sent to the actual db server for execution. I would say additional 0.1 ms is the time taken by framework to generate the query.