Select top N percent from query in ActiveRecord, Rails - ruby-on-rails

How can I select the top N percent of the rows of a table, according to some order clause? Hopefully with only one query to the database
According to this discussion, the following is a way to select the top 10% rows from a table from PostgreSQL:
SELECT * FROM mytbl ORDER BY num_sales DESC LIMIT
(SELECT (count(*) / 10) AS selnum FROM mytbl)
According to this answer, nesting a query inside a where clause in ActiveRecord will generated a nested SELECT, instead of firing two queries:
Item.where(product_id: Product.where(price: 50))
How can something like this be done in ActiveRecord without too much SQL?

It would not be very inefficient to do this:
total_rows = MyClass.count
limit_rows = (0.1 * total_rows).to_i
MyClass.order("num_sales desc").limit(limit_rows)
Or of course:
MyClass.order("num_sales desc").limit((0.1 * MyClass.count).to_i)

Related

Rails relation ordering?

So I want to translate this SQL query into Rails (and in this EXACT order):
Suppose I have
WITH sub_table as (
SELECT * FROM main_table LIMIT 10 OFFSET 100 ORDER BY id
)
SELECT * FROM sub_table INNER JOIN other_table
ON sub_table.id = other_table.other_id
The importance here is that the order of execution must be:
LIMIT and OFFSET in that sub_table query MUST be executed first
The second statement should happen after.
So if the relations I have are called OtherTable and MainTable does something like this work:
subTableRelation = MainTable.order(id: :asc).limit(10).offset(100)
subTableRelation.join(OtherTable, ....)
The main question here is how Rails Relation execution order impacts things.
While ActiveRecord does not provide CTEs in its high level API, Arel will allow you to build this exact query.
Since you did not provide models and obfuscated the table names I will build this completely in Arel for the time being.
sub_table = Arel::Table.new('sub_table')
main_table = Arel::Table.new('main_table')
other_table = Arel::Table.new('other_table')
sub_table_query = main_table.project(Arel.star).take(10).skip(100).order(main_table[:id])
sub_table_alias = Arel::Nodes::As.new(Arel.sql(sub_table.name),sub_table_query)
query = sub_table.project(Arel.star)
.join(other_table).on(sub_table[:id].eq(other_table[:other_id]))
.with(sub_table_alias)
query.to_sql
Output :
WITH sub_table AS (
SELECT
*
FROM main_table
ORDER BY main_table.id
-- Output here will differ by database
LIMIT 10 OFFSET 100
)
SELECT
*
FROM sub_table
INNER JOIN other_table ON sub_table.id = other_table.other_id
If you are able to provide better context I can provided a better solution, most likely resulting in an ActiveRecord::Relation object which is likely to be preferable for chaining and model access purposes.

how to get subset of activerecord objects after performing .limit()?

I want to be able to limit the activerecord objects to 20 being returned, then perform a where() that returns a subset of the limited objects which I currently know only 10 will fulfil the second columns criteria.
e.g. of ideal behaviour:
o = Object.limit(20)
o.where(column: criteria).count
=> 10
But instead, activerecord still looks for 20 objects that fulfil the where() criteria, but looks outside of the original 20 objects that the limit() would have returned on its own.
How can I get the desired response?
One way to decrease the search space is to use a nested query. You should search the first N records rather than all records which match a specific condition. In SQL this would be done like this:
select * from (select * from table order by ORDERING_FIELD limit 20) where column = value;
The query above will only search for the condition in 20 rows from the table. Notice how I have added a ORDERING_FIELD, this is required because each query could give you a different order each time you run it.
To do something similar in Rails, you could try the following:
Object.where(id: Object.order(:id).limit(20).select(:id)).where(column: criteria)
This will execute a query similar to the following:
SELECT [objects].* FROM [objects] WHERE [objects].[id] IN (SELECT TOP (20) [objects].[id] FROM [objects] ORDER BY [objects].id ASC) AND [objects].[column] = criteria

rewrite sql statement with max and groupby in ruby

I have this my sql view:
SELECT
`reports`.`date` AS `date`,
`reports`.`book_title` AS `book_title`,
max(
`reports`.`royalty_type`
) AS `royalty_type`,
max(
`reports`.`avg_list_price`
) AS `avg_list_price`
FROM
`reports`
GROUP BY
`reports`.`date`,
`reports`.`book_title`,
`reports`.`marketplace`
As far as I understand it groups results by date, then, by book_title and then by market place and then it selects max royalty_type and avg_list_price within this small subgroups
How do I rewrite this in rails activerecord?
I don't know how to select max within this small groups in activerecord.
Try this one
Report.group(:date, :book_title, :marketplace).select('date, book_title, MAX(royalty_type) AS royalty_type, MAX(avg_list_price) AS avg_list_price')

ActiveRecord SUM includes more objects in the calculation than it supplies the same query without sum

I faced today a problem that leads me in a gotcha of ActiveRecord use.
ActiveRecord returns for a specific query (with includes) certain amount of objects in an ActiveRelation object.
If you chain on the same ActiveRecord query sum(:attribute), it includes more objects in the calculated result. To describe what I mean here my example:
Environment:
ActiveRecord 4.2.3
Postgres 9.3.5
DB-structure:
Order has_many items
My query:
#orders = Order.includes(:items).where('orders.created_at >= ? AND orders.created_at <= ?', date_from, date_to)
The produced SQL-Query:
SELECT orders.* FROM order_containers WHERE orders.created_at >= '2015-08-11' AND orders.created_at <= '2015-08-17 23:59:59.999999';
The mentioned query returns e.g. 20 orders. As you can see, the includes doesn't play any rule in the query. And if I sum the price for the result, in ruby:
#orders.to_a.sum(&:price)
it returns 20.00
The same ActiveRecord query with SUM:
Order.includes(:items).where('orders.created_at >= ? AND orders.created_at <= ?', date_from, date_to).sum(:price)
it returns 45.00
It produces a different SQL statment:
SELECT SUM(orders.price_eur) FROM orders LEFT OUTER JOIN line_items ON items.order_container_id = orders.id WHERE orders.created_at >= '2015-08-11' AND orders.created_at <= '2015-08-17 23:59:59.999999'
The summed orders in this case are much more because the produced SQL-query includes the same order more than one time (because of Join). Every order has one or more items what leads to much more orders (duplicates) than the query without the Left Outer Join.
I hope this can help you avoid this gotcha.
nabinabou
includes is generally used for eager loading. Why don't you replace it with joins?

How to select data for defined page and total count of records?

I have a table with paginated data and this is the way I select data for each page:
#visitors = EventsVisitor
.select('visitors.*, events_visitors.checked_in, events_visitors.checkin_date, events_visitors.source, events_visitors.id AS ticket_id')
.joins(:visitor)
.order(order)
.where(:event_id => params[:event_id])
.where(filter_search)
.where(mode)
.limit(limit)
.offset(offset)
Also to build table pagination I need to know total count of records. Currently my solution for this is very rough:
total = EventsVisitor
.select('count(*) as count, events_visitors.*')
.joins(:visitor)
.order(order)
.where(:event_id => params[:event_id])
.where(filter_search)
.where(mode)
.first()
.count
So my question is as follows - What is the optimal ruby way to select limited data for the current page and total count of records?
I noticed that if I do #visitors.count - additional sql query will be generated:
SELECT COUNT(count_column) FROM (SELECT 1 AS count_column FROM `events_visitors` INNER JOIN `visitors` ON `visitors`.`id` = `events_visitors`.`visitor_id` WHERE `events_visitors`.`event_id` = 1 LIMIT 15 OFFSET 0) subquery_for_count
First of all, I do not understand what is the reason to send an additional query to get a count of data that we already have, I mean that after we got data from database in #visitors we can count it with ruby without need to send additional queries to DB.
Second - I thought that maybe there are some ways to use something like .total_count that will generate similar count(*) query but without that useless limit/offset?
you should except limit and offset
http://guides.rubyonrails.org/active_record_querying.html#except .
See how kaminari does it
https://github.com/kaminari/kaminari/blob/92052eedf047d65df71cc0021a9df9df1e2fc36e/lib/kaminari/models/active_record_relation_methods.rb#L11
So it might be something like
total = #visitors.except(:offset, :limit, :order).count

Resources