How to get a most recent value group by year by using SQL - ruby-on-rails

I have a Company model that has_many Statement.
class Company < ActiveRecord::Base
has_many :statements
end
I want to get statements that have most latest date field grouped by fiscal_year_end field.
I implemented the function like this:
c = Company.first
c.statements.to_a.group_by{|s| s.fiscal_year_end }.map{|k,v| v.max_by(&:date) }
It works ok, but if possible I want to use ActiveRecord query(SQL), so that I don't need to load unnecessary instance to memory.
How can I write it by using SQL?

select t.username, t.date, t.value
from MyTable t
inner join (
select username, max(date) as MaxDate
from MyTable
group by username
) tm on t.username = tm.username and t.date = tm.MaxDate

For these kinds of things, I find it helpful to get the raw SQL working first, and then translate it into ActiveRecord afterwards. It sounds like a textbook case of GROUP BY:
SELECT fiscal_year_end, MAX(date) AS max_date
FROM statements
WHERE company_id = 1
GROUP BY fiscal_year_end
Now you can express that in ActiveRecord like so:
c = Company.first
c.statements.
group(:fiscal_year_end).
order(nil). # might not be necessary, depending on your association and Rails version
select("fiscal_year_end, MAX(date) AS max_date")
The reason for order(nil) is to prevent ActiveRecord from adding ORDER BY id to the query. Rails 4+ does this automatically. Since you aren't grouping by id, it will cause the error you're seeing. You could also order(:fiscal_year_end) if that is what you want.
That will give you a bunch of Statement objects. They will be read-only, and every attribute will be nil except for fiscal_year_end and the magically-present new field max_date. These instances don't represent specific statements, but statement "groups" from your query. So you can do something like this:
- #statements_by_fiscal_year_end.each do |s|
%tr
%td= s.fiscal_year_end
%td= s.max_date
Note there is no n+1 query problem here, because you fetched everything you need in one query.
If you decide that you need more than just the max date, e.g. you want the whole statement with the latest date, then you should look at your options for the greatest n per group problem. For raw SQL I like LATERAL JOIN, but the easiest approach to use with ActiveRecord is DISTINCT ON.
Oh one more tip: For debugging weird errors, I find it helpful to confirm what SQL ActiveRecord is trying to use. You can use to_sql to get that:
c = Company.first
puts c.statements.
group(:fiscal_year_end).
select("fiscal_year_end, MAX(date) AS max_date").
to_sql
In that example, I'm leaving off order(nil) so you can see that ActiveRecord is adding an ORDER BY clause you don't want.

for example you want to get all statements by start of the months you should use this
#companey = Company.first
#statements = #companey.statements.find(:all, :order => 'due_at, id', :limit => 50)
then group them as you want
#monthly_statements = #statements.group_by { |statement| t.due_at.beginning_of_month }

Building upon Bharat's answer you can do this type of query in Rails using find_by_sql in this way:
Statement.find_by_sql ["Select t.* from statements t INNER JOIN (
SELECT fiscal_year_end, max(date) as MaxDate GROUP BY fiscal_year_end
) tm on t.fiscal_year_end = tm.fiscal_year_end AND
t.created_at = tm.MaxDate WHERE t.company_id = ?", company.id]
Note the last where part to make sure the statements belong to a specific company instance, and that this is called from the class. I haven't tested this with the array form, but I believe you can turn this into a scope and use it like this:
# In Statement model
scope :latest_from_fiscal_year, lambda |enterprise_id| {
find_by_sql[..., enterprise_id] # Query above
}
# Wherever you need these statements for a particular company
company = Company.find(params[:id])
latest_statements = Statement.latest_from_fiscal_year(company.id)
Note that if you somehow need all the latest statements for all companies then this most likely leave you with a N+1 queries problem. But that is a beast for another day.
Note: If anyone else has a way to have this query work on the association without using the last where part (company.statements.latest_from_year and such) let me know and I'll edit this, in my case in rails 3 it just pulled em from the whole table without filtering.

Related

say `Post` is a model, a class that inherits from `ApplicationRecord`, in rails. Then, what does Post.arel_table.create_table_alias does?

Lets say I have this code:
new_and_updated = Post.where(:published_at => nil).union(Post.where(:draft => true))
post = Post.arel_table
Post.from(post.create_table_alias(new_and_updated, :posts))
I have this code from a post about arel, but does not really explains what create_table_alias does. Only that at the end the result is an active activeRecord::Relation object, that is the result of the previously defined union. Why is needed to pass :posts, as a second param for create_table_alias, is this the name of the table in the database?
The Arel is essentially as follows
alias = Arel::Table.new(table_name)
table = Arel::Nodes::As.new(table_definition,alias)
This creates a SQL alias for the new table definition so that we can reference this in a query.
TL;DR
Lets explain how this works in terms of the code you posted.
new_and_updated= Post.where(:published_at => nil).union(Post.where(:draft => true))
This statement can be converted into the following SQL
SELECT
posts.*
FROM
posts
WHERE
posts.published_at IS NULL
UNION
SELECT
posts.*
FROM
posts
WHERE
posts.draft = 1
Well that is a great query but you cannot select from it as a subquery without a Syntax Error. This is where the alias comes in so this line (as explained above in terms of Arel)
post.create_table_alias(new_and_updated, :posts)
becomes
(SELECT
posts.*
FROM
posts
WHERE
posts.published_at IS NULL
UNION
SELECT
posts.*
FROM
posts
WHERE
posts.draft = 1) AS posts -- This is the alias
Now the wrapping Post.from can select from this sub-query such that the final query is
SELECT
posts.*
FROM
(SELECT
posts.*
FROM
posts
WHERE
posts.published_at IS NULL
UNION
SELECT
posts.*
FROM
posts
WHERE
posts.draft = 1) AS posts
BTW your query can be simplified a bit if you are using rails 5 and this removes the need for the rest of the code as well e.g.
Post.where(:published_at => nil).or(Post.where(:draft => true))
Will become
SELECT
posts.*
FROM
posts
WHERE
posts.published_at IS NULL OR posts.draft = 1
From the Rails official doc, from query method does this:
Specifies table from which the records will be fetched.
So, in order to fetch posts from the new_and_updated relation, we need to have an alias table which is what post.create_table_alias(new_and_updated, :posts) is doing.
Rubydoc for Arel's create_table_alias method tells us that the instance method is included in Table module.
Here :posts parameter is specifying the name of the alias table to create while new_and_updated provides ActiveRecord::Relation object.
Hope that helps.

Postgres ORDER BY values in IN list using Rails Active Record

I receive a list of UserIds(about 1000 at a time) sorted by 'Income'. I have User records in "my system's database" but the 'Income' column is not there. I want to retrieve the Users from "my system's database"
in the Sorted Order as received in the list. I tried doing the following using Active Record expecting that the records would be retrieved in the same order as in the Sorted List but it does not work.
//PSEUDO CODE
User.all(:conditions => {:id => [SORTED LIST]})
I found an answer to a similar question at the link below, but am not sure how to implement the suggested solution using Active Record.
ORDER BY the IN value list
Is there any other way to do it?
Please guide.
Shardul.
Your linked to answer provides exactly what you need, you just need to code it in Ruby in a flexible manner.
Something like this:
class User
def self.find_as_sorted(ids)
values = []
ids.each_with_index do |id, index|
values << "(#{id}, #{index + 1})"
end
relation = self.joins("JOIN (VALUES #{values.join(",")}) as x (id, ordering) ON #{table_name}.id = x.id")
relation = relation.order('x.ordering')
relation
end
end
In fact you could easily put that in a module and mixin it into any ActiveRecord classes that need it, since it uses table_name and self its not implemented with any specific class names.
MySQL users can do this via the FIELD function but Postgres lacks it. However this questions has work arounds: Simulating MySQL's ORDER BY FIELD() in Postgresql

Is it possible to "dynamically" join a table only if that table is not joined yet?

I am using Ruby on Rails 3.2.2 and I would like to know if in scope methods it is possible to "dynamically" join a table only if that table is not joined yet. That it, I have:
def self.scope_method_name(user)
joins(:joining_association_name).where("joining_table_name.user_id = ?", user.id)
end
I would like to make something like the following:
# Note: the following code is just a sample in order to understand what I mean.
def self.scope_method_name(user)
if table_is_joined?(joining_table_name)
where("joining_table_name.user_id = ?", user.id)
else
joins(:joining_association_name).where("joining_table_name.user_id = ?", user.id)
end
end
Is it possible / advised to make that? If so, how could / should I proceed?
I would like to use this approach in order to avoid multiple database table statements in INNER JOIN of SQL queries (in some cases it seems to make my SQL querying not working as expected since multiple table statements) and so to use the scope_method_name without caring related SQL query concerns (in my case, without caring to join database tables).
Note: It could raise SQL errors (for example, errors as-like "ActiveRecord::StatementInvalid: Mysql2::Error: Unknown column 'joining_table_name.user_id' in 'where clause'") when you have not joined yet the database table (for example, this could happen when you run code like ClassName.scope_method_name(#user) without to previously join the joining_association_name and so without to join the related joining_table_name table).
Where is the method loaded? to check if an association has been loaded. You could try to use that.
if association_name.loaded?
where("joining_table_name.user_id = ?", user.id)
else
joins(:joining_association_name).where("joining_table_name.user_id = ?", user.id)
end

Rails ActiveRecord Join

I'm using rails and am trying to figure out how to use ActiveRecord within the method to combine the following into one query:
def children_active(segment)
parent_id = Category.select('id').where('segment' => segment)
Category.where('parent_id'=>parent_id, 'active' => true)
end
Basically, I'm trying to get sub categories of a category that is designated by a unique column called segment. Right now, I'm getting the id of the category in the first query, and then using that value for the parent_id in the second query. I've been trying to figure out how to use AR to do a join so that it can be accomplished in just one query.
You can use self join with a alias table name:
Category.joins("LEFT OUTER JOIN categories AS segment_categories on segment_categories.id = categories.parent_id").where("segment_categories.segment = ?", segment).where("categories.active = ?", true)
This may looks not so cool, but it can implement the query in one line, and there will be much less performance loss than your solution when data collection is big, because "INCLUDE IN" is much more slower than "JOIN".

rails - activerecord ... grab first result

I want to grab the most recent entry from a table. If I was just using sql, you could do
Select top 1 * from table ORDER BY EntryDate DESC
I'd like to know if there is a good active record way of doing this.
I could do something like:
table.find(:order => 'EntryDate DESC').first
But it seems like that would grab the entire result set, and then use ruby to select the first result. I'd like ActiveRecord to create sql that only brings across one result.
You need something like:
Model.first(:order => 'EntryDate DESC')
which is shorthand for
Model.find(:first, :order => 'EntryDate DESC')
Take a look at the documentation for first and find for details.
The Rails documentation seems to be pretty subjective in this instance. Note that .first is the same as find(:first, blah...)
From:http://api.rubyonrails.org/classes/ActiveRecord/Base.html#M002263
"Find first - This will return the first record matched by the options used. These options can either be specific conditions or merely an order. If no record can be matched, nil is returned. Use Model.find(:first, *args) or its shortcut Model.first(*args)."
Digging into the ActiveRecord code, at line 1533 of base.rb (as of 9/5/2009), we find:
def find_initial(options)
options.update(:limit => 1)
find_every(options).first
end
This calls find_every which has the following definition:
def find_every(options)
include_associations = merge_includes(scope(:find, :include), options[:include])
if include_associations.any? && references_eager_loaded_tables?(options)
records = find_with_associations(options)
else
records = find_by_sql(construct_finder_sql(options))
if include_associations.any?
preload_associations(records, include_associations)
end
end
records.each { |record| record.readonly! } if options[:readonly]
records
end
Since it's doing a records.each, I'm not sure if the :limit is just limiting how many records it's returning after the query is run, but it sure looks that way (without digging any further on my own). Seems you should probably just use raw SQL if you're worried about the performance hit on this.
Could just use find_by_sql http://api.rubyonrails.org/classes/ActiveRecord/Base.html#M002267
table.find_by_sql "Select top 1 * from table ORDER BY EntryDate DESC"

Resources