Nested named scopes with joins (explosive error) - ruby-on-rails

So I have an ActiveRecord class with a couple different named scopes that include join parameters. While running a report, I happen to have a situation where one gets called inside of the other:
1 Model.scope_with_some_joins.find_in_batches do |models|
2 models.each do |mdl|
3 other_comparisons = Model.scope_with_other_joins
4 end
5 end
My problem is on line 3 -- I get a runtime error showing me that for some reason when running the second query it's maintaining the join scope from the outer query. Really I need it to be run separately on it's own, without sharing context with the outer query. Any thoughts or ideas?
(I should mention that the problem is an "ambigious column" error because there is one table that is joined in from both queries)

You're looking for
Model.with_exclusive_scope { ...do your find in here... }
This will remove any scopes that are currently in use for the block.
An example usage:
# in model.rb
def self.find_stuff
self.scope_with_some_joins.find_in_batches do |models|
models.each do |mdl|
self.with_exclusive_scope do
other_comparisons = self.scope_with_other_joins
end
end
end
end
Then you query with Model.find_stuff. This way the logic is wrapped up in the model, not in the controller.

Related

Does splitting up an active record query over 2 methods hit the database twice?

I have a database query where I want to get an array of Users that are distinct for the set:
#range is a predefinded date range
#shift_list is a list of filtered shifts
def listing
Shift
.where(date: #range, shiftname: #shift_list)
.select(:user_id)
.distinct
.map { |id| User.find( id.user_id ) }
.sort
end
and I read somewhere that for readability, or isolating for testing, or code reuse, you could split this into seperate methods:
def listing
shiftlist
.select(:user_id)
.distinct
.map { |id| User.find( id.user_id ) }
.sort
end
def shift_list
Shift
.where(date: #range, shiftname: #shift_list)
end
So I rewrote this and some other code, and now the page takes 4 times as long to load.
My question is, does this type of method splitting cause the database to be hit twice? Or is it something that I did elsewhere?
And I'd love a suggestion to improve the efficiency of this code.
Further to the need to remove mapping from the code, this shift list is being created with the following code:
def _month_shift_list
Shift
.select(:shiftname)
.distinct
.where(date: #range)
.map {|x| x.shiftname }
end
My intention is to create an array of shiftnames as strings.
I am obviously missing some key understanding in database access, as this method is clearly creating part of the problem.
And I think I have found the solution to this with the following:
def month_shift_list
Shift.
.where(date: #range)
.pluck(:shiftname)
.uniq
end
Nope, the database will not be hit twice. The queries in both methods are lazy loaded. The issue you have with the slow page load times is because the map function now has to do multiple finds which translates to multiple SELECT from the DB. You can re-write your query to this:
def listing
User.
joins(:shift).
merge(Shift.where(date: #range, shiftname: #shift_list).
uniq.
sort
end
This has just one hit to the DB and will be much faster and should produce the same result as above.
The assumption here is that there is a has_one/has_many relationship on the User model for Shifts
class User < ActiveRecord::Base
has_one :shift
end
If you don't want to establish the has_one/has_many relationship on User, you can re-write it to:
def listing
User.
joins("INNER JOIN shifts on shifts.user_id = users.id").
merge(Shift.where(date: #range, shiftname: #shift_list).
uniq.
sort
end
ALTERNATIVE:
You can use 2 queries if you experience issues with using ActiveRecord#merge.
def listing
user_ids = Shift.where(date: #range, shiftname: #shift_list).uniq.pluck(:user_id).sort
User.find(user_ids)
end

Rails code refactor in call method to handle map

I'm just wondering is there any chance to get fresh eye on code below and make some code refactor?
def call
inq_proc_ids = InquiryProcess.all.includes(inquiry_field_responses: :inquiry_field).select do |process|
process.inquiry_field_responses.select do |inquiry_field_responses|
inquiry_field_responses.inquiry_field.name == 'company_name'
end.last&.value&.start_with?(company_filter)
end.map(&:id)
InquiryProcess.where(id: inq_proc_ids)
end
I think I should leave only InquiryProcess.where(id: inq_proc_ids) in my call method but I don't know how to handle with all these .last&.value&.start_with?(company_filter) and .map(&:id) stuff.
EDIT:
I was trying to split it to the new methods
def call
InquiryProcess.where(id: inquiry_process_id)
end
private
attr_reader :company_filter, :inquiry_field_response
def inquiry_process_id
InquiryProcess.all.includes(inquiry_field_responses: :inquiry_field).select do |process|
process.inquiry_field_responses.select_company_name
end.map(&:id)
end
def select_company_name
select do |inquiry_field_responses|
inquiry_field_responses.inquiry_field.name == 'company_name'
end.last&.value&.start_with?(company_filter)
end
but I got an error:
NoMethodError (undefined method `select_company_name' for ActiveRecord::Associations::CollectionProxy []>):
The code you posted is not only hard to follow, but I remember we had a massive memory leak connected to ActiveReocrd caching when using precalculated ids in a query.
That said, I'd try to utilise the above within a single sql query:
def call
id_select = InquiryProcess
.joins(inquiry_field_responses: :inquiry_field)
.where(inquire_fields: { name: 'company_name' })
.where(InquiryField.arel_table[:value].matches("#{company_filter}%"))
.select(:id)
InquiryProcess.where(id: id_select)
end
Note that id_select is not an array of ids but ActiveRecord scope, the above will translate to following SQL:
SELECT "inquiry_processes".*
FROM "inquiry_processes"
WHERE "inquiry_processes"."id" IN (
SELECT "inquiry_processes"."id"
FROM "inquiry_processes"
INNER JOIN ...
WHERE ...
)
And to answer another question - why do we query table by matching id to a result of another subquery on the same table? This is to avoid all sort of painful issues when you deal with an active record relation that has a join in it - e.g. it would affect all further includes statements, as the preloaded association would only include records matching the relation join conditions.
I really hope for you that this bit is quite well tested or you have someone who can verify validity of the behaviour.

ActiveRecord query with alias'd table names

Using model concerns which include scopes, what is the best way to write these knowing that nested and/or self-referencing queries are likely?
In one of my concerns, I have scopes similar to these:
scope :current, ->(as_at = Time.now) { current_and_expired(as_at).current_and_future(as_at) }
scope :current_and_future, ->(as_at = Time.now) { where("#{upper_bound_column} IS NULL OR #{upper_bound_column} >= ?", as_at) }
scope :current_and_expired, ->(as_at = Time.now) { where("#{lower_bound_column} IS NULL OR #{lower_bound_column} <= ?", as_at) }
def self.lower_bound_column
lower_bound_field
end
def self.upper_bound_column
upper_bound_field
end
And is referred to via has_many's, example: has_many :company_users, -> { current }
If an ActiveRecord query is made which refers to a few models that include the concern, this results in an 'ambiguous column name' exception which makes sense.
To help overcome this, I change the column name helper methods to now be
def self.lower_bound_column
"#{self.table_name}.#{lower_bound_field}"
end
def self.upper_bound_column
"#{self.table_name}.#{upper_bound_field}"
end
Which works great, until you require self-referencing queries. Arel helps mitigate these issues by aliasing the table name in the resulting SQL, for example:
LEFT OUTER JOIN "company_users" "company_users_companies" ON "company_users_companies"."company_id" = "companies"."id"
and
INNER JOIN "company_users" ON "users"."id" = "company_users"."user_id" WHERE "company_users"."company_id" = $2
The issue here is that self.table_name no longer refers to the table name in the query. And this results in the tongue in cheek hint: HINT: Perhaps you meant to reference the table alias "company_users_companies"
In an attempt to migrate these queries over to Arel, I changed the column name helper methods to:
def self.lower_bound_column
self.class.arel_table[lower_bound_field.to_sym]
end
def self.upper_bound_column
self.class.arel_table[upper_bound_field.to_sym]
end
and updated the scopes to reflect:
lower_bound_column.eq(nil).or(lower_bound_column.lteq(as_at))
but this just ported the issue across since self.class.arel_table will always be the same regardless of the query.
I guess my question is, is how do I create scopes that can be used in self-referencing queries, which require operators such as <= and >=?
Edits
I have created a basic application to help showcase this issue.
git clone git#github.com:fattymiller/expirable_test.git
cd expirable_test
createdb expirable_test-development
bundle install
rake db:migrate
rake db:seed
rails s
Findings and assumptions
Works in sqlite3, not Postgres. Most likely because Postgres enforces the order of queries in the SQL?
Well, well, well. After quite a big time looking through the sources of Arel, ActiveRecord and Rails issues (it seems this is not new), I was able to find the way to access the current arel_table object, with its table_aliases if they are being used, inside the current scope at the moment of its execution.
That made possible to know if the scope is going to be used within a JOIN that has the table name aliased, or if on the other hand the scope can be used on the real table name.
I just added this method to your Expirable concern:
def self.current_table_name
current_table = current_scope.arel.source.left
case current_table
when Arel::Table
current_table.name
when Arel::Nodes::TableAlias
current_table.right
else
fail
end
end
As you can see, I'm using current_scope as the base object to look for the arel table, instead of the prior attempts of using self.class.arel_table or even relation.arel_table, which as you said remained the same regardless of where the scope was used. I'm just calling source on that object to obtain an Arel::SelectManager that in turn will give you the current table on the #left. At this moment there are two options: that you have there an Arel::Table (no alias, table name is on #name) or that you have an Arel::Nodes::TableAlias with the alias on its #right.
With that table_name you can revert to your first attempt of #{current_table_name}.#{lower_bound_field} and #{current_table_name}.#{upper_bound_field} in your scopes:
def self.lower_bound_column
"#{current_table_name}.#{lower_bound_field}"
end
def self.upper_bound_column
"#{current_table_name}.#{upper_bound_field}"
end
scope :current_and_future, ->(as_at = Time.now) { where("#{upper_bound_column} IS NULL OR #{upper_bound_column} >= ?", as_at) }
scope :current_and_expired, ->(as_at = Time.now) { where("#{lower_bound_column} IS NULL OR #{lower_bound_column} <= ?", as_at) }
This current_table_name method seems to me to be something that would be useful to have on the AR / Arel public API, so it can be maintained across version upgrades. What do you think?
If you are interested, here are some references I used down the road:
A similar question on SO, answered with a ton of code, that you could use instead of your beautiful and concise Ability.
This Rails issue and this other one.
And the commit on your test app on github that made tests green!
I have a slightly modified approach from #dgilperez, which uses the full power of Arel
def self.current_table_name
current_table = current_scope.arel.source.left
end
now you could modify your methods with arel_table syntax
def self.lower_bound_column
current_table[:lower_bound_field]
end
def self.upper_bound_column
current_table[:upper_bound_field]
end
and use it query like this
lower_bound_column.eq(nil).or(lower_bound_column.lteq(as_at))

Return a query with nested results with Rails ActiveRecord

I'm trying to grab a list of active tasks for a workflow tool I'm making, with the data structured like this:
User
has_many Projects
has_many Subprojects
has_many Tasks
has_many TimeLogs
Active tasks are defined as any task with a TimeLog that does not have a 'completed' timestamp.
I'm trying to make the main page display this full structure, but only show parent structures that have an active task at some level. Any Users/Projects/Subprojects that don't have an active task beneath them should not be returned by the query.
So far I've tried:
A join on all four tables, which produces duplicate rows
A WHERE EXISTS statement, which returns only the relevant users, but doesn't maintain the WHERE clause when I try to access its children
Is there a way to achieve this without manually culling the data in Ruby?
Due to the nested level of problem it might be quite complicated. As I understand, you want to omit users|projects|subprojects, that do not have any active tasks. There's no simple sql that will let you achieve that by:
users.each do |user|
user.active_projects.each do |project|
...
end
end
Instead, I would query for tasks first, i.e. #tasks = Task.includes(:subproject => [:project => :user]).where("status NOT IN ('completed')"). Then you have incompleted tasks and now all you need to do is to reorder fetched data, I mean:
#tasks = #tasks
.group_by {|t| t.subproject.project.user }
.reduce({}) do |sum, (user, tasks)|
sum[user] ||= {}
tasks.each do |task|
sum[user][task.subproject.project] ||= {}
sum[user][task.subproject.project][task.subproject] ||= []
sum[user][task.subproject.project][task.subproject] << task
end
sum
end
I haven't tested this, but that's the idea.
You could try this based on the activerecord queries documents:
user.projects.joins(subprojects: { tasks: [{ time_logs: { timestamp: 'completed'} }] })
I can't test it, but give it a shot.

Texticle and ActsAsTaggableOn

I'm trying to implement search over tags as part of a Texticle search. Since texticle doesn't search over multiple tables from the same model, I ended up creating a new model called PostSearch, following Texticle's suggestion about System-Wide Searching
class PostSearch < ActiveRecord::Base
# We want to reference various models
belongs_to :searchable, :polymorphic => true
# Wish we could eliminate n + 1 query problems,
# but we can't include polymorphic models when
# using scopes to search in Rails 3
# default_scope :include => :searchable
# Search.new('query') to search for 'query'
# across searchable models
def self.new(query)
debugger
query = query.to_s
return [] if query.empty?
self.search(query).map!(&:searchable)
#self.search(query) <-- this works, not sure why I shouldn't use it.
end
# Search records are never modified
def readonly?; true; end
# Our view doesn't have primary keys, so we need
# to be explicit about how to tell different search
# results apart; without this, we can't use :include
# to avoid n + 1 query problems
def hash
id.hash
end
def eql?(result)
id == result.id
end
end
In my Postgres DB I created a view like this:
CREATE VIEW post_searches AS
SELECT posts.id, posts.name, string_agg(tags.name, ', ') AS tags
FROM posts
LEFT JOIN taggings ON taggings.taggable_id = posts.id
LEFT JOIN tags ON taggings.tag_id = tags.id
GROUP BY posts.id;
This allows me to get posts like this:
SELECT * FROM post_searches
id | name | tags
1 Intro introduction, funny, nice
So it seems like that should all be fine. Unfortunately calling
PostSearch.new("funny") returns [nil] (NOT []). Looking through the Texticle source code, it seems like this line in the PostSearch.new
self.search(query).map!(&:searchable)
maps the fields using some sort of searchable_columns method and does it ?incorrectly? and results in a nil.
On a different note, the tags field doesn't get searched in the texticle SQL query unless I cast it from a text type to a varchar type.
So, in summary:
Why does the object get mapped to nil when it is found?
AND
Why does texticle ignore my tags field unless it is varchar?
Texticle maps objects to nil instead of nothing so that you can check for nil? - it's a safeguard against erroring out checking against non-existent items. It might be worth asking tenderlove himself as to exactly why he did it that way.
I'm not completely positive as to why Texticle ignores non-varchars, but it looks like it's a performance safeguard so that Postgres does not do full table scans (under the section Creating Indexes for Super Speed):
You will need to add an index for every text/string column you query against, or else Postgresql will revert to a full table scan instead of using the indexes.

Resources