Arel: join table, sort on joined field - ruby-on-rails

I have the following ActiveRecord Models
class Publication < ActiveRecord::Base
attr_accessible :id, :pname
has_many :advertisements
end
class Vendor < ActiveRecord::Base
attr_accessible :id, :vname
has_many :advertisements
end
class Advertisement < ActiveRecord::Base
attr_accessible :id, :vendor_id, :publication_id, :prose, :aname
belongs_to :vendor
belongs_to :publication
end
The tables for these have the same fields as their accessible attributes.
I would like to be able to sort on the publication name, ad name, or vendor name, ascending or descending.
I also have a controller for the advertisements, where I want to display a list of ads. The list displays the name of the ad (aname), prose of the ad (prose), the name of the vendor (vname), and the name of the publication (pname).
The SQL query for ordering by publication name would look something like:
SELECT ads.aname AS aname, ads.id, ads.prose, ven.vname AS vname, pub.pname AS pname
FROM advertisements AS ads
INNER JOIN publications AS pub ON ads.publication_id = pub.id
INNER JOIN vendors AS ven ON ads.vendor_id = ven.id
ORDER BY <sort_column> <sort_order>
Where sort_column could be one of "pname", "aname", or "vname", and sort_order could be one of "ASC" or "DESC", and both would come as parameters from the web form along with the pagination page number.
The controller index code looks like this:
class AdvertisementsController < ApplicationController
def index
sort_column = params[:sort_column]
sort_order = params[:sort_order]
#ads = Advertisement.join( somehow join tables)
.where(some condition).where(some other condition)
.order("#{sort_column} #{sort_order}") ### I don't know what to do here
.paginate(page: params[:page], per_page: 10) #from will_paginate
end
# other controller methods.......
end
The index view table snippet (written in SLIM) looks like this:
tr
- #ads.each do |ad|
td = ad.id
td = ad.aname
td = ad.pname
td = ad.vname
I am aware that I could use AREL to do this, but I have been mucking around with AREL in the Rails console trying to generate and execute this query with pagination, and reading tutorials on the web and I can't figure out how to get this query in AREL, with sorting on joined fields, and with the ability to use a will_paginate Ruby query clause to paginate the query.
How does one use AREL, or even ActiveRecord to do this? I appreciate any help I can get.

You can accomplish what you want with vanilla ActiveRecord methods, without Arel. What you have is pretty close, this might help you get there.
# whitelist incoming params
sort_column = %w(pname aname vname).include?(params[:sort_column]) ? params[:sort_column] : "pname"
sort_order = %w(asc desc).include?(params[:sort_order]) ? params[:sort_order] : "desc"
#ads = Advertisement.select("advertisements.*, vendors.vname, publications.pname").
joins(:publication, :vendor).
where(some condition).
where(some other condition).
order("#{sort_column} #{sort_order}").
page(params[:page]).per_page(10)

You can have the solution work with both Arel and ActiveRecord. I would suggest you stick to ActiveRecord as much as you can unless you cant do it with AR.
Arel is great, but lately I have seen that in my code base, it reduces overall readability, esp if you mix it with AR or use too much of it.
Also couple of other suggestions:
On the same query about try using "includes" instead of using "joins". You might that its easier than having to add the select clause. I use includes for outerjoins, but for a more detailed comparison, google "includes vs joins", it is very interesting.
In complete opposite direction of my first suggestion, in case you queries are going to get complex, I highly recommend using https://github.com/activerecord-hackery/ransack or https://github.com/activerecord-hackery/squeel for your use case.
Especially if you not doing the above for just learning purposes.

Related

Searching multiple tables with postgreSQL 13 and Rails 6+

I provide a lot of context to set the stage for the question. What I'm trying to solve is fast and accurate fuzzysearch against multiple database tables using structured data, not full-text document search.
I'm using postgreSQL 13.4+ and Rails 6+ if it matters.
I have fairly structured data for several tables:
class Contact
attribute :id
attribute :first_name
attribute :last_name
attribute :email
attribute :phone
end
class Organization
attribute :name
attribute :license_number
end
...several other tables...
I'm trying to implement a fast and accurate fuzzysearch so that I can search across all these tables (Rails models) at once.
Currently I have a separate search query using ILIKE that concats the columns I want to search against on-the-fly for each model:
# contact.rb
scope :search -> (q) { where("concat_ws(' ', first_name, last_name, email, phone) ILIKE :q", q: "%#{q}%")
# organization.rb
scope :search -> (q) { where("concat_ws(' ', name, license_number) ILIKE :q", q: "%#{q}%") }
In my search controller I query each of these tables separately and display the top 3 results for each model.
#contacts = Contact.search(params[:q]).limit(3)
#organizations = Organization.search(params[:q]).limit(3)
This works but is fairly slow and not as accurate as I would like.
Problems with my current approach:
Slow (relatively speaking) with only thousands of records.
Not accurate because ILIKE must have an exact match somewhere in the string and I want to implement fuzzysearch (ie, with ILIKE, "smth" would not match "smith").
Not weighted; I would like to weight the contacts.last_name column over say the organizations.name because the contacts table is generally speaking the higher priority search item.
My solution
My theoretical solution is to create a search_entries polymorphic table that has a separate record for each contact, organization, etc, that I want to search against, and then this search_entries table could be indexed for fast retrieval.
class SearchEntry
attribute :data
belongs_to :searchable, polymorphic: true
# Store data as all lowercase to optimize search (avoid lower method in PG)
def data=(text)
self[:data] = text.lowercase
end
end
However, what I'm getting stuck on is how to structure this table so that it can be indexed and searched quickly.
contact = Contact.first
SearchEntry.create(searchable: contact, data: "#{contact.first_name} #{contact.last_name} #{contact.email} #{contact.phone}")
organization = Organization.first
SearchEntry.create(searchable: organization, data: "#{organization.name} #{organization.license_number}")
This gives me the ability to do something like:
SearchEntry.where("data LIKE :q", q: "%#{q}%")
or even something like fuzzysearch using PG's similarity() function:
SearchEntry.connection.execute("SELECT * FROM search_entries ORDER BY SIMILARITY(data, '#{q}') LIMIT 10")
I believe I can use a GIN index with pg_trgm on this data field as well to optimize searching (not 100% on that...).
This simplifies my search into a single query on a single table, but it still doesn't allow me to do weighted column searching (ie, contacts.last_name is more important than organizations.name).
Questions
Would this approach enable me to index the data so that I could have very fast fuzzysearch? (I know "very fast" is subjective, so what I mean is an efficient usage of PG to get results as quickly as possible).
Would I be able to use a GIN index combined with pg_trgm tri-grams to index this data for fast fuzzysearch?
How would I implement weighting certain values higher than others in an approach like this?
One potential solution is to create a materialized view consisting of a union of data from the two (or more tables). Take this simplefied example:
CREATE MATERIALIZED VIEW searchables AS
SELECT
resource_id,
resource_type,
name,
weight
FROM
SELECT
id as resource_id,
'Contact' as resource_type
concat_ws(' ', first_name, last_name) AS name,
1 AS weight
FROM contacts
UNION
SELECT
id as resource_id,
'Organization' as resource_type
name
2 AS weight
FROM organizations
class Searchable < ApplicationRecord
belongs_to :resource, polymorphic: true
def readonly?
true
end
# Search contacts and organziations with a higher weight on contacts
def self.search(name)
where(arel_table[:name].matches(name)).order(weight: :desc)
end
end
Since materialized views are stored in a table like structure you can apply indices just like you could with a normal table:
CREATE INDEX searchables_name_trgm ON name USING gist (searchables gist_trgm_ops);
To ActiveRecord it also behaves just like a normal table.
Of course the complexity here will grow with number of columns you want to search and the end result might end up both underwhelming in functionality and overwhelming in complexity compared to an off the shelf solution with thousands of hours behind it.
The scenic gem can be used to make the migrations for creating materialized views simpler.

ActiveRecord query for Users who don't own a Car

How do I get all the users who do not have a car?
class User < ActiveRecord::Base
has_one :car
end
class Car < ActiveRecord::Base
belongs_to :user
end
I was doing the following:
all.select {|user| not user.car }
That worked perfect until my database of users and cars got too big and now I get strange errors, especially when I try and sort the result. I need to do the filtering in the query and the ordering as well as part of the query.
UPDATE: What I did was the following:
where('id not in (?)', Car.pluck(:user_id)).order('first_name, last_name, middle_name')
It's fairly slow as Rails has to grab all the user_ids from the cars table and then issue a giant query. I know I can do a sub-query in SQL, but there must be a better Rails/ActiveRecord way.
UPDATE 2: I now have a noticeably more efficient query:
includes(:car).where(cars: {id: nil})
The answer I accepted below has joins with a SQL string instead of includes. I don't know if includes is more inefficient because it stores the nil data in Ruby objects whereas joins might not? I like not using strings...
One way is to use a left join from the users table to the cars table and only take user entries that don't have any corresponding values in the cars table, this looks like:
User.select('users.*').joins('LEFT JOIN cars ON users.id = cars.user_id').where('cars.id IS NULL')
Most of the work that needs to be done here is SQL. Try this:
User.joins("LEFT OUTER JOIN cars ON users.id = cars.user_id").where("cars.id IS NULL")
It is incredibly inefficient to do this with ruby, as you appear to be trying to do.
You can throw an order on there too:
User.
joins("LEFT OUTER JOIN cars ON users.id = cars.user_id").
where("cars.id IS NULL").
order(:first_name, :last_name, :middle_name)
You can make this a scope on your User model so you only have one place to deal with it:
class User < ActiveRecord::Base
has_one :car
def self.without_cars
joins("LEFT OUTER JOIN cars ON users.id = cars.user_id").
where("cars.id IS NULL").
order(:first_name, :last_name, :middle_name)
end
end
This way you can do:
User.without_cars
In your controller or another method, or even chain the scope:
User.without_cars.where("users.birthday > ?", 18.years.ago)
to find users without cars that are under 18 years old (arbitrary example, but you get the idea). My point is, this kind of thing should always be made into a scope, so it can be chained with other scopes :) Arel is awesome that way.

ActiveRecord query array intersection?

I'm trying to figure out the count of certain types of articles. I have a very inefficient query:
Article.where(status: 'Finished').select{|x| x.tags & Article::EXPERT_TAGS}.size
In my quest to be a better programmer, I'm wondering how to make this a faster query. tags is an array of strings in Article, and Article::EXPERT_TAGS is another array of strings. I want to find the intersection of the arrays, and get the resulting record count.
EDIT: Article::EXPERT_TAGS and article.tags are defined as Mongo arrays. These arrays hold strings, and I believe they are serialized strings. For example: Article.first.tags = ["Guest Writer", "News Article", "Press Release"]. Unfortunately this is not set up properly as a separate table of Tags.
2nd EDIT: I'm using MongoDB, so actually it is using a MongoWrapper like MongoMapper or mongoid, not ActiveRecord. This is an error on my part, sorry! Because of this error, it screws up the analysis of this question. Thanks PinnyM for pointing out the error!
Since you are using MongoDB, you could also consider a MongoDB-specific solution (aggregation framework) for the array intersection, so that you could get the database to do all the work before fetching the final result.
See this SO thread How to check if an array field is a part of another array in MongoDB?
Assuming that the entire tags list is stored in a single database field and that you want to keep it that way, I don't see much scope of improvement, since you need to get all the data into Ruby for processing.
However, there is one problem with your database query
Article.where(status: 'Finished')
# This translates into the following query
SELECT * FROM articles WHERE status = 'Finished'
Essentially, you are fetching all the columns whereas you only need the tags column for your process. So, you can use pluck like this:
Article.where(status: 'Finished').pluck(:tags)
# This translates into the following query
SELECT tags FROM articles WHERE status = 'Finished'
I answered a question regarding general intersection like queries in ActiveRecord here.
Extracted below:
The following is a general approach I use for constructing intersection like queries in ActiveRecord:
class Service < ActiveRecord::Base
belongs_to :person
def self.with_types(*types)
where(service_type: types)
end
end
class City < ActiveRecord::Base
has_and_belongs_to_many :services
has_many :people, inverse_of: :city
end
class Person < ActiveRecord::Base
belongs_to :city, inverse_of: :people
def self.with_cities(cities)
where(city_id: cities)
end
# intersection like query
def self.with_all_service_types(*types)
types.map { |t|
joins(:services).merge(Service.with_types t).select(:id)
}.reduce(scoped) { |scope, subquery|
scope.where(id: subquery)
}
end
end
Person.with_all_service_types(1, 2)
Person.with_all_service_types(1, 2).with_cities(City.where(name: 'Gold Coast'))
It will generate SQL of the form:
SELECT "people".*
FROM "people"
WHERE "people"."id" in (SELECT "people"."id" FROM ...)
AND "people"."id" in (SELECT ...)
AND ...
You can create as many subqueries as required with the above approach based on any conditions/joins etc so long as each subquery returns the id of a matching person in its result set.
Each subquery result set will be AND'ed together thus restricting the matching set to the intersection of all of the subqueries.

How to write complex query in Ruby

Need advice, how to write complex query in Ruby.
Query in PHP project:
$get_trustee = db_query("SELECT t.trustee_name,t.secret_key,t.trustee_status,t.created,t.user_id,ui.image from trustees t
left join users u on u.id = t.trustees_id
left join user_info ui on ui.user_id = t.trustees_id
WHERE t.user_id='$user_id' AND trustee_status ='pending'
group by secret_key
ORDER BY t.created DESC")
My guess in Ruby:
get_trustee = Trustee.find_by_sql('SELECT t.trustee_name, t.secret_key, t.trustee_status, t.created, t.user_id, ui.image FROM trustees t
LEFT JOIN users u ON u.id = t.trustees_id
LEFT JOIN user_info ui ON ui.user_id = t.trustees_id
WHERE t.user_id = ? AND
t.trustee_status = ?
GROUP BY secret_key
ORDER BY t.created DESC',
[user_id, 'pending'])
Option 1 (Okay)
Do you mean Ruby with ActiveRecord? Are you using ActiveRecord and/or Rails? #find_by_sql is a method that exists within ActiveRecord. Also it seems like the user table isn't really needed in this query, but maybe you left something out? Either way, I'll included it in my examples. This query would work if you haven't set up your relationships right:
users_trustees = Trustee.
select('trustees.*, ui.image').
joins('LEFT OUTER JOIN users u ON u.id = trustees.trustees_id').
joins('LEFT OUTER JOIN user_info ui ON ui.user_id = t.trustees_id').
where(user_id: user_id, trustee_status: 'pending').
order('t.created DESC')
Also, be aware of a few things with this solution:
I have not found a super elegant way to get the columns from the join tables out of the ActiveRecord objects that get returned. You can access them by users_trustees.each { |u| u['image'] }
This query isn't really THAT complex and ActiveRecord relationships make it much easier to understand and maintain.
I'm assuming you're using a legacy database and that's why your columns are named this way. If I'm wrong and you created these tables for this app, then your life would be much easier (and conventional) with your primary keys being called id and your timestamps being called created_at and updated_at.
Option 2 (Better)
If you set up your ActiveRecord relationships and classes properly, then this query is much easier:
class Trustee < ActiveRecord::Base
self.primary_key = 'trustees_id' # wouldn't be needed if the column was id
has_one :user
has_one :user_info
end
class User < ActiveRecord::Base
belongs_to :trustee, foreign_key: 'trustees_id' # relationship can also go the other way
end
class UserInfo < ActiveRecord::Base
self.table_name = 'user_info'
belongs_to :trustee
end
Your "query" can now be ActiveRecord goodness if performance isn't paramount. The Ruby convention is readability first, reorganizing code later if stuff starts to scale.
Let's say you want to get a trustee's image:
trustee = Trustee.where(trustees_id: 5).first
if trustee
image = trustee.user_info.image
..
end
Or if you want to get all trustee's images:
Trustee.all.collect { |t| t.user_info.try(:image) } # using a #try in case user_info is nil
Option 3 (Best)
It seems like trustee is just a special-case user of some sort. You can use STI if you don't mind restructuring you tables to simplify even further.
This is probably outside of the scope of this question so I'll just link you to the docs on this: http://api.rubyonrails.org/classes/ActiveRecord/Base.html see "Single Table Inheritance". Also see the article that they link to from Martin Fowler (http://www.martinfowler.com/eaaCatalog/singleTableInheritance.html)
Resources
http://guides.rubyonrails.org/association_basics.html
http://guides.rubyonrails.org/active_record_querying.html
Yes, find_by_sql will work, you can try this also:
Trustee.connection.execute('...')
or for generic queries:
ActiveRecord::Base.connection.execute('...')

How to filter association_ids for an ActiveRecord model?

In a domain like this:
class User
has_many :posts
has_many :topics, :through => :posts
end
class Post
belongs_to :user
belongs_to :topic
end
class Topic
has_many :posts
end
I can read all the Topic ids through user.topic_ids but I can't see a way to apply filtering conditions to this method, since it returns an Array instead of a ActiveRecord::Relation.
The problem is, given a User and an existing set of Topics, marking the ones for which there is a post by the user. I am currently doing something like this:
def mark_topics_with_post(user, topics)
# only returns the ids of the topics for which this user has a post
topic_ids = user.topic_ids
topics.each {|t| t[:has_post]=topic_ids.include(t.id)}
end
But this loads all the topic ids regardless of the input set. Ideally, I'd like to do something like
def mark_topics_with_post(user, topics)
# only returns the topics where user has a post within the subset of interest
topic_ids = user.topic_ids.where(:id=>topics.map(&:id))
topics.each {|t| t[:has_post]=topic_ids.include(t.id)}
end
But the only thing I can do concretely is
def mark_topics_with_post(user, topics)
# needlessly create Post objects only to unwrap them later
topic_ids = user.posts.where(:topic_id=>topics.map(&:id)).select(:topic_id).map(&:topic_id)
topics.each {|t| t[:has_post]=topic_ids.include(t.id)}
end
Is there a better way?
Is it possible to have something like select_values on a association or scope?
FWIW, I'm on rails 3.0.x, but I'd be curious about 3.1 too.
Why am I doing this?
Basically, I have a result page for a semi-complex search (which happens based on the Topic data only), and I want to mark the results (Topics) as stuff on which the user has interacted (wrote a Post).
So yeah, there is another option which would be doing a join [Topic,Post] so that the results come out as marked or not from the search, but this would destroy my ability to cache the Topic query (the query, even without the join, is more expensive than fetching only the ids for the user)
Notice the approaches outlined above do work, they just feel suboptimal.
I think that your second solution is almost the optimal one (from the point of view of the queries involved), at least with respect to the one you'd like to use.
user.topic_ids generates the query:
SELECT `topics`.id FROM `topics`
INNER JOIN `posts` ON `topics`.`id` = `posts`.`topic_id`
WHERE `posts`.`user_id` = 1
if user.topic_ids.where(:id=>topics.map(&:id)) was possible it would have generated this:
SELECT topics.id FROM `topics`
INNER JOIN `posts` ON `topics`.`id` = `posts`.`topic_id`
WHERE `posts`.`user_id` = 1 AND `topics`.`id` IN (...)
this is exactly the same query that is generated doing: user.topics.select("topics.id").where(:id=>topics.map(&:id))
while user.posts.select(:topic_id).where(:topic_id=>topics.map(&:id)) generates the following query:
SELECT topic_id FROM `posts`
WHERE `posts`.`user_id` = 1 AND `posts`.`topic_id` IN (...)
which one of the two is more efficient depends on the data in the actual tables and indices defined (and which db is used).
If the topic ids list for the user is long and has topics repeated many times, it may make sense to group by topic id at the query level:
user.posts.select(:topic_id).group(:topic_id).where(:topic_id=>topics.map(&:id))
Suppose your Topic model has a column named id you can do something like this
Topic.select(:id).join(:posts).where("posts.user_id = ?", user_id)
This will run only one query against your database and will give you all the topics ids that have posts for a given user_id

Resources