Searching multiple tables with postgreSQL 13 and Rails 6+ - ruby-on-rails

I provide a lot of context to set the stage for the question. What I'm trying to solve is fast and accurate fuzzysearch against multiple database tables using structured data, not full-text document search.
I'm using postgreSQL 13.4+ and Rails 6+ if it matters.
I have fairly structured data for several tables:
class Contact
attribute :id
attribute :first_name
attribute :last_name
attribute :email
attribute :phone
end
class Organization
attribute :name
attribute :license_number
end
...several other tables...
I'm trying to implement a fast and accurate fuzzysearch so that I can search across all these tables (Rails models) at once.
Currently I have a separate search query using ILIKE that concats the columns I want to search against on-the-fly for each model:
# contact.rb
scope :search -> (q) { where("concat_ws(' ', first_name, last_name, email, phone) ILIKE :q", q: "%#{q}%")
# organization.rb
scope :search -> (q) { where("concat_ws(' ', name, license_number) ILIKE :q", q: "%#{q}%") }
In my search controller I query each of these tables separately and display the top 3 results for each model.
#contacts = Contact.search(params[:q]).limit(3)
#organizations = Organization.search(params[:q]).limit(3)
This works but is fairly slow and not as accurate as I would like.
Problems with my current approach:
Slow (relatively speaking) with only thousands of records.
Not accurate because ILIKE must have an exact match somewhere in the string and I want to implement fuzzysearch (ie, with ILIKE, "smth" would not match "smith").
Not weighted; I would like to weight the contacts.last_name column over say the organizations.name because the contacts table is generally speaking the higher priority search item.
My solution
My theoretical solution is to create a search_entries polymorphic table that has a separate record for each contact, organization, etc, that I want to search against, and then this search_entries table could be indexed for fast retrieval.
class SearchEntry
attribute :data
belongs_to :searchable, polymorphic: true
# Store data as all lowercase to optimize search (avoid lower method in PG)
def data=(text)
self[:data] = text.lowercase
end
end
However, what I'm getting stuck on is how to structure this table so that it can be indexed and searched quickly.
contact = Contact.first
SearchEntry.create(searchable: contact, data: "#{contact.first_name} #{contact.last_name} #{contact.email} #{contact.phone}")
organization = Organization.first
SearchEntry.create(searchable: organization, data: "#{organization.name} #{organization.license_number}")
This gives me the ability to do something like:
SearchEntry.where("data LIKE :q", q: "%#{q}%")
or even something like fuzzysearch using PG's similarity() function:
SearchEntry.connection.execute("SELECT * FROM search_entries ORDER BY SIMILARITY(data, '#{q}') LIMIT 10")
I believe I can use a GIN index with pg_trgm on this data field as well to optimize searching (not 100% on that...).
This simplifies my search into a single query on a single table, but it still doesn't allow me to do weighted column searching (ie, contacts.last_name is more important than organizations.name).
Questions
Would this approach enable me to index the data so that I could have very fast fuzzysearch? (I know "very fast" is subjective, so what I mean is an efficient usage of PG to get results as quickly as possible).
Would I be able to use a GIN index combined with pg_trgm tri-grams to index this data for fast fuzzysearch?
How would I implement weighting certain values higher than others in an approach like this?

One potential solution is to create a materialized view consisting of a union of data from the two (or more tables). Take this simplefied example:
CREATE MATERIALIZED VIEW searchables AS
SELECT
resource_id,
resource_type,
name,
weight
FROM
SELECT
id as resource_id,
'Contact' as resource_type
concat_ws(' ', first_name, last_name) AS name,
1 AS weight
FROM contacts
UNION
SELECT
id as resource_id,
'Organization' as resource_type
name
2 AS weight
FROM organizations
class Searchable < ApplicationRecord
belongs_to :resource, polymorphic: true
def readonly?
true
end
# Search contacts and organziations with a higher weight on contacts
def self.search(name)
where(arel_table[:name].matches(name)).order(weight: :desc)
end
end
Since materialized views are stored in a table like structure you can apply indices just like you could with a normal table:
CREATE INDEX searchables_name_trgm ON name USING gist (searchables gist_trgm_ops);
To ActiveRecord it also behaves just like a normal table.
Of course the complexity here will grow with number of columns you want to search and the end result might end up both underwhelming in functionality and overwhelming in complexity compared to an off the shelf solution with thousands of hours behind it.
The scenic gem can be used to make the migrations for creating materialized views simpler.

Related

Arel: join table, sort on joined field

I have the following ActiveRecord Models
class Publication < ActiveRecord::Base
attr_accessible :id, :pname
has_many :advertisements
end
class Vendor < ActiveRecord::Base
attr_accessible :id, :vname
has_many :advertisements
end
class Advertisement < ActiveRecord::Base
attr_accessible :id, :vendor_id, :publication_id, :prose, :aname
belongs_to :vendor
belongs_to :publication
end
The tables for these have the same fields as their accessible attributes.
I would like to be able to sort on the publication name, ad name, or vendor name, ascending or descending.
I also have a controller for the advertisements, where I want to display a list of ads. The list displays the name of the ad (aname), prose of the ad (prose), the name of the vendor (vname), and the name of the publication (pname).
The SQL query for ordering by publication name would look something like:
SELECT ads.aname AS aname, ads.id, ads.prose, ven.vname AS vname, pub.pname AS pname
FROM advertisements AS ads
INNER JOIN publications AS pub ON ads.publication_id = pub.id
INNER JOIN vendors AS ven ON ads.vendor_id = ven.id
ORDER BY <sort_column> <sort_order>
Where sort_column could be one of "pname", "aname", or "vname", and sort_order could be one of "ASC" or "DESC", and both would come as parameters from the web form along with the pagination page number.
The controller index code looks like this:
class AdvertisementsController < ApplicationController
def index
sort_column = params[:sort_column]
sort_order = params[:sort_order]
#ads = Advertisement.join( somehow join tables)
.where(some condition).where(some other condition)
.order("#{sort_column} #{sort_order}") ### I don't know what to do here
.paginate(page: params[:page], per_page: 10) #from will_paginate
end
# other controller methods.......
end
The index view table snippet (written in SLIM) looks like this:
tr
- #ads.each do |ad|
td = ad.id
td = ad.aname
td = ad.pname
td = ad.vname
I am aware that I could use AREL to do this, but I have been mucking around with AREL in the Rails console trying to generate and execute this query with pagination, and reading tutorials on the web and I can't figure out how to get this query in AREL, with sorting on joined fields, and with the ability to use a will_paginate Ruby query clause to paginate the query.
How does one use AREL, or even ActiveRecord to do this? I appreciate any help I can get.
You can accomplish what you want with vanilla ActiveRecord methods, without Arel. What you have is pretty close, this might help you get there.
# whitelist incoming params
sort_column = %w(pname aname vname).include?(params[:sort_column]) ? params[:sort_column] : "pname"
sort_order = %w(asc desc).include?(params[:sort_order]) ? params[:sort_order] : "desc"
#ads = Advertisement.select("advertisements.*, vendors.vname, publications.pname").
joins(:publication, :vendor).
where(some condition).
where(some other condition).
order("#{sort_column} #{sort_order}").
page(params[:page]).per_page(10)
You can have the solution work with both Arel and ActiveRecord. I would suggest you stick to ActiveRecord as much as you can unless you cant do it with AR.
Arel is great, but lately I have seen that in my code base, it reduces overall readability, esp if you mix it with AR or use too much of it.
Also couple of other suggestions:
On the same query about try using "includes" instead of using "joins". You might that its easier than having to add the select clause. I use includes for outerjoins, but for a more detailed comparison, google "includes vs joins", it is very interesting.
In complete opposite direction of my first suggestion, in case you queries are going to get complex, I highly recommend using https://github.com/activerecord-hackery/ransack or https://github.com/activerecord-hackery/squeel for your use case.
Especially if you not doing the above for just learning purposes.

ActiveRecord query array intersection?

I'm trying to figure out the count of certain types of articles. I have a very inefficient query:
Article.where(status: 'Finished').select{|x| x.tags & Article::EXPERT_TAGS}.size
In my quest to be a better programmer, I'm wondering how to make this a faster query. tags is an array of strings in Article, and Article::EXPERT_TAGS is another array of strings. I want to find the intersection of the arrays, and get the resulting record count.
EDIT: Article::EXPERT_TAGS and article.tags are defined as Mongo arrays. These arrays hold strings, and I believe they are serialized strings. For example: Article.first.tags = ["Guest Writer", "News Article", "Press Release"]. Unfortunately this is not set up properly as a separate table of Tags.
2nd EDIT: I'm using MongoDB, so actually it is using a MongoWrapper like MongoMapper or mongoid, not ActiveRecord. This is an error on my part, sorry! Because of this error, it screws up the analysis of this question. Thanks PinnyM for pointing out the error!
Since you are using MongoDB, you could also consider a MongoDB-specific solution (aggregation framework) for the array intersection, so that you could get the database to do all the work before fetching the final result.
See this SO thread How to check if an array field is a part of another array in MongoDB?
Assuming that the entire tags list is stored in a single database field and that you want to keep it that way, I don't see much scope of improvement, since you need to get all the data into Ruby for processing.
However, there is one problem with your database query
Article.where(status: 'Finished')
# This translates into the following query
SELECT * FROM articles WHERE status = 'Finished'
Essentially, you are fetching all the columns whereas you only need the tags column for your process. So, you can use pluck like this:
Article.where(status: 'Finished').pluck(:tags)
# This translates into the following query
SELECT tags FROM articles WHERE status = 'Finished'
I answered a question regarding general intersection like queries in ActiveRecord here.
Extracted below:
The following is a general approach I use for constructing intersection like queries in ActiveRecord:
class Service < ActiveRecord::Base
belongs_to :person
def self.with_types(*types)
where(service_type: types)
end
end
class City < ActiveRecord::Base
has_and_belongs_to_many :services
has_many :people, inverse_of: :city
end
class Person < ActiveRecord::Base
belongs_to :city, inverse_of: :people
def self.with_cities(cities)
where(city_id: cities)
end
# intersection like query
def self.with_all_service_types(*types)
types.map { |t|
joins(:services).merge(Service.with_types t).select(:id)
}.reduce(scoped) { |scope, subquery|
scope.where(id: subquery)
}
end
end
Person.with_all_service_types(1, 2)
Person.with_all_service_types(1, 2).with_cities(City.where(name: 'Gold Coast'))
It will generate SQL of the form:
SELECT "people".*
FROM "people"
WHERE "people"."id" in (SELECT "people"."id" FROM ...)
AND "people"."id" in (SELECT ...)
AND ...
You can create as many subqueries as required with the above approach based on any conditions/joins etc so long as each subquery returns the id of a matching person in its result set.
Each subquery result set will be AND'ed together thus restricting the matching set to the intersection of all of the subqueries.

Texticle and ActsAsTaggableOn

I'm trying to implement search over tags as part of a Texticle search. Since texticle doesn't search over multiple tables from the same model, I ended up creating a new model called PostSearch, following Texticle's suggestion about System-Wide Searching
class PostSearch < ActiveRecord::Base
# We want to reference various models
belongs_to :searchable, :polymorphic => true
# Wish we could eliminate n + 1 query problems,
# but we can't include polymorphic models when
# using scopes to search in Rails 3
# default_scope :include => :searchable
# Search.new('query') to search for 'query'
# across searchable models
def self.new(query)
debugger
query = query.to_s
return [] if query.empty?
self.search(query).map!(&:searchable)
#self.search(query) <-- this works, not sure why I shouldn't use it.
end
# Search records are never modified
def readonly?; true; end
# Our view doesn't have primary keys, so we need
# to be explicit about how to tell different search
# results apart; without this, we can't use :include
# to avoid n + 1 query problems
def hash
id.hash
end
def eql?(result)
id == result.id
end
end
In my Postgres DB I created a view like this:
CREATE VIEW post_searches AS
SELECT posts.id, posts.name, string_agg(tags.name, ', ') AS tags
FROM posts
LEFT JOIN taggings ON taggings.taggable_id = posts.id
LEFT JOIN tags ON taggings.tag_id = tags.id
GROUP BY posts.id;
This allows me to get posts like this:
SELECT * FROM post_searches
id | name | tags
1 Intro introduction, funny, nice
So it seems like that should all be fine. Unfortunately calling
PostSearch.new("funny") returns [nil] (NOT []). Looking through the Texticle source code, it seems like this line in the PostSearch.new
self.search(query).map!(&:searchable)
maps the fields using some sort of searchable_columns method and does it ?incorrectly? and results in a nil.
On a different note, the tags field doesn't get searched in the texticle SQL query unless I cast it from a text type to a varchar type.
So, in summary:
Why does the object get mapped to nil when it is found?
AND
Why does texticle ignore my tags field unless it is varchar?
Texticle maps objects to nil instead of nothing so that you can check for nil? - it's a safeguard against erroring out checking against non-existent items. It might be worth asking tenderlove himself as to exactly why he did it that way.
I'm not completely positive as to why Texticle ignores non-varchars, but it looks like it's a performance safeguard so that Postgres does not do full table scans (under the section Creating Indexes for Super Speed):
You will need to add an index for every text/string column you query against, or else Postgresql will revert to a full table scan instead of using the indexes.

How to filter association_ids for an ActiveRecord model?

In a domain like this:
class User
has_many :posts
has_many :topics, :through => :posts
end
class Post
belongs_to :user
belongs_to :topic
end
class Topic
has_many :posts
end
I can read all the Topic ids through user.topic_ids but I can't see a way to apply filtering conditions to this method, since it returns an Array instead of a ActiveRecord::Relation.
The problem is, given a User and an existing set of Topics, marking the ones for which there is a post by the user. I am currently doing something like this:
def mark_topics_with_post(user, topics)
# only returns the ids of the topics for which this user has a post
topic_ids = user.topic_ids
topics.each {|t| t[:has_post]=topic_ids.include(t.id)}
end
But this loads all the topic ids regardless of the input set. Ideally, I'd like to do something like
def mark_topics_with_post(user, topics)
# only returns the topics where user has a post within the subset of interest
topic_ids = user.topic_ids.where(:id=>topics.map(&:id))
topics.each {|t| t[:has_post]=topic_ids.include(t.id)}
end
But the only thing I can do concretely is
def mark_topics_with_post(user, topics)
# needlessly create Post objects only to unwrap them later
topic_ids = user.posts.where(:topic_id=>topics.map(&:id)).select(:topic_id).map(&:topic_id)
topics.each {|t| t[:has_post]=topic_ids.include(t.id)}
end
Is there a better way?
Is it possible to have something like select_values on a association or scope?
FWIW, I'm on rails 3.0.x, but I'd be curious about 3.1 too.
Why am I doing this?
Basically, I have a result page for a semi-complex search (which happens based on the Topic data only), and I want to mark the results (Topics) as stuff on which the user has interacted (wrote a Post).
So yeah, there is another option which would be doing a join [Topic,Post] so that the results come out as marked or not from the search, but this would destroy my ability to cache the Topic query (the query, even without the join, is more expensive than fetching only the ids for the user)
Notice the approaches outlined above do work, they just feel suboptimal.
I think that your second solution is almost the optimal one (from the point of view of the queries involved), at least with respect to the one you'd like to use.
user.topic_ids generates the query:
SELECT `topics`.id FROM `topics`
INNER JOIN `posts` ON `topics`.`id` = `posts`.`topic_id`
WHERE `posts`.`user_id` = 1
if user.topic_ids.where(:id=>topics.map(&:id)) was possible it would have generated this:
SELECT topics.id FROM `topics`
INNER JOIN `posts` ON `topics`.`id` = `posts`.`topic_id`
WHERE `posts`.`user_id` = 1 AND `topics`.`id` IN (...)
this is exactly the same query that is generated doing: user.topics.select("topics.id").where(:id=>topics.map(&:id))
while user.posts.select(:topic_id).where(:topic_id=>topics.map(&:id)) generates the following query:
SELECT topic_id FROM `posts`
WHERE `posts`.`user_id` = 1 AND `posts`.`topic_id` IN (...)
which one of the two is more efficient depends on the data in the actual tables and indices defined (and which db is used).
If the topic ids list for the user is long and has topics repeated many times, it may make sense to group by topic id at the query level:
user.posts.select(:topic_id).group(:topic_id).where(:topic_id=>topics.map(&:id))
Suppose your Topic model has a column named id you can do something like this
Topic.select(:id).join(:posts).where("posts.user_id = ?", user_id)
This will run only one query against your database and will give you all the topics ids that have posts for a given user_id

How to join multiple tables with one to one relationships in rails

Ruby on Rails is very new to me. I am trying to retrieve set of columns from 3 different tables. I thought I could use SQL view to retrieve my results but could not find a way to use views in Rails. Here are my tables.
1) User table --> user name, password and email
2) UserDetails table --> foreign key: user_id, name, address1, city etc.
3) UserWorkDetails --> foreign key: user_id, work address1, work type, etc
These 3 tables have one to one relationships. So table 2 belongs to table 1 and table 3 also belongs to table 1. Table 1 has one userdetails and one userworkdetails.
I want to get user email, name, address1, city, work address1, work type using joins.
What is the best way to handle this?
The data is (are) in the models. Everything else is just an optimization. So address1 is at user.user_detail.address1, for instance.
if you have
class User
has_one :user_detail
has_one :user_work_detail
end
class UserDetail
belongs_to :user
end
class UserWorkDetail
belongs_to :user
end
With user_id columns in tables named user_details and user_work_details then everything else is done for you.
If you later need to optimize you can :include the owned models, but it's not necessary for everything to work.
To get what you want done quickly use the :include option to include both the other tables when you query the primary table, so:
some_user_details = User.find(some_id, :include => [:user_details, :user_work_details])
This will just load all of the fields from the tables at once so there's only one query executed, then you can do what you need with the objects as they will contain all of the user data.
I find that this is simple enough and sufficient, and with this you're not optimising too early before you know where the bottlenecks are.
However if you really want to just load the required fields use the :select option as well on the ActiveRecord::Base find method:
some_user_details = User.find(some_id, :include => [:user_details, :user_work_details], :select => "id, email, name, address1, city, work_address1")
Although my SQL is a bit rusty at the moment.
User.find(:first, :joins => [:user_work_details, :user_details], :conditions => {:id => id})
I'd say that trying to just select the fields you want is a premature optimization at this point, but you could do it in a complicated :select hash.
:include will do 3 selects, 1 on user, one on user_details, and one on user_work_details. It's great for selecting a collection of objects from multiple tables in minimum queries.
http://apidock.com/rails/ActiveRecord/Base/find/class

Resources