I have three tables: Accounts, Investments, and Games. An Investment has an account_id, game_id, some statistic counters, and is created the first time an Account participates in a Game.
I want to provide a JSON list of the latest Games along with the user's Investment in that Game, like this:
[{id: 666, name: "Foobar", ..., investment: {tokens: 58, credits: 42, ...}},...]
If they have not yet participated in the game, I still want to include an Investment object with default values, so I overrode the serializable_hash function in my Game model:
# game.rb
has_many :investments
def serializable_hash(options=nil)
options ||= {}
i = investments.find_or_initialize_by_account_id options[:uid]
{:id => id, ..., :investment => i.serializable_hash}
end
However, when I run something like Game.find(list_of_ids).to_json(:uid => current_user.id), Rails does a separate query on the Investments table for each Game. I tried Game.includes(:investments).find(list_of_ids).to_json(:uid => current_user.id) but not only does that load the investments for all users, it still does a separate query for each game to find or initialize the investment object.
In short, given a list of game IDs and an account id, what's a clean way to load the associated Investment objects that exist in one query, and initialize the rest?
You want to give the list of ids to the server in one go. I use the IN operator for this:
Game.includes(:investments).where("games.id IN (?)", list_of_ids)
Related
I provide a lot of context to set the stage for the question. What I'm trying to solve is fast and accurate fuzzysearch against multiple database tables using structured data, not full-text document search.
I'm using postgreSQL 13.4+ and Rails 6+ if it matters.
I have fairly structured data for several tables:
class Contact
attribute :id
attribute :first_name
attribute :last_name
attribute :email
attribute :phone
end
class Organization
attribute :name
attribute :license_number
end
...several other tables...
I'm trying to implement a fast and accurate fuzzysearch so that I can search across all these tables (Rails models) at once.
Currently I have a separate search query using ILIKE that concats the columns I want to search against on-the-fly for each model:
# contact.rb
scope :search -> (q) { where("concat_ws(' ', first_name, last_name, email, phone) ILIKE :q", q: "%#{q}%")
# organization.rb
scope :search -> (q) { where("concat_ws(' ', name, license_number) ILIKE :q", q: "%#{q}%") }
In my search controller I query each of these tables separately and display the top 3 results for each model.
#contacts = Contact.search(params[:q]).limit(3)
#organizations = Organization.search(params[:q]).limit(3)
This works but is fairly slow and not as accurate as I would like.
Problems with my current approach:
Slow (relatively speaking) with only thousands of records.
Not accurate because ILIKE must have an exact match somewhere in the string and I want to implement fuzzysearch (ie, with ILIKE, "smth" would not match "smith").
Not weighted; I would like to weight the contacts.last_name column over say the organizations.name because the contacts table is generally speaking the higher priority search item.
My solution
My theoretical solution is to create a search_entries polymorphic table that has a separate record for each contact, organization, etc, that I want to search against, and then this search_entries table could be indexed for fast retrieval.
class SearchEntry
attribute :data
belongs_to :searchable, polymorphic: true
# Store data as all lowercase to optimize search (avoid lower method in PG)
def data=(text)
self[:data] = text.lowercase
end
end
However, what I'm getting stuck on is how to structure this table so that it can be indexed and searched quickly.
contact = Contact.first
SearchEntry.create(searchable: contact, data: "#{contact.first_name} #{contact.last_name} #{contact.email} #{contact.phone}")
organization = Organization.first
SearchEntry.create(searchable: organization, data: "#{organization.name} #{organization.license_number}")
This gives me the ability to do something like:
SearchEntry.where("data LIKE :q", q: "%#{q}%")
or even something like fuzzysearch using PG's similarity() function:
SearchEntry.connection.execute("SELECT * FROM search_entries ORDER BY SIMILARITY(data, '#{q}') LIMIT 10")
I believe I can use a GIN index with pg_trgm on this data field as well to optimize searching (not 100% on that...).
This simplifies my search into a single query on a single table, but it still doesn't allow me to do weighted column searching (ie, contacts.last_name is more important than organizations.name).
Questions
Would this approach enable me to index the data so that I could have very fast fuzzysearch? (I know "very fast" is subjective, so what I mean is an efficient usage of PG to get results as quickly as possible).
Would I be able to use a GIN index combined with pg_trgm tri-grams to index this data for fast fuzzysearch?
How would I implement weighting certain values higher than others in an approach like this?
One potential solution is to create a materialized view consisting of a union of data from the two (or more tables). Take this simplefied example:
CREATE MATERIALIZED VIEW searchables AS
SELECT
resource_id,
resource_type,
name,
weight
FROM
SELECT
id as resource_id,
'Contact' as resource_type
concat_ws(' ', first_name, last_name) AS name,
1 AS weight
FROM contacts
UNION
SELECT
id as resource_id,
'Organization' as resource_type
name
2 AS weight
FROM organizations
class Searchable < ApplicationRecord
belongs_to :resource, polymorphic: true
def readonly?
true
end
# Search contacts and organziations with a higher weight on contacts
def self.search(name)
where(arel_table[:name].matches(name)).order(weight: :desc)
end
end
Since materialized views are stored in a table like structure you can apply indices just like you could with a normal table:
CREATE INDEX searchables_name_trgm ON name USING gist (searchables gist_trgm_ops);
To ActiveRecord it also behaves just like a normal table.
Of course the complexity here will grow with number of columns you want to search and the end result might end up both underwhelming in functionality and overwhelming in complexity compared to an off the shelf solution with thousands of hours behind it.
The scenic gem can be used to make the migrations for creating materialized views simpler.
I have this code:
Business.all.limit(50).each do |business|
card = {name: business.name, logo: business.logo, category: business.category.name}
feed << card
end
In my models, Business belongs to Category, and Category has many Business
My problem is that this will query the DB 50 times, each time I want to retrieve each business' category name.
I have seen Rails cache effectively by using :include, but all examples I have seen are for child records, for example:
Category.all :include => [:businesses]
but in this case I want to cache the parent's data.
Its the same you can do by using singular model name
Business.includes(:category)
Data structure is as follows:
class Job
has_many job_sections
has_many job_products, through job_sections
end
class JobSection
belongs_to job
has_many job_products
end
class JobProduct
belongs_to product
belongs_to job_section
end
When I call job.job_products i could end up with something like this:
#<JobProduct:0x007ff4128b0ca0
id: 18133,
product_id: 250,
quantity: 3,
frozen_cache: {},
discount: 0.0,
#<JobProduct:0x007ff4128b00c0
id: 18134,
product_id: 250,
quantity: 1,
frozen_cache: {},
discount: 0.0]
As you can see the product_id is identical in both instances.
How do I merge the contents of the arrays by product id so I retrieve and act on them as aggregated values?
In a way, I need to be able to act on job products by their product_id rather than their id.
Effectively the result being something like this...
[#<SomeFancySeerviceObjectMaybe?
product_id: 250,
quantity: 4,
frozen_cache: {},
discount: 0.0]
Do I opt for a little Plain Old Ruby Object to handle them all, or do I have to rethink the architecture of this, or is there (hopefully!) a bit of Rails secret sauce that than can help me out?
*FYI Job Section is a recent addition to the architecture, and I don't think its has been particularly well thought through. However, I can't spend too much time reversing what's already in place.
This set up isn't ideal, I'm probably the sixth dev in as many years to start picking this apart.
Your suggestions are most welcome. Thank you
In SQL this would be something like this:
SELECT SUM(quantity)
FROM job_products
WHERE product_id = 250
GROUP BY product_id
You can do that in ActiveRecord too. If you just want an integer, you can use pluck:
total_quantity = job.job_products.
group(:product_id).
pluck("SUM(job_products.quantity)").
first
You can also pluck several columns if you want (in Rails 4+), which is why it returns an array. So if you want average discount at the same time, it's easy.
If you would prefer a JobProduct instance, you can get that too, but in your case a lot of the attributes will be nil because of the grouping. But you can say:
summary = job.job_products.
group(:product_id).
select("SUM(job_products.quantity) AS total_quantity").
first
And you'll get a read-only JobProduct with an extra attribute named total_quantity. So you can do summary.total_quantity. But because of the grouping, summary will have a nil id, discount, etc. Basically it will only have attributes matching the things you select. This is a little weird, but sometimes it lets you write methods that work both on "real" JobProducts and for these summary versions.
I have the following classes and relationships
City has_many Cinemas
Cinemas has_many Movies
Movies has_many Ratings
Movies Has_many Genres through GenreMovie
and I want to test queries like
* Show me the all movies in NewYork
* Show me the all movies in NewYork order by the rating
* Show me the all movies in NewYork order by length_of_movie, in genre "Action"
* show me all movies in Cinema "X" order by rating, that are in Genre "SciFi"
Currently the way I am doing as below, using factory girl, and chaining a bunch of models together to have data to check against,
city = create(:city)
cinema = create(:cinema, city: city)
5.times do
movie = create(:movie, cinema: cinema, tags: ["sci fi", "action"]
3.times do
create(:rating, score: 2.3, movie: movie)
end
end
and repeating that 3-4 to generate enough data to query against but it seems so clunky.
Is there a better way ?
I normally test this using a very "minimalistic" approach:
e.g. for your first case I would create two movies, one in NY, and one outside. Your method should only return the one in NY
For the second, create three movies, both in NY, with different rating. Create them in a not logic way, so that, no matter what, they will be sorted. Check whether your method returns them in the right order
Similar for the other cases.
I would not just create 5x3 movies. Makes no sense, and only costs time...
There are several factory_girl constructs you could use to clean these up. create_list will create an array of objects, cleaning up your x.times do blocks. A with_ratings trait on your movie factory could allow you to opt in to having ratings automatically created when you create a movie (via FactoryGirl callbacks). You could even have that use a transient attribute in order to control the number and rating. So your result could look something like this:
cinema = create(:cinema)
movies = create_list(
:movie,
5,
:with_ratings,
cinema: cinema,
tags: [...],
ratings_count: 3,
ratings_value: 2.3
)
If you need access to the city, you can get it via cinema.city.
See:
transient attributes
traits
In my rails project, I have a query which finds the 10 most recent contests and orders by their associated poll dates:
#contests = Contest.find(
:all,
:limit => "10",
:include => :polls,
:order => "polls.start_date DESC" )
Currently this shows each contest and then iterates through associated polls sorting the master list by poll start date.
Some of these contests have the same :geo, :office and :cd attributes. I would like to combine those in the view, so rather than listing each contest and iterating through each associated poll (as I'm doing right now), I'd like to iterate through each unique combination of :geo, :office and :cd and then for each "group," iterate through all associated polls regardless of associated contest and sort by polls.start_date. I'd like to do this without having to create more cruft in the db.
Unless I've misunderstood, I think you might be looking for this:
#contests.group_by { |c| [c.geo, c.office, c.cd] }
It gives you a Hash, keyed on [c.geo, c.office, c.cd], each entry of which contains an Array of the contests that share the combination.