I'm trying to implement search over tags as part of a Texticle search. Since texticle doesn't search over multiple tables from the same model, I ended up creating a new model called PostSearch, following Texticle's suggestion about System-Wide Searching
class PostSearch < ActiveRecord::Base
# We want to reference various models
belongs_to :searchable, :polymorphic => true
# Wish we could eliminate n + 1 query problems,
# but we can't include polymorphic models when
# using scopes to search in Rails 3
# default_scope :include => :searchable
# Search.new('query') to search for 'query'
# across searchable models
def self.new(query)
debugger
query = query.to_s
return [] if query.empty?
self.search(query).map!(&:searchable)
#self.search(query) <-- this works, not sure why I shouldn't use it.
end
# Search records are never modified
def readonly?; true; end
# Our view doesn't have primary keys, so we need
# to be explicit about how to tell different search
# results apart; without this, we can't use :include
# to avoid n + 1 query problems
def hash
id.hash
end
def eql?(result)
id == result.id
end
end
In my Postgres DB I created a view like this:
CREATE VIEW post_searches AS
SELECT posts.id, posts.name, string_agg(tags.name, ', ') AS tags
FROM posts
LEFT JOIN taggings ON taggings.taggable_id = posts.id
LEFT JOIN tags ON taggings.tag_id = tags.id
GROUP BY posts.id;
This allows me to get posts like this:
SELECT * FROM post_searches
id | name | tags
1 Intro introduction, funny, nice
So it seems like that should all be fine. Unfortunately calling
PostSearch.new("funny") returns [nil] (NOT []). Looking through the Texticle source code, it seems like this line in the PostSearch.new
self.search(query).map!(&:searchable)
maps the fields using some sort of searchable_columns method and does it ?incorrectly? and results in a nil.
On a different note, the tags field doesn't get searched in the texticle SQL query unless I cast it from a text type to a varchar type.
So, in summary:
Why does the object get mapped to nil when it is found?
AND
Why does texticle ignore my tags field unless it is varchar?
Texticle maps objects to nil instead of nothing so that you can check for nil? - it's a safeguard against erroring out checking against non-existent items. It might be worth asking tenderlove himself as to exactly why he did it that way.
I'm not completely positive as to why Texticle ignores non-varchars, but it looks like it's a performance safeguard so that Postgres does not do full table scans (under the section Creating Indexes for Super Speed):
You will need to add an index for every text/string column you query against, or else Postgresql will revert to a full table scan instead of using the indexes.
Related
I provide a lot of context to set the stage for the question. What I'm trying to solve is fast and accurate fuzzysearch against multiple database tables using structured data, not full-text document search.
I'm using postgreSQL 13.4+ and Rails 6+ if it matters.
I have fairly structured data for several tables:
class Contact
attribute :id
attribute :first_name
attribute :last_name
attribute :email
attribute :phone
end
class Organization
attribute :name
attribute :license_number
end
...several other tables...
I'm trying to implement a fast and accurate fuzzysearch so that I can search across all these tables (Rails models) at once.
Currently I have a separate search query using ILIKE that concats the columns I want to search against on-the-fly for each model:
# contact.rb
scope :search -> (q) { where("concat_ws(' ', first_name, last_name, email, phone) ILIKE :q", q: "%#{q}%")
# organization.rb
scope :search -> (q) { where("concat_ws(' ', name, license_number) ILIKE :q", q: "%#{q}%") }
In my search controller I query each of these tables separately and display the top 3 results for each model.
#contacts = Contact.search(params[:q]).limit(3)
#organizations = Organization.search(params[:q]).limit(3)
This works but is fairly slow and not as accurate as I would like.
Problems with my current approach:
Slow (relatively speaking) with only thousands of records.
Not accurate because ILIKE must have an exact match somewhere in the string and I want to implement fuzzysearch (ie, with ILIKE, "smth" would not match "smith").
Not weighted; I would like to weight the contacts.last_name column over say the organizations.name because the contacts table is generally speaking the higher priority search item.
My solution
My theoretical solution is to create a search_entries polymorphic table that has a separate record for each contact, organization, etc, that I want to search against, and then this search_entries table could be indexed for fast retrieval.
class SearchEntry
attribute :data
belongs_to :searchable, polymorphic: true
# Store data as all lowercase to optimize search (avoid lower method in PG)
def data=(text)
self[:data] = text.lowercase
end
end
However, what I'm getting stuck on is how to structure this table so that it can be indexed and searched quickly.
contact = Contact.first
SearchEntry.create(searchable: contact, data: "#{contact.first_name} #{contact.last_name} #{contact.email} #{contact.phone}")
organization = Organization.first
SearchEntry.create(searchable: organization, data: "#{organization.name} #{organization.license_number}")
This gives me the ability to do something like:
SearchEntry.where("data LIKE :q", q: "%#{q}%")
or even something like fuzzysearch using PG's similarity() function:
SearchEntry.connection.execute("SELECT * FROM search_entries ORDER BY SIMILARITY(data, '#{q}') LIMIT 10")
I believe I can use a GIN index with pg_trgm on this data field as well to optimize searching (not 100% on that...).
This simplifies my search into a single query on a single table, but it still doesn't allow me to do weighted column searching (ie, contacts.last_name is more important than organizations.name).
Questions
Would this approach enable me to index the data so that I could have very fast fuzzysearch? (I know "very fast" is subjective, so what I mean is an efficient usage of PG to get results as quickly as possible).
Would I be able to use a GIN index combined with pg_trgm tri-grams to index this data for fast fuzzysearch?
How would I implement weighting certain values higher than others in an approach like this?
One potential solution is to create a materialized view consisting of a union of data from the two (or more tables). Take this simplefied example:
CREATE MATERIALIZED VIEW searchables AS
SELECT
resource_id,
resource_type,
name,
weight
FROM
SELECT
id as resource_id,
'Contact' as resource_type
concat_ws(' ', first_name, last_name) AS name,
1 AS weight
FROM contacts
UNION
SELECT
id as resource_id,
'Organization' as resource_type
name
2 AS weight
FROM organizations
class Searchable < ApplicationRecord
belongs_to :resource, polymorphic: true
def readonly?
true
end
# Search contacts and organziations with a higher weight on contacts
def self.search(name)
where(arel_table[:name].matches(name)).order(weight: :desc)
end
end
Since materialized views are stored in a table like structure you can apply indices just like you could with a normal table:
CREATE INDEX searchables_name_trgm ON name USING gist (searchables gist_trgm_ops);
To ActiveRecord it also behaves just like a normal table.
Of course the complexity here will grow with number of columns you want to search and the end result might end up both underwhelming in functionality and overwhelming in complexity compared to an off the shelf solution with thousands of hours behind it.
The scenic gem can be used to make the migrations for creating materialized views simpler.
Let's say I've got User class with an :email field. And let's say I'm using activeadmin to manage Users.
Making a filter that returns emails that match one string, e.g. "smith", is very simple. In admin/user.rb, I just include the line
filter :email
This gives me a filter widget that does the job.
However, this filter doesn't let me search for the intersection of multiple terms. I can search for emails containing "smith", but not for emails containing both "smith" AND ".edu".
Google tells me that activerecord uses Ransack under the hood, and the Ransack demo has an 'advanced' mode that permits multiple term searches.
What's the easiest way to get a multiple term search widget into activeadmin?
Ideally, I'd like a widget that would allow me to enter smith .edu or smith AND .edu to filter for emails containing both terms.
there is simple solution using ranasckable scopes
So put something like this in your model
class User < ActiveRecord::Base
....
scope :email_includes, ->(search) {
current_scope = self
search.split.uniq.each do |word|
current_scope = current_scope.where('user.email ILIKE ?', "%#{word}%")
end
current_scope
}
def self.ransackable_scopes(auth_object = nil)
[ :email_includes]
end
end
After this you can add filter with AA DSL
Like
filter :email_includes, as: :string, label: "Email"
UPD
should work if change email_contains_any to email_includes
I've figured out a solution but it's not pretty.
The good news is that Ransack has no trouble with multiple terms searches. These searches use the 'predicate' cont_all. The following line works for finding emails containing 'smith' and '.edu'.
User.ransack(email_cont_all: ['smith','.edu'] ).result
Since these searches are easy in Ransack, they're probably straightforward in Activeadmin, right? Wrong! To get them working, I needed to do three things.
I put a custom ransack method (a.k.a. ransacker) into User.rb. I named the ransacker email_multiple_terms.
class User < ActiveRecord::Base
# ...
ransacker :email_multiple_terms do |parent|
parent.table[:path]
end
I declared a filter in my activeadmin dashboard, and associated it with the ransacker. Note that the search predicate cont_all is appended to the ransacker name.
admin/User.rb:
ActiveAdmin.register User do
# ...
filter :email_multiple_terms_cont_all, label: "Email", as: :string
This line creates the filter widget in Activeadmin. We're nearly there. One problem left: Activeadmin sends search queries to ransack as a single string (e.g. "smith .edu"), whereas our ransacker wants the search terms as an array. Somewhere, we need to convert the single string into an array of search terms.
I modified activeadmin to split the search string under certain conditions. The logic is in a method that I added to lib/active_admin/resource_controller/data_access.rb.
def split_search_params(params)
params.keys.each do |key|
if key.ends_with? "_any" or key.ends_with? "_all"
params[key] = params[key].split # turn into array
end
end
params
end
I then called this method inside apply_filtering.
def apply_filtering(chain)
#search = chain.ransack split_search_params clean_search_params params[:q]
#search.result
end
This code is live in my own fork of activeadmin, here: https://github.com/d-H-/activeadmin
So, to get multiple term search working, follow steps 1 and 2 above, and include my fork of A.A. in your Gemfile:
gem 'activeadmin', :git => 'git://github.com/d-H-/activeadmin.git'
HTH.
If anyone's got a simpler method, please share!
Just add three filters to your model:
filter :email_cont
filter :email_start
filter :email_end
It gives you a flexible way to manage your search.
This filter executes next sql code:
SELECT "admin_users".* FROM "admin_users"
WHERE ("admin_users"."email" ILIKE '%smith%' AND
"admin_users"."email" ILIKE '%\.edu')
ORDER BY "admin_users"."id" desc LIMIT 30 OFFSET 0
I expect that exactly what you're looking for.
I receive a list of UserIds(about 1000 at a time) sorted by 'Income'. I have User records in "my system's database" but the 'Income' column is not there. I want to retrieve the Users from "my system's database"
in the Sorted Order as received in the list. I tried doing the following using Active Record expecting that the records would be retrieved in the same order as in the Sorted List but it does not work.
//PSEUDO CODE
User.all(:conditions => {:id => [SORTED LIST]})
I found an answer to a similar question at the link below, but am not sure how to implement the suggested solution using Active Record.
ORDER BY the IN value list
Is there any other way to do it?
Please guide.
Shardul.
Your linked to answer provides exactly what you need, you just need to code it in Ruby in a flexible manner.
Something like this:
class User
def self.find_as_sorted(ids)
values = []
ids.each_with_index do |id, index|
values << "(#{id}, #{index + 1})"
end
relation = self.joins("JOIN (VALUES #{values.join(",")}) as x (id, ordering) ON #{table_name}.id = x.id")
relation = relation.order('x.ordering')
relation
end
end
In fact you could easily put that in a module and mixin it into any ActiveRecord classes that need it, since it uses table_name and self its not implemented with any specific class names.
MySQL users can do this via the FIELD function but Postgres lacks it. However this questions has work arounds: Simulating MySQL's ORDER BY FIELD() in Postgresql
I'm trying to do a simple query of a serialized column, how do you do this?
serialize :mycode, Array
1.9.3p125 :026 > MyModel.find(104).mycode
MyModel Load (0.6ms) SELECT `mymodels`.* FROM `mymodels` WHERE `mymodels`.`id` = 104 LIMIT 1
=> [43565, 43402]
1.9.3p125 :027 > MyModel.find_all_by_mycode("[43402]")
MyModel Load (0.7ms) SELECT `mymodels`.* FROM `mymodels` WHERE `mymodels`.`mycode` = '[43402]'
=> []
1.9.3p125 :028 > MyModel.find_all_by_mycode(43402)
MyModel Load (1.2ms) SELECT `mymodels`.* FROM `mymodels` WHERE `mymodels`.`mycode` = 43402
=> []
1.9.3p125 :029 > MyModel.find_all_by_mycode([43565, 43402])
MyModel Load (1.1ms) SELECT `mymodels`.* FROM `mymodels` WHERE `mymodels`.`mycode` IN (43565, 43402)
=> []
It's just a trick to not slow your application. You have to use .to_yaml.
exact result:
MyModel.where("mycode = ?", [43565, 43402].to_yaml)
#=> [#<MyModel id:...]
Tested only for MySQL.
Basically, you can't. The downside of #serialize is that you're bypassing your database's native abstractions. You're pretty much limited to loading and saving the data.
That said, one very good way to slow your application to a crawl could be:
MyModel.all.select { |m| m.mycode.include? 43402 }
Moral of the story: don't use #serialize for any data you need to query on.
Serialized array is stored in database in particular fashion eg:
[1, 2, 3, 4]
in
1\n 2\n 3\n etc
hence the query would be
MyModel.where("mycode like ?", "% 2\n%")
put space between % and 2.
Noodl's answer is right, but not entirely correct.
It really depends on the database/ORM adapter you are using: for instance PostgreSQL can now store and search hashes/json - check out hstore. I remember reading that ActiveRecord adapter for PostgreSQl now handles it properly. And if you are using mongoid or something like that - then you are using unstructured data (i.e. json) on a database level everywhere.
However if you are using a db that can't really handle hashes - like MySQL / ActiveRecord combination - then the only reason you would use serialized field is for somet data that you can create / write in some background process and display / output on demand - the only two uses that I found in my experience are some reports ( like a stat field on a Product model - where I need to store some averages and medians for a product), and user options ( like their preferred template color -I really don't need to query on that) - however user information - like their subscription for a mailing list - needs to be searchable for email blasts.
PostgreSQL hstore ActiveRecord Example:
MyModel.where("mycode #> 'KEY=>\"#{VALUE}\"'")
UPDATE
As of 2017 both MariaDB and MySQL support JSON field types.
You can query the serialized column with a sql LIKE statement.
MyModel.where("mycode LIKE '%?%'", 43402)
This is quicker than using include?, however, you cannot use an array as the parameter.
Good news! If you're using PostgreSQL with hstore (which is super easy with Rails 4), you can now totally search serialized data. This is a handy guide, and here's the syntax documentation from PG.
In my case I have a dictionary stored as a hash in an hstore column called amenities. I want to check for a couple queried amenities that have a value of 1 in the hash, I just do
House.where("amenities #> 'wifi => 1' AND amenities #> 'pool => 1'")
Hooray for improvements!
There's a blog post from 2009 from FriendFeed that describes how to use serialized data within MySQL.
What you can do is create tables that function as indexes for any data that you want to search.
Create a model that contains the searchable values/fields
In your example, the models would look something like this:
class MyModel < ApplicationRecord
# id, name, other fields...
serialize :mycode, Array
end
class Item < ApplicationRecord
# id, value...
belongs_to :my_model
end
Creating an "index" table for searchable fields
When you save MyModel, you can do something like this to create the index:
Item.where(my_model: self).destroy
self.mycode.each do |mycode_item|
Item.create(my_model: self, value: mycode_item)
end
Querying and Searching
Then when you want to query and search just do:
Item.where(value: [43565, 43402]).all.map(&:my_model)
Item.where(value: 43402).all.map(&:my_model)
You can add a method to MyModel to make that simpler:
def find_by_mycode(value_or_values)
Item.where(value: value_or_values).all.map(&my_model)
end
MyModel.find_by_mycode([43565, 43402])
MyModel.find_by_mycode(43402)
To speed things up, you will want to create a SQL index for that table.
Using the following comments in this post
https://stackoverflow.com/a/14555151/936494
https://stackoverflow.com/a/15287674/936494
I was successfully able to query a serialized Hash in my model
class Model < ApplicationRecord
serialize :column_name, Hash
end
When column_name holds a Hash like
{ my_data: [ { data_type: 'MyType', data_id: 113 } ] }
we can query it in following manner
Model.where("column_name = ?", hash.to_yaml)
That generates a SQL query like
Model Load (0.3ms) SELECT "models".* FROM "models" WHERE (column_name = '---
:my_data:
- :data_type: MyType
:data_id: 113
')
In case anybody is interested in executing the generated query in SQL terminal it should work, however care should be taken that value is in exact format stored in DB. However there is another easy way I found at PostgreSQL newline character to use a raw string containing newline characters
select * from table_name where column_name = E'---\n:my_data:\n- :data_type: MyType\n :data_id: 113\n'
The most important part in above query is E.
Note: The database on which I executed above is PostgreSQL.
To search serialized list you need to prefix and postfix the data with unique characters.
Example:
Rather than something like:
2345,12345,1234567 which would cause issues you tried to search for 2345 instead, you do something like <2345>,<12345>,<1234567> and search for <2345> (the search query get's transformed) instead. Of course choice of prefix/postfix characters depends on the valid data that will be stored. You might instead use something like ||| if you expect < to be used and potentially| to be used. Of course that increases the data the field uses and could cause performance issues.
Using a trigrams index or something would avoid potential performance issues.
You can serialize it like data.map { |d| "<#{d}>" }.join(',') and deserialize it via data.gsub('<').gsub('>','').split(','). A serializer class would do the job quite well to load/extract tha data.
The way you do this is by setting the database field to text and using rail's serialize model method with a custom lib class. The lib class needs to implement two methods:
def self.dump(obj) # (returns string to be saved to database)
def self.load(text) # (returns object)
Example with duration. Extracted from the article so link rot wouldn't get it, please visit the article for more information. The example uses a single value, but it's fairly straightforward to serialize a list of values and deserialize the list using the methods mentioned above.
class Duration
# Used for `serialize` method in ActiveRecord
class << self
def load(duration)
self.new(duration || 0)
end
def dump(obj)
unless obj.is_a?(self)
raise ::ActiveRecord::SerializationTypeMismatch,
"Attribute was supposed to be a #{self}, but was a #{obj.class}. -- #{obj.inspect}"
end
obj.length
end
end
attr_accessor :minutes, :seconds
def initialize(duration)
#minutes = duration / 60
#seconds = duration % 60
end
def length
(minutes.to_i * 60) + seconds.to_i
end
end
If you have serialized json column and you want to apply like query on that. do it like that
YourModel.where("hashcolumn like ?", "%#{search}%")
How would i do a query like this.
i have
#model = Model.near([latitude, longitude], 6.8)
Now i want to filter another model, which is associated with the one above.
(help me with getting the right way to do this)
model2 = Model2.where("model_id == :one_of_the_models_filtered_above", {:one_of_the_models_filtered_above => only_from_the_models_filtered_above})
the model.rb would be like this
has_many :model2s
the model2.rb
belongs_to :model
Right now it is like this (after #model = Model.near([latitude, longitude], 6.8)
model2s =[]
models.each do |model|
model.model2s.each do |model2|
model2.push(model2)
end
end
I want to accomplish the same thing, but with an active record query instead
i think i found something, why does this fail
Model2.where("model.distance_from([:latitude,:longitude]) < :dist", {:latitude => latitude, :longitude => longitude, :dist => 6.8})
this query throws this error
SQLite3::SQLException: near "(": syntax error: SELECT "tags".* FROM "tags" WHERE (model.distance_from([43.45101666666667,-80.49773333333333]) < 6.8)
, why
use includes. It will eager-load associated models (only two SQL queries instead of N+1).
#models = Model.near( [latitude, longitude], 6.8 ).includes( :model2s )
so when you will do #models.first.model2s, associated model2s will already be loaded (see RoR guides for more info).
If you want to get an array of all model2s belonging to your collection of models, you can do :
#models.collect( &:model2s )
# add .flatten at the end of the chain if you want a one level deep array
# add .uniq at the end of the chain if you don't want duplicates
collect (also called map) will gather in an array the result of any block passed to each of the caller's elements (this does exactly the same as your code, see Enumerable's doc for more info). The & before the symbol converts it into a Proc passed to each element of the collection, so this is the same as writing
#models.collect {|model| model.model2s }
one more thing : #mu is right, seems SQLite does not know about your distance_from stored procedure. As i suspect this is a GIS related question, you may ask about this particular issue on gis.stackexchange.com