Rails tokenized text search across fields with performance in mind - ruby-on-rails

I have a Rails app on Heroku that I'm looking to increase the user-friendlyness of the search for. To do this, I'd like to allow them to text search across multiple fields on multiple models through associations. The input from the user could be a mix of text from any of these fields (and often might span multiple fields) in no particular order.
Example: if you had a car database and wanted to allow the user to search "Honda Fit 2011", where "Honda" came from the manufacturer table, "Fit" came from the model table, and "2011" came from the model_year table.
I'm thinking that I need to build a single field on the root record that contains the unique list of words from each of these fields, and then tokenize the user's input. But that would cause me to use an IN clause, which I'm not sure could benefit from full-text search plugins like pg_search.
So, my question is what's a good way to active a search like this in Rails?

I would take a look at Sunspot_rails. It uses Solr as it's search engine, but allows you index content in all sorts of fruity ways. For instance, I have models indexed with their associations pretty simply:
searchable do
text :description
text :category do
category.present? ? category.name : ''
end
end
You can then search with:
TYPES = [Asset,Product]
Sunspot.search(*TYPES) do |q|
q.fulltext search_str
end

Related

Rails Search One Column On Multiple Saved Terms (Saved Searches In Model)

One table, one column ('headline' in an RSS feed reader). On the front end, I want a text area in which I can enter a comma-separated list of search terms, some multi-word, like for 'politics':
rajoy, pp, "popular party", "socialist party", etc
This could either be stored as part of a separate search model or as a keyword column on the 'category' or 'story' models, so they can be edited and improved with different terms from the front end, as a story develops.
In the RSS reader, have a series of links, one for each story or category, that, on being clicked return the headlines that contain one (or more) of the search terms from the stored list.
In a later version, it would be good to find headlines containing several of the terms in the list, but let's start simple.
Have been doing lots of reading about postgres, rails, different types of searches and queries, but can't seem to find what I want, which I understand is basically "search 'headlines' column against this list of search terms".
Sounds like it might be an array thing that's more to do with controllers in Rails than postgres, or cycling through a giant OR query with some multi-word terms, but I'm not sure.
Does anyone have any better pointers about how to start?
Users
If this will be user specific, I would start with a User model that is responsible for persisting each unique set of search terms. Think logon or session.
Assuming you use the Category method mentioned before, and assuming there's a column called name. Each search term would be stored as a separate instance in the database. Think tags.
headlines that contain one (or more) of the search terms from the stored list
Categories
Since each Category has many terms, and all the queries are going to be OR queries, a model that joins the User and Category, storing a search term would be appropriate.
I'm also assuming you have a Story model that contains the actual stories, although this may not be persisted in the database. I'm predicting your story model has a heading and a body.
Terminal Console
rails generage model SearchTerm query:string user:references category:references && rake db:migrate
Models
On your existing User and Category models you would add:
# app/models/user.rb
has_many :search_terms
has_many categories, through: :search_terms
# app/models/category.rb
has_many :search_terms
has_many :stories
Rails Console
This will automatically make it possible for you to do this:
#user = User.last # this is in the console, just to demonstrate
#category = Category.find_by_name("politics")
#user.search_terms.create {query: "rajoy", category: #category}
#user.search_terms.create {query: "pp", category: #category}
#user.search_terms.where(category_id: #category.id).pluck(:query)
-> ['rajoy', 'pp']
Controllers
What you will want to do with your controller (probably the Category controller) is to parse your text field and update the search terms in the database. If you want to require commas and spaces to separate fields, you could do:
#user.search_terms.where(category: #category).delete_all
params[:search_term][:query].split(", ").map{|x| x.gsub("\"", "")}.each do |term|
#user.search_terms.create({category: #category, query: term})
end
Front End
Personally though, I'd make the front end a bit less complicated to use, like either just require commas, no quotes, or just require spaces and quotes.
Search
For the grand finale, for the Stories to be displayed that have search terms in their heading:
Story.where(#user.search_terms.where(category: #category).pluck(:query).map { |term| "heading like '%#{term}%'" }.join(" OR "))
I would recommend using pg_search gem rather than trying to maintain complicated queries like this.
Note: I'm sure there are errors in this, since I wasn't able to actually create the entire app to answer your questions. I hope this helps you get started with what you actually need to do. I encourage you as you work through this to post questions that have some code.
References
Rails guides: choosing habtm or has many through
gem 'pg_search'
Stack Overflow: Search a database based on query

Postgres HStore vs HABTM

I am building an app that has and model that can be tagged with entries from another model, similar to the tagging function of Stackoverflow.
For example:
class Question < ActiveRecord::Base
has_and_belongs_to_many :tags
end
class Tag < ActiveRecord::Base
has_and_belongs_to_many :questions
end
I am debating between just setting up a has_and_belongs_to_many relationship with a join table, or adding the tags to a hash using Postgres' hstore feature.
Looking for anyone that has had a similar experience that can speak to performance differences, obstacles, or anything else that should persuade me one way or another.
EDIT:
I think I should also mention that this will be a API that will be using an AngularJS frontend.
You are describing the topic of a great debate:) Normalization vs denormalization. Using many to many allows you to do nice queries such as "how many people use a certain tag" in a very simple way. HStore is very nice as well but you end up with thousands of the same tags everywhere. I use both approaches in different projects but the real problem comes when you decide one day to move your database. With Hstore you will be stuck to postgresql or have to rewrite your code. If super high speed is important as well as querying different ways and you often want to load a user record in one fellow swoop as fast as possible and show all used tags I normally do both: create a many to many relationship as tags are normally also connected to more objects like user has many tags from tags table and tags are connected to let's say brands which are connected to products and so on.
Then I create an additional field with hstore or json objects on the user table which adds every tag or removes it when the many to many relationship is destroyed.
To give you an example: in on of my projects I have companies (almost 10 million) who are interested in certain keywords and their ranking on google. This table has millions of rows but connected only to 2 million keywords which are connected to search results. This way I can quickly query which result is searched for by how many people and who they are.
If a customer opens their key word search page I load their keywords from a text column with json which is faster than going through the table.

How to group by multiple attributes on children, and then count?

In a Rails 3.2 app I have a User model that has many Awards.
The Award class has :type, :level and :image attributes.
On a User's show page I want to show their Awards, but with some criteria. User.awards should be grouped by both type and level, and for each type-level combination I want to display its image, and a count of the awards.
I'm struggling to construct the queries and views to achieve this (and to explain this clearly).
How can I group on two attributes of a child record, and then display both a count and attribute (i.e. image) of those children?
It took me some time to figure this out because of the complicated mix of active record objects, arrays and grouped arrays.
Anyway, incase this is useful for anyone else
Given a User has many Awards, and Award has attributes :type, :level, :image.
for award in #user.awards.group_by{ |award| [award.type,award.level] }.sort_by{|award| [award[0][0], award[0][1]]}
puts "#{(award[0][0]).capitalize} - Level #{award[0][1]}" # e.g. Award_Name - Level 1
puts award[1].first.image #outputs the value of award.image, i.e. the image url
puts award[1].count #counts the number of grouped awards
end
A bit fiddly! Maybe there are ways to optimize this code?
Depending on the database you're using you have to build a custom SQL query using a GROUP BY on type and level:
SELECT * FROM users GROUP BY users.type, users.level
(Postgres has a special interpretation of the GROUP BY so check the document of the database you're using).
To write it in Rails read the documentation: http://guides.rubyonrails.org/active_record_querying.html#group
For the count you'll have to do it in a second step (Ruby could do it using the size method on the Array of ActiveRecord object the query will return you).

ElasticSearch search on many types

I'm using Rails with the Tire gem (for ElasticSearch) and I need to search across multiple models. Something like:
# title is a field in all models
Tire.search :tasks, :projects, :posts, { :title => "word" }
I know I can search models one by one and then handle these results, but that should be unecessary considering ElasticSearch(Lucene) is document oriented.
Any thoughts?
Thanks,
One possibility is to see them not as distinct models. A compound model could be that every document can be an item belonging to one or many differnt submodels identified by a string constant which can be multivalued.
If you want to retrieve only results from one of those submodels you could add a fixed part to the query which identifies the set of documents belonging to this submodel.
The only caveeat is that you need to have a primary key which is unique(which is not that bad because you can use something like an implicit document key).

Asking ThinkingSphinx not to search in specific attributes

In a model,
I have defined indexes on some columns and have defined some attributes.
On some external application conditions, I don't want to sphinx to search in some columns values.
In the question name, do you mean attributes or fields? I'm guessing fields, as attributes are only used for filters...
So, to search on just specific fields, you can make a query like the following:
Model.search "#(title, body, user) foo bar", :match_mode => :extended
Put all the fields you want to search on within the parentheses, and you should be good to go.

Resources