ElasticSearch search on many types - ruby-on-rails

I'm using Rails with the Tire gem (for ElasticSearch) and I need to search across multiple models. Something like:
# title is a field in all models
Tire.search :tasks, :projects, :posts, { :title => "word" }
I know I can search models one by one and then handle these results, but that should be unecessary considering ElasticSearch(Lucene) is document oriented.
Any thoughts?
Thanks,

One possibility is to see them not as distinct models. A compound model could be that every document can be an item belonging to one or many differnt submodels identified by a string constant which can be multivalued.
If you want to retrieve only results from one of those submodels you could add a fixed part to the query which identifies the set of documents belonging to this submodel.
The only caveeat is that you need to have a primary key which is unique(which is not that bad because you can use something like an implicit document key).

Related

Rails Search One Column On Multiple Saved Terms (Saved Searches In Model)

One table, one column ('headline' in an RSS feed reader). On the front end, I want a text area in which I can enter a comma-separated list of search terms, some multi-word, like for 'politics':
rajoy, pp, "popular party", "socialist party", etc
This could either be stored as part of a separate search model or as a keyword column on the 'category' or 'story' models, so they can be edited and improved with different terms from the front end, as a story develops.
In the RSS reader, have a series of links, one for each story or category, that, on being clicked return the headlines that contain one (or more) of the search terms from the stored list.
In a later version, it would be good to find headlines containing several of the terms in the list, but let's start simple.
Have been doing lots of reading about postgres, rails, different types of searches and queries, but can't seem to find what I want, which I understand is basically "search 'headlines' column against this list of search terms".
Sounds like it might be an array thing that's more to do with controllers in Rails than postgres, or cycling through a giant OR query with some multi-word terms, but I'm not sure.
Does anyone have any better pointers about how to start?
Users
If this will be user specific, I would start with a User model that is responsible for persisting each unique set of search terms. Think logon or session.
Assuming you use the Category method mentioned before, and assuming there's a column called name. Each search term would be stored as a separate instance in the database. Think tags.
headlines that contain one (or more) of the search terms from the stored list
Categories
Since each Category has many terms, and all the queries are going to be OR queries, a model that joins the User and Category, storing a search term would be appropriate.
I'm also assuming you have a Story model that contains the actual stories, although this may not be persisted in the database. I'm predicting your story model has a heading and a body.
Terminal Console
rails generage model SearchTerm query:string user:references category:references && rake db:migrate
Models
On your existing User and Category models you would add:
# app/models/user.rb
has_many :search_terms
has_many categories, through: :search_terms
# app/models/category.rb
has_many :search_terms
has_many :stories
Rails Console
This will automatically make it possible for you to do this:
#user = User.last # this is in the console, just to demonstrate
#category = Category.find_by_name("politics")
#user.search_terms.create {query: "rajoy", category: #category}
#user.search_terms.create {query: "pp", category: #category}
#user.search_terms.where(category_id: #category.id).pluck(:query)
-> ['rajoy', 'pp']
Controllers
What you will want to do with your controller (probably the Category controller) is to parse your text field and update the search terms in the database. If you want to require commas and spaces to separate fields, you could do:
#user.search_terms.where(category: #category).delete_all
params[:search_term][:query].split(", ").map{|x| x.gsub("\"", "")}.each do |term|
#user.search_terms.create({category: #category, query: term})
end
Front End
Personally though, I'd make the front end a bit less complicated to use, like either just require commas, no quotes, or just require spaces and quotes.
Search
For the grand finale, for the Stories to be displayed that have search terms in their heading:
Story.where(#user.search_terms.where(category: #category).pluck(:query).map { |term| "heading like '%#{term}%'" }.join(" OR "))
I would recommend using pg_search gem rather than trying to maintain complicated queries like this.
Note: I'm sure there are errors in this, since I wasn't able to actually create the entire app to answer your questions. I hope this helps you get started with what you actually need to do. I encourage you as you work through this to post questions that have some code.
References
Rails guides: choosing habtm or has many through
gem 'pg_search'
Stack Overflow: Search a database based on query

Rails Different types of Model have different number of and type of fields

In rails web app, user can "make" a Document. There are different types of documents :
Loan
Business
Insurance
Each type of Document will have some things in common such as: account_num, doc_id, at least 1 name, but then they have different attributes.
For example:
Loan is only doc with a loan_type field
Business documents can have 1+ name attributes
If these docs may have different number of attributes, do they need to be completely separate models, or is there a way to incorporate a doc_type attribute for Document which then reveals what, and how many, attributes are associated with the Document? If so, what would that look like?
What you're describing is the express purpose of single-table inheritance in Rails.
Use one table with a super-set of all the fields from all the models. Add a type column, and then create your three models, inheriting from a base model, and you're pretty much done.
Depends on what you're going to need, but in general if your models have a strong commonality that part can all be in the same table, and include a type column that specifies the class name. This is called single table inheritance.
Any differences between the models give you some interesting options. If there are only a few differences, the columns could simply be included. If there are several, or the columns in question may be only sparsely populated, you can introduce a new table for the extra columns that belongs_to one of the models. For example, you could have an alternate_names table for businesses.
class AlternateNames < ActiveRecord::Base
belongs_to :business
end
In the unlikely case that you don't need to search on the extra data, you can even keep it in the same table, with a column named something like extra_data, and serialize a hash of extra attributes. Each class can handle this data as appropriate.
class Document < ActiveRecord::Base
# your code
serializes :extra_data
end
class Business < Document
def names
[name] + extra_data[:names]
end
end

rails semi-complex STI with ancestry data model planning the routes and controllers

I'm trying to figure out the best way to manage my controller(s) and models for a particular use case.
I'm building a review system where a User may build a review of several distinct types with a Polymorphic Reviewable.
Country (has_many reviews & cities)
Subdivision/State (optional, sometimes it doesnt exist, also reviewable, has_many cities)
City (has places & review)
Burrow (optional, also reviewable ex: Brooklyn)
Neighborhood (optional & reviewable, ex: williamsburg)
Place (belongs to city)
I'm also wondering about adding more complexity. I also want to include subdivisions occasionally... ie for the US, I might add Texas or for Germany, Baveria and have it be reviewable as well but not every country has regions and even those that do might never be reviewed. So it's not at all strict. I would like it to as simple and flexible as possible.
It'd kinda be nice if the user could just land on one form and select either a city or a country, and then drill down using data from say Foursquare to find a particular place in a city and make a review.
I'm really not sure which route I should take? For example, what happens if I have a Country, and a City... and then I decide to add a Burrow?
Could I give places tags (ie Williamsburg, Brooklyn) belong_to NY City and the tags belong to NY?
Tags are more flexible and optionally explain what areas they might be in, the tags belong to a city, but also have places and be reviewable?
So I'm looking for suggestions for anyone who's done something related.
Using Rails 3.2, and mongoid.
I've built something very similar and found two totally different way that both worked well.
Way 1: Country » Subcountry » City » Neighborhood
The first way that worked for me is to do it with Country, Subcountry, City, Neighborhood. This maps well to major geocoding services and is sufficient for most simple uses. This can be STI (as in your example) or with multiple tables (how I did it).
In your example you wrote "Subdivision/State". My two cents is to avoid using those terms and instead use "Subcountry" because it's an ISO standard and you'll save yourself some confusion when another developer thinks a subdivision is a tiny neighborhood of houses, or when you have a non-U.S. country that doesn't use states, but instead uses provinces.
This is what I settled on after many experiments with trying model names like Region, District, Area, Zone, etc. and abandoning these as too vague or too specific. In your STI case it may be fine to use more names.
One surprise is that it's a big help to write associations that go multi-level, for example to say country.cities (skipping subcountry). This is because sometimes the intermediary model doesn't exist (i.e. there's no subcountry). In your STI, this may be trickier.
Also you get a big speedup if you denormalize your tables, so for example my city table has a country field. This makes updating info a bit trickier but it's worth it. Your STI could inmplement an equivalent to this by using tags.
Way 2: Zones that are lists of lat/lng shapes with bounding boxes
The second way is to use an arbitrary Zone model and store latitude longitude shapes. This gives you enormous flexibility, and you can pre-calculate when shapes contain other shapes, or intersect them. So your "drill down" becomes "show me shapes immediately inside this one".
Postgres has some good geocoding helpers for this, and you can speed up lookups by doing bounding boxes of min/max lat/lng. We also stored data like the expected center point of a Zone (which is where we would drop a pin on a map), and a radius (useful for calculating queries like "show me all the x items within y distance).
With this approach we were able to do interesting zones like "Broadway in New York" which isn't really a neighborhood so much as long street, and "The Amazon Basin" which is defined by the river, not a country.
STI Model with Ancestry and with Polymprphic Relation
I built something similar for previous projects, and went for STI with ancestry because it is very flexible and allows to model a tree of nodes. Not all intermediate nodes have to exist (as in your example of State/Subdivision/Subcountry).
For Mongoid there are at least two ancestry gems: mongoid-ancestry and mongestry (links below).
As an added benefit of using STI with ancestry, you can also model other location-related nodes, let's say restaurants or other places.
You can also add geo-location information lat/lon to all your nodes, so you can geo-tag them.
In the example below I just used one set of geo-location coordinates (center point) - but you could of course also add several geo-locations to model a bounding box.
You can arrange the nodes in any order you like, e.g. through this_node.children.create(...) .
When using MySQL with ancestry, you can pass-in the type of the newly created node. There must be a similar way with mongoid-ancestry (haven't tried it).
In addition to the tree-structured nodes, you can use a polymorphic collection to model the Reviews, and also Tags (well, there's a gem for acts_as_taggable, so you don't have to models Tags yourself).
Compared to modeling every class with it's own collection, this STI approach is much more flexible and keeps the schema simple. It's very easy to add a new type of node later.
This paradigm can be used with either Mongoid or SQL data stores.
# app/models/geo_node.rb
class GeoNode # this is the parent class; geo_nodes is the table name / collection name.
include Mongoid::Document
has_ancestry # either this
has_mongestry # or this
has_many :reviews, :as => :reviewable
field :lat, type: Float
field :lon, type: Float
field :name, type: String
field :desc, type: String
# ...
end
# app/models/geo_node/country.rb
class Country < GeoNode
end
# app/models/geo_node/subcountry.rb
Class Subcountry < GeoNode
end
# app/models/geo_node/city.rb
class City < GeoNode
end
# app/models/review.rb
class Review
include Mongoid::Document
belongs_to :reviewable, :polymorphic => true
field :title
field :details
end
Check these links:
mongoid-ancestry gem https://github.com/skyeagle/mongoid-ancestry
mongestry gem https://github.com/DailyDeal/mongestry
mongoid-tree gem https://github.com/benedikt/mongoid-tree
Gist on Mongoid STI: https://gist.github.com/507721
ancestry gem (for MySQL)
A big thanks to Stefan Kroes for his awesome ancestry gem, and to Anton Orel for adapting it to Mongoid (mongoid-ancestry). ancestry is of the most useful gems I've seen.
Sounds like a good candidate for nested routes/resources. In routes.rb, do something like:
resources :cities do
resources :reviews
end
resources :countries do
resources :reviews
end
resources :places do
resources :reviews
end
Which should produce something along the lines of rake routes:
reviews_cities GET /cities/:id/reviews {:controller=>"reviews", :action=>"index"}
reviews_countries GET /countries/:id/reviews {:controller=>"reviews", :action=>"index"}
reviews_places GET /countries/:id/reviews {:controller=>"reviews", :action=>"index"}
...etc., etc.
In the controller action, you lookup match up the :id of reviewable record, and only send back reviews that are attached to that reviewable object.
Also, see the nested resources section of the Rails Routing Guide, and this RailsCast on Polymorphic relationships, which has a quick section on routing, and getting everything to line up properly.
I would probably keep my data model very unrestrictive, and handle any specifics related to what filters to display in the controller/view. Make a mapping table where you can map attributes (i.e. city, borough, state, country) polymorphically, also polymorphically to reviewable.
By assuming many-to-many, your schema is as flexible as it can be, and you can restrict which mappings to create using validations or filters in your models.
It's basically using tagging, like you eluded, but not really using a tags model per-se, but rather a polymorphic association to different models that all act like tags.
Keep your DB schema clean and keep the business logic in ruby.

Rails tokenized text search across fields with performance in mind

I have a Rails app on Heroku that I'm looking to increase the user-friendlyness of the search for. To do this, I'd like to allow them to text search across multiple fields on multiple models through associations. The input from the user could be a mix of text from any of these fields (and often might span multiple fields) in no particular order.
Example: if you had a car database and wanted to allow the user to search "Honda Fit 2011", where "Honda" came from the manufacturer table, "Fit" came from the model table, and "2011" came from the model_year table.
I'm thinking that I need to build a single field on the root record that contains the unique list of words from each of these fields, and then tokenize the user's input. But that would cause me to use an IN clause, which I'm not sure could benefit from full-text search plugins like pg_search.
So, my question is what's a good way to active a search like this in Rails?
I would take a look at Sunspot_rails. It uses Solr as it's search engine, but allows you index content in all sorts of fruity ways. For instance, I have models indexed with their associations pretty simply:
searchable do
text :description
text :category do
category.present? ? category.name : ''
end
end
You can then search with:
TYPES = [Asset,Product]
Sunspot.search(*TYPES) do |q|
q.fulltext search_str
end

A database design for variable column names

I have a situation that involves Companies, Projects, and Employees who write Reports on Projects.
A Company owns many projects, many reports, and many employees.
One report is written by one employee for one of the company's projects.
Companies each want different things in a report. Let's say one company wants to know about project performance and speed, while another wants to know about cost-effectiveness. There are 5-15 criteria, set differently by each company, which ALL apply to all of that company's project reports.
I was thinking about different ways to do this, but my current stalemate is this:
To company table, add text field criteria, which contains an array of the criteria desired in order.
In the report table, have a company_id and columns criterion1, criterion2, etc.
I am completely aware that this is typically considered horrible database design - inelegant and inflexible. So, I need your help! How can I build this better?
Conclusion
I decided to go with the serialized option in my case, for these reasons:
My requirements for the criteria are simple - no searching or sorting will be required of the reports once they are submitted by each employee.
I wanted to minimize database load - where these are going to be implemented, there is already a large page with overhead.
I want to avoid complicating my database structure for what I believe is a relatively simple need.
CouchDB and Mongo are not currently in my repertoire so I'll save them for a more needy day.
This would be a great opportunity to use NoSQL! Seems like the textbook use-case to me. So head over to CouchDB or Mongo and start hacking.
With conventional DBs you are slightly caught in the problem of how much to normalize your data:
A sort of "good" way (meaning very normalized) would look something like this:
class Company < AR::Base
has_many :reports
has_many :criteria
end
class Report < AR::Base
belongs_to :company
has_many :criteria_values
has_many :criteria, :through => :criteria_values
end
class Criteria < AR::Base # should be Criterion but whatever
belongs_to :company
has_many :criteria_values
# one attribute 'name' (or 'type' and you can mess with STI)
end
class CriteriaValues < AR::Base
belongs_to :report
belongs_to :criteria
# one attribute 'value'
end
This makes something very simple and fast in NoSQL a triple or quadruple join in SQL and you have many models that pretty much do nothing.
Another way is to denormalize:
class Company < AR::Base
has_many :reports
serialize :criteria
end
class Report < AR::Base
belongs_to :company
serialize :criteria_values
def criteria
self.company.criteria
end
# custom code here to validate that criteria_values correspond to criteria etc.
end
Related to that is the rather clever way of serializing at least the criteria (and maybe values if they were all boolean) is using bit fields. This basically gives you more or less easy migrations (hard to delete and modify, but easy to add) and search-ability without any overhead.
A good plugin that implements this is Flag Shih Tzu which I've used on a few projects and could recommend.
Variable columns (eg. crit1, crit2, etc.).
I'd strongly advise against it. You don't get much benefit (it's still not very searchable since you don't know in which column your info is) and it leads to maintainability nightmares. Imagine your db gets to a few million records and suddenly someone needs 16 criteria. What could have been a complete no-issue is suddenly a migration that adds a completely useless field to millions of records.
Another problem is that a lot of the ActiveRecord magic doesn't work with this - you'll have to figure out what crit1 means by yourself - now if you wan't to add validations on these fields then that adds a lot of pointless work.
So to summarize: Have a look at Mongo or CouchDB and if that seems impractical, go ahead and save your stuff serialized. If you need to do complex validation and don't care too much about DB load then normalize away and take option 1.
Well, when you say "To company table, add text field criteria, which contains an array of the criteria desired in order" that smells like the company table wants to be normalized: you might break out each criterion in one of 15 columns called "criterion1", ..., "criterion15" where any or all columns can default to null.
To me, you are on the right track with your report table. Each row in that table might represent one report; and might have corresponding columns "criterion1",...,"criterion15", as you say, where each cell says how well the company did on that column's criterion. There will be multiple reports per company, so you'll need a date (or report-number or similar) column in the report table. Then the date plus the company id can be a composite key; and the company id can be a non-unique index. As can the report date/number/some-identifier. And don't forget a column for the reporting-employee id.
Any and every criterion column in the report table can be null, meaning (maybe) that the employee did not report on this criterion; or that this criterion (column) did not apply in this report (row).
It seems like that would work fine. I don't see that you ever need to do a join. It looks perfectly straightforward, at least to these naive and ignorant eyes.
Create a criteria table that lists the criteria for each company (company 1 .. * criteria).
Then, create a report_criteria table (report 1 .. * report_criteria) that lists the criteria for that specific report based on the criteria table (criteria 1 .. * report_criteria).

Resources