I'm working with a Rails 3 application to allow people to apply for grants and such. We're using Elasticsearch/Tire as a search engine.
Documents, e.g., grant proposals, are composed of many answers of varying types, like contact information or essays. In AR, (relational dbs in general) you can't specify a polymorphic "has_many" relation directly, so instead:
class Document < ActiveRecord::Base
has_many :answerings
end
class Answering < ActiveRecord::Base
belongs_to :document
belongs_to :question
belongs_to :payload, :polymorphic => true
end
"Payloads" are models for individual answer types: contacts, narratives, multiple choice, and so on. (These models are namespaced under "Answerable.")
class Answerable::Narrative < ActiveRecord::Base
has_one :answering, :as => :payload
validates_presence_of :narrative_content
end
class Answerable::Contact < ActiveRecord::Base
has_one :answering, :as => :payload
validates_presence_of :fname, :lname, :city, :state, :zip...
end
Conceptually, the idea is an answer is composed of an answering (functions like a join table, stores metadata common to all answers) and an answerable (which stores the actual content of the answer.) This works great for writing data. Search and retrieval, not so much.
I want to use Tire/ES to expose a more sane representation of my data for searching and reading. In a normal Tire setup, I'd wind up with (a) an index for answerings and (b) separate indices for narratives, contacts, multiple choices, and so on. Instead, I'd like to just store Documents and Answers, possibly as parent/child. The Answers index would merge data from Answerings (id, question_id, updated_at...) and Answerables (fname, lname, email...). This way, I can search Answers from a single index, filter by type, question_id, document_id, etc. The updates would be triggered from Answering, but each answering will then pull in information from its answerable. I'm using RABL to template my search engine inputs, so that's easy enough.
Answering.find(123).to_indexed_json # let's say it's a narrative
=> { id: 123, question_id: 10, :document_id: 24, updated_at: ..., updated_by: root#me.com, narrative_content: "Back in the day, when I was a teenager, before I had...", answerable_type: "narrative" }
So, I have a couple of questions.
The goal is to provide a single-query solution for all answers, regardless of underlying (answerable) type. I've never set something like this up before. Does this seem like a sane approach to the problem? Can you foresee wrinkles I can't? Alternatives/suggestions/etc. are welcome.
The tricky part, as I see it, is mapping. My plan is to put explicit mappings in the Answering model for the fields that need indexing options, and just let the default mappings take care of the rest:
mapping do
indexes :question_id, :index => :not_analyzed
indexes :document_id, :index => :not_analyzed
indexes :narrative_content, :analyzer => :snowball
indexes :junk_collection_total, :index => :not_analyzed
indexes :some_other_crazy_field, :index
[...]
If I don't specify a mapping for some field, (say, "fname") will Tire/ES fall back on dynamic mapping? (Should I explicitly map every field that will be used?)
Thanks in advance. Please let me know if I can be more specific.
Indexing is the right way to go about this. Along with indexing field names, you can index the results of methods.
mapping do
indexes :payload_details, :as => 'payload_details', :analyzer => 'snowball',:boost => 0
end
def payload_details
"#{payload.fname} #{payload.lname}" #etc.
end
The indexed value becomes a duck type, so if you index all of the values that you reference in your view, the data will be available. If you access an attribute that is not indexed on the model of the indexed item, it will grab the instance from ActiveRecord, if you access an attribute of a related model, I am pretty sure you get a reference error, but the dynamic finder may take over.
Related
I have a Gift model:
class Gift
include Mongoid::Document
include Mongoid::Timestamps
has_many :gift_units, :inverse_of => :gift
end
And I have a GiftUnit model:
class GiftUnit
include Mongoid::Document
include Mongoid::Timestamps
belongs_to :gift, :inverse_of => :gift_units
end
Some of my gifts have gift_units, but others have not. How do I query for all the gifts where gift.gift_units.size > 0?
Fyi: Gift.where(:gift_units.exists => true) does not return anything.
That has_many is an assertion about the structure of GiftUnit, not the structure of Gift. When you say something like this:
class A
has_many :bs
end
you are saying that instance of B have an a_id field whose values are ids for A instances, i.e. for any b which is an instance of B, you can say A.find(b.a_id) and get an instance of A back.
MongoDB doesn't support JOINs so anything in a Gift.where has to be a Gift field. But your Gifts have no gift_units field so Gift.where(:gift_units.exists => true) will never give you anything.
You could probably use aggregation through GiftUnit to find what you're looking for but a counter cache on your belongs_to relation should work better. If you had this:
belongs_to :gift, :inverse_of => :gift_units, :counter_cache => true
then you would get a gift_units_count field in your Gifts and you could:
Gift.where(:gift_units_count.gt => 0)
to find what you're looking for. You might have to add the gift_units_count field to Gift yourself, I'm finding conflicting information about this but I'm told (by a reliable source) in the comments that Mongoid4 creates the field itself.
If you're adding the counter cache to existing documents then you'll have to use update_counters to initialize them before you can query on them.
I tried to find a solution for this problem several times already and always gave up. I just got an idea how this can be easily mimicked. It might not be a very scalable way, but it works for limited object counts. The key to this is a sentence from this documentation where it says:
Class methods on models that return criteria objects are also treated like scopes, and can be chained as well.
So, get this done, you can define a class function like so:
def self.with_units
ids = Gift.all.select{|g| g.gift_units.count > 0}.map(&:id)
Gift.where(:id.in => ids)
end
The advantage is, that you can do all kinds of queries on the associated (GiftUnits) model and return those Gift instances, where those queries are satisfied (which was the case for me) and most importantly you can chain further queries like so:
Gift.with_units.where(:some_field => some_value)
My task is to abstract/inherit an active record class. I'm making a blog where post is a base super with title, slug dates etc... all the redundant stuff you would expect to find.
Here's where things take a turn, I want to sub class out Post into many other sub post types such as audio post, video post, image post, vanilla post. I think you get the point. Obviously each sub type will have their own respective attributes and members.
Instead of creating a name, slug, etc., for each sub post type, what is the best practice to inherit or possibly interface the base class? ("I do favor composition over inheritance")
Once I figure out how to properly abstract out my models, I would like to then figure out some polymorphic way to say something like Blog.find(1).posts and get an array of all the posts types.
I realize that this may not be performance optimal to query all the post types in a polymorphic way so feel free to seguest a better way.
While I personally also prefer composition over inheritance, ActiveRecord does not. In this case, if you want to use tools that ActiveRecord offers, you should take a look at Single Table Inheritance, which would take care of both of your questions. It does use inheritance, however.
Switching to a non-ActiveRecord orm may offer you a way of doing this without having to do everything via inheritance. I've used DataMapper, which prefers composition, with success in the past, but it isn't as feature-packed as ActiveRecord and may not offer what you need.
Other then single table inheritance, you may also consider using has_one association.
All your sub-types has one post-info, which is the general post name, slug etc (and a post-info belongs to a sub-type polymorphically).
In this way, you would have a table of post-info, and tables for every sub-types.
However, in the model you have to do a little bit more handling:
class PostInfo < ActiveRecord::Base
belongs_to :post, :polymorphic => true
# will need these 2 fields: :post_id, :post_type (might be AudioPost, ImagePost, etc)
end
class AudioPost < ActiveRecord::Base
has_one :post_info, :as => :post
# you may also want these:
accept_nested_attributes_for :post_info
delegate :name, :slug, :posted_at, :to => :post_info
end
So now if you want to get all the posts, you may:
Blog.find(1).post_infos
post_info.post # => audio_post, image_post, or whatever depending on post_type
If you don't want to use .post_infos, you may also change all those names, such as:
class Post < ActiveRecord::Base
belongs_to :actual_post # actual_post_id, actual_post_type
end
class AudioPost < ActiveRecord::Base
has_one :post, :as => :actual_post
accept_nested_attributes_for :post
delegate :name, :slug, :posted_at, :to => :post
end
Now, you have:
posts = Blog.find(1).posts
actual_post = posts.first.actual_post # => an audio_post instance
actual_post.name # => same as actual_post.post.name, so you do not need the name field in the AudioPost model
I have got two classes
class Claim
include Mongoid::Document
embeds_many :claim_fields
belongs_to :user
...
end
class ClaimField
include Mongoid::Document
embedded_in :claim
field :title
field :value
...
end
I need to fetch all unique values for claim_fields with current title for my claim through db (not by Ruby - it is too slow for thousands of records)
I've already tryied this
user = User.find(...)
Claim.collection.distinct("claim_fields.value", {:user_id => user.id, "claim_fields.title" => some_title})
# that is the same as user.claims.find(...).distinct("claim_fields.value")
But it returns ALL claim_fields values, and I need it to return only values for claim_fields with title that I need.
PS looks that I need some MapReduce here
The fundamental problem here is that MongoDB queries only return entire documents. You are filtering on claim_fields.title, but the system is returning all Claim documents that match.
You're doing a distinct, but MongoDB treats sub-objects and Documents differently. As a result, the distinct is probably not doing what you want it to.
There are two possible solutions here:
Pre-calculate via M/R. (as you suggest)
Break these out into two collections.
Regarding #2, there is no requirement for embedding objects as you have. Embedded should be done based on the queries you plan to perform most. So if this is a common query, then it's fair to make these separate documents.
I have three models, basically:
class Vendor
has_many :items
end
class Item
has_many :sale_items
belongs_to :vendor
end
class SaleItem
belongs_to :item
end
Essentially, each sale_item points to a specific item (but has an associated quantity and sale price which might be different from the item's base price, hence the separate model), and each item is made by a specific vendor.
I'd like to sort all sale_items by vendor name, but this means going through the associated item, because that's where the association is.
My first attempt was to change SaleItem to the following:
class SaleItem
belongs_to :item
has_one :vendor, :through => :item
end
Which allows me to look for SaleItem.first.vendor, but doesn't allow me to do something like:
SaleItem.joins(:vendor).all(:order => "vendors.name")
Is there an easy way to figure out these complex associations and sorting? It would be especially great if there were a plugin that could take care of these sort of things. I have a lot of different types of tables to add sorting to in this application, and I feel like this will be a big chunk of the figuring-out work.
This could definitely be done with a more complex SQL query (possibly using find_by_sql), but you could also do it pretty easily in Ruby. Try something like the following:
SaleItem.find(:all, :include => { :items => :vendors }).sort do |first,second|
first.vendor.name <=> second.vendor.name
end
I haven't tested it, so it might not work exactly like this, but it should give you a good idea of one possible solution.
Edit: Found an old blog post that seems to have solved this issue. Hopefully this still works in the lastest version of ActiveRecord.
source: http://matthewman.net/2007/01/04/eager-loading-objects-in-a-rails-has_many-through-association/
Second Edit: Straight from the Rails documentation
To include a deep hierarchy of associations, use a hash:
for post in Post.find(:all, :include => [ :author, { :comments => { :author => :gravatar } } ])
That’ll grab not only all the comments but all their authors and gravatar pictures. You can mix and match symbols, arrays and hashes in any combination to describe the associations you want to load.
There's your explanation.
Do you really need your sale_items sorted by the database, or could you wait until it is presented and do the sorting client side via javascript (there are some great sorting libraries out there) - that would save server CPU and (backend) code complexity.
Not sure this could fall in performance section as well as model/database section, so here goes....
Let's say I have 3 models:
Movie {
has_one :interest, :as => :resource
}
Song {
has_one :interest, :as => :resource
}
Story {
has_one :interest, :as => :resource
}
and ...
Interest {
belongs_to :resource, :polymorphic => true
}
Now, if I need a list of all interests for all movies, and I want to show also the date those Movies objects were created (to say how old they were), then I use the lookup on resource_type attribute and then #some_interest.resource.created_at.
The problem with this is if I have 100 movie interests, then I will get 101 queries right ? So linear degradation. I tried using :include => [:resource] in my query call, but it says cannot use include in polymorphic associations.
How can I either eager load or optimize this problem to avoid this severe degradation ??
Any help would be greatly appreciated !!
If you are using searchlogic, there is a special syntax to deal with polymorphic relationships that may help. You can search on the has side of the relationship by specifying the name of the related class suffixed with type.
E.g. given your models, you ought to be able to do something like this to get movies created in the last 90 days:
Interest.resource_movie_type_created_at_gt(Time.now-90.days)
Searchlogic sets up the join on the related model for you, which should allay performance concerns.
Of course you can always write your own SQL using the find_by_sql method.
PS. one very helpful trick is to turn on logging in the Rails console while writing searches. This allows you to see the SQL generated right in the console without having to dig through the logs.