How to work with elasticsearch and associations between index objects? - ruby-on-rails

I am using the Tire gem for Rails and a couple of questions have been raised about model associations. How do you work with them? Let say you have a relation between a Person and a car. Each Person have many Cars. Now if you want to index the car objects too, how do you do that? How can you retrieve a person by searching for car.make for example?
In general, I can see that elasticsearch which is document centric doesn't have the same concepts that RDBMS have. One-to-one, one-to-many and many-to-many.
If you have a many-to-many relationship for instance and you want only objects with a property of the other end of the relationship that will be impossible? Is elastic search a better fit with a NoSQL database like MongoDB?

There are many possible strategies how to model your data in Elasticsearch, including relationships. In Elasticsearch, there are at least three strategies for related data:
Just use JSON and it's ability to fluently express hierarchies,
for the situation where the Car is actually a list of cars, use the nested type,
use parent/child support for situations where you need to index both entities individually.
With Tire, first check and try out in code this answer: Elasticsearch, Tire, and Nested queries / associations with ActiveRecord. It should contain all information needed for your scenario. The code is available also separately.
References:
http://www.spacevatican.org/2012/6/3/fun-with-elasticsearch-s-children-and-nested-documents/
Elasticsearch, Tire, and Nested queries / associations with ActiveRecord
https://gist.github.com/karmi/3200212

Related

Indexing in many to many relationship grails

I have two domains Class A and Class B with many to many relationship. For performance tuning I want to add indexes in a_id,b_id columns of table a_b(table produced by grails). FYI I have added the indexes from query.
It looks like you'll need to create the index within the database itself. If there was an option to index the join table it'd likely be in the joinTable(), but it's not there.
I found this tutorial very helpful with many to many relationship in Grails.
Many-to-Many Mapping without Hibernate XML
This method will give you the separate index unlike joinTable().

Rails Data Modelling

In my company, we are trying to cache some data that we are querying from an API. We are using Rails. Two of my models are 'Query' and 'Response'. I want to create a one-to-many relationship between Query and Response, wherein, one query can have many responses.
I thought this is the right way to do it.
Query = [query]
Response = [query_id, response_detail_1, response_detail_2]
Then, in the Models, I did the following Data Associations:
class Query < ActiveRecord::Base
has_many :response
end
class Response < ActiveRecord::Base
belongs_to :query
end
So, canonically, whenever I want to find all the responses for a given query, I would do -
"_id" = Query.where(:query => "given query").id
Response.where(:query_id => "_id")
But my boss made me use an Array column in the Query model, remove the Data Associations between the models and put the id of each response record in that array column in the Query model. So, now the Query model looks like
Query = [query_id, [response_id_1, response_id_2, response_id_3,...]]
I just want to know what are the merits and demerits of doing it both ways and which is the right way to do it.
If the relationship is really a one-to-many relationship, the "standard" approach is what you originally suggested, or using a junction table. You're losing out on referential integrity that you could get with a FK by using the array. Postgres almost had FK constraints on array columns, but from what I researched it looks like it's not currently in the roadmap:
http://blog.2ndquadrant.com/postgresql-9-3-development-array-element-foreign-keys/
You might get some performance advantages out of the array approach if you consider it like a denormalization/caching assist. See this answer for some info on that, but it still recommends using a junction table:
https://stackoverflow.com/a/17012344/4280232. This answer and the comments also offer some thoughts on the array performance vs the join performance:
https://stackoverflow.com/a/13840557/4280232
Another advantage of using the array is that arrays will preserve order, so if order is important you could get some benefits there:
https://stackoverflow.com/a/2489805/4280232
But even then, you could put the order directly on the responses table (assuming they're unique to each query) or you could put it on a join table.
So, in sum, you might get some performance advantages out of the array foreign keys, and they might help with ordering, but you won't be able to enforce FK constraints on them (as of the time of this writing). Unless there's a special situation going on here, it's probably better to stick with the "FK column on the child table" approach, as that is considerably more common.
Granted, that all applies mainly to SQL databases, which I notice now you didn't specify in your question. If you're using NoSQL there may be other conventions for this.

Can I have a one way HABTM relationship?

Say I have the model Item which has one Foo and many Bars.
Foo and Bar can be used as parameters when searching for Items and so Items can be searched like so:
www.example.com/search?foo=foovalue&bar[]=barvalue1&bar[]=barvalue2
I need to generate a Query object that is able to save these search parameters. I need the following relationships:
Query needs to access one Foo and many Bars.
One Foo can be accessed by many different Queries.
One Bar can be accessed by many different Queries.
Neither Bar nor Foo need to know anything about Query.
I have this relationship set up currently like so:
class Query < ActiveRecord::Base
belongs_to :foo
has_and_belongs_to_many :bars
...
end
Query also has a method which returns a hash like this: { foo: 'foovalue', bars: [ 'barvalue1', 'barvalue2' } which easily allows me to pass these values into a url helper and generate the search query.
This all works fine.
My question is whether this is the best way to set up this relationship. I haven't seen any other examples of one-way HABTM relationships so I think I may be doing something wrong here.
Is this an acceptable use of HABTM?
Functionally yes, but semantically no. Using HABTM in a "one-sided" fashion will achieve exactly what you want. The name HABTM does unfortunately insinuate a reciprocal relationship that isn't always the case. Similarly, belongs_to :foo makes little intuitive sense here.
Don't get caught up in the semantics of HABTM and the other association, instead just consider where your IDs need to sit in order to query the data appropriately and efficiently. Remember, efficiency considerations should above all account for your productivity.
I'll take the liberty to create a more concrete example than your foos and bars... say we have an engine that allows us to query whether certain ducks are present in a given pond, and we want to keep track of these queries.
Possibilities
You have three choices for storing the ducks in your Query records:
Join table
Native array of duck ids
Serialized array of duck ids
You've answered the join table use case yourself, and if it's true that "neither [Duck] nor [Pond] need to know anything about Query", using one-sided associations should cause you no problems. All you need to do is create a ducks_queries table and ActiveRecord will provide the rest. You could even opt to use has_many :through relationship if you need to do anything fancy.
At times arrays are more convenient than using join tables. You could store the data as a serialized integer array and add handlers for accessing the data similar to the following:
class Query
serialize :duck_ids
def ducks
transaction do
Duck.where(id: duck_ids)
end
end
end
If you have native array support in your database, you can do the same from within your DB. similar.
With Postgres' native array support, you could make a query as follows:
SELECT * FROM ducks WHERE id=ANY(
(SELECT duck_ids FROM queries WHERE id=1 LIMIT 1)::int[]
)
You can play with the above example on SQL Fiddle
Trade Offs
Join table:
Pros: Convention over configuration; You get all the Rails goodies (e.g. query.bars, query.bars=, query.bars.where()) out of the box
Cons: You've added complexity to your data layer (i.e. another table, more dense queries); makes little intuitive sense
Native array:
Pros: Semantically nice; you get all the DB's array-related goodies out of the box; potentially more performant
Cons: You'll have to roll your own Ruby/SQL or use an ActiveRecord extension such as postgres_ext; not DB agnostic; goodbye Rails goodies
Serialized array:
Pros: Semantically nice; DB agnostic
Cons: You'll have to roll your own Ruby; you'll loose the ability to make certain queries directly through your DB; serialization is icky; goodbye Rails goodies
At the end of the day, your use case makes all the difference. That aside, I'd say you should stick with your "one-sided" HABTM implementation: you'll lose a lot of Rails-given gifts otherwise.

Dynamically retrieve a nested polymorphic-ally related object?

Is there a way to traverse relationships through their lineage?
In other words, if you have a hierarchy of three or more polymorphic-ally related objects, is there any relative way to retrieve objects that are two degrees away, since I won't know its class?
For example:
Let's say I have images, which is polymorphic to students and teachers. Let's say students and teachers is polymorphic to different objects, like church and university.
How would I easily grab the "church" or "university" object with the images one loaded?
The short answer is that you can't do this. That is, you can't make a has-many-through association across a polymorphic association.
You can, however, make such an association if you limit the scope to a specific class/table. So you can find (and eager load) all teachers' universities for a specific image or image relation, for example. But you can't broaden the query to look at multiple potential tables in the way you seem to be hoping.
Of course, you can always write a query to do this by hand if so inclined...

Rails: Multiple trees for a single item

I want to categorize objects in multiple trees to reflect their characteristics and to build a navigation on.
So, given the following trees:
Category1
-Category-1-1
-Category-1-2
Category2
-Category-2-1
-Category-2-2
--Category-2-2-1
An object could e.g. belong to both Category-1-2 and to Category-2-2-1.
The goal is to be able to fetch all objects from the database
that belong to a certain category
that belong to a certain category or its decendants
A more practical example:
A category might have a hierarchy of 'Tools > Gardening Tools > Cutters'.
A second category: 'Hard objects > Metal objects > Small metal objects'
An object 'Pruners' would be categorized as belonging to 'Cutters' as well as 'Small metal objects'.
I want to be able to
retrieve all 'Gardening Tools' -> 'Pruners'
retrieve all Category children of 'Gardening Tools' -> 'Cutters'
retrieve all 'Hard objects' -> 'Pruners'
retrieve all 'Hard objects' that are also 'Cutters' -> 'Pruners'
retrieve all 'Soft objects' that are also 'Cutters' -> []
Any pointers? I have briefly looked at closure_tree, awesome_nested_sets etc., but I am not sure they are a good match.
Please note that the code here is all pseudo code.
I would use ancestry gem and would model your data with three model classes.
This way your data is normalized and it's a good base to build on.
Category - ancestry tree
has_may Memberships
has_may Products through Memberships
Membership
belongs_to Category
belongs_to Products
Products
has_may Memberships
has_may Categories through Memberships
From there on you need to figure out how to perform the equerries efficiently.
My way of doing this is to understand how to do it with SQL and then figure out how to express the queries with activercord's DSL.
Some resources:
http://railsantipatterns.com/ This book has some examples of complex SQL queries turned into reusable scopes and helpers
http://guides.rubyonrails.org/active_record_querying.html#joining-tables Rails's documentation, see section on joins and includes
http://stackoverflow.com/questions/38549/difference-between-inner-and-outer-join A great explanation of SQL joins
Queries examples:
Find a category.
Category.find(category_id)
Find a category and include it's products inside the specified category.
Category.find(category_id).join(:memberships => :products)
Find a category's sub-tree ind include products
Category.subtree_of(category_id).join(:memberships => :products)
Find all categories a products belongs to.
Product.find(product_id).categories
I just did this and I chose not to use ancestry, but closure_tree because the author says it is faster and I agree with him. Know you need a `has_and_belongs_to_many' between Categories (which I like to call tags whenever I add multiple to a single object) and Objects.
Now the finders, the bad news is that without your own custom query you might not be able to do it with one. Using the gems methods you will do something like:
Item.joins(:tags).where(tags: {id: self_and_descendant_ids })
The code is clean and it executes two queries, one for the descendant_ids and another one in Objects. Slight variations of this, should give you what you need for all except the last. That one is tough and I haven't implemented it (I'm in the process).
For now, you will have to call tag.self_and_ancestor_ids on both (Query count: 2), all items in those tags (Query count: 4) and intersect. After this, some serious refactoring is needed. I think we need to write SQL to reduce the number of queries, I don't think Rails query interface will be enough.
Another reason I chose *closure_tree* was the use of parent_id, all siblings share it (just like any other Rails association) so it made it easier to interface with other gems (for example RankedModel to sort).
I think you could go for one of the tree gems, personally I like Ancestry. Then make an association for each category to have many objects and each object can belong to many categories.
Have you stumbled on any problems already or are you just researching your options?

Resources