RoR: Tagging tags with other tags - ruby-on-rails

I'm trying to prototype a system in Rails. Essentially, its an abstract relational data model that takes in user input to create nodes of information. Each node can have meta-information associated with it, so some nodes may have CreateDate and DueDate while others may have StartDate, DueDate and PersonResponsible. In this way we're simply collecting lots of notes, and attributing information that a person would want to remember in relation to that note. Easy.
What I want to do to build on that is to make each node act as a tag which can be applied to any other node, building trees that can be browsed down with every node leading you to other nodes that are relationally its children. That way you can start by showing a list of your top level nodes (those not tagged by any others) and as each item is focused on, present a list of that node's children (all the other nodes which are tagged by it).
So my question is, which rails plugins should I look into to do this?

If I understood correctly - the data model you are describing is a graph.
Unfortunately ... I haven't found a plugin that implements graphs with all the characteristics you need (acts_ as _ graph plugin cannot do it) so you could try programming the model yourself.
You will need 3 tables and 2 active record classes for it (one table is used for the many to many relationship)
Classes
1. Node
has_many_and_belongs_to :node
2. Metadata
belongs_to :node
Since you need dynamic metadata you can use 2 columns: Name (string), Data (text) but you'll have to serialize data when you put it in the Data field (since you need class information as well as the data so you can use it).
I think this model should be able to hold your data. It's up to you to program the user interface part.

Related

Best Way to Store Contextual Attributes in Core Data?

I am using Core Data to store objects. What is the most efficient possibility for me (i.e. best execution efficiency, least code required, greatest simplicity and greatest compatibility with existing functions/libraries/frameworks) to store different attribute values for each object depending on the context, knowing that the contexts cannot be pre-defined, will be legion and constantly edited by the user?
Example:
An Object is a Person (Potentially =Employer / =Employee)
Each person works for several other persons and has different titles in relation to their work relationships, and their title may change from one year to another (in case this detail matters: each person may also concomitantly employ one or several other persons, which is why a person is an employee but potentially also an employer)
So one attribute of my object would be “Title vs Employer vs Year Ended”
The best I could do with my current knowledge is save all three elements together as a string which would be an attribute value assigned to each object, and constantly parse that string to be able to use it, but this has the following (HUGE) disadvantages:
(1) Unduly Slowed Execution & Increased Energy Use. Using this contextual attribute is at the very core of my prospective App´s core function (so it would literally be used 10-100 times every minute). Having to constantly parse this information to be able to use it adds undue processing that I’d very much like to avoid
(2) Undue Coding Overhead. Saving this contextual attribute as a string will unduly make additional coding for me necessary each time I’ll use this central information (i.e. very often).
(3) Undue Complexity & Potential Incompatibility. It will also add undue complexity and by departing from the expected practice it will escape the advantages of Core Data.
What would be the most efficient way to achieve my intended purpose without the aforementioned disadvantages?
Taking your example, one option is to create an Employment entity, with attributes for the title and yearEnded and two (to-one) relationships to Person. One relationship represents the employer and the other represents the employee.
The inverse relationships are in both cases to-many. One represents the employments where the Person is the employee (so you might name it employmentsTaken) and the other relationship represents the employments where the Person is the Employer (so you might name it employmentsGiven).
Generalising, this is the solution recommended by Apple for many-many relationships which have attributes (see "Modelling a relationship based on its semantics" in their documentation).
Whether that will address all of the concerns listed in your question, I leave to your experimentation: if things are changing 10-100 times a minute, the overhead of fetch requests and creating/updating/deleting the intermediate (Employment) entity might be worse than your string representation.

Graph Database Data Model of One Type of Object

Say I'm a mechanic who's worked on many different cars and would like to keep a database of the cars I've worked on. These cars have different manufacturers, models, and some customers have modified versions of these cars with different parts so it's not guaranteed the same model gives you the same car. In addition, I would like to see all these different cars and their similarities/differences easily. Basically the database needs to both represent the logical similarities/differences between all cars that I encounter while still giving me the ability to push/pull each instance of a car I've encountered.
Is this more set up for a relational or graph database?
If a graph database, how would you go about designing it? Each of the relationship labels would just be a 'has_a' or 'is_a_type_of'. Would you have the logical structure amongst all the cars and for each individual car have them point to the leaf nodes? Or would you have each relationship represent each specific car and have those relationships span the logical tree structure of the cars?
Alright so a "graphy" way to go about this would be to create a node type for each kind of domain object. You have a Car identified by a VIN, it can be linked to a Make, Model, and Year. You also have Mechanic nodes that [:work_on] various Car nodes. Don't store make/model/year with the Car, but rather link via relationships, e.g.:
CREATE (c:Car { VIN: "ABC"})-[:make]->(m:Make {label:"Toyota"});
...and so on.
Each of the relationship labels would just be a 'has_a' or
'is_a_type_of'.
Probably no, I'd create different relationship types unique to pairings of node types. So Mechanic -> Car would be :works_on, Car -> Model would be [:model] and so on. I don't recommend using the same relationship type like has_a everywhere, because from a modeling perspective it's harder to sort out the valid domain and ranges of those relationships (e.g. you'll end up in a situation where has_a can go from just about anything to just about anything, and picking out which has_a relationships you want will be hard).
Or would you have each relationship represent each specific car and
have those relationships span the logical tree structure of the cars?
Each car is its own node, identified by something like a VIN, not by a make/model/year. (Splitting out make/model/year later will allow you to very easily query for all Volvos, etc).
Your last question (and the toughest one):
Is this more set up for a relational or graph database?
This is an opinionated question (it attracts opinionated answers), let me put it to you this way: any data under the sun can be done both relationally and via graphs. So I could answer both yes relational, and yes graph. Your data and your domain doesn't select whether you should do RDBMS or Graph. Your queries and access patterns select RDBMS vs. graph. If you know how you need to use your data, which queries you'll run, and what you're trying to do, then with that information in hand, you can do your own analysis and determine which one is better. Both have strengths and weaknesses and many points of tradeoff. Without knowing how you'll access the data, it's impossible to answer this question in a really fair way.

How to implement an EAV model in Neo4j?

The Entity-Attribute-Value (EAV) model is really powerful, but complex to implement using SQL, so people often look for alternatives to EAV. It seems like the perfect candidate for graph databases. I understand how to build a movie database where you have nodes with the Neo4j label "Movie" with the property "release_date" right on the node. How would you make this more generic, such that movies have the Neo4j label "Entity" following the general EAV model?
I've thought a lot about this, but I'm not confident I have a good solution. I'll take a stab at it anyway. Here's the most basic model:
<node> <relationship> <node>
Attribute --> :VALUE --> Entity
name="Label",type="string" --> value="Movie" --> name="The Matrix"
With this model, you can write code for how to display and edit Attribute.type. For example, maybe all labels have a text field with finite options on the front-end and all dates have a date-picker. You could break Attribute.type out into its own node, Type, if that was preferable (particularly would make sense for handling composite types). In that case, you have the relationship TYPE between Attribute and Type nodes.
This becomes a problem if entities have multiple relationships, as is the case for reviews or if you want to relate the value to something else, such as the user who assigned the value. Now, I think, the relationship "VALUE" has to be it's own node of type "Value" (i.e. has the Neo4j label, "Value") with an incoming relationship from both Attribute and User nodes.
The full form has Type nodes, Attribute nodes, User nodes, Value nodes, and Entity nodes, where the relationships have basically no properties on them.
Why do you need it in the first place?
I always thought that EAV was just a workaround for relational databases not being schema free.
Neo4j as other nosql databases is schema free, so you can just add the attributes that you want to both nodes and relationships.
If you need to you can also record the EAV model in a meta-schema within the graph but in most cases it is good enough if the meta-schema lives within the application that creates and uses your attributes.
Usually I treat labels as roles which in a certain context provide certain properties and relationships. A node can have many labels each of which representing one of those roles.
E.g. for the same node
:Person(name)-[:LIVES_IN]->(:City)
:Employee(empNo)-[:WORKS_AT]->(:Company)
:Developer()-[:HAS_SKILL]->(:CompSkill)
...
So in your case :Entity would just be a label that implies the name property.
And :Movie is a label that implies a release_date property and e.g. ACTED_IN relationships.

How to do a join in Elasticsearch -- or at the Lucene level

What's the best way to do the equivalent of an SQL join in Elasticsearch?
I have an SQL setup with two large tables: Persons and Items.
A Person can own many items.
Both Person and Item rows can change (i.e. be updated).
I have to run searches which filter by aspects of both the person and the item.
In Elasticsearch, it looks like you could make Person a nested document of Item, then use has_child.
But: if you then update a Person, I think you'd need to update every Item they own (which could be a lot).
Is that correct?
Is there a nice way to solve this query in Elasticsearch?
As already mentioned the way to go is parent/child. The point is that nested documents are extremely performant but in order for them to be updated you need to re-submit the whole structure (parent + nested documents). Although the internal implementation of nested documents consists of separate lucene documents, those nested doc are not visible nor directly accessible. In fact when using nested documents you then need to use proper queries to access them (nested query, nested filter, nested facet etc.).
On the other hand parent/child allows you to have separate documents that refer to each other, which can be updated independently. It has a cost in terms of performance and memory used but it is way more flexible than nested documents.
As mentioned in this article though, the fact that elasticsearch helps you managing relations doesn't mean that you must use those features. In a lot of complex usecases it is just better to have some custom logic on the application layer that handles with relations. In facet there are limitations with parent/child too: for instance you can never get back both parent and children at the same time, as opposed to nested documents that doesn't allow to get back only matching children (for now).
Take a look at my answer for: In Elasticsearch, can multiple top-level documents share a single nested document?
This discusses the use of _parent mapping as a way to avoid the issue with needing to update every Item when a Person is updated.

How to store many item flags in core data

I am trying to do the following in my iPad app. I have a structure that allows people to create grouped lists which we call "Templates". So The top level CoreOffer(has Title) which can have many groups(has grouptitle/displayorder) which can have many items(has ItemTitle, DisplayOrder). As shown below. This works great, I can create Templates perfectly.
Image link
http://img405.imageshack.us/img405/9145/screenshot20110610at132.png
But once Templates are created people than can use them to map against the Template which I will call an Evaluation. A Template can be used many times. The Evaluation will contain a date(system generated) and which items from this particular Template have been selected.
Example below, people will be able to check particular rows in the screen below, this is then an Evaluation.
Image link
http://img41.imageshack.us/img41/8049/screenshot20110610at133.png
I am struggling to figure out how to create and store this information in the core data model without duplicating the Template. (struggling coming from a SQL background!) In SQL this would involve something like an evaluation table recording each itemid and its selection status.
I expect its quite simple but I just cant get my head around it!
Thanks
The first thing you want to do is clean up the naming in your data model. Remember, you are dealing with unique objects here and not the names of tables, columns, rows, joins etc in SQL. So, you don't need to prefix everything with "Core" (unless you have multiple kinds of Offer, Group and Item entities.)
Names of entities start with uppercase letters, names of attributes and relationships with lower case. All entity names are singular because the modeling of the entity does not depend on how many instances of the entity there will be or what kind of relationships it will have. To-one relationship names should be singular and to-many plural. These conventions make the code easy to read and convey information about the data model without having to see the actual graphic.
So, we could clean up your existing model like:
Offer{
id:string
title:string
groups<-->>Group.offer
}
Group{
title:string
displayOrder:number
offer<<-->Offer.groups
items<-->>Item.group
}
Item{
title:string
displayOrder:number
isSelected:Bool
group<<-->Group.items
}
Now if you read a keypath in code that goes AnOfferObj.groups.items you can instantly tell you are traversing two to-many relationships without knowing anything else about the data model.
I am unclear exactly what you want your "Evaluations" to "copy". You appear to either want them to "copy" the entire graph of any Offer or you want them "copy" a set of Item objects.
In either case, the solution is to create an Evaluation entity that can form a relationship with either Offer or Item.
In the first case it would look like:
Evaluation{
title:string
offer<<-->Offer.evaluations
}
Offer{
id:string
title:string
groups<-->>Group.offer
evaluations<-->>Evaluation.offer
}
... and in the second case:
Evaluation{
title:string
items<<-->>Item.evaluations
}
Item{
title:string
displayOrder:number
isSelected:Bool
group<<-->Group.items
evaluations<<-->>Evaluation.items
}
Note that in neither case are you duplicating or copying anything. You are just creating a reference to an existing group of objects. In the first case, you would find all the related Item objects for a particular Evaluation object by walking a keypath of offer.groups.items. In the second case, you would walk just the keypath of the items relationship of the Evaluation object with items.
Note that how you ultimately display all this in the UI is independent of the data model. Once you have the objects in hand, you can sort or otherwise arrange them as you need to based on the needs of view currently in use.
Some parting advice: Core Data is not SQL. Entities are not tables. Objects are not rows. Attributes are not columns. Relationships are not joins. Core Data is an object graph management system that may or may not persist the object graph and may or may not use SQL far behind the scenes to do so. Trying to think of Core Data in SQL terms will cause you to completely misunderstand Core Data and result in much grief and wasted time.
Basically, forget everything you know about SQL. That knowledge won't help you understand Core Data and will actively impede your understanding of it.

Resources